Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20231229となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 収束保証付きブロックチェーンに基づくフェデレーション学習におけるロバストソフトマックスアグリゲーション Robust softmax aggregation on blockchain based federated learning with convergence guarantee ( http://arxiv.org/abs/2311.07027v2 ) ライセンス: Link先を確認	Huiyu Wu, Diego Klabjan,	(参考訳) ブロックチェーンベースのフェデレーション学習は、参加者がローカルデータセットを共有することなくモデルトレーニングを可能にする分散学習スキームである。本稿では,ブロックチェーンに基づくフェデレーション学習フレームワークについて述べる。まず、既存のブロックチェーンネットワーク上でテスト済みの証明・オブ・テイクコンセンサス機構を利用して、バリデータとマイナを選択し、参加者の更新を集約し、ブロックを計算する、ブロックチェーンベースの新しいフェデレーション学習アーキテクチャを提案する。第2に、アグリゲーションプロセスの堅牢性を確保するために、特定のブロックチェーンアーキテクチャに依存した、近似された人口損失値に基づく、新しいソフトマックスアグリゲーション手法を設計する。さらに,我々のソフトマックス集約手法は,非制限仮定の凸設定において,大域最小値に収束することを示す。包括的実験により、我々のフレームワークは、様々な設定において、既存のロバストな集約アルゴリズムよりも大きなマージンで優れていることが示された。 Blockchain based federated learning is a distributed learning scheme that allows model training without participants sharing their local data sets, where the blockchain components eliminate the need for a trusted central server compared to traditional Federated Learning algorithms. In this paper we propose a softmax aggregation blockchain based federated learning framework. First, we propose a new blockchain based federated learning architecture that utilizes the well-tested proof-of-stake consensus mechanism on an existing blockchain network to select validators and miners to aggregate the participants' updates and compute the blocks. Second, to ensure the robustness of the aggregation process, we design a novel softmax aggregation method based on approximated population loss values that relies on our specific blockchain architecture. Additionally, we show our softmax aggregation technique converges to the global minimum in the convex setting with non-restricting assumptions. Our comprehensive experiments show that our framework outperforms existing robust aggregation algorithms in various settings by large margins.	翻訳日:2024-03-18 23:32:03 公開日:2023-12-29
# ドローンファームウェアの動的解析の問題点とその解決法 Difficulties in Dynamic Analysis of Drone Firmware and Its Solutions ( http://arxiv.org/abs/2312.16818v2 ) ライセンス: Link先を確認	Yejun Kim, Kwangsoo Cho, Seungjoo Kim,	(参考訳) モノのインターネット(IoT)技術の進歩により、その応用は公共、工業、民間、軍事など様々な分野にまたがる。特に、ドローン部門は商業目的と軍事目的の両方において大きな注目を集めている。その結果、ドローンの脆弱性分析に焦点を当てた研究が急増した。しかし、IoTデバイスに対する脅威を軽減するセキュリティ研究のほとんどは、主にネットワーク、ファームウェア、モバイルアプリケーションに焦点を当てている。これらのうち、ファームウェアのセキュリティを解析するためにファジリングを使用するには、ファームウェアのエミュレーションが必要である。しかし、ドローンファームウェアに関しては、エミュレーションや自動ファジィングツールが欠けている。これは、入力インターフェースの制限、ファームウェアの暗号化、署名といった問題によることが多い。既存のエミュレータやIoTデバイスの自動アナライザがドローンに適用できると仮定するのは興味深いかもしれないが、実際的な応用が証明されている。本稿では,ドローンファームウェアの動的解析の課題について論じ,潜在的な解決策を提案する。さらに,最大市場シェアのDJIドローンに適用することで,提案手法の有効性を実証する。 With the advancement of Internet of Things (IoT) technology, its applications span various sectors such as public, industrial, private and military. In particular, the drone sector has gained significant attention for both commercial and military purposes. As a result, there has been a surge in research focused on vulnerability analysis of drones. However, most security research to mitigate threats to IoT devices has focused primarily on networks, firmware and mobile applications. Of these, the use of fuzzing to analyse the security of firmware requires emulation of the firmware. However, when it comes to drone firmware, the industry lacks emulation and automated fuzzing tools. This is largely due to challenges such as limited input interfaces, firmware encryption and signatures. While it may be tempting to assume that existing emulators and automated analysers for IoT devices can be applied to drones, practical applications have proven otherwise. In this paper, we discuss the challenges of dynamically analysing drone firmware and propose potential solutions. In addition, we demonstrate the effectiveness of our methodology by applying it to DJI drones, which have the largest market share.	翻訳日:2024-03-18 11:18:35 公開日:2023-12-29
# Web アセンブリを用いた簡単なクライアント側による個人情報の暗号化 Simple client-side encryption of personal information with Web Assembly ( http://arxiv.org/abs/2312.17689v1 ) ライセンス: Link先を確認	Marco Falda, Angela Grassi,	(参考訳) HTTPSプロトコルは、いくつかの攻撃に対して高いレベルの堅牢性を強制しているが、必要な証明書をイントラネットにセットアップするのは簡単ではない。 Web Assemblyを使ってクライアント側のデータを暗号化する簡単な方法が提案されている。データはクリアテキストとしてサーバに転送されることはない。サーバ内のフィールドの検索は、暗号文と平文との安定した接頭辞対応を保証する符号化方式によって可能となる。本手法はセマンティック・メディカル・データベースのために開発され、不感な情報を明確な形で保持しながら、追加のパスワードを使って個人データにアクセスすることができる。 Web Assemblyは、操作の暗号化/復号化の迅速かつ効率的な実行を保証するために選ばれている。コードはhttps://github.com/mfalda/client-encdec.comで公開されている。 The HTTPS protocol has enforced a higher level of robustness to several attacks; however, it is not easy to set up the required certificates on intranets, nor is it effective in the case the server confidentiality is not reliable, as in the case of cloud services, or it could be compromised. A simple method is proposed to encrypt the data on the client side, using Web Assembly. It never transfers data to the server as clear text. Searching fields in the server is made possible by an encoding scheme that ensures a stable prefix correspondence between ciphertext and plaintext. The method has been developed for a semantic medical database, and allows accessing personal data using an additional password while maintaining non-sensitive information in clear form. Web Assembly has been chosen to guarantee the fast and efficient execution of encrypting/decrypting operations and because of its characteristic of producing modules that are very robust against reverse engineering. The code is available at https://github.com/mfalda/client-encdec.	翻訳日:2024-03-18 11:08:48 公開日:2023-12-29
# スケールされたアフィン$\varphi^4_4$量子ユークリッド共変相対論におけるグリーン関数の連続極限 Continuum limit of the Green function in scaled affine $\varphi^4_4$ quantum Euclidean covariant relativistic field theory ( http://arxiv.org/abs/2402.10903v1 ) ライセンス: Link先を確認	Riccardo Fantoni,	(参考訳) 我々は、経路積分モンテカルロ計算機実験を通じて、$\varphi_4^4$スケールユークリッド共変相対論的スカラー場理論のアフィン量子化が、1点函数と2点函数のよく定義された連続極限を持つ有効な量子場理論であることを証明した。アフィン量子化は、スケールした振る舞いを伴う状況を利用して、量子論の完全な満足な量子化を導き、予期せぬ$\hbar^2/\varphi^2$へと導く。 We prove through path integral Monte Carlo computer experiments that the affine quantization of the $\varphi_4^4$ scaled Euclidean covariant relativistic scalar field theory is a valid quantum field theory with a well defined continuum limit of the one- and two-point-function. Affine quantization leads to a completely satisfactory quantization of field theories using situations that involve scaled behavior leading to an unexpected, $\hbar^2/\varphi^2$ which arises only in the quantum aspects.	翻訳日:2024-03-18 07:28:31 公開日:2023-12-29
# Web 3.0のための技術導入: 総合的な調査 Enabling Technologies for Web 3.0: A Comprehensive Survey ( http://arxiv.org/abs/2401.10901v1 ) ライセンス: Link先を確認	Md Arif Hassan, Mohammad Behdad Jamshidi, Bui Duc Manh, Nam H. Chu, Chi-Hieu Nguyen, Nguyen Quang Hieu, Cong T. Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Nguyen Van Huynh, Mohammad Abu Alsheikh and Eryk Dutkiewicz	(参考訳) Web 3.0はインターネット進化の次の段階であり、自律性、効率性、品質、セキュリティ、プライバシを高めることを目的としている。この進化は、最新の技術開発を利用してコンテンツアクセスを民主化する可能性がある。本稿では,ブロックチェーン,セマンティックWeb,3DインタラクティブWeb,メタバース,仮想現実/拡張現実,モノのインターネット,Web 3.0の形成におけるその役割など,Web 3.0のコンテキストにおけるテクノロジの実現に関する詳細な調査を行う。 web 3.0のコンセプト、基本的なアーキテクチャ、潜在的なアプリケーション、業界の採用など、包括的背景を提供することで開始します。次に、Web 3.0開発に重要なIoT、5G、ブロックチェーン技術の最近のブレークスルーについて検討する。次に、ai、semantic web、および3d interactive webを含む他の実現可能な技術について論じる。これらの技術を利用することで、分散ID、プラットフォームの相互運用性、データの透明性、レイテンシの低減、システムのスケーラビリティ向上など、Web 3.0の実現における重要な課題に効果的に対処できます。最後に、Web 3.0の実装に関連する重要な課題を強調し、潜在的なソリューションを強調し、この分野における今後の研究方向性に関する洞察を提供する。 Web 3.0 represents the next stage of Internet evolution, aiming to empower users with increased autonomy, efficiency, quality, security, and privacy. This evolution can potentially democratize content access by utilizing the latest developments in enabling technologies. In this paper, we conduct an in-depth survey of enabling technologies in the context of Web 3.0, such as blockchain, semantic web, 3D interactive web, Metaverse, Virtual reality/Augmented reality, Internet of Things technology, and their roles in shaping Web 3.0. We commence by providing a comprehensive background of Web 3.0, including its concept, basic architecture, potential applications, and industry adoption. Subsequently, we examine recent breakthroughs in IoT, 5G, and blockchain technologies that are pivotal to Web 3.0 development. Following that, other enabling technologies, including AI, semantic web, and 3D interactive web, are discussed. Utilizing these technologies can effectively address the critical challenges in realizing Web 3.0, such as ensuring decentralized identity, platform interoperability, data transparency, reducing latency, and enhancing the system's scalability. Finally, we highlight significant challenges associated with Web 3.0 implementation, emphasizing potential solutions and providing insights into future research directions in this field.	翻訳日:2024-01-28 16:07:58 公開日:2023-12-29
# Holonic Learning: 柔軟なエージェントベースの分散機械学習フレームワーク Holonic Learning: A Flexible Agent-based Distributed Machine Learning Framework ( http://arxiv.org/abs/2401.10839v1 ) ライセンス: Link先を確認	Ahmad Esmaeili, Zahra Ghorrati, Eric T. Matson	(参考訳) 過去10年間で、データと計算リソースのユビキタス化が進み、マシンラーニングパラダイムのより分散的なアプローチへの顕著な移行を促している。このような移行は、スケーラビリティとリソース分散の課題に対処するだけでなく、プライバシーとセキュリティの懸念にも対処しようとしている。本稿では,ディープラーニングモデルを学習するための協調的かつプライバシを重視した学習フレームワークであるHoloonic Learning(HoL)を紹介する。ホロニックの概念を活用することで、HoLフレームワークは学習プロセスにおける構造化された自己相似階層を確立し、ホロロン内のコミットメントとコミュニケーションパターンとともに、各ホロンの個々のモデル集約アプローチを通じて、よりニュアンスな協調制御を可能にする。 HoLは一般的な形で、幅広い設計と柔軟性を提供する。本論文は, 実験解析と有効性を示すため, 全ホロンのモデルアグリゲーションに重み付け平均化を用いたHoLの特殊変種であるHoloAvgを実装した。提案手法の収束性は,標準MNIStデータセットのIIDおよび非IID設定における実験により検証される。さらに, 各種設計およびデータ分散シナリオ下でのHoLの性能挙動について検討した。この結果から,特に非IIDデータ分布の文脈において,HoLの競争性能向上の成果が確認された。 Ever-increasing ubiquity of data and computational resources in the last decade have propelled a notable transition in the machine learning paradigm towards more distributed approaches. Such a transition seeks to not only tackle the scalability and resource distribution challenges but also to address pressing privacy and security concerns. To contribute to the ongoing discourse, this paper introduces Holonic Learning (HoL), a collaborative and privacy-focused learning framework designed for training deep learning models. By leveraging holonic concepts, the HoL framework establishes a structured self-similar hierarchy in the learning process, enabling more nuanced control over collaborations through the individual model aggregation approach of each holon, along with their intra-holon commitment and communication patterns. HoL, in its general form, provides extensive design and flexibility potentials. For empirical analysis and to demonstrate its effectiveness, this paper implements HoloAvg, a special variant of HoL that employs weighted averaging for model aggregation across all holons. The convergence of the proposed method is validated through experiments on both IID and Non-IID settings of the standard MNISt dataset. Furthermore, the performance behaviors of HoL are investigated under various holarchical designs and data distribution scenarios. The presented results affirm HoL's prowess in delivering competitive performance particularly, in the context of the Non-IID data distribution.	翻訳日:2024-01-28 16:06:33 公開日:2023-12-29
# ReliCD:信頼性を意識した信頼性認知診断フレームワーク ReliCD: A Reliable Cognitive Diagnosis Framework with Confidence Awareness ( http://arxiv.org/abs/2401.10749v1 ) ライセンス: Link先を確認	Yunfei Zhang, Chuan Qin, Dazhong Shen, Haiping Ma, Le Zhang, Xingyi Zhang, Hengshu Zhu	(参考訳) 過去数十年間、認知診断モデリングは、学生の学習状況と知識習得レベルを定量化できる計算教育コミュニティにおいて注目を集めてきた。実際、ニューラルネットワークの最近の進歩は、学生や運動の深い表現を学ぶことによって、従来の認知診断モデルの性能を大きく向上させた。それにもかかわらず、既存のアプローチは、学生の熟練度を予測する際の自信過剰の問題に苦しむことが多く、これは主に、現実的な学生と生徒との対話データにおける不可避なノイズとスパーシティによって引き起こされ、診断フィードバックの教育的応用を著しく妨げている。そこで本研究では, 診断フィードバックの信頼性を定量化し, 異なる認知的診断機能に対して柔軟である, 信頼性の高い認知診断(relicd)フレームワークを提案する。具体的には,まず,様々な知識概念の状態の不確実性を明確に推定し,診断フィードバックの信頼性を定量化するベイズ法を提案する。特に、潜在的な差異を考慮し、事前学習モデルを用いて、異なる能力概念の潜伏変数に対する個々の事前分布をモデル化することを提案する。さらに,信頼度ランキングの論理的仮説を導入する。この線に沿って、学生のパフォーマンス予測過程をモデル化し、信頼度パラメータを最適化する新たな校正損失を設計する。最後に、4つの実世界のデータセットに関する広範な実験により、ReliCDフレームワークの有効性が明らかになりました。 During the past few decades, cognitive diagnostics modeling has attracted increasing attention in computational education communities, which is capable of quantifying the learning status and knowledge mastery levels of students. Indeed, the recent advances in neural networks have greatly enhanced the performance of traditional cognitive diagnosis models through learning the deep representations of students and exercises. Nevertheless, existing approaches often suffer from the issue of overconfidence in predicting students' mastery levels, which is primarily caused by the unavoidable noise and sparsity in realistic student-exercise interaction data, severely hindering the educational application of diagnostic feedback. To address this, in this paper, we propose a novel Reliable Cognitive Diagnosis(ReliCD) framework, which can quantify the confidence of the diagnosis feedback and is flexible for different cognitive diagnostic functions. Specifically, we first propose a Bayesian method to explicitly estimate the state uncertainty of different knowledge concepts for students, which enables the confidence quantification of diagnostic feedback. In particular, to account for potential differences, we suggest modeling individual prior distributions for the latent variables of different ability concepts using a pre-trained model. Additionally, we introduce a logical hypothesis for ranking confidence levels. Along this line, we design a novel calibration loss to optimize the confidence parameters by modeling the process of student performance prediction. Finally, extensive experiments on four real-world datasets clearly demonstrate the effectiveness of our ReliCD framework.	翻訳日:2024-01-28 16:06:10 公開日:2023-12-29
# テンソル畳み込みニューラルネットワークを用いた製造におけるブースティング欠陥検出 Boosting Defect Detection in Manufacturing using Tensor Convolutional Neural Networks ( http://arxiv.org/abs/2401.01373v1 ) ライセンス: Link先を確認	Pablo Martin-Ramiro, Unai Sainz de la Maza, Roman Orus, Samuel Mugel	(参考訳) 欠陥検出は製造業における品質管理の段階において最も重要かつ困難な課題の1つである。本稿では,テンソル畳み込みニューラルネットワーク(t-cnn)を導入し,ロバート・ボッシュ製造工場で製造される超音波センサの構成要素の1つである実欠陥検出への応用について検討する。我々の量子インスパイアされたT-CNNは、精度を犠牲にすることなく、等価なCNNモデルのトレーニング速度と性能を大幅に向上するために、縮小モデルパラメータ空間で動作する。より具体的には、t-cnnが従来のcnnと同じ性能を品質指標で測定し、最大15分の1のパラメータと4%から19%の速さで達成できることを実証する。以上の結果から,T-CNNは従来の人間の視覚検査の結果を大きく上回り,製造における実際の応用に価値をもたらすことが示された。 Defect detection is one of the most important yet challenging tasks in the quality control stage in the manufacturing sector. In this work, we introduce a Tensor Convolutional Neural Network (T-CNN) and examine its performance on a real defect detection application in one of the components of the ultrasonic sensors produced at Robert Bosch's manufacturing plants. Our quantum-inspired T-CNN operates on a reduced model parameter space to substantially improve the training speed and performance of an equivalent CNN model without sacrificing accuracy. More specifically, we demonstrate how T-CNNs are able to reach the same performance as classical CNNs as measured by quality metrics, with up to fifteen times fewer parameters and 4% to 19% faster training times. Our results demonstrate that the T-CNN greatly outperforms the results of traditional human visual inspection, providing value in a current real application in manufacturing.	翻訳日:2024-01-15 09:54:35 公開日:2023-12-29
# 大規模言語モデルを用いたオンライン投稿の脅威検出の有効性 Efficacy of Utilizing Large Language Models to Detect Public Threat Posted Online ( http://arxiv.org/abs/2401.02974v1 ) ライセンス: Link先を確認	Taeksoo Kwon (Algorix Convergence Research Office), Connor Kim (Centennial High School)	(参考訳) 本稿では,大規模言語モデル(LLM)を用いてオンライン投稿された公的な脅威を検出する方法を提案する。暴力に対するレトリックや先進的な警告の拡散に対する懸念が高まっている中、自動コンテンツ分析技術は早期の識別とモデレーションに役立つ可能性がある。カスタムデータ収集ツールは、500の非脅威例と20の脅威からなる、韓国の人気のあるオンラインコミュニティからの投稿タイトルを集めるために開発された。様々なLSM(GPT-3.5、GPT-4、PaLM)は個々のポストを「脅威」または「安全」に分類するよう促された。統計的分析では、全てのモデルが強い精度を示し、脅威と非脅威の識別の両方に対して適合性テストの2乗精度を渡した。 GPT-4は総じて97.9%の非脅威と100%の脅威精度で性能が向上した。 PaLM APIの価格設定はコスト効率が高かった。以上の結果から,LLMは大規模コンテンツモデレーションを効果的に強化し,新たなオンラインリスクを軽減できる可能性が示唆された。しかし、バイアス、透明性、倫理的監視は、現実の実施前に重要な考慮事項である。 This paper examines the efficacy of utilizing large language models (LLMs) to detect public threats posted online. Amid rising concerns over the spread of threatening rhetoric and advance notices of violence, automated content analysis techniques may aid in early identification and moderation. Custom data collection tools were developed to amass post titles from a popular Korean online community, comprising 500 non-threat examples and 20 threats. Various LLMs (GPT-3.5, GPT-4, PaLM) were prompted to classify individual posts as either "threat" or "safe." Statistical analysis found all models demonstrated strong accuracy, passing chi-square goodness of fit tests for both threat and non-threat identification. GPT-4 performed best overall with 97.9% non-threat and 100% threat accuracy. Affordability analysis also showed PaLM API pricing as highly cost-efficient. The findings indicate LLMs can effectively augment human content moderation at scale to help mitigate emerging online risks. However, biases, transparency, and ethical oversight remain vital considerations before real-world implementation.	翻訳日:2024-01-15 09:31:35 公開日:2023-12-29
# ANALYTiC:機械学習における決定境界と次元化の理解 ANALYTiC: Understanding Decision Boundaries and Dimensionality Reduction in Machine Learning ( http://arxiv.org/abs/2401.05418v1 ) ライセンス: Link先を確認	Salman Haidri	(参考訳) コンパクトでハンドヘルドなデバイスが登場したことで、追跡された動きデータのプールができ、それを使ってトレンドやパターンを推測できるようになりました。動物、人間、車両等の様々な軌跡データの洪水により、ANALYTiCのアイデアは、ラベル付きデータの集合から学習することで、軌跡から意味的アノテーションを推論するアクティブラーニングによって生まれた。本研究は,データ中のパターンやクラスタを強調表示し,現在あるアクティブラーニングと組み合わせて,次元削減と決定境界の適用について検討する。これらの特徴を3つの異なる軌道データセットでテストし,ラベル付きデータの活用と解釈性の向上を目標とした。実験により,これらの組み合わせ手法がトラジェクティブラベリングの効率と精度を向上させる可能性を実証した。この研究は、運動データ分析の文脈における機械学習と視覚的手法のより広範な統合に向けた足掛かりとなる。 The advent of compact, handheld devices has given us a pool of tracked movement data that could be used to infer trends and patterns that can be made to use. With this flooding of various trajectory data of animals, humans, vehicles, etc., the idea of ANALYTiC originated, using active learning to infer semantic annotations from the trajectories by learning from sets of labeled data. This study explores the application of dimensionality reduction and decision boundaries in combination with the already present active learning, highlighting patterns and clusters in data. We test these features with three different trajectory datasets with objective of exploiting the the already labeled data and enhance their interpretability. Our experimental analysis exemplifies the potential of these combined methodologies in improving the efficiency and accuracy of trajectory labeling. This study serves as a stepping-stone towards the broader integration of machine learning and visual methods in context of movement data analysis.	翻訳日:2024-01-15 08:20:54 公開日:2023-12-29
# 慣性センサ信号強調のためのウェーブレット動的選択ネットワーク Wavelet Dynamic Selection Network for Inertial Sensor Signal Enhancement ( http://arxiv.org/abs/2401.05416v1 ) ライセンス: Link先を確認	Yifeng Wang, Yi Zhao	(参考訳) 姿勢や動きを感知するコンポーネントとして、慣性センサーは様々な携帯機器で広く使われている。しかし、慣性センサーの過酷なエラーは、特に軌道回復と意味認識の機能を阻害する。主流信号処理法として、ウェーブレット基底関数が豊富で多様なため、ウェーブレットは信号の数学的顕微鏡として評価される。しかし、慣性センサの複雑なノイズタイプと応用シナリオにより、ウェーブレットベースパープレキシングが選択される。本研究では,可変慣性信号に対して適切なウェーブレット基底をインテリジェントに選択するウェーブレット動的選択ネットワーク(wdsnet)を提案する。さらに、既存のディープラーニングアーキテクチャは、入力データから特徴を抽出する上で優れているが、カテゴリ認識能力の向上に不可欠である対象カテゴリの特徴を学習することを無視し、ウェーブレットベースの選択を改善する。そこで本研究では,トレーニング可能なパラメータを増やすことなく,カテゴリの特徴を抽出し,表現できるカテゴリ表現機構を提案する。さらにcrmは、共通の完全連結ネットワークをカテゴリ表現に変換し、遠方かつ自明な1つのホットな分類ラベルよりも特徴抽出器を注意深く監視する。本稿では,ネットワーク上で解釈可能性を設定し,特徴抽出器の特徴監督機構を監督するプロセスと呼び,その効果を実験的・理論的に実証する。拡張された慣性信号は、軌道再構成などの元の信号に関して実行不能なタスクを実行できる。定量的およびビジュアルな結果は、WDSNetが既存の手法より優れていることを示している。注目すべきは、WDSNetは弱教師付き手法として、比較された全教師付き手法の最先端性能を達成することである。 As attitude and motion sensing components, inertial sensors are widely used in various portable devices. But the severe errors of inertial sensors restrain their function, especially the trajectory recovery and semantic recognition. As a mainstream signal processing method, wavelet is hailed as the mathematical microscope of signal due to the plentiful and diverse wavelet basis functions. However, complicated noise types and application scenarios of inertial sensors make selecting wavelet basis perplexing. To this end, we propose a wavelet dynamic selection network (WDSNet), which intelligently selects the appropriate wavelet basis for variable inertial signals. In addition, existing deep learning architectures excel at extracting features from input data but neglect to learn the characteristics of target categories, which is essential to enhance the category awareness capability, thereby improving the selection of wavelet basis. Therefore, we propose a category representation mechanism (CRM), which enables the network to extract and represent category features without increasing trainable parameters. Furthermore, CRM transforms the common fully connected network into category representations, which provide closer supervision to the feature extractor than the far and trivial one-hot classification labels. We call this process of imposing interpretability on a network and using it to supervise the feature extractor the feature supervision mechanism, and its effectiveness is demonstrated experimentally and theoretically in this paper. The enhanced inertial signal can perform impracticable tasks with regard to the original signal, such as trajectory reconstruction. Both quantitative and visual results show that WDSNet outperforms the existing methods. Remarkably, WDSNet, as a weakly-supervised method, achieves the state-of-the-art performance of all the compared fully-supervised methods.	翻訳日:2024-01-15 08:20:39 公開日:2023-12-29
# グローバル外交実践における生成AIの役割:戦略的枠組み The Role of Generative AI in Global Diplomatic Practices: A Strategic Framework ( http://arxiv.org/abs/2401.05415v1 ) ライセンス: Link先を確認	Muneera Bano, Zahid Chaudhri, Didar Zowghi	(参考訳) 21世紀に人工知能(AI)が外交の領域を変革するにつれ、この研究はこれらの進歩の双対性を評価することの必要性に対処し、それらがもたらす課題と彼らが与える機会の両方を解き放つ。 OpenAIによるChatGPTのローンチから1年近くが経ち、さまざまな作業領域にその機能を持たせた。これらの能力を外交に応用する範囲はまだ完全には解明されていない。我々の研究目的は、デジタル・AI外交に関する現在の言論を体系的に検討し、現代の外交実践におけるジェネレーティブ・AIの役割のための包括的枠組みの開発を知らせることである。 230の学術論文の体系的な分析を通じて、我々は機会と課題のスペクトルを特定し、ジェネレーティブAIの統合のための多面的概念を捉え、外交における将来の研究と革新のコースを設定する戦略的枠組みに到達した。 As Artificial Intelligence (AI) transforms the domain of diplomacy in the 21st century, this research addresses the pressing need to evaluate the dualistic nature of these advancements, unpacking both the challenges they pose and the opportunities they offer. It has been almost a year since the launch of ChatGPT by OpenAI that revolutionised various work domains with its capabilities. The scope of application of these capabilities to diplomacy is yet to be fully explored or understood. Our research objective is to systematically examine the current discourse on Digital and AI Diplomacy, thus informing the development of a comprehensive framework for the role of Generative AI in modern diplomatic practices. Through the systematic analysis of 230 scholarly articles, we identified a spectrum of opportunities and challenges, culminating in a strategic framework that captures the multifaceted concepts for integration of Generative AI, setting a course for future research and innovation in diplomacy.	翻訳日:2024-01-15 08:20:15 公開日:2023-12-29
# 固有状態遷移におけるスケール不変臨界ダイナミクス Scale-invariant critical dynamics at eigenstate transitions ( http://arxiv.org/abs/2309.16005v2 ) ライセンス: Link先を確認	Miroslav Hopjan, Lev Vidmar	(参考訳) スケール不変ダイナミクスの概念は、スペクトル形式因子(sff)のランプの出現によって示されるように、量子カオス系において後期によく確立されている。先行論文[phys. rev. lett. 131, 060404 (2023)]の結果に基づいて,臨界点における生存確率とsffのスケール不変ダイナミクス,すなわち量子カオスから局在への固有状態遷移の特徴を考察する。量子カオス状態とは対照的に、臨界における量子力学は、後期のスケール不変性を示すだけでなく、中間時力学と呼ばれるより短い時間で現れることを示す。結果は二次モデルと相互作用モデルの両方に適用できる。具体的には,3次元から5次元のアンダーソンモデル,前者のパワールールランダムバンド行列,後者の量子太陽モデルと超メトリックモデル,およびローゼンツヴァイク・ポーターモデルについて検討した。 The notion of scale-invariant dynamics is well established at late times in quantum chaotic systems, as illustrated by the emergence of a ramp in the spectral form factor (SFF). Building on the results of the preceding Letter [Phys. Rev. Lett. 131, 060404 (2023)], we explore features of scale-invariant dynamics of survival probability and SFF at criticality, i.e., at eigenstate transitions from quantum chaos to localization. We show that, in contrast to the quantum chaotic regime, the quantum dynamics at criticality do not only exhibit scale invariance at late times, but also at much shorter times that we refer to as mid-time dynamics. Our results apply to both quadratic and interacting models. Specifically, we study Anderson models in dimensions three to five and power-law random banded matrices for the former, and the quantum sun model and the ultrametric model for the latter, as well as the Rosenzweig-Porter model.	翻訳日:2024-01-03 19:52:20 公開日:2023-12-29
# ハイブリッドモデリングデザインパターン Hybrid Modeling Design Patterns ( http://arxiv.org/abs/2401.00033v1 ) ライセンス: Link先を確認	Maja Rudolph, Stefan Kurz, Barbara Rakitsch	(参考訳) デザインパターンは、繰り返し発生するモデリング課題にソリューションを伝達する体系的な方法を提供する。本稿では、第一原理に基づくモデリングとデータ駆動モデリング技術を組み合わせたハイブリッドモデリングの設計パターンを紹介する。どちらのアプローチも相補的な利点がある一方で、それらをハイブリッドモデルに組み合わせる方法は多々あり、適切な解決策は目前にある問題に依存します。本稿では、データ駆動コンポーネントとドメイン知識をハイブリッドアプローチに組み合わせるための青写真として機能する4つの基本パターンを提案する。さらに,基本パターンとより複雑なハイブリッドモデルの組み合わせを規定する2つの構成パターンも提示する。各デザインパターンは、気候モデリング、工学、物理学といった応用分野の典型的なユースケースによって示される。 Design patterns provide a systematic way to convey solutions to recurring modeling challenges. This paper introduces design patterns for hybrid modeling, an approach that combines modeling based on first principles with data-driven modeling techniques. While both approaches have complementary advantages there are often multiple ways to combine them into a hybrid model, and the appropriate solution will depend on the problem at hand. In this paper, we provide four base patterns that can serve as blueprints for combining data-driven components with domain knowledge into a hybrid approach. In addition, we also present two composition patterns that govern the combination of the base patterns into more complex hybrid models. Each design pattern is illustrated by typical use cases from application areas such as climate modeling, engineering, and physics.	翻訳日:2024-01-03 19:18:58 公開日:2023-12-29
# 意思決定基盤モデルのための自己指導型事前学習: 定式化, パイプライン, 課題 Self-supervised Pretraining for Decision Foundation Model: Formulation, Pipeline and Challenges ( http://arxiv.org/abs/2401.00031v1 ) ライセンス: Link先を確認	Xiaoqian Liu, Jianbin Jiao, Junge Zhang	(参考訳) 意思決定(Decision-making)は、選択と最適なポリシーを見つけるために知覚、記憶、推論を必要とする動的なプロセスである。意思決定の伝統的なアプローチはサンプルの効率と一般化に苦しむ一方で、大規模な自己教師付き事前学習は言語やビジョンにおける微調整や少数ショット学習による迅速な適応を可能にしている。そこで我々は,大規模な自己指導型事前学習から得られる知識を下流の意思決定問題に統合する。本稿では,事前学習と下流推定のためのデータ収集,事前学習目標,適応戦略に関する最近の研究について述べる。最後に,総合的かつ柔軟な自己指導型事前学習の助けを借りて,意思決定基盤モデル開発における重要な課題と今後の方向性を明らかにする。 Decision-making is a dynamic process requiring perception, memory, and reasoning to make choices and find optimal policies. Traditional approaches to decision-making suffer from sample efficiency and generalization, while large-scale self-supervised pretraining has enabled fast adaptation with fine-tuning or few-shot learning in language and vision. We thus argue to integrate knowledge acquired from generic large-scale self-supervised pretraining into downstream decision-making problems. We propose Pretrain-Then-Adapt pipeline and survey recent work on data collection, pretraining objectives and adaptation strategies for decision-making pretraining and downstream inference. Finally, we identify critical challenges and future directions for developing decision foundation model with the help of generic and flexible self-supervised pretraining.	翻訳日:2024-01-03 19:18:47 公開日:2023-12-29
# OCRのスケーリング法則に関する実証的研究 An Empirical Study of Scaling Law for OCR ( http://arxiv.org/abs/2401.00028v1 ) ライセンス: Link先を確認	Miao Rang, Zhenni Bi, Chuanjian Liu, Yunhe Wang, Kai Han	(参考訳) モデルサイズ、データボリューム、計算、モデル性能の法則は自然言語処理(nlp)の分野で広く研究されてきた。しかし、光学文字認識(OCR)におけるスケーリング法則はまだ研究されていない。そこで本研究では,テキスト認識分野におけるモデルの性能とスケール,データボリューム,計算の相関関係を総合的に検討し,他の要因が一定である場合に,性能とモデルサイズ間のスムーズなパワー則と,データボリュームのトレーニングを行う。さらに,600万実サンプルと1800万合成サンプルからなる,rebu-synと呼ばれる大規模データセットを構築した。スケーリング法則と新しいデータセットに基づいて、シーンテキスト認識モデルをトレーニングし、トップ1の平均精度97.42%の6つの一般的なテストベンチマーク上で、最先端の新たなテストを実現しました。 The laws of model size, data volume, computation and model performance have been extensively studied in the field of Natural Language Processing (NLP). However, the scaling laws in Optical Character Recognition (OCR) have not yet been investigated. To address this, we conducted comprehensive studies that involved examining the correlation between performance and the scale of models, data volume and computation in the field of text recognition.Conclusively, the study demonstrates smooth power laws between performance and model size, as well as training data volume, when other influencing factors are held constant. Additionally, we have constructed a large-scale dataset called REBU-Syn, which comprises 6 million real samples and 18 million synthetic samples. Based on our scaling law and new dataset, we have successfully trained a scene text recognition model, achieving a new state-ofthe-art on 6 common test benchmarks with a top-1 average accuracy of 97.42%.	翻訳日:2024-01-03 19:18:17 公開日:2023-12-29
# 学習可能な離散ウェーブレット変換を用いたブラインド動作劣化のための高能率マルチスケールネットワーク Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring ( http://arxiv.org/abs/2401.00027v1 ) ライセンス: Link先を確認	Xin Gao, Tianheng Qiu, Xinyu Zhang, Hanlin Bai, Kang Liu, Xuan Huang, Hu Wei, Guoying Zhang, Huaping Liu	(参考訳) しかし、ディープラーニングの文脈では、既存のマルチスケールアルゴリズムでは、低スケールのrgb画像とディープセマンティクスの融合に複雑なモジュールを使用するだけでなく、手作業で十分な信頼性を持たない低解像度のイメージ対を生成する必要がある。本研究では,simo(single-input and multiple-outputs)に基づくマルチスケールネットワークを提案する。これにより、粗大なスキームに基づくアルゴリズムの複雑さを単純化する。マルチスケールアーキテクチャを用いて得られた詳細情報に影響を及ぼす復元欠陥を軽減するため,実世界のぼやけた軌跡の特徴を学習可能なウェーブレット変換モジュールと組み合わせて,ぼやけた画像から鋭い画像へのステップバイステップ遷移の方向連続性と周波数特性に着目した。そこで本研究では,実世界の分散データセットにおいて,主観的および客観的品質と計算効率の両面で最先端の性能を示す学習可能な離散ウェーブレット変換(mlwnet)を用いたマルチスケールネットワークを提案する。 Coarse-to-fine schemes are widely used in traditional single-image motion deblur; however, in the context of deep learning, existing multi-scale algorithms not only require the use of complex modules for feature fusion of low-scale RGB images and deep semantics, but also manually generate low-resolution pairs of images that do not have sufficient confidence. In this work, we propose a multi-scale network based on single-input and multiple-outputs(SIMO) for motion deblurring. This simplifies the complexity of algorithms based on a coarse-to-fine scheme. To alleviate restoration defects impacting detail information brought about by using a multi-scale architecture, we combine the characteristics of real-world blurring trajectories with a learnable wavelet transform module to focus on the directional continuity and frequency features of the step-by-step transitions between blurred images to sharp images. In conclusion, we propose a multi-scale network with a learnable discrete wavelet transform (MLWNet), which exhibits state-of-the-art performance on multiple real-world deblurred datasets, in terms of both subjective and objective quality as well as computational efficiency.	翻訳日:2024-01-03 19:18:01 公開日:2023-12-29
# マルチパーティ量子相互情報:代替定義」への回答 Reply to "Comment on `Multiparty quantum mutual information: An alternative definition'" ( http://arxiv.org/abs/2401.00026v1 ) ライセンス: Link先を確認	Asutosh Kumar	(参考訳) 我々はLeeらの主張を再確認する。 [先述の A 108, 066401 (2023)] は、以前の研究(A. Kumar, Phys. Rev. A 96, 012332 (2017))で提案された量子相対エントロピーの観点から、多部系における量子双対全相関の式は正しくない。量子相対エントロピーの観点から、量子双対全相関の代替式(s)を提供する。しかし、量子双対全相関の計算では、フォン・ノイマンのエントロピーの観点からその表現を使うべきであると仮定する。 We reaffirm the claim of Lee et al. [preceding Comment, Phys. Rev. A 108, 066401 (2023)] that the expression of quantum dual total correlation of a multipartite system in terms of quantum relative entropy as proposed in previous work [A. Kumar, Phys. Rev. A 96, 012332 (2017)] is not correct. We provide alternate expression(s) of quantum dual total correlation in terms of quantum relative entropy. We, however, prescribe that in computing quantum dual total correlation one should use its expression in terms of von Neumann entropy.	翻訳日:2024-01-03 19:17:39 公開日:2023-12-29
# モデルミス種別を用いた適応線形二次制御の漸近回帰解析 Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification ( http://arxiv.org/abs/2401.00073v1 ) ライセンス: Link先を確認	Bruce D. Lee, Anders Rantzer, Nikolai Matni	(参考訳) 多様なデータセット上で大きなモデルを事前学習し、特定のアプリケーション用に微調整するという戦略は、コンピュータビジョン、自然言語処理、ロボット制御に素晴らしい結果をもたらした。この戦略は適応制御において大きな可能性を秘めており、限られたデータで変化する条件に迅速に適応する必要がある。適応制御のための事前学習の利点を具体的に理解するために,学習者がダイナミクスのための基底行列の集合の事前知識を有する場合の適応線形二次制御問題について検討する。この根拠は、基盤となるデータ生成プロセスのダイナミクスを完全に表現できないという意味では不明確である。先行する知識を用いて,システムと$t$インタラクションを行った後,期待する後悔の上限を証明できるアルゴリズムを提案する。 t$ が小さいレジームでは、上限は、学習者が利用可能な事前知識に応じて、$\texttt{poly}(\log t)$ または $\sqrt{t}$ の項スケールによって支配される。 t$ が大きければ、後悔は$\delta t$ で成長する言葉によって支配され、$\delta$ は誤特定のレベルを定量化する。この線形項は、不特定の基底を用いて基礎となる力学を完全に推定できないため、基底行列がオンラインでも適用されない限り避けられない。しかし、基底行列の重みを推定する誤りによって生じる部分線型項が無視できるようになった後、大きな t$ に対してのみ支配的である。我々は解析を検証するシミュレーションを提供する。また,本シミュレーションでは,関連システムの集合からのオフラインデータを事前学習段階の一部として使用して,適応制御器で使用される不特定なダイナミクスベースを推定する。 The strategy of pre-training a large model on a diverse dataset, then fine-tuning for a particular application has yielded impressive results in computer vision, natural language processing, and robotic control. This strategy has vast potential in adaptive control, where it is necessary to rapidly adapt to changing conditions with limited data. Toward concretely understanding the benefit of pre-training for adaptive control, we study the adaptive linear quadratic control problem in the setting where the learner has prior knowledge of a collection of basis matrices for the dynamics. This basis is misspecified in the sense that it cannot perfectly represent the dynamics of the underlying data generating process. We propose an algorithm that uses this prior knowledge, and prove upper bounds on the expected regret after $T$ interactions with the system. In the regime where $T$ is small, the upper bounds are dominated by a term scales with either $\texttt{poly}(\log T)$ or $\sqrt{T}$, depending on the prior knowledge available to the learner. When $T$ is large, the regret is dominated by a term that grows with $\delta T$, where $\delta$ quantifies the level of misspecification. This linear term arises due to the inability to perfectly estimate the underlying dynamics using the misspecified basis, and is therefore unavoidable unless the basis matrices are also adapted online. However, it only dominates for large $T$, after the sublinear terms arising due to the error in estimating the weights for the basis matrices become negligible. We provide simulations that validate our analysis. Our simulations also show that offline data from a collection of related systems can be used as part of a pre-training stage to estimate a misspecified dynamics basis, which is in turn used by our adaptive controller.	翻訳日:2024-01-03 19:06:24 公開日:2023-12-29
# 磁気材料の機械学習モデル Machine-learned models for magnetic materials ( http://arxiv.org/abs/2401.00072v1 ) ライセンス: Link先を確認	Pawe{\l} Leszczy\'nski, Kamil Kutorasi\'nski, Marcin Szewczyk, and Jaros{\l}aw Paw{\l}owski	(参考訳) 本稿では,ディープニューラルネットワークを用いた材料モデリングのための汎用フレームワークを提案する。多次元特性(測定を模倣する)で表される材料は、教師なしの方法で神経オートエンコーダモデルを訓練するために使用される。エンコーダは、デコーダ部分で使用される理論モデルの材料パラメータを予測しようとしている。デコーダは予測パラメータを用いて入力特性を再構成する。ニューラルモデルは、様々な物質的振る舞いをカバーできる合成的に生成された特性の集合を捉えるように訓練され、単一の測定のためにモデルパラメータを最適化するのではなく、基礎となる物理学を一般化できるモデルへと導かれる。モデルの設定後、周波数領域と電流領域の磁性物質を同時にモデル化する複雑な問題において、その有用性を証明する。 We present a general framework for modeling materials using deep neural networks. Material represented by multidimensional characteristics (that mimic measurements) is used to train the neural autoencoder model in an unsupervised manner. The encoder is trying to predict the material parameters of a theoretical model, which is then used in a decoder part. The decoder, using the predicted parameters, reconstructs the input characteristics. The neural model is trained to capture a synthetically generated set of characteristics that can cover a broad range of material behaviors, leading to a model that can generalize on the underlying physics rather than just optimize the model parameters for a single measurement. After setting up the model we prove its usefulness in the complex problem of modeling magnetic materials in the frequency and current (out-of-linear range) domains simultaneously.	翻訳日:2024-01-03 19:05:56 公開日:2023-12-29
# 可変相互作用によるボソニックダイナミクスの有限性決定 Deciding finiteness of bosonic dynamics with tunable interactions ( http://arxiv.org/abs/2401.00069v1 ) ライセンス: Link先を確認	David Edward Bruschi, Andr\'e Xuereb and Robert Zeier	(参考訳) この研究では、ボソニック量子力学の分解に動機付けられ、対応するリー代数(無限次元かもしれない)を研究する。このような因子分解を特徴付けるために、これらのリー代数の条件を有限次元とする。各自由ハミルトン項がそれ自体が生成リー代数の元である場合を考える。提案手法では,スキュー・エルミートボソニック作用素を適切な部分空間に体系的に分割し,リー代数自体の次元を測るために用いられるスキュー・エルミート作用素の特定の列を構成する新しいツールを開発する。この結果の意義は、特定のハミルトニアンの独立制御生成子のみを制約する条件に依存するため、生成されたリー代数の有限性を検証する効果的なアルゴリズムを提供する。さらに、この結果は、生成および消滅作用素の多項式をワイル代数(weyl algebra)と呼ぶ数学的仕事と密接に結びついている。私たちの研究は、量子制御と量子技術に関連するボソニックダイナミクスの分解をよりよく理解するための道を開くものです。 In this work we are motivated by factorization of bosonic quantum dynamics and we study the corresponding Lie algebras, which can potentially be infinite dimensional. To characterize such factorization, we identify conditions for these Lie algebras to be finite dimensional. We consider cases where each free Hamiltonian term is itself an element of the generated Lie algebra. In our approach, we develop new tools to systematically divide skew-hermitian bosonic operators into appropriate subspaces, and construct specific sequences of skew-hermitian operators that are used to gauge the dimensionality of the Lie algebras themselves. The significance of our result relies on conditions that constrain only the independently controlled generators in a particular Hamiltonian, thereby providing an effective algorithm for verifying the finiteness of the generated Lie algebra. In addition, our results are tightly connected to mathematical work where the polynomials of creation and annihilation operators are known as the Weyl algebra. Our work paves the way for better understanding factorization of bosonic dynamics relevant to quantum control and quantum technology.	翻訳日:2024-01-03 19:05:42 公開日:2023-12-29
# 任意領域における粒子形状モデリング Particle-Based Shape Modeling for Arbitrary Regions-of-Interest ( http://arxiv.org/abs/2401.00067v1 ) ライセンス: Link先を確認	Hong Xu, Alan Morris, Shireen Y. Elhabian	(参考訳) 統計的形状モデリング (SSM) は解剖学的構造の形態変化を定量的に解析する手法である。これらの分析は、特定の形態学的特徴に焦点を当てるために、対象の解剖学的領域の建築モデルを必要とすることが多い。任意の領域の形状モデリングを可能にするために,広く使用されているssmフレームワークである \particle-based shape modeling (psm) の拡張を提案する。興味のある領域を定義する既存の方法は計算コストが高く、トポロジカルな制限がある。これらの欠点に対処するために、メッシュフィールドを使用して自由形式の制約を定義し、任意の領域の関心を形状面に分割することができる。さらに,モデル最適化に二次ペナルティ法を付加することにより,切削面と自由形式の制約の組み合わせを計算効率良く実行できるようにする。本手法の有効性を,難解な合成データセットと2つの医学データセットに示す。 Statistical Shape Modeling (SSM) is a quantitative method for analyzing morphological variations in anatomical structures. These analyses often necessitate building models on targeted anatomical regions of interest to focus on specific morphological features. We propose an extension to \particle-based shape modeling (PSM), a widely used SSM framework, to allow shape modeling to arbitrary regions of interest. Existing methods to define regions of interest are computationally expensive and have topological limitations. To address these shortcomings, we use mesh fields to define free-form constraints, which allow for delimiting arbitrary regions of interest on shape surfaces. Furthermore, we add a quadratic penalty method to the model optimization to enable computationally efficient enforcement of any combination of cutting-plane and free-form constraints. We demonstrate the effectiveness of this method on a challenging synthetic dataset and two medical datasets.	翻訳日:2024-01-03 19:05:23 公開日:2023-12-29
# 新規金属合金の3d印刷のための加速プロセス開発 Accelerating Process Development for 3D Printing of New Metal Alloys ( http://arxiv.org/abs/2401.00065v1 ) ライセンス: Link先を確認	David Guirguis, Conrad Tucker, Jack Beuth	(参考訳) 3Dプリントされた金属の品質の不確実性や変動に対処することで、この技術が広く使われるようになる。新しい合金のプロセスマッピングは、許容できる印刷品質を一貫して生み出す最適なプロセスパラメータを決定するために不可欠である。プロセスマッピングは通常、従来の方法で行われ、実験の設計や印刷部品のex situ characterizationに使用される。一方,In situ手法は観測可能な特徴が限られており,精度を高めるためには複雑な高コスト設定が必要であるため,制限されている。ビデオビジョントランスと高速イメージングを用いたレーザ-金属相互作用における溶融金属力学の時間的特徴を取り入れることで,これらの制約を緩和する。我々の手法は既存の商用機械で利用でき、効率的な欠陥と変数の定量化のためのその場プロセスマップを提供することができる。本手法の汎用性は, 組成や内在性熱流動特性の異なる合金に対して, クロスデータセット評価を行うことによって実証される。 Addressing the uncertainty and variability in the quality of 3D printed metals can further the wide spread use of this technology. Process mapping for new alloys is crucial for determining optimal process parameters that consistently produce acceptable printing quality. Process mapping is typically performed by conventional methods and is used for the design of experiments and ex situ characterization of printed parts. On the other hand, in situ approaches are limited because their observable features are limited and they require complex high-cost setups to obtain temperature measurements to boost accuracy. Our method relaxes these limitations by incorporating the temporal features of molten metal dynamics during laser-metal interactions using video vision transformers and high-speed imaging. Our approach can be used in existing commercial machines and can provide in situ process maps for efficient defect and variability quantification. The generalizability of the approach is demonstrated by performing cross-dataset evaluations on alloys with different compositions and intrinsic thermofluid properties.	翻訳日:2024-01-03 19:05:10 公開日:2023-12-29
# 排他性グラフによるハイブリッド因果構造の特徴付け Characterizing Hybrid Causal Structures with the Exclusivity Graph Approach ( http://arxiv.org/abs/2401.00063v1 ) ライセンス: Link先を確認	Giovanni Rodari, Davide Poderini, Emanuele Polino, Alessia Suprano, Fabio Sciarrino, Rafael Chaves	(参考訳) 一般因果構造によって制約された相関集合の幾何解析は基礎的・量子的技術研究において最も重要なものである。この課題に対処することは一般的に困難であり、異なるシナリオのための多様な理論的手法の開発を促す。近年, 因果構造の異なる部分における異なる因果仮定を組み合わせた新たなハイブリッドシナリオが出現している。本研究では,古典的,量子的,非シグナリングな分布を,古典的因果制約や弱い非シグナリングが因果構造の異なるノードに使用されるハイブリッドシナリオにおいて探索するために,グラフ理論手法を拡張した。そのような因果関係を無向グラフにマッピングすることで、対応する分布の集合を特徴付け、それらの関係を分析することができる。特に本手法では,古典的,量子的,無信号的動作を同時に区別できるベル的不等式を最小化し,対応する境界を効率的に推定する方法を示す。この手法は量子ネットワークの研究や量子情報処理への応用のための強力なツールである。 Analyzing the geometry of correlation sets constrained by general causal structures is of paramount importance for foundational and quantum technology research. Addressing this task is generally challenging, prompting the development of diverse theoretical techniques for distinct scenarios. Recently, novel hybrid scenarios combining different causal assumptions within different parts of the causal structure have emerged. In this work, we extend a graph theoretical technique to explore classical, quantum, and no-signaling distributions in hybrid scenarios, where classical causal constraints and weaker no-signaling ones are used for different nodes of the causal structure. By mapping such causal relationships into an undirected graph we are able to characterize the associated sets of compatible distributions and analyze their relationships. In particular we show how with our method we can construct minimal Bell-like inequalities capable of simultaneously distinguishing classical, quantum, and no-signaling behaviors, and efficiently estimate the corresponding bounds. The demonstrated method will represent a powerful tool to study quantum networks and for applications in quantum information tasks.	翻訳日:2024-01-03 19:04:55 公開日:2023-12-29
# 組織効果のための意味コンピューティング--組織理論から意味論的モデリングまで Semantic Computing for Organizational Effectiveness: From Organization Theory to Practice through Semantics-Based Modelling ( http://arxiv.org/abs/2401.00062v1 ) ライセンス: Link先を確認	Mena Rizk, Daniela Rosu, Mark Fox	(参考訳) 組織の重要な機能は、その目的を達成するために必要な統合のレベル(調整と協力)を育むことである。協力するための調整とモチベーションの必要性は、組織のメンバとその作業との間の無数の依存関係から生まれます。したがって、協調と協力の問題に対する解決策を推論するには、基礎となる依存関係を含む堅牢な表現が必要である。このような表現が正式な組織モデルから欠落していることが分かっています。確立された組織研究と、北米最大の自治体との広範囲にわたるフィールドワークに基づいて、(1) 結果、報酬、疫学依存といった概念を運用するオントロジーを導入し、その統合リスクとの関連性、(2) 複雑な政府インフラプロジェクトにおける統合を分析・支援するためのこのオントロジーの現実的応用について述べる。オントロジーはZ3とOWLの両方で実装・検証されている。モデルの主な特徴は、推論可能な依存関係、説明可能な協調と協力のリスク、リスクを軽減するために組織内の依存関係構造をどのように変更できるかに関するアクション可能な洞察などです。インセンティブのミスアライメントやフリーライディング,サブゴール最適化といった現実的な課題を依存性構造の観点から概念化する上で,セマンティクスに基づくアプローチは,協調と協力をモデル化し,強化するための新しい手法である。意思決定支援システムに統合されたこのモデルは、組織設計と有効性に影響を及ぼす助けとなるかもしれない。より広範に、我々のアプローチは、既存の組織理論から有形で現実的な価値を導き出す意味論の変革の可能性を強調している。 A critical function of an organization is to foster the level of integration (coordination and cooperation) necessary to achieve its objectives. The need to coordinate and motivation to cooperate emerges from the myriad dependencies between an organization's members and their work. Therefore, to reason about solutions to coordination and cooperation problems requires a robust representation that includes the underlying dependencies. We find that such a representation remains missing from formal organizational models, and we leverage semantics to bridge this gap. Drawing on well-established organizational research and our extensive fieldwork with one of North America's largest municipalities, (1) we introduce an ontology, formalized in first-order logic, that operationalizes concepts like outcome, reward, and epistemic dependence, and their links to potential integration risks; and (2) present real-world applications of this ontology to analyze and support integration in complex government infrastructure projects. Our ontology is implemented and validated in both Z3 and OWL. Key features of our model include inferable dependencies, explainable coordination and cooperation risks, and actionable insights on how dependency structures within an organization can be altered to mitigate the risks. Conceptualizing real-world challenges like incentive misalignment, free-riding, and subgoal optimization in terms of dependency structures, our semantics-based approach represents a novel method for modelling and enhancing coordination and cooperation. Integrated within a decision-support system, our model may serve as an impactful aid for organizational design and effectiveness. More broadly, our approach underscores the transformative potential of semantics in deriving tangible, real-world value from existing organization theory.	翻訳日:2024-01-03 19:04:37 公開日:2023-12-29
# 界面を透過する波動と粒子:可逆性とコヒーレンス Transmission of waves and particles through the interface: reversibility and coherence ( http://arxiv.org/abs/2401.00059v1 ) ライセンス: Link先を確認	A.P. Meilakhs	(参考訳) 本稿では, 量子粒子(フォノン, 電子, 光子)の界面透過性について検討し, 様々な物理シナリオにおける普遍パターンを同定する。古典波動方程式から始め、それらを量子化し、運動方程式を導出する。これらは界面における粒子の分布関数のマッチング条件である。熱輸送のような不可逆過程を正確に記述するための重要な特徴である、導出方程式の時間的不可逆性に留意する。我々は, 波動方程式の時間対称性が乱れ, 入射波の非コヒーレンスを仮定して, 導出の分岐を同定する。その結果,非コヒーレント伝送が時間的不可逆性を示すことがわかった。我々はこの仮説を検証する実験を提案する。 We examine the transmission of quantum particles (phonons, electrons, and photons) across interfaces, identifying universal patterns in diverse physical scenarios. Starting with classical wave equations, we quantize them and derive kinetic equations. Those are matching conditions for the distribution functions of particles at the interface. We note the time irreversibility of the derived kinetic equations -- an essential feature for accurately describing irreversible processes like heat transport. We identify the juncture in our derivation where the time symmetry of wave equations is disrupted, it is the assumption of the non-coherence of incident waves. Consequently, we infer that non-coherent transmission through the interface exhibits time irreversibility. We propose an experiment to validate this hypothesis.	翻訳日:2024-01-03 19:04:06 公開日:2023-12-29
# 共有経済の言語を探る:ドイツ語と英語のAirbnbに対する信頼の構築とプライバシー上の懸念を減らす Exploring the language of the sharing economy: Building trust and reducing privacy concern on Airbnb in German and English ( http://arxiv.org/abs/2401.00058v1 ) ライセンス: Link先を確認	Alex Zarifis, Richard Ingham and Julia Kroenung	(参考訳) イングランドで英語、ドイツでドイツ語でプロパティを提供する人のプロフィールにあるテキストは、信頼が構築されているかどうかを調査するために比較され、プライバシーに関する懸念も同様に減少する。信頼構築の方法は,(1)形式性,(2)距離と近接性,(3)動機づけとユーモア,(4)主張的かつ受動的であること,(5)プラットフォーム言語スタイルと用語に適合すること,(6)境界を設定すること,の6つである。プライバシーの懸念は通常、プラットフォームに残されているため、直接的に軽減されることはない。その結果,言語の影響は限定的であり,プラットフォーム規範や習慣が最大の影響を受けていることが示唆された。 The text in the profile of those offering their properties in England in English and in Germany in German, are compared to explore whether trust is built, and privacy concerns are reduced in the same way. Six methods of building trust are used by the landlords: (1) the level of formality, (2) distance and proximity, (3) emotiveness and humor, (4) being assertive and passive aggressive, (5) conformity to the platform language style and terminology and (6) setting boundaries. Privacy concerns are not usually reduced directly as this is left to the platform. The findings indicate that language has a limited influence and the platform norms and habits are the biggest influence.	翻訳日:2024-01-03 19:03:55 公開日:2023-12-29
# コントラスト世界モデルの一般化特性 Generalization properties of contrastive world models ( http://arxiv.org/abs/2401.00057v1 ) ライセンス: Link先を確認	Kandan Ramakrishnan, R. James Cotton, Xaq Pitkow, Andreas S. Tolias	(参考訳) オブジェクト中心の世界モデルに関する最近の研究は、完全に教師なしまたは自己管理的な方法で、オブジェクトの観点で表現を分解することを目的としている。このような世界モデルは一般化問題に対処する重要な要素であると仮定されている。しかし、自己スーパービジョンでは性能が向上しているが、OODの一般化は体系的にも明示的にもテストされていない。本稿では、対照的世界モデルの一般化特性について広範な研究を行う。我々は、新しいオブジェクト属性への外挿、新しい結合や新しい属性の導入など、様々なOOD一般化シナリオの下で、モデルを体系的にテストする。実験の結果, 対照的な世界モデルでは, 異なるOODテストの下では一般化できず, サンプルがOODの程度によって性能が低下することがわかった。遷移の更新と畳み込みの特徴マップを視覚化すると、オブジェクトの属性の変化(以前は目に見えない色、形、色と形の組み合わせなど)が、オブジェクトの表現の分解を分解するのを観察する。我々の研究は、一般化のためのオブジェクト指向表現の重要性を強調し、現在のモデルは人間レベルの一般化に必要な表現を学ぶ能力に制限されている。 Recent work on object-centric world models aim to factorize representations in terms of objects in a completely unsupervised or self-supervised manner. Such world models are hypothesized to be a key component to address the generalization problem. While self-supervision has shown improved performance however, OOD generalization has not been systematically and explicitly tested. In this paper, we conduct an extensive study on the generalization properties of contrastive world model. We systematically test the model under a number of different OOD generalization scenarios such as extrapolation to new object attributes, introducing new conjunctions or new attributes. Our experiments show that the contrastive world model fails to generalize under the different OOD tests and the drop in performance depends on the extent to which the samples are OOD. When visualizing the transition updates and convolutional feature maps, we observe that any changes in object attributes (such as previously unseen colors, shapes, or conjunctions of color and shape) breaks down the factorization of object representations. Overall, our work highlights the importance of object-centric representations for generalization and current models are limited in their capacity to learn such representations required for human-level generalization.	翻訳日:2024-01-03 19:03:42 公開日:2023-12-29
# 集団行動によるオンラインアルゴリズムリコース Online Algorithmic Recourse by Collective Action ( http://arxiv.org/abs/2401.00055v1 ) ライセンス: Link先を確認	Elliot Creager and Richard Zemel	(参考訳) アルゴリズムに関する研究は通常、固定された意思決定システムと対話する際に、個人が好ましくない自動決定を合理的に変更する方法を考える。本稿では,データ主体とのインタラクションに応じてシステムパラメータが動的に更新されるオンライン環境に着目した。一般的な個人レベルのリコースを超えて、オンライン設定では、パラメータ更新ルールを活用することで、グループがシステム決定を形作る新しい方法が開かれる。我々は,ユーザが協調して特徴の摂動を計算し,悪質な自動決定の緩和における集団的行動の重要性を強調することで,リコースが改善されることを示す。 Research on algorithmic recourse typically considers how an individual can reasonably change an unfavorable automated decision when interacting with a fixed decision-making system. This paper focuses instead on the online setting, where system parameters are updated dynamically according to interactions with data subjects. Beyond the typical individual-level recourse, the online setting opens up new ways for groups to shape system decisions by leveraging the parameter update rule. We show empirically that recourse can be improved when users coordinate by jointly computing their feature perturbations, underscoring the importance of collective action in mitigating adverse automated decisions.	翻訳日:2024-01-03 19:03:22 公開日:2023-12-29
# ChatEd:ChatGPTを活用した高等教育用チャットボット ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education ( http://arxiv.org/abs/2401.00052v1 ) ライセンス: Link先を確認	Kevin Wang, Jason Ramos, Ramon Lawrence	(参考訳) 自然言語処理(NLP)の急速な進化に伴い、ChatGPTのような大規模言語モデル(LLM)は、様々な分野を変革できる強力なツールとして登場した。その膨大な知識ベースと動的相互作用能力は、パーソナライズされたアシスタントとして運営することで教育を改善する重要な可能性を示している。しかし,LLMを教育現場に展開する際には,誤った,偏見のある,あるいは不快な回答が生まれる可能性も大きな課題である。この研究は、ChatGPTの強みと従来の情報検索ベースのチャットボットフレームワークを組み合わせて、高等教育における学生支援を強化する革新的なアーキテクチャを導入する。私たちの経験的評価は、このアプローチの高い期待を裏付けています。 With the rapid evolution of Natural Language Processing (NLP), Large Language Models (LLMs) like ChatGPT have emerged as powerful tools capable of transforming various sectors. Their vast knowledge base and dynamic interaction capabilities represent significant potential in improving education by operating as a personalized assistant. However, the possibility of generating incorrect, biased, or unhelpful answers are a key challenge to resolve when deploying LLMs in an education context. This work introduces an innovative architecture that combines the strengths of ChatGPT with a traditional information retrieval based chatbot framework to offer enhanced student support in higher education. Our empirical evaluations underscore the high promise of this approach.	翻訳日:2024-01-03 19:03:11 公開日:2023-12-29
# 期待分割関数と連続最適化によるMessengerと非コーディングRNAの設計 Messenger and Non-Coding RNA Design via Expected Partition Function and Continuous Optimization ( http://arxiv.org/abs/2401.00037v1 ) ライセンス: Link先を確認	Ning Dai, Wei Yu Tang, Tianshuo Zhou, David H. Mathews, Liang Huang	(参考訳) メッセンジャーRNAと非コーディングRNAを設計するタスクは離散最適化の問題であり、これらの問題のいくつかのバージョンはNPハードである。一般的な局所探索法に代わるものとして,これらの問題を連続最適化として定式化し,「期待分割関数」という新しい概念に基づく最適化のための汎用フレームワークを開発する。基本的な考え方は、可能な全ての候補列にまたがる分布から始め、目的関数を系列から分布へと拡張することである。次に,勾配勾配に基づく最適化法を用いて拡張対象関数を改良し,分布は徐々に1つのホットシーケンス(すなわち1つのシーケンス)へと縮小する。この枠組みにおける2つの重要なケーススタディとして、分配関数(すなわちアンサンブル自由エネルギー)を最適化するmRNA設計問題と、条件付き(すなわちボルツマン)確率を最適化する非コーディングRNA設計問題を考える。いずれの場合も,本手法は有望な予備結果を示す。コードはhttps://github.com/kunyaa/rna_design_codebaseで利用可能です。 The tasks of designing messenger RNAs and non-coding RNAs are discrete optimization problems, and several versions of these problems are NP-hard. As an alternative to commonly used local search methods, we formulate these problems as continuous optimization and develop a general framework for this optimization based on a new concept of "expected partition function". The basic idea is to start with a distribution over all possible candidate sequences, and extend the objective function from a sequence to a distribution. We then use gradient descent-based optimization methods to improve the extended objective function, and the distribution will gradually shrink towards a one-hot sequence (i.e., a single sequence). We consider two important case studies within this framework, the mRNA design problem optimizing for partition function (i.e., ensemble free energy) and the non-coding RNA design problem optimizing for conditional (i.e., Boltzmann) probability. In both cases, our approach demonstrate promising preliminary results. We make our code available at https://github.com/KuNyaa/RNA_Design_codebase.	翻訳日:2024-01-03 19:03:00 公開日:2023-12-29
# 離散分布ネットワーク Discrete Distribution Networks ( http://arxiv.org/abs/2401.00036v1 ) ライセンス: Link先を確認	Lei Yang	(参考訳) 本稿では,階層的離散分布を用いたデータ分布を近似する新しい生成モデルである離散分布ネットワーク(ddn)を提案する。ネットワーク内の機能は本質的に分布情報を含むため,単一出力からネットワークを解放し,複数のサンプルを同時に生成することは極めて有効である。したがって、ddnは複数の離散的なサンプルポイントを生成して、連続的な分布を含む目標分布に適合する。 DDNは、ターゲットデータのより詳細な情報をキャプチャするために、第1層で生成された粗い結果から、GTに最も近い出力を選択する。この選択された出力は、第2層の条件としてネットワークにフィードバックされ、GTに類似した新しい出力を生成する。 DDN層の数が増加するにつれて、出力の表現空間は指数関数的に拡大し、生成したサンプルはGTに近づきつつある。この離散分布の階層的な出力パターンはDDNに2つの興味深い性質を与える:高度に圧縮された表現とより一般的なゼロショット条件生成である。 CIFAR-10 および FFHQ における実験により,DDN の有効性とこれらの興味深い特性を実証した。 We introduce a novel generative model, the Discrete Distribution Networks (DDN), that approximates data distribution using hierarchical discrete distributions. We posit that since the features within a network inherently contain distributional information, liberating the network from a single output to concurrently generate multiple samples proves to be highly effective. Therefore, DDN fits the target distribution, including continuous ones, by generating multiple discrete sample points. To capture finer details of the target data, DDN selects the output that is closest to the Ground Truth (GT) from the coarse results generated in the first layer. This selected output is then fed back into the network as a condition for the second layer, thereby generating new outputs more similar to the GT. As the number of DDN layers increases, the representational space of the outputs expands exponentially, and the generated samples become increasingly similar to the GT. This hierarchical output pattern of discrete distributions endows DDN with two intriguing properties: highly compressed representation and more general zero-shot conditional generation. We demonstrate the efficacy of DDN and these intriguing properties through experiments on CIFAR-10 and FFHQ.	翻訳日:2024-01-03 19:02:42 公開日:2023-12-29
# 複雑力学系のモデルにおける構造誤差の学習 Learning About Structural Errors in Models of Complex Dynamical Systems ( http://arxiv.org/abs/2401.00035v1 ) ライセンス: Link先を確認	Jin-Long Wu, Matthew E. Levine, Tapio Schneider, Andrew Stuart	(参考訳) 複雑な力学系は、いくつかの自由度(例えば、小さなスケール)が計算的に解決できない、あるいは完全に理解されていないため、モデル化が難しいことが知られている。例えば、雲の動力学と液滴形成の小さなスケールは気候の制御に不可欠であるが、地球規模の気候モデルでは解決できない。未解決の自由度の影響に対する半経験的閉包モデルはしばしば存在し、重要なドメイン固有の知識をエンコードする。このようなクロージャモデルの構築と構造的エラーの学習による修正は、ドメイン知識とデータを融合する効果的な方法になり得る。ここでは,構造的エラーについて学ぶための一般的なアプローチ,原則,アルゴリズムについて述べる。このアプローチの鍵となるのは、例えば未解決スケールのクロージャモデルにおいて、複雑なシステムのモデル内に構造的エラーモデルを含めることです。構造誤差は通常非線形に観測可能なデータにマップされる。しかしながら、モデル出力とデータ間のミスマッチは、ラベル付き入力ペアの欠如と構造誤差モデルの出力不足により、構造誤差について間接的にのみ通知される。さらに、モデルの微分は存在せず、容易に利用することができる。構造的エラーモデルについて,デリバティブフリーなカルマン逆アルゴリズムと変種を用いて間接データから学習する方法,スパーシティ制約が「害を及ぼさない」原理をいかに強制するか,構造的エラーのモデル化方法について論じる。また,非局所的および確率的誤差モデルを用いるメリットについても考察する。さらに,データ同化技術が非エルゴディックシステムにおける構造的誤りの学習を支援することを示す。概念とアルゴリズムは、Lorenz-96システムとヒトグルコース-インスリンモデルに基づく2つの数値例で示される。 Complex dynamical systems are notoriously difficult to model because some degrees of freedom (e.g., small scales) may be computationally unresolvable or are incompletely understood, yet they are dynamically important. For example, the small scales of cloud dynamics and droplet formation are crucial for controlling climate, yet are unresolvable in global climate models. Semi-empirical closure models for the effects of unresolved degrees of freedom often exist and encode important domain-specific knowledge. Building on such closure models and correcting them through learning the structural errors can be an effective way of fusing data with domain knowledge. Here we describe a general approach, principles, and algorithms for learning about structural errors. Key to our approach is to include structural error models inside the models of complex systems, for example, in closure models for unresolved scales. The structural errors then map, usually nonlinearly, to observable data. As a result, however, mismatches between model output and data are only indirectly informative about structural errors, due to a lack of labeled pairs of inputs and outputs of structural error models. Additionally, derivatives of the model may not exist or be readily available. We discuss how structural error models can be learned from indirect data with derivative-free Kalman inversion algorithms and variants, how sparsity constraints enforce a "do no harm" principle, and various ways of modeling structural errors. We also discuss the merits of using non-local and/or stochastic error models. In addition, we demonstrate how data assimilation techniques can assist the learning about structural errors in non-ergodic systems. The concepts and algorithms are illustrated in two numerical examples based on the Lorenz-96 system and a human glucose-insulin model.	翻訳日:2024-01-03 19:02:23 公開日:2023-12-29
# バンド内遷移による強超高速消磁 Strong ultrafast demagnetization due to the intraband transitions ( http://arxiv.org/abs/2401.00099v1 ) ライセンス: Link先を確認	Mitsuko Murakami and G. P. Zhang	(参考訳) フェムト秒レーザーパルスによる強磁性遷移金属の脱磁は固体物理学における根本的な問題であり、スピントロニクスデバイスの開発にはその理解が不可欠である。速度ゲージにおける時間依存磁気モーメントのab initio計算は、実験で観測された大量の消磁を再現することには成功していない。本研究では,結晶運動量空間内の対流微分を通じてバンド内遷移を速度ゲージ内に組み込む手法を提案する。時間依存性の量子Liouville方程式に基づく遷移元素バルク結晶(bccFe,hcpCo,fccNi)に対する実験結果から,バンド内項の挿入後の非磁性化量の劇的な増大が得られた。また,各強磁性材料へのバンド内遷移の効果は,バンド構造とスピン特性の違いにより大きく異なることがわかった。我々の発見は超高速磁化の理解に大きく影響している。 Demagnetization in ferromagnetic transition metals driven by a femtosecond laser pulse is a fundamental problem in solid state physics, and its understanding is essential to the development of spintronics devices. Ab initio calculation of time-dependent magnetic moment in the velocity gauge so far has not been successful in reproducing the large amount of demagnetization observed in experiments. In this work, we propose a method to incorporate intraband transitions within the velocity gauge through a convective derivative in the crystal momentum space. Our results for transition-element bulk crystals (bcc Fe, hcp Co and fcc Ni) based on the time-dependent quantum Liouville equation show a dramatic enhancement in the amount of demagnetization after the inclusion of an intraband term, in agreement with experiments. We also find that the effect of intraband transitions to each ferromagnetic material is distinctly different because of their band structure and spin property differences. Our finding has a far-reaching impact on understanding of ultrafast demagnetization.	翻訳日:2024-01-03 18:52:53 公開日:2023-12-29
# ブラジルのシナリオにおける自動評価 Automatic Essay Scoring in a Brazilian Scenario ( http://arxiv.org/abs/2401.00095v1 ) ライセンス: Link先を確認	Felipe Akio Matsuoka	(参考訳) 本稿では,ブラジルのExame Nacional do Ensino M\'edio(ENEM)のポルトガル語エッセイに合わせた,AES(Automatic Essay Scoring)アルゴリズムを提案する。提案手法は,高度な深層学習技術を活用して,学生エッセイの大量評価における効率性とスケーラビリティを目標とした,人間の評価基準に忠実に整合する。この研究はブラジルの教育アセスメントにおける手動採点の物流的および財政的な制約に応えるだけでなく、スコアリングの公平性と一貫性を高めることを約束しており、大規模な学術分野におけるaesの適用において大きな一歩を踏み出した。 This paper presents a novel Automatic Essay Scoring (AES) algorithm tailored for the Portuguese-language essays of Brazil's Exame Nacional do Ensino M\'edio (ENEM), addressing the challenges in traditional human grading systems. Our approach leverages advanced deep learning techniques to align closely with human grading criteria, targeting efficiency and scalability in evaluating large volumes of student essays. This research not only responds to the logistical and financial constraints of manual grading in Brazilian educational assessments but also promises to enhance fairness and consistency in scoring, marking a significant step forward in the application of AES in large-scale academic settings.	翻訳日:2024-01-03 18:52:35 公開日:2023-12-29
# 言語に基づく物体検出訓練のための強化否定値の生成 Generating Enhanced Negatives for Training Language-Based Object Detectors ( http://arxiv.org/abs/2401.00094v1 ) ライセンス: Link先を確認	Shiyu Zhao, Long Zhao, Vijay Kumar B.G, Yumin Suh, Dimitris N. Metaxas, Manmohan Chandraker, Samuel Schulter	(参考訳) 言語ベースのopen-vocabulary object detectionの最近の進歩は、フリーフォームのテキストアノテーションで大規模データを活用するより良い方法を見つけることに起因する。このようなモデルを識別的目的関数で訓練することは成功したが、良い正と負のサンプルを必要とする。しかし、自由形式の性質と対象記述の開語彙は、負の空間を極端に大きくする。事前の作業はランダムに負をサンプリングするか、ルールベースのテクニックを使って構築する。対照的に、我々は、現代の生成モデルに組み込まれた膨大な知識を活用して、元のデータにより関連性のあるネガティブを自動構築することを提案する。具体的には,大きな言語モデルを用いて負のテキスト記述を生成し,テキストから画像への拡散モデルを用いて対応する負のイメージを生成する。実験分析により,生成した負データとの関連性が確認され,言語ベースの検出器での利用により,2つの複雑なベンチマークの性能が向上した。 The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations. Training such models with a discriminative objective function has proven successful, but requires good positive and negative samples. However, the free-form nature and the open vocabulary of object descriptions make the space of negatives extremely large. Prior works randomly sample negatives or use rule-based techniques to build them. In contrast, we propose to leverage the vast knowledge built into modern generative models to automatically build negatives that are more relevant to the original data. Specifically, we use large-language-models to generate negative text descriptions, and text-to-image diffusion models to also generate corresponding negative images. Our experimental analysis confirms the relevance of the generated negative data, and its use in language-based detectors improves performance on two complex benchmarks.	翻訳日:2024-01-03 18:52:19 公開日:2023-12-29
# 配車システムにおけるフェアネスエンハンシング車両のリバランス Fairness-Enhancing Vehicle Rebalancing in the Ride-hailing System ( http://arxiv.org/abs/2401.00093v1 ) ライセンス: Link先を確認	Xiaotong Guo, Hanyong Xu, Dingyi Zhuang, Yunhan Zheng, Jinhua Zhao	(参考訳) 配車産業の急速な成長は、世界中の都市交通に革命をもたらした。利益があるにも拘わらず、未整備のコミュニティは安価な配車サービスへのアクセシビリティが限られているため、株式の懸念が生じる。この文脈で大きな問題は、アイドル車両が需要が予想される地域に移動される、車両のリバランス問題である。需要予測と再バランス戦略の公平なアプローチがなければ、これらのプラクティスは既存の不平等をさらに深めることができる。配車サービスの世界では、アルゴリズムの公正性、ドライバーへの公正性、ライダーへの公正性の3つの主な面が認識されている。本稿では,新しい車両リバランス法を用いて,アルゴリズムとライダーの公平性の向上に着目する。本稿では,需要予測のための社会認識型時空間グラフ畳み込みネットワーク(sa-stgcn)と,それに続く車両リバランスのための公平性統合マッチング統合車両リバランスモデル(mivr)を組み合わせたアプローチを提案する。本手法は, 予測の不一致を低減し, 多様な地域におけるサービス提供の適正化を図る。本システムの有効性を実世界の配車データに基づくシミュレーションを用いて評価する。提案手法は配車需要の予測における正確性と公平性を両立させ,その後の運転においてより公平な車両再バランスを実現することを示唆する。具体的には,本研究で開発したアルゴリズムにより,標準偏差と平均顧客待ち時間をそれぞれ6.48%,0.49%削減した。この成果は配車プラットフォームにとって有益な成果であり、運用効率と公平さのバランスを保っている。 The rapid growth of the ride-hailing industry has revolutionized urban transportation worldwide. Despite its benefits, equity concerns arise as underserved communities face limited accessibility to affordable ride-hailing services. A key issue in this context is the vehicle rebalancing problem, where idle vehicles are moved to areas with anticipated demand. Without equitable approaches in demand forecasting and rebalancing strategies, these practices can further deepen existing inequities. In the realm of ride-hailing, three main facets of fairness are recognized: algorithmic fairness, fairness to drivers, and fairness to riders. This paper focuses on enhancing both algorithmic and rider fairness through a novel vehicle rebalancing method. We introduce an approach that combines a Socio-Aware Spatial-Temporal Graph Convolutional Network (SA-STGCN) for refined demand prediction and a fairness-integrated Matching-Integrated Vehicle Rebalancing (MIVR) model for subsequent vehicle rebalancing. Our methodology is designed to reduce prediction discrepancies and ensure equitable service provision across diverse regions. The effectiveness of our system is evaluated using simulations based on real-world ride-hailing data. The results suggest that our proposed method enhances both accuracy and fairness in forecasting ride-hailing demand, ultimately resulting in more equitable vehicle rebalancing in subsequent operations. Specifically, the algorithm developed in this study effectively reduces the standard deviation and average customer wait times by 6.48% and 0.49%, respectively. This achievement signifies a beneficial outcome for ride-hailing platforms, striking a balance between operational efficiency and fairness.	翻訳日:2024-01-03 18:52:03 公開日:2023-12-29
# アクティブラーニングフレームワークにおける政策管理コストの定量化 Quantifying Policy Administration Cost in an Active Learning Framework ( http://arxiv.org/abs/2401.00086v1 ) ライセンス: Link先を確認	Si Zhang and Philip W. L. Fong	(参考訳) 本稿では,政策管理のための計算モデルを提案する。組織が進化するにつれて、新しいユーザとリソースはアクセス制御モデルの仲介の下に徐々に置かれる。こうした新たなエンティティを追加する度に、ポリシー管理者は、新しい現実を反映してアクセス制御ポリシーをどう修正するかを熟考しなければならない。適切に設計されたアクセス制御モデルは、組織の規模が大きくなると管理コストが禁じられないように、このような変化を予測しなければならない。残念ながら、過去のアクセス制御の研究は、政策管理のコストを定量化する公式な方法を提供していない。本研究では,現在進行中の政策管理を活発な学習フレームワークでモデル化することを提案する。管理コストはクエリの複雑さの観点から定量化できる。保護領域の進化に応用することで,このアプローチの有用性を実証する。また、さまざまな政策管理戦略をフレームワークでモデル化しました。これにより、ポリシーが進化するときにヒューリスティックな推論を使用することにより、ドメインベースのポリシーがアクセス制御行列よりもコスト面で有利であることを正式に示せるようになりました。我々の知る限り、これは、政策検討のコストを調査し、ヒューリスティックな政策管理のコスト優位性を示すために、アクティブな学習フレームワークを使用する最初の試みである。 This paper proposes a computational model for policy administration. As an organization evolves, new users and resources are gradually placed under the mediation of the access control model. Each time such new entities are added, the policy administrator must deliberate on how the access control policy shall be revised to reflect the new reality. A well-designed access control model must anticipate such changes so that the administration cost does not become prohibitive when the organization scales up. Unfortunately, past Access Control research does not offer a formal way to quantify the cost of policy administration. In this work, we propose to model ongoing policy administration in an active learning framework. Administration cost can be quantified in terms of query complexity. We demonstrate the utility of this approach by applying it to the evolution of protection domains. We also modelled different policy administration strategies in our framework. This allowed us to formally demonstrate that domain-based policies have a cost advantage over access control matrices because of the use of heuristic reasoning when the policy evolves. To the best of our knowledge, this is the first work to employ an active learning framework to study the cost of policy deliberation and demonstrate the cost advantage of heuristic policy administration.	翻訳日:2024-01-03 18:51:34 公開日:2023-12-29
# 物質波の交叉ウィグナー関数によるグーイ位相と量子干渉 Gouy phase and quantum interference with cross-Wigner functions for matter-waves ( http://arxiv.org/abs/2401.00083v1 ) ライセンス: Link先を確認	Lucas S. Marinho, Pedro R. Dieguez, Carlos H. S. Vieira and Irismar G. da Paz	(参考訳) グーイ位相は、古典電磁波から物質波、量子光学まで、様々な波動現象を正確に記述するのに必須である。本研究では,相互ウィグナー変換に基づく位相空間法を用いて,相関ガウス波パケットによって特徴付けられる物質波の進化における空間的および時間的干渉を分析する。第一に,初期関数の交叉と自由進化,第二に二重スリット配置による進化を考える。グローバルなグーイ位相を得る波動関数と異なり、クロスウィグナーは進化時間が異なるため、グーイ位相差を取得する。その結果, 時間的相同性相は時間的干渉の正確な説明に重要であることが示唆された。さらに,物質波を用いた二重スリット実験において,空間強度干渉項からクロスウィグナーを再構成するウィグナー関数に基づく手法を提案する。 The Gouy phase is essential for accurately describing various wave phenomena, ranging from classical electromagnetic waves to matter waves and quantum optics. In this work, we employ phase-space methods based on the cross-Wigner transformation to analyze spatial and temporal interference in the evolution of matter waves characterized initially by a correlated Gaussian wave packet. First, we consider the cross-Wigner of the initial function with its free evolution, and second for the evolution through a double-slit arrangement. Different from the wave function which acquires a global Gouy phase, we find that the cross-Wigner acquires a Gouy phase difference due to different evolution times. The results suggest that temporal like-Gouy phases are important for an accurate description of temporal interference. Furthermore, we propose a technique based on the Wigner function to reconstruct the cross-Wigner from the spatial intensity interference term in a double-slit experiment with matter waves.	翻訳日:2024-01-03 18:51:17 公開日:2023-12-29
# 金融における合成データ応用 Synthetic Data Applications in Finance ( http://arxiv.org/abs/2401.00081v1 ) ライセンス: Link先を確認	Vamsi K. Potluru, Daniel Borrajo, Andrea Coletta, Niccol\`o Dalmasso, Yousef El-Laham, Elizabeth Fons, Mohsen Ghassemi, Sriram Gopalakrishnan, Vikesh Gosai, Eleonora Krea\v{c}i\'c, Ganapathy Mani, Saheed Obitayo, Deepak Paramanand, Natraj Raman, Mikhail Solonin, Srijan Sood, Svitlana Vyetrenko, Haibei Zhu, Manuela Veloso, Tucker Balch	(参考訳) 合成データは、金融、ヘルスケア、バーチャルリアリティーなど、さまざまな商用環境で大きな進歩を遂げている。本稿では、金融セクターにおける合成データのプロトタイプ的応用について概観する。これらは、表表、時系列、イベントシリーズ、および市場および小売金融アプリケーションの両方から生じる非構造化を含む、さまざまなデータモダリティをカバーする。金融は高度に規制された産業であるため、合成データはプライバシー、公正性、説明可能性に関連する問題を扱うための潜在的アプローチである。これらのアプリケーションにおける我々のアプローチの品質と有効性を評価するために、様々な指標が利用されます。金融分野の文脈において,合成データのオープンな方向性で結論づける。 Synthetic data has made tremendous strides in various commercial settings including finance, healthcare, and virtual reality. We present a broad overview of prototypical applications of synthetic data in the financial sector and in particular provide richer details for a few select ones. These cover a wide variety of data modalities including tabular, time-series, event-series, and unstructured arising from both markets and retail financial applications. Since finance is a highly regulated industry, synthetic data is a potential approach for dealing with issues related to privacy, fairness, and explainability. Various metrics are utilized in evaluating the quality and effectiveness of our approaches in these applications. We conclude with open directions in synthetic data in the context of the financial domain.	翻訳日:2024-01-03 18:51:03 公開日:2023-12-29
# スポーツシナリオにおける大規模再識別分析 : 臨界点獲得のベテラル A Large-Scale Re-identification Analysis in Sporting Scenarios: the Betrayal of Reaching a Critical Point ( http://arxiv.org/abs/2401.00080v1 ) ライセンス: Link先を確認	David Freire-Obreg\'on, Javier Lorenzo-Navarro, Oliverio J. Santana, Daniel Hern\'andez-Sosa, Modesto Castrill\'on-Santana	(参考訳) 遠距離ランニング競技の参加者を再特定することは、広範囲にわたる距離と絶えず変化する地形のために悩まされることがある。これらの課題を克服するために、ランナーの顔、バイブの数字、衣服を分析するコンピュータビジョン技術が開発されている。しかし,本研究では,様々な事前訓練されたヒト行動認識モデルと損失関数を活用することで,走者再識別(re-ID)のための歩行に基づく新しいアプローチを提案する。提案手法は,超遠距離競技におけるランナーの再識別に有望な結果をもたらすことを示す。さらに, 選手が持久限界に近づいているとき, 異なる人体運動の意義と, リid精度への影響について検討した。本研究は,ランナーの歩行の認識が,激しい疲労の瞬間として定義される競技の臨界点(cp)と,その位置から数km離れたフィニッシュラインが見えてくる地点によってどのように影響を受けるかを検討したものである。このCPがアスリートのリIDの精度をいかに向上させるかを検討することを目的とする。実験の結果,運動選手がアプローチするにつれて,歩行認識が著しく向上する(最大9%のマップ増加)ことが判明した。これは、遠距離競技や長距離監視タスクなど、現実世界のシナリオで歩行認識を利用する可能性を強調している。 Re-identifying participants in ultra-distance running competitions can be daunting due to the extensive distances and constantly changing terrain. To overcome these challenges, computer vision techniques have been developed to analyze runners' faces, numbers on their bibs, and clothing. However, our study presents a novel gait-based approach for runners' re-identification (re-ID) by leveraging various pre-trained human action recognition (HAR) models and loss functions. Our results show that this approach provides promising results for re-identifying runners in ultra-distance competitions. Furthermore, we investigate the significance of distinct human body movements when athletes are approaching their endurance limits and their potential impact on re-ID accuracy. Our study examines how the recognition of a runner's gait is affected by a competition's critical point (CP), defined as a moment of severe fatigue and the point where the finish line comes into view, just a few kilometers away from this location. We aim to determine how this CP can improve the accuracy of athlete re-ID. Our experimental results demonstrate that gait recognition can be significantly enhanced (up to a 9% increase in mAP) as athletes approach this point. This highlights the potential of utilizing gait recognition in real-world scenarios, such as ultra-distance competitions or long-duration surveillance tasks.	翻訳日:2024-01-03 18:50:51 公開日:2023-12-29
# 神経科学研究における操作の成熟度モデル A Maturity Model for Operations in Neuroscience Research ( http://arxiv.org/abs/2401.00077v1 ) ライセンス: Link先を確認	Erik C. Johnson, Thinh T. Nguyen, Benjamin K. Dichter, Frank Zappulla, Montgomery Kosma, Kabilar Gunalan, Yaroslav O. Halchenko, Shay Q. Neufeld, Michael Schirner, Petra Ritter, Maryann E. Martone, Brock Wester, Franco Pestilli, Dimitri Yatsenko	(参考訳) 科学者は活動と目標を拡大するために新しいアプローチを採用しています。神経技術、人工知能、自動化、コラボレーションツールの進歩は、新たな発見を約束する。しかし、他の分野や産業と比較して、神経科学研究所はコラボレーション、再現性、自動化をサポートする主要な技術を採用するのが遅かった。様々な研究チームに対して,自動化された研究ワークフローを実現するためのロードマップを策定する。神経科学研究における5段階能力成熟モデルの構築を提案する。高いレベルの運用成熟を達成するには、新たなテクノロジ対応の方法論が必要です。成熟度モデルは、多分野の神経科学チームにおけるオペレーションの評価とアップグレードのためのガイドラインを提供する。 Scientists are adopting new approaches to scale up their activities and goals. Progress in neurotechnologies, artificial intelligence, automation, and tools for collaboration promises new bursts of discoveries. However, compared to other disciplines and the industry, neuroscience laboratories have been slow to adopt key technologies to support collaboration, reproducibility, and automation. Drawing on progress in other fields, we define a roadmap for implementing automated research workflows for diverse research teams. We propose establishing a five-level capability maturity model for operations in neuroscience research. Achieving higher levels of operational maturity requires new technology-enabled methodologies, which we describe as ``SciOps''. The maturity model provides guidelines for evaluating and upgrading operations in multidisciplinary neuroscience teams.	翻訳日:2024-01-03 18:50:23 公開日:2023-12-29
# サイバーセキュリティにおける説明可能な機械学習のためのテンソルネットワーク Tensor Networks for Explainable Machine Learning in Cybersecurity ( http://arxiv.org/abs/2401.00867v1 ) ライセンス: Link先を確認	Borja Aizpurua, Roman Orus	(参考訳) 本稿では,テンソルネットワークが機械学習アルゴリズムの解法開発にどのように役立つかを示す。具体的には,行列積状態(mps)に基づく教師なしクラスタリングアルゴリズムを開発し,敵生成脅威インテリジェンスの実際のユースケースに適用する。我々の調査は、MPSがオートエンコーダやGANといった従来のディープラーニングモデルと性能面で競合し、よりリッチなモデル解釈能力を提供することを示した。我々のアプローチは、機能的確率、フォン・ノイマンのエントロピー、および相互情報の抽出を自然に促進し、異常の分類のための説得力のある物語を提供し、前例のないレベルの透明性と解釈可能性を促進する。 In this paper we show how tensor networks help in developing explainability of machine learning algorithms. Specifically, we develop an unsupervised clustering algorithm based on Matrix Product States (MPS) and apply it in the context of a real use-case of adversary-generated threat intelligence. Our investigation proves that MPS rival traditional deep learning models such as autoencoders and GANs in terms of performance, while providing much richer model interpretability. Our approach naturally facilitates the extraction of feature-wise probabilities, Von Neumann Entropy, and mutual information, offering a compelling narrative for classification of anomalies and fostering an unprecedented level of transparency and interpretability, something fundamental to understand the rationale behind artificial intelligence decisions.	翻訳日:2024-01-03 15:36:38 公開日:2023-12-29
# リニアオーバーパラメータ化によるプレナードネットワークのブースティング Boosting Pruned Networks with Linear Over-parameterization ( http://arxiv.org/abs/2204.11444v3 ) ライセンス: Link先を確認	Yu Qian, Jian Cao, Xiaoshuang Li, Jie Zhang, Hufei Li, Jue Chen	(参考訳) 構造化プルーニングは、高速な推論のためのチャネル(フィルタ)を減らし、実行時にフットプリントを低くすることでニューラルネットワークを圧縮する。プルーニング後の精度を回復するため、細調整は通常、プルーニングネットワークに適用される。しかし、刈り取られたネットワークに残されているパラメータが少なすぎると、精度を回復するための微調整が困難になる。この課題に対処するため,我々は,まず,刈り込みネットワーク内のコンパクト層を線形に過度にパラメータ化して,微調整パラメータの数を拡大し,さらに微調整後に元の層に再パラメータ化する手法を提案する。具体的には、現在の出力特徴写像を変更しない連続的な畳み込み/直線層を複数有する畳み込み/直線層を等価に拡張する。さらに, 類似性保存知識蒸留を利用して, 過パラメータ化ブロックが対応する高密度層の即時データ-データ類似性を学習し, 特徴学習能力を維持する。提案手法は,CIFAR-10とImageNetで総合的に評価され,バニラ微調整戦略,特に大きな刈り取り率に優れていた。 Structured pruning compresses neural networks by reducing channels (filters) for fast inference and low footprint at run-time. To restore accuracy after pruning, fine-tuning is usually applied to pruned networks. However, too few remaining parameters in pruned networks inevitably bring a great challenge to fine-tuning to restore accuracy. To address this challenge, we propose a novel method that first linearly over-parameterizes the compact layers in pruned networks to enlarge the number of fine-tuning parameters and then re-parameterizes them to the original layers after fine-tuning. Specifically, we equivalently expand the convolution/linear layer with several consecutive convolution/linear layers that do not alter the current output feature maps. Furthermore, we utilize similarity-preserving knowledge distillation that encourages the over-parameterized block to learn the immediate data-to-data similarities of the corresponding dense layer to maintain its feature learning ability. The proposed method is comprehensively evaluated on CIFAR-10 and ImageNet which significantly outperforms the vanilla fine-tuning strategy, especially for large pruning ratio.	翻訳日:2024-01-03 03:35:10 公開日:2023-12-29
# 知識発見のための事前情報を用いた次元削減 Dimension Reduction with Prior Information for Knowledge Discovery ( http://arxiv.org/abs/2111.13646v4 ) ライセンス: Link先を確認	Anh Tuan Bui	(参考訳) 本稿では,高次元データを低次元空間にマッピングする問題を,他の既知の特徴の存在下で解決する。この問題は、ほとんどのアプリケーションによく制御可能/測定可能な機能があるため、科学や工学においてユビキタスである。この問題を解決するため,本稿では,条件付き多次元スケーリング (conditional multidimensional scaling, mds) と呼ばれる幅広い手法を提案する。また,条件付きMDSの目的関数を最適化するアルゴリズムを開発した。このアルゴリズムの収束は穏やかな仮定の下で証明される。条件付きMDSは、親族関係用語、表情、織物、カーブランド認識、シリンダー加工の例で説明される。これらの例は, 従来の次元減少に対する条件付きMDSの利点を示し, 次元削減空間の推定品質を改善し, 可視化と知識発見タスクを簡素化した。この作業用のコンピュータコードは、オープンソースcml Rパッケージで利用可能である。 This paper addresses the problem of mapping high-dimensional data to a low-dimensional space, in the presence of other known features. This problem is ubiquitous in science and engineering as there are often controllable/measurable features in most applications. To solve this problem, this paper proposes a broad class of methods, which is referred to as conditional multidimensional scaling (MDS). An algorithm for optimizing the objective function of conditional MDS is also developed. The convergence of this algorithm is proven under mild assumptions. Conditional MDS is illustrated with kinship terms, facial expressions, textile fabrics, car-brand perception, and cylinder machining examples. These examples demonstrate the advantages of conditional MDS over conventional dimension reduction in improving the estimation quality of the reduced-dimension space and simplifying visualization and knowledge discovery tasks. Computer codes for this work are available in the open-source cml R package.	翻訳日:2024-01-03 03:32:21 公開日:2023-12-29
# 線形計画のための確率的原始双対法の近似最適線形収束 Nearly Optimal Linear Convergence of Stochastic Primal-Dual Methods for Linear Programming ( http://arxiv.org/abs/2111.05530v3 ) ライセンス: Link先を確認	Haihao Lu, Jinwen Yang	(参考訳) 近年,線形プログラミング(LP)における一階法への関心が高まっている。本稿では,lpのような鋭いプライマル・デュアル問題を解くために分散低減と再スタートを用いた確率的アルゴリズムを提案する。提案手法は,鋭いインスタンスを高い確率で解くための線形収束率を示す。さらに,非制約双線形問題に対する効率的な座標ベースの確率オラクルを提案し,これは反復コストが$\mathcal O(1)$であり,既存の決定論的および確率論的アルゴリズムの複雑さを改善する。最後に、得られた線形収束率は、幅広い確率的原始双対法に対してほぼ最適($\log$ 項まで)であることが示される。 There is a recent interest on first-order methods for linear programming (LP). In this paper,we propose a stochastic algorithm using variance reduction and restarts for solving sharp primal-dual problems such as LP. We show that the proposed stochastic method exhibits a linear convergence rate for solving sharp instances with a high probability. In addition, we propose an efficient coordinate-based stochastic oracle for unconstrained bilinear problems, which has $\mathcal O(1)$ per iteration cost and improves the complexity of the existing deterministic and stochastic algorithms. Finally, we show that the obtained linear convergence rate is nearly optimal (upto $\log$ terms) for a wide class of stochastic primal dual methods.	翻訳日:2024-01-03 03:32:05 公開日:2023-12-29
# 課金するか、売るか? LSTM, CNN, オートエンコーダによるEVパックの有効寿命推定 To Charge or to Sell? EV Pack Useful Life Estimation via LSTMs, CNNs, and Autoencoders ( http://arxiv.org/abs/2110.03585v2 ) ライセンス: Link先を確認	Michael Bosello, Carlo Falcomer, Claudio Rossi, Giovanni Pau	(参考訳) 電気自動車(EV)は、より良い性能と快適性を提供することを約束しながら急速に普及している。彼らの成功にもかかわらず、そのコストは依然として課題である。リチウムイオン電池は最も高価なev部品の1つであり、様々な用途でエネルギー貯蔵の標準となっている。バッテリーパックの有効寿命(RUL)を正確に見積もると、再利用が促進され、EVのコストが削減され、持続可能性が改善される。電池パックの残留市場値を定量化するために、正しいRUL推定を用いることができる。そして、顧客は、その価値がまだあるとき、すなわち、ターゲットアプリケーションの寿命が終わる前に、バッテリーを売ることを決定できるので、安全性と信頼性を損なうことなく、第2のドメインで再利用することができる。本稿では,Liイオン電池のRUL(LSTMとオートエンコーダ対CNNとオートエンコーダ)を推定するための2つのディープラーニング手法を提案し,比較する。オートエンコーダは有用な特徴を抽出するために使用され、その後のネットワークはRULを推定するために使用される。これまでの文献で提案されているものと比較して,本手法が実際にデプロイされたアプリケーションに適用可能であることを保証するための対策を講じている。このような対策としては,(1)非可測変数を入力として使用するのを避けること,(2)広い変数と異なる条件の適切なデータセットを使用すること,(3)サイクル数ではなく残時間を予測すること,などがあげられる。その結果,提案手法は,ばらつきの多い多数の電池からなるデータセットを一般化できることがわかった。 Electric vehicles (EVs) are spreading fast as they promise to provide better performance and comfort, but above all, to help face climate change. Despite their success, their cost is still a challenge. Lithium-ion batteries are one of the most expensive EV components, and have become the standard for energy storage in various applications. Precisely estimating the remaining useful life (RUL) of battery packs can encourage their reuse and thus help to reduce the cost of EVs and improve sustainability. A correct RUL estimation can be used to quantify the residual market value of the battery pack. The customer can then decide to sell the battery when it still has a value, i.e., before it exceeds the end of life of the target application, so it can still be reused in a second domain without compromising safety and reliability. This paper proposes and compares two deep learning approaches to estimate the RUL of Li-ion batteries: LSTM and autoencoders vs. CNN and autoencoders. The autoencoders are used to extract useful features, while the subsequent network is then used to estimate the RUL. Compared to what has been proposed so far in the literature, we employ measures to ensure the method's applicability in the actual deployed application. Such measures include (1) avoiding using non-measurable variables as input, (2) employing appropriate datasets with wide variability and different conditions, and (3) predicting the remaining ampere-hours instead of the number of cycles. The results show that the proposed methods can generalize on datasets consisting of numerous batteries with high variance.	翻訳日:2024-01-03 03:31:29 公開日:2023-12-29
# ベントニックAUV調査計画のための特徴空間探査 Feature Space Exploration For Planning Initial Benthic AUV Surveys ( http://arxiv.org/abs/2105.11598v2 ) ライセンス: Link先を確認	Jackson Shields, Oscar Pizarro, Stefan B. Williams	(参考訳) 特別目的自律水中車両(AUV)は、海底の光学画像を収集するベントニック(海底)調査に使用される。カメラのフットプリントが小さく、調査対象地域が広いため、これらのAUVは数万平方メートルを超える領域の完全なカバレッジ画像を収集できない。そのため,AUV経路のサンプル採取は少ないが,効果的に行う必要がある。広帯域の音響浴量測定データは広い範囲で容易に利用でき、しばしば海底覆いに先立って有用である。そのため、AUVデータ収集のガイドには、事前の浴量測定が使用できる。本研究は,多種多様な水浴場から試料を採取するために,水浴計の特徴空間表現を効率的に探索する初期auvサーベイの計画手法を提案する。これにより、AUVは独自の生息地を含む可能性があり、調査地域全体を代表する地域を訪問できる。本稿では,機能空間探索の報奨,フリーフォームパスの計画,サーベイテンプレートの配置を最適化するための情報収集プランナーを提案する。これらの手法のAUV調査計画への適合性は,特徴空間のカバレッジと,初期潜水時のベント性生息地の全クラスへの訪問能力に基づいて評価される。 RRT(Rapidly-Expanding Random Trees)とMCTS(Monte-Carlo Tree Search)に基づくインフォームティブプランナーが最も有効であることがわかった。これは、初期潜水の有用性を高めるため、AUV調査にとって貴重なツールである。また、音響浴量測定と視覚由来の海底分類の関係を学習するための総合的なトレーニングセットも提供する。 Special-purpose Autonomous Underwater Vehicles (AUVs) are utilised for benthic (seafloor) surveys, where the vehicle collects optical imagery of the seafloor. Due to the small-sensor footprint of the cameras and the vast areas to be surveyed, these AUVs can not feasibly collect full coverage imagery of areas larger than a few tens of thousands of square meters. Therefore it is necessary for AUV paths to sample the surveys areas sparsely, yet effectively. Broad-scale acoustic bathymetric data is readily available over large areas, and is often a useful prior of seafloor cover. As such, prior bathymetry can be used to guide AUV data collection. This research proposes methods for planning initial AUV surveys that efficiently explore a feature space representation of the bathymetry, in order to sample from a diverse set of bathymetric terrain. This will enable the AUV to visit areas that likely contain unique habitats and are representative of the entire survey site. We propose several information gathering planners that utilise a feature space exploration reward, to plan freeform paths or to optimise the placement of a survey template. The suitability of these methods to plan AUV surveys is evaluated based on the coverage of the feature space and also the ability to visit all classes of benthic habitat on the initial dive. Informative planners based on Rapidly-expanding Random Trees (RRT) and Monte-Carlo Tree Search (MCTS) were found to be the most effective. This is a valuable tool for AUV surveys as it increases the utility of initial dives. It also delivers a comprehensive training set to learn a relationship between acoustic bathymetry and visually-derived seafloor classifications.	翻訳日:2024-01-03 03:30:59 公開日:2023-12-29
# ガウス雑音を持つ行列:特異部分空間摂動の最適推定 Matrices with Gaussian noise: optimal estimates for singular subspace perturbation ( http://arxiv.org/abs/1803.00679v3 ) ライセンス: Link先を確認	Sean O'Rourke and Van Vu and Ke Wang	(参考訳) Davis-Kahan-Wedin $\sin \Theta$定理は、行列の特異部分空間が小さな摂動を受けるとどのように変化するかを記述する。この古典的な結果は最悪のシナリオでは鋭い。本稿では,摂動がガウス確率行列である場合,davis-kahan-wedin $\sin \theta$ theorem の確率的バージョンを証明する。ある種の構造的仮定の下では、古典的なデービス=カーン=ヴェーディン$\sin \Theta$定理を著しく改善する最適境界を得る。私たちの重要なツールの1つは、特異値に対して束縛された新しい摂動です。 The Davis-Kahan-Wedin $\sin \Theta$ theorem describes how the singular subspaces of a matrix change when subjected to a small perturbation. This classic result is sharp in the worst case scenario. In this paper, we prove a stochastic version of the Davis-Kahan-Wedin $\sin \Theta$ theorem when the perturbation is a Gaussian random matrix. Under certain structural assumptions, we obtain an optimal bound that significantly improves upon the classic Davis-Kahan-Wedin $\sin \Theta$ theorem. One of our key tools is a new perturbation bound for the singular values, which may be of independent interest.	翻訳日:2024-01-03 03:30:18 公開日:2023-12-29
# FlowX: メッセージフローによる説明可能なグラフニューラルネットワークを目指して FlowX: Towards Explainable Graph Neural Networks via Message Flows ( http://arxiv.org/abs/2206.12987v3 ) ライセンス: Link先を確認	Shurui Gui, Hao Yuan, Jie Wang, Qicheng Lao, Kang Li, Shuiwang Ji	(参考訳) グラフニューラルネットワーク(GNN)の動作メカニズム解明へのステップとして,その説明可能性について検討する。現在のほとんどの手法はグラフノード、エッジ、機能の説明に重点を置いているが、GNNの本質的な機能メカニズムとして、メッセージフローは説明可能性を実現する上でより自然なものである、と我々は主張する。そこで本研究では,重要なメッセージフローを識別してGNNを説明する新しい手法であるFlowXを提案する。フローの重要性を定量化するために,協調ゲーム理論からシェープリー値の哲学に従うことを提案する。連立の余分な貢献を計算することの複雑さに対処するために,シェープ値近似を更なるトレーニングの初期評価として計算するフローサンプリングスキームを提案する。次に,多様な説明対象に対してフロースコアを学習するための情報制御学習アルゴリズムを提案する。合成および実世界の両方のデータセットに関する実験的研究により、提案したFlowXとその変種がGNNの説明可能性の向上に繋がることを示した。コードはhttps://github.com/divelab/digで入手できる。 We investigate the explainability of graph neural networks (GNNs) as a step toward elucidating their working mechanisms. While most current methods focus on explaining graph nodes, edges, or features, we argue that, as the inherent functional mechanism of GNNs, message flows are more natural for performing explainability. To this end, we propose a novel method here, known as FlowX, to explain GNNs by identifying important message flows. To quantify the importance of flows, we propose to follow the philosophy of Shapley values from cooperative game theory. To tackle the complexity of computing all coalitions' marginal contributions, we propose a flow sampling scheme to compute Shapley value approximations as initial assessments of further training. We then propose an information-controlled learning algorithm to train flow scores toward diverse explanation targets: necessary or sufficient explanations. Experimental studies on both synthetic and real-world datasets demonstrate that our proposed FlowX and its variants lead to improved explainability of GNNs. The code is available at https://github.com/divelab/DIG.	翻訳日:2024-01-03 03:19:35 公開日:2023-12-29
# 言語間の生涯学習 Cross-lingual Lifelong Learning ( http://arxiv.org/abs/2205.11152v2 ) ライセンス: Link先を確認	Meryem M'hamdi, Xiang Ren, and Jonathan May	(参考訳) 多言語学習の長年の目標は、多言語データ分布の変化に耐えられる普遍的な言語横断モデルを開発することである。このような多言語モデルを、見当たらないターゲット言語に適応させる作業は、数多く行われてきた。しかし、この方向のほとんどの研究は、ソースからターゲット言語への標準のワンホップ転送学習パイプラインに焦点を当てているが、現実的なシナリオでは、新しい言語を逐次的に組み込むことができる。本稿では,言語間連続学習(ccl)の評価パラダイムを提案する。そこでは,異なる言語からの新たなデータに継続的に適応するためのアプローチのカテゴリを分析する。マルチリンガルなシーケンシャルな学習を特に難しいものにするための洞察を提供する。このような課題を克服するために,言語間連続学習アルゴリズムの代表的なセットをベンチマークし,注意深く収集されたデータストリームのベースラインと比較して,その知識の保存,蓄積,一般化能力を分析する。この分析の意味は、従来の転帰学習を超えて、異なる言語間連続学習のデシダラタを測り、バランスをとる方法のレシピを含む。 The longstanding goal of multi-lingual learning has been to develop a universal cross-lingual model that can withstand the changes in multi-lingual data distributions. There has been a large amount of work to adapt such multi-lingual models to unseen target languages. However, the majority of work in this direction focuses on the standard one-hop transfer learning pipeline from source to target languages, whereas in realistic scenarios, new languages can be incorporated at any time in a sequential manner. In this paper, we present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm, where we analyze different categories of approaches used to continually adapt to emerging data from different languages. We provide insights into what makes multilingual sequential learning particularly challenging. To surmount such challenges, we benchmark a representative set of cross-lingual continual learning algorithms and analyze their knowledge preservation, accumulation, and generalization capabilities compared to baselines on carefully curated datastreams. The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata, which go beyond conventional transfer learning.	翻訳日:2024-01-03 03:18:55 公開日:2023-12-29
# 漢文書を現代朝鮮語・英語に翻訳する Translating Hanja Historical Documents to Contemporary Korean and English ( http://arxiv.org/abs/2205.10019v5 ) ライセンス: Link先を確認	Juhee Son, Jiho Jin, Haneul Yoo, JinYeong Bak, Kyunghyun Cho, Alice Oh	(参考訳) 朝鮮王朝の年代記(ajd)には朝鮮の近代国家に先立つ500年の王国である朝鮮の王の日々の記録が含まれている。アナル文字は、1968年から1993年まで朝鮮語に翻訳された古来の朝鮮語書記法「般若」で書かれていた。しかし、この翻訳は書き直しに過ぎず、多くの古語的な韓国語も含んでいたため、2012年に新しい専門的な翻訳作業が始まった。それ以来、わずか1人の王の記録は10年で完成している。並行して、専門家翻訳家は英語の翻訳にも取り組んでおり、そのペースは遅く、これまでのところ英語の王の記録は1つだけだった。そこで本稿では,ハンジャの歴史文書をより理解しやすい韓国語と英語に翻訳するニューラルマシン翻訳モデルh2keを提案する。 H2KEは多言語ニューラルマシン翻訳の上に構築され、時代遅れの朝鮮語翻訳の全データセットと、より最近になって翻訳された現代韓国語と英語の小さなデータセットから漢漢で書かれた歴史的文書の翻訳を学ぶ。提案手法を,漢書古文書の復元と翻訳を同時に学習する最近のモデルと,新たに翻訳されたコーパスのみに基づいて学習したトランスフォーマーベースモデルとを比較した。実験の結果,現代韓国語と英語の両翻訳のBLEUスコアにおいて,本手法が基調を著しく上回ることがわかった。我々はさらに、専門家と非専門家の韓国語話者による原語翻訳よりも翻訳が好ましいことを示す広範な人的評価を行っている。 The Annals of Joseon Dynasty (AJD) contain the daily records of the Kings of Joseon, the 500-year kingdom preceding the modern nation of Korea. The Annals were originally written in an archaic Korean writing system, `Hanja', and were translated into Korean from 1968 to 1993. The resulting translation was however too literal and contained many archaic Korean words; thus, a new expert translation effort began in 2012. Since then, the records of only one king have been completed in a decade. In parallel, expert translators are working on English translation, also at a slow pace and produced only one king's records in English so far. Thus, we propose H2KE, a neural machine translation model, that translates historical documents in Hanja to more easily understandable Korean and to English. Built on top of multilingual neural machine translation, H2KE learns to translate a historical document written in Hanja, from both a full dataset of outdated Korean translation and a small dataset of more recently translated contemporary Korean and English. We compare our method against two baselines: a recent model that simultaneously learns to restore and translate Hanja historical document and a Transformer based model trained only on newly translated corpora. The experiments reveal that our method significantly outperforms the baselines in terms of BLEU scores for both contemporary Korean and English translations. We further conduct extensive human evaluation which shows that our translation is preferred over the original expert translations by both experts and non-expert Korean speakers.	翻訳日:2024-01-03 03:18:35 公開日:2023-12-29
# 協調推論誘導言語モデルによる数学単語問題の解法 Solving Math Word Problems via Cooperative Reasoning induced Language Models ( http://arxiv.org/abs/2210.16257v5 ) ライセンス: Link先を確認	Xinyu Zhu, Junjie Wang, Lin Zhang, Yuxiang Zhang, Ruyi Gan, Jiaxing Zhang, Yujiu Yang	(参考訳) 大規模事前学習言語モデル(PLM)は、特に数学語問題(MWP)のような高レベルの知性を必要とする問題に新たな機会をもたらす。しかしながら、既存のPLMをMWPに直接適用することは、生成プロセスが十分な監督を欠いているため、人間としての高速な適応性を欠いているため失敗する可能性がある。人間の推論には、即時反応系(システム1)と微妙な推論系(システム2)から構成される二重推論の枠組みがあることに気付く。これにより、協調推論(Cooperative Reasoning, CoRe)と呼ばれる、MWPを解くための協調推論によるPLMを開発することとなり、システム1をジェネレータとして、システム2をバリデーションとして、人間のような推論アーキテクチャを実現する。提案手法では, ジェネレータは推論経路の生成に責任を持ち, 検証器を用いて評価を監督し, ジェネレータに対する信頼性の高いフィードバックを得る。我々はCoReフレームワークをいくつかの数学的推論データセット上で評価し、最先端の手法よりも優れた改善を実現した。私たちのコードはhttps://github.com/TianHongZXY/CoReで利用可能です。 Large-scale pre-trained language models (PLMs) bring new opportunities to challenging problems, especially those that need high-level intelligence, such as the math word problem (MWPs). However, directly applying existing PLMs to MWPs can fail as the generation process lacks sufficient supervision and thus lacks fast adaptivity as humans. We notice that human reasoning has a dual reasoning framework that consists of an immediate reaction system (system 1) and a delicate reasoning system (system 2), where the entire reasoning is determined by their interaction. This inspires us to develop a cooperative reasoning-induced PLM for solving MWPs, called Cooperative Reasoning (CoRe), resulting in a human-like reasoning architecture with system 1 as the generator and system 2 as the verifier. In our approach, the generator is responsible for generating reasoning paths, and the verifiers are used to supervise the evaluation in order to obtain reliable feedback for the generator. We evaluate our CoRe framework on several mathematical reasoning datasets and achieve decent improvement over state-of-the-art methods, up to 9.6% increase over best baselines. Our codes are available at https://github.com/TianHongZXY/CoRe	翻訳日:2024-01-03 03:11:03 公開日:2023-12-29
# 未学習ニューラルネットワークによる残留バックプロジェクション Residual Back Projection With Untrained Neural Networks ( http://arxiv.org/abs/2210.14416v2 ) ライセンス: Link先を確認	Ziyu Shu and Alireza Entezari	(参考訳) 背景と目的: 画像処理タスクにおけるニューラルネットワークの成功は、CT(Computerd tomography)における画像再構成問題への彼らの応用を動機付けている。この分野では進歩が進んでいるが、安定性の欠如と精度の理論的保証、および特定の画像領域に対する高品質なトレーニングデータの不足は、多くのCTアプリケーションに課題をもたらす。本稿では,ニューラルネットワークの階層構造を利用したCTにおける反復的再構成(IR)の枠組みを,トレーニングを必要とせずに提案する。本フレームワークでは,この構造情報をDIP(Deep Image Prior)として組み込んで,リザーブ・バック・プロジェクション(RBP)接続を用いてイテレーションの基礎となる。方法: 目標関数を最小化し, 高精度な再構成を実現するために, 未訓練のu-netと新しい残差バックプロジェクションを併用して提案する。各イテレーションにおいて、トレーニングされていないU-netの重みを最適化し、現在のイテレーションにおけるU-netの出力を使用して、上記RBP接続を介して次のイテレーションにおけるU-netの入力を更新する。結果: 実験の結果, RBP-DIPフレームワークは, 従来のIR法と類似のネットワーク構造を持つ事前学習モデル, 未学習モデルに改善をもたらすことがわかった。これらの改善は、少数ビュー、限定アングル、低線量撮像構成において特に重要である。結論: パラレルビームX線撮影とファンビームX線撮影を併用すると, 複数の条件下での大きな改善が見られた。さらに、提案フレームワークは、トレーニングデータを必要としないため、異なる条件(ノイズレベル、幾何学、画像オブジェクトなど)に適応するために、オンデマンドで調整することができる。 Background and Objective: The success of neural networks in a number of image processing tasks has motivated their application in image reconstruction problems in computed tomography (CT). While progress has been made in this area, the lack of stability and theoretical guarantees for accuracy, together with the scarcity of high-quality training data for specific imaging domains pose challenges for many CT applications. In this paper, we present a framework for iterative reconstruction (IR) in CT that leverages the hierarchical structure of neural networks, without the need for training. Our framework incorporates this structural information as a deep image prior (DIP), and uses a novel residual back projection (RBP) connection that forms the basis for our iterations. Methods: We propose using an untrained U-net in conjunction with a novel residual back projection to minimize an objective function and achieve high-accuracy reconstruction. In each iteration, the weights of the untrained U-net are optimized, and the output of the U-net in the current iteration is used to update the input of the U-net in the next iteration through the aforementioned RBP connection. Results: Experimental results demonstrate that the RBP-DIP framework offers improvements over other state-of-the-art conventional IR methods, as well as pre-trained and untrained models with similar network structures under multiple conditions. These improvements are particularly significant in the few-view, limited-angle, and low-dose imaging configurations. Conclusions: Applying to both parallel and fan beam X-ray imaging, our framework shows significant improvement under multiple conditions. Furthermore, the proposed framework requires no training data and can be adjusted on-demand to adapt to different conditions (e.g. noise level, geometry, and imaged object).	翻訳日:2024-01-03 03:10:06 公開日:2023-12-29
# 特定ミスデータ機構を用いたガウス混合モデルのベイズ則推定法の解析 Analysis of Estimating the Bayes Rule for Gaussian Mixture Models with a Specified Missing-Data Mechanism ( http://arxiv.org/abs/2210.13785v2 ) ライセンス: Link先を確認	Ziyang Lyu	(参考訳) 半教師付き学習(SSL)アプローチは、幅広い工学と科学の分野でうまく適用されている。本稿では、Ahfock と McLachlan (2020) が導入した、未分類観測のための欠落機構を持つ生成モデルフレームワークについて検討する。一部分類されたサンプルでは、欠落データ機構を用いたベイズ規則を用いた分類器は、2クラス正規ホモシダモデルにおいて完全教師付き分類器を上回ることができ、特に中程度から低い重複率と欠落クラスラベルの割合で、あるいは重なりが大きいが欠落ラベルが少ない。また、重複領域や欠落したクラスラベルの比率に関わらず、欠落データ機構のない分類器を上回ります。シミュレーションにより不均等な共分散を持つ2成分および3成分の正規混合モデルの探索を行い, 以上の知見を裏付ける。最後に,ニューロン間および皮膚病変データセットに欠測データ機構を付加した分類器について述べる。 Semi-supervised learning (SSL) approaches have been successfully applied in a wide range of engineering and scientific fields. This paper investigates the generative model framework with a missingness mechanism for unclassified observations, as introduced by Ahfock and McLachlan(2020). We show that in a partially classified sample, a classifier using Bayes rule of allocation with a missing-data mechanism can surpass a fully supervised classifier in a two-class normal homoscedastic model, especially with moderate to low overlap and proportion of missing class labels, or with large overlap but few missing labels. It also outperforms a classifier with no missing-data mechanism regardless of the overlap region or the proportion of missing class labels. Our exploration of two- and three-component normal mixture models with unequal covariances through simulations further corroborates our findings. Finally, we illustrate the use of the proposed classifier with a missing-data mechanism on interneuronal and skin lesion datasets.	翻訳日:2024-01-03 03:09:02 公開日:2023-12-29
# フェルミオン量子シミュレーションのための誤り訂正符号 Error-correcting codes for fermionic quantum simulation ( http://arxiv.org/abs/2210.08411v5 ) ライセンス: Link先を確認	Yu-An Chen, Alexey V. Gorshkov, and Yijia Xu	(参考訳) パウリ安定化符号の文脈における$\mathbb{Z}_2$格子ゲージ理論の枠組みを利用して、2次元正方格子上の量子ビット系によるフェルミオンをシミュレートする手法を提案する。ローラン多項式環上のパウリ加群のシンプレクティック自己同型について検討する。これにより、エンコードされた論理フェルミオンと物理キュービットの間のレートを固定しながら、安定化符号の符号距離を体系的に増加させることができる。フェミオンシミュレーションに適した安定化符号群を同定し、$d=2,3,4,5,6,7$の符号距離を達成し、任意の$\lfloor \frac{d-1}{2} \rfloor$-qubitエラーの補正を可能にする。従来のコード連結手法とは対照的に、この手法は(フェルミオン)符号率を低下させることなくコード距離を増大させることができる。特に、コード距離が$d=3,4,5$のコードに対して、すべての安定化子と論理演算子を明示的に示す。すべてのPauliエラーに対するシンドロームを提供し、コード距離を数値的に計算するシンドロームマッチングアルゴリズムを考案する。 Utilizing the framework of $\mathbb{Z}_2$ lattice gauge theories in the context of Pauli stabilizer codes, we present methodologies for simulating fermions via qubit systems on a two-dimensional square lattice. We investigate the symplectic automorphisms of the Pauli module over the Laurent polynomial ring. This enables us to systematically increase the code distances of stabilizer codes while fixing the rate between encoded logical fermions and physical qubits. We identify a family of stabilizer codes suitable for fermion simulation, achieving code distances of $d=2,3,4,5,6,7$, allowing correction of any $\lfloor \frac{d-1}{2} \rfloor$-qubit error. In contrast to the traditional code concatenation approach, our method can increase the code distances without decreasing the (fermionic) code rate. In particular, we explicitly show all stabilizers and logical operators for codes with code distances of $d=3,4,5$. We provide syndromes for all Pauli errors and invent a syndrome-matching algorithm to compute code distances numerically.	翻訳日:2024-01-03 03:08:10 公開日:2023-12-29
# CoreDeep: 幅確率によるき裂検出アルゴリズムの改善 CoreDeep: Improving Crack Detection Algorithms Using Width Stochasticity ( http://arxiv.org/abs/2209.04648v2 ) ライセンス: Link先を確認	Ram Krishna Pandey, Akshit Achara	(参考訳) 画像中のクラックの自動検出やセグメント化は、メンテナンスや運用のコスト削減に役立つ。背景から亀裂を分離する明確な境界が存在しないため,難易度分析のための亀裂の検出,測定,定量化は困難である。開発されたアルゴリズムは、データに関連する固有の課題を扱う必要がある。知覚的に注目すべき課題は、色、強度、深さ、ぼやけ、動きの青、方向、欠陥に対する異なる関心領域(ROI)、スケール、照明、複雑で困難な背景などである。これらのバリエーションは(crack interクラス)とイメージ(crack in-class変数)にまたがる。全体として、大きな背景(インター)と前景(イントラクラス)のばらつきがある。本研究では,これらの変化が背景シナリオの難易度に与える影響を低減しようと試みている。我々は,これらの変動の影響を低減するために,確率幅(SW)アプローチを提案している。提案手法は検出性を向上し,偽陽性と陰性を大幅に低減する。我々は,平均IoU,偽陽性,陰性,主観的品質の観点からアルゴリズムの性能を客観的に測定した。 Automatically detecting or segmenting cracks in images can help in reducing the cost of maintenance or operations. Detecting, measuring and quantifying cracks for distress analysis in challenging background scenarios is a difficult task as there is no clear boundary that separates cracks from the background. Developed algorithms should handle the inherent challenges associated with data. Some of the perceptually noted challenges are color, intensity, depth, blur, motion-blur, orientation, different region of interest (ROI) for the defect, scale, illumination, complex and challenging background, etc. These variations occur across (crack inter class) and within images (crack intra-class variabilities). Overall, there is significant background (inter) and foreground (intra-class) variability. In this work, we have attempted to reduce the effect of these variations in challenging background scenarios. We have proposed a stochastic width (SW) approach to reduce the effect of these variations. Our proposed approach improves detectability and significantly reduces false positives and negatives. We have measured the performance of our algorithm objectively in terms of mean IoU, false positives and negatives and subjectively in terms of perceptual quality.	翻訳日:2024-01-03 03:06:07 公開日:2023-12-29
# 予測誤差保証による分散オフラインポリシー評価 Distributional Offline Policy Evaluation with Predictive Error Guarantees ( http://arxiv.org/abs/2302.09456v3 ) ライセンス: Link先を確認	Runzhe Wu, Masatoshi Uehara, Wen Sun	(参考訳) 本研究では,ポリシから生成されていないオフラインデータセット,すなわち分散オフラインポリシ評価(OPE)を用いて,ポリシの戻り値の分布を推定する問題について検討する。本稿では,mle (maximum likelihood estimation) のシーケンスを実行し,mle を通じて訓練できる限り,任意の状態確率的生成モデルを統合する柔軟性を有する適応度推定 (adapted likelihood estimation, fle) というアルゴリズムを提案する。 FLEは、報酬が多次元ベクトルとなるような有限水平と無限水平の割引設定の両方に使うことができる。我々の理論的結果は、有限水平と無限水平の割引設定の両方において、FLEは総変分距離とワッサーシュタイン距離で基底真理に近い分布を学習できることを示している。我々の理論的結果は、オフラインデータがテストポリシーのトレースをカバーし、教師付き学習MLEが成功するという条件下で成り立つ。実験では,2つの生成モデル,ガウス混合モデルと拡散モデルを用いてFLEの性能を示す。多次元報酬設定では、拡散モデルを持つFLEは、テストポリシの戻りの複雑な分布を推定することができる。 We study the problem of estimating the distribution of the return of a policy using an offline dataset that is not generated from the policy, i.e., distributional offline policy evaluation (OPE). We propose an algorithm called Fitted Likelihood Estimation (FLE), which conducts a sequence of Maximum Likelihood Estimation (MLE) and has the flexibility of integrating any state-of-the-art probabilistic generative models as long as it can be trained via MLE. FLE can be used for both finite-horizon and infinite-horizon discounted settings where rewards can be multi-dimensional vectors. Our theoretical results show that for both finite-horizon and infinite-horizon discounted settings, FLE can learn distributions that are close to the ground truth under total variation distance and Wasserstein distance, respectively. Our theoretical results hold under the conditions that the offline data covers the test policy's traces and that the supervised learning MLE procedures succeed. Experimentally, we demonstrate the performance of FLE with two generative models, Gaussian mixture models and diffusion models. For the multi-dimensional reward setting, FLE with diffusion models is capable of estimating the complicated distribution of the return of a test policy.	翻訳日:2024-01-03 02:57:08 公開日:2023-12-29
# 対訳オンライン協調フィルタリング Adversarial Online Collaborative Filtering ( http://arxiv.org/abs/2302.05765v2 ) ライセンス: Link先を確認	Stephen Pasteris, Fabio Vitale, Mark Herbster, Claudio Gentile, Andre' Panisson	(参考訳) 本研究では,オンライン・コラボレーション・フィルタリングの課題について検討し,ユーザがオンライン・スタイルでコンテンツを配信する必要があること,ユーザが同じコンテンツアイテムを1回以上推薦できないこと,等について考察する。まず,ユーザの嗜好行列上の双クラスタ化仮定の下で動作するアルゴリズムを設計・解析し,このアルゴリズムがユーザ列やアイテムの宇宙,選好行列の双クラスタ化パラメータに関する事前知識に従わないよう,完全適応的でありながら最適な後悔保証を示すことを示す。そこで,本アルゴリズムの汎用行列を用いたより頑健なバージョンを提案する。また,このアルゴリズムはパラメータフリーであり,プライオリティ行列が2重クラスター構造から逸脱する量と一致することを後悔していることを示す。我々の知る限り、これらはオンライン共同フィルタリングにおける最初の成果であり、このレベルの一般化と適応性は、反復的制約下で維持される。最後に,理論の検証と標準ベースラインとの実証的な比較を目的とした実世界のデータセットに関する簡単な実験により,理論的知見を補完する。この比較は、これらのベースラインに対するアプローチの競争上の優位性を示している。 We investigate the problem of online collaborative filtering under no-repetition constraints, whereby users need to be served content in an online fashion and a given user cannot be recommended the same content item more than once. We start by designing and analyzing an algorithm that works under biclustering assumptions on the user-item preference matrix, and show that this algorithm exhibits an optimal regret guarantee, while being fully adaptive, in that it is oblivious to any prior knowledge about the sequence of users, the universe of items, as well as the biclustering parameters of the preference matrix. We then propose a more robust version of this algorithm which operates with general matrices. Also this algorithm is parameter free, and we prove regret guarantees that scale with the amount by which the preference matrix deviates from a biclustered structure. To our knowledge, these are the first results on online collaborative filtering that hold at this level of generality and adaptivity under no-repetition constraints. Finally, we complement our theoretical findings with simple experiments on real-world datasets aimed at both validating the theory and empirically comparing to standard baselines. This comparison shows the competitive advantage of our approach over these baselines.	翻訳日:2024-01-03 02:56:47 公開日:2023-12-29
# テキストから画像へのプロンプトの最適化 Optimizing Prompts for Text-to-Image Generation ( http://arxiv.org/abs/2212.09611v2 ) ライセンス: Link先を確認	Yaru Hao, Zewen Chi, Li Dong, Furu Wei	(参考訳) よく設計されたプロンプトは、テキストから画像へのモデルをガイドし、素晴らしい画像を生成する。しかしながら、パフォーマンスプロンプトはモデル固有であり、ユーザ入力と不一致であることが多い。本稿では,従来のユーザ入力をモデル優先のプロンプトに自動的に適応する一般的なフレームワークである,プロンプト適応を提案する。具体的には、手作業によるプロンプトの小さなコレクション上で、事前訓練された言語モデルを用いて教師付き微調整を行う。その後、強化学習を使用して、より良いプロンプトを探索します。我々は,本来のユーザ意図を維持しつつ,より美的なイメージを生成するためのポリシーを奨励する報酬関数を定義する。安定拡散実験の結果,本手法は自動測定値と人選好評価値の両方で手動のプロンプト工学よりも優れていた。さらに、強化学習は、特にドメイン外のプロンプトのパフォーマンスをさらに向上させる。事前トレーニングされたチェックポイントはhttps://aka.ms/promptist.comで入手できる。デモはhttps://aka.ms/promptist-demoで見ることができる。 Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts. Specifically, we first perform supervised fine-tuning with a pretrained language model on a small collection of manually engineered prompts. Then we use reinforcement learning to explore better prompts. We define a reward function that encourages the policy to generate more aesthetically pleasing images while preserving the original user intentions. Experimental results on Stable Diffusion show that our method outperforms manual prompt engineering in terms of both automatic metrics and human preference ratings. Moreover, reinforcement learning further boosts performance, especially on out-of-domain prompts. The pretrained checkpoints are available at https://aka.ms/promptist. The demo can be found at https://aka.ms/promptist-demo.	翻訳日:2024-01-03 02:54:47 公開日:2023-12-29
# 高エネルギー物理におけるFAIRAIモデル FAIR AI Models in High Energy Physics ( http://arxiv.org/abs/2212.05081v3 ) ライセンス: Link先を確認	Javier Duarte and Haoyang Li and Avik Roy and Ruike Zhu and E. A. Huerta and Daniel Diaz and Philip Harris and Raghav Kansal and Daniel S. Katz and Ishaan H. Kavoori and Volodymyr V. Kindratenko and Farouk Mokhtar and Mark S. Neubauer and Sang Eon Park and Melissa Quinnan and Roger Rusack and Zhizhen Zhao	(参考訳) findable, access, interoperaable, and reusable (fair) データ原則は、科学的発見を促進するためにデータの共有方法を調査し、評価し、改善するためのフレームワークを提供する。これらの原則を研究ソフトウェアや他のデジタル製品に一般化することは、活発な研究分野である。機械学習(ML)モデル -- 明示的にプログラムされることなくデータに基づいてトレーニングされたアルゴリズム -- や、より一般的には人工知能(AI)モデル — は、AIが実験的な高エネルギー物理学(HEP)のような科学領域を変革しているため、この目標にとって重要なものだ。本稿では、HEPにおけるAIモデルに対するFAIR原則の実践的定義を提案し、これらの原則の適用のためのテンプレートを記述する。グラフニューラルネットワークを用いてヒッグス粒子が2つのボトムクォークに崩壊するのを識別する、HEPに適用したAIモデルの例を用いて、テンプレートの使用を実証する。本稿では,このFAIR AIモデルの堅牢性,ハードウェアアーキテクチャとソフトウェアフレームワーク間のポータビリティ,解釈可能性について報告する。 The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly programmed -- and more generally, artificial intelligence (AI) models, are an important target for this because of the ever-increasing pace with which AI is transforming scientific domains, such as experimental high energy physics (HEP). In this paper, we propose a practical definition of FAIR principles for AI models in HEP and describe a template for the application of these principles. We demonstrate the template's use with an example AI model applied to HEP, in which a graph neural network is used to identify Higgs bosons decaying to two bottom quarks. We report on the robustness of this FAIR AI model, its portability across hardware architectures and software frameworks, and its interpretability.	翻訳日:2024-01-03 02:54:19 公開日:2023-12-29
# 統計的推論としての説明可能性 Explainability as statistical inference ( http://arxiv.org/abs/2212.03131v3 ) ライセンス: Link先を確認	Hugo Henri Joseph Senetaire, Damien Garreau, Jes Frellsen, Pierre-Alexandre Mattei	(参考訳) 近年、様々なモデル説明アプローチが提案されており、いずれも非常に異なる理論とヒューリスティックによって導かれている。本稿では,統計的推論問題として新しい経路と解釈可能性を提案する。本稿では,解釈可能な予測を生成するために設計された一般の深部確率モデルを提案する。モデルパラメータは最大確率で学習でき、この方法は任意の予測器ネットワークアーキテクチャと任意の種類の予測問題に適用することができる。本手法は,ニューラルネットワークをセレクタとして使用し,推論時の解釈を高速に行う無形解釈モデルの一例である。いくつかの一般的な解釈可能性法は、一般モデルに対する正規化極大確率の特別な場合であることが示されている。そこで本稿では,特徴重要度マップの評価を可能にする,真理選択に基づく新しいデータセットを提案する。これらのデータセットを用いて、複数の命令を用いることでより合理的な解釈が得られることを示す。 A wide variety of model explanation approaches have been proposed in recent years, all guided by very different rationales and heuristics. In this paper, we take a new route and cast interpretability as a statistical inference problem. We propose a general deep probabilistic model designed to produce interpretable predictions. The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture and any type of prediction problem. Our method is a case of amortized interpretability models, where a neural network is used as a selector to allow for fast interpretation at inference time. Several popular interpretability methods are shown to be particular cases of regularised maximum likelihood for our general model. We propose new datasets with ground truth selection which allow for the evaluation of the features importance map. Using these datasets, we show experimentally that using multiple imputation provides more reasonable interpretations.	翻訳日:2024-01-03 02:54:00 公開日:2023-12-29
# 連続量子場理論のための変分ニューラルネットワークアンサッツ Variational Neural-Network Ansatz for Continuum Quantum Field Theory ( http://arxiv.org/abs/2212.00782v4 ) ライセンス: Link先を確認	John M. Martyn, Khadijeh Najafi, Di Luo	(参考訳) ファインマンにさかのぼる物理学者は、量子場理論に変分原理を適用することの難しさを嘆いている。非相対論的場の量子論では、状態のフォック空間表現を構成する無限に多くの$n$粒子波動関数をパラメータ化し、最適化することが課題である。ここでは,連続体における非相対論的量子場理論への変分原理の適用を可能にする深層学習アンサッツであるニューラルネットワーク量子場状態を導入することにより,この問題にアプローチする。我々のansatzは、ディープセットニューラルネットワークアーキテクチャを使用して、量子場状態を含むn$-particle波関数のすべてを同時にパラメータ化します。我々は、ansatzを用いて、不均一系や長距離相互作用を持つ系を含む様々な場理論の基底状態の近似を行い、量子場理論を探索する強力な新しいツールを示す。 Physicists dating back to Feynman have lamented the difficulties of applying the variational principle to quantum field theories. In non-relativistic quantum field theories, the challenge is to parameterize and optimize over the infinitely many $n$-particle wave functions comprising the state's Fock space representation. Here we approach this problem by introducing neural-network quantum field states, a deep learning ansatz that enables application of the variational principle to non-relativistic quantum field theories in the continuum. Our ansatz uses the Deep Sets neural network architecture to simultaneously parameterize all of the $n$-particle wave functions comprising a quantum field state. We employ our ansatz to approximate ground states of various field theories, including an inhomogeneous system and a system with long-range interactions, thus demonstrating a powerful new tool for probing quantum field theories.	翻訳日:2024-01-03 02:53:47 公開日:2023-12-29
# 医用samアダプタ : 医用画像分割のためのsegment anythingモデルの適用 Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation ( http://arxiv.org/abs/2304.12620v7 ) ライセンス: Link先を確認	Junde Wu and Wei Ji and Yuanpei Liu and Huazhu Fu and Min Xu and Yanwu Xu and Yueming Jin	(参考訳) Segment Anything Model (SAM) は、画像セグメンテーションの分野で、様々なセグメンテーションタスクやプロンプトベースのインタフェースにおいて印象的な機能によって最近人気を集めている。しかし、最近の研究や個人実験により、SAMは医学的な特定の知識が欠如しているため、医療画像のセグメンテーションにおいて不十分であることが示されている。これにより、医療画像におけるSAMのセグメンテーション能力の強化に関する疑問が提起される。本稿では,samモデルを微調整する代わりに,領域固有の医学知識を,軽量かつ効果的な適応手法を用いてセグメンテーションモデルに組み込む医療用samアダプタ(med-sa)を提案する。 Med-SAでは,3次元医用画像に2次元SAMを適応させる空間深度変換(SD-Trans)と,即時適応を実現するハイパープロンプト適応(HyP-Adpt)を提案する。各種画像モダリティを対象とした17の医用画像分割作業に関する総合評価実験を行った。 Med-SAは、パラメータのわずか2%を更新しながら、いくつかの最先端(SOTA)医療画像セグメンテーション法より優れている。私たちのコードはhttps://github.com/KidsWithTokens/Medical-SAM-Adapterでリリースされています。 The Segment Anything Model (SAM) has recently gained popularity in the field of image segmentation due to its impressive capabilities in various segmentation tasks and its prompt-based interface. However, recent studies and individual experiments have shown that SAM underperforms in medical image segmentation, since the lack of the medical specific knowledge. This raises the question of how to enhance SAM's segmentation capability for medical images. In this paper, instead of fine-tuning the SAM model, we propose the Medical SAM Adapter (Med-SA), which incorporates domain-specific medical knowledge into the segmentation model using a light yet effective adaptation technique. In Med-SA, we propose Space-Depth Transpose (SD-Trans) to adapt 2D SAM to 3D medical images and Hyper-Prompting Adapter (HyP-Adpt) to achieve prompt-conditioned adaptation. We conduct comprehensive evaluation experiments on 17 medical image segmentation tasks across various image modalities. Med-SA outperforms several state-of-the-art (SOTA) medical image segmentation methods, while updating only 2\% of the parameters. Our code is released at https://github.com/KidsWithTokens/Medical-SAM-Adapter.	翻訳日:2024-01-03 02:46:50 公開日:2023-12-29
# Knapsack最適化によるニューラルネットワークのセキュア化 Securing Neural Networks with Knapsack Optimization ( http://arxiv.org/abs/2304.10442v2 ) ライセンス: Link先を確認	Yakir Gorski, Amir Jevnisek, Shai Avidan	(参考訳) ニューラルネットワークを持つmlaasサービスプロバイダ(sps)は、ニューラルネットワークの重み付けを秘密にしたいと考えている。一方で、ユーザはデータを明かすことなく、spsのニューラルネットワークを推論に利用したいと考えている。マルチパーティ計算(MPC)は、これを実現するためのソリューションを提供する。 MPCの計算は、当事者がデータを行き来するときに通信を伴う。非線形操作は通常、通信帯域幅の大部分を必要とする主なボトルネックである。本稿では,多くのコンピュータビジョンタスクのバックボーンとして機能するResNetsに着目し,その非線形成分,具体的にはReLUの数を削減することを目的とする。我々の重要な洞察は、空間的近接画素は相関したReLU応答を示すことである。この知見に基づいて、ピクセル当たりのReLU演算をパッチ毎のReLU演算に置き換える。このアプローチを "Block-ReLU" と呼ぶ。ニューラルネットワークの異なるレイヤは異なる特徴階層に対応するため、ニューラルネットワークのさまざまなレイヤに対してパッチサイズの柔軟性を実現することは理にかなっている。そこで我々は,Knapsack問題に対する新たな問題の削減を通じて,最適なパッチサイズを選択するアルゴリズムを考案した。 ResNet50のバックボーンを用いたイメージネットの分類、ResNet18のバックボーンを用いたCIFAR100の分類、MobileNetV2のバックボーンを用いたADE20Kのセマンティックセグメンテーション、ResNet50のバックボーンを用いたPascal VOC 2012のセマンティックセグメンテーションの4つの問題に対するアプローチを示す。私たちのアプローチは、少数の競合相手に比べて競争力のあるパフォーマンスを実現します。ソースコードはhttps://github.com/yg320/secure_inferenceで公開しています。 MLaaS Service Providers (SPs) holding a Neural Network would like to keep the Neural Network weights secret. On the other hand, users wish to utilize the SPs' Neural Network for inference without revealing their data. Multi-Party Computation (MPC) offers a solution to achieve this. Computations in MPC involve communication, as the parties send data back and forth. Non-linear operations are usually the main bottleneck requiring the bulk of communication bandwidth. In this paper, we focus on ResNets, which serve as the backbone for many Computer Vision tasks, and we aim to reduce their non-linear components, specifically, the number of ReLUs. Our key insight is that spatially close pixels exhibit correlated ReLU responses. Building on this insight, we replace the per-pixel ReLU operation with a ReLU operation per patch. We term this approach 'Block-ReLU'. Since different layers in a Neural Network correspond to different feature hierarchies, it makes sense to allow patch-size flexibility for the various layers of the Neural Network. We devise an algorithm to choose the optimal set of patch sizes through a novel reduction of the problem to the Knapsack Problem. We demonstrate our approach in the semi-honest secure 3-party setting for four problems: Classifying ImageNet using ResNet50 backbone, classifying CIFAR100 using ResNet18 backbone, Semantic Segmentation of ADE20K using MobileNetV2 backbone, and Semantic Segmentation of Pascal VOC 2012 using ResNet50 backbone. Our approach achieves competitive performance compared to a handful of competitors. Our source code is publicly available: https://github.com/yg320/secure_inference.	翻訳日:2024-01-03 02:46:27 公開日:2023-12-29
# 地震量子化 Earthquake Quantization ( http://arxiv.org/abs/2303.06158v4 ) ライセンス: Link先を確認	Benjamin Koch and Enrique Mu\~noz	(参考訳) アインシュタインの144歳の誕生日の記念として、経路積分の経路がランダムではなく、ランダムな背景の測地方程式の解となるような新しい量子化処方則を提案する。この視点の変化は、非相対論的量子力学の通常の定式化と数学的に等価であることを示す。結論として、物質に結合した量子重力や量子同値原理のような概念的問題について述べる。 In this homage to Einstein's 144th birthday we propose a novel quantization prescription, where the paths of a path-integral are not random, but rather solutions of a geodesic equation in a random background. We show that this change of perspective can be made mathematically equivalent to the usual formulations of non-relativistic quantum mechanics. To conclude, we comment on conceptual issues, such as quantum gravity coupled to matter and the quantum equivalence principle.	翻訳日:2024-01-03 02:41:46 公開日:2023-12-29
# 画像2SSM:放射基底関数を持つ画像からの統計的形状モデルの再構成 Image2SSM: Reimagining Statistical Shape Models from Images with Radial Basis Functions ( http://arxiv.org/abs/2305.11946v2 ) ライセンス: Link先を確認	Hong Xu and Shireen Y. Elhabian	(参考訳) 統計的形状モデリング(SSM)は解剖学的形態変化を解析するための重要なツールである。典型的なSSMパイプラインでは、セグメント化と剛性登録を経た3次元解剖画像は、統計的解析が可能な低次元形状特徴を用いて表現される。コンパクトな形状表現を構築するための様々な方法が提案されているが、それらは手間とコストのかかるステップを伴う。本研究では,画像から形状のラジアル・ベイシス関数(rbf)に基づく表現を学習するために,画像セグメンテーションペアを利用した新しい深層学習手法であるimage2ssmを提案する。このrpfベースの形状表現は、複雑なジオメトリにデータ駆動方式で適応できる基礎面の連続的かつコンパクトな表現を推定するために、ネットワークに豊富な自己教師付き信号を提供する。 image2ssmは、最小限のパラメータチューニングとユーザ支援を必要とせず、解剖学的形状のアンサンブルの統計的ランドマークに基づく形状モデルを構築して、興味のある生物学的構造の集団を特徴付けることができる。トレーニングが完了すると、Image2SSMは、新しい未分割画像から低次元の形状表現を推測するために使用でき、特に大きなコホートを扱う場合、SSMのスケーラブルなアプローチへの道を開くことができる。合成および実データを用いた実験は,SSMの最先端対応方式と比較して提案手法の有効性を示した。 Statistical shape modeling (SSM) is an essential tool for analyzing variations in anatomical morphology. In a typical SSM pipeline, 3D anatomical images, gone through segmentation and rigid registration, are represented using lower-dimensional shape features, on which statistical analysis can be performed. Various methods for constructing compact shape representations have been proposed, but they involve laborious and costly steps. We propose Image2SSM, a novel deep-learning-based approach for SSM that leverages image-segmentation pairs to learn a radial-basis-function (RBF)-based representation of shapes directly from images. This RBF-based shape representation offers a rich self-supervised signal for the network to estimate a continuous, yet compact representation of the underlying surface that can adapt to complex geometries in a data-driven manner. Image2SSM can characterize populations of biological structures of interest by constructing statistical landmark-based shape models of ensembles of anatomical shapes while requiring minimal parameter tuning and no user assistance. Once trained, Image2SSM can be used to infer low-dimensional shape representations from new unsegmented images, paving the way toward scalable approaches for SSM, especially when dealing with large cohorts. Experiments on synthetic and real datasets show the efficacy of the proposed method compared to the state-of-art correspondence-based method for SSM.	翻訳日:2024-01-03 02:31:56 公開日:2023-12-29
# quditsを用いたso(5)多重フェルミオン系の量子シミュレーション Quantum Simulations of SO(5) Many-Fermion Systems using Qudits ( http://arxiv.org/abs/2305.11941v2 ) ライセンス: Link先を確認	Marc Illa, Caroline E. P. Robin and Martin J. Savage	(参考訳) 量子多体系の構造とダイナミクスは、基礎となる相互作用の間の微妙な相互作用の結果であり、複雑な絡み合い構造をもたらす。この明らかな複雑さにもかかわらず、対称性は出現し、関連する自由度を決定するために長く使われてきた。本研究では,量子コンピュータが相互作用するフェルミオン系をシミュレートする上で,量子コンピュータの潜在的有用性について検討する。フェルミオンのアガシモデルは、基礎となる$so(5)$代数に基づいており、それらが記述するシステムは、5つの基底状態を持つモードのペアに分割することができ、自然に$d=5$ qudits (qu5its) の配列に埋め込まれる。最大12qu5itに埋め込まれたフェルミオン系の時間進化の古典的なノイズレスシミュレーションは、Googleのcirqソフトウェアを用いて実行される。 qu5it回路の資源要求を解析し、量子ビットシステムへの2つの異なるマッピング、物理認識のjordan-wignerマッピングと状態から状態へのマッピングと比較した。特に、必要な量子リソースの削減と、シミュレーションを物理空間から外すための予測エラーの低減に、キューディットを使うことの利点を見出した。それまで認識されていなかった符号問題は、高エネルギー励起を経時的に進行するトロタライズエラーから特定されている。これは高エネルギーおよび核物理学における量子シミュレーション、特に断片化と高度非弾性マルチチャネル過程に意味を持つ。 The structure and dynamics of quantum many-body systems are the result of a delicate interplay between underlying interactions, which leads to intricate entanglement structures. Despite this apparent complexity, symmetries emerge and have long been used to determine the relevant degrees of freedom and simplify classical descriptions of these systems. In this work, we explore the potential utility of quantum computers with arrays of qudits in simulating interacting fermionic systems, when the qudits can naturally map these relevant degrees of freedom. The Agassi model of fermions is based on an underlying $so(5)$ algebra, and the systems it describes can be partitioned into pairs of modes with five basis states, which naturally embed in arrays of $d=5$ qudits (qu5its). Classical noiseless simulations of the time evolution of systems of fermions embedded in up to twelve qu5its are performed using Google's cirq software. The resource requirements of the qu5it circuits are analyzed and compared with two different mappings to qubit systems, a physics-aware Jordan-Wigner mapping and a state-to-state mapping. We find advantages in using qudits, specifically in lowering the required quantum resources and reducing anticipated errors that take the simulation out of the physical space. A previously unrecognized sign problem has been identified from Trotterization errors in time evolving high-energy excitations. This has implications for quantum simulations in high-energy and nuclear physics, specifically of fragmentation and highly inelastic, multi-channel processes.	翻訳日:2024-01-03 02:31:33 公開日:2023-12-29
# 三フレーバーニュートリノ振動における量子性 Quantifying quantumness in three-flavor neutrino oscillations ( http://arxiv.org/abs/2305.06095v3 ) ライセンス: Link先を確認	Victor Bittencourt, Massimo Blasone, Silvio De Siena, Cristina Matrella	(参考訳) 3相振動ニュートリノ系で符号化された量子相関を平面波と波束の2つのアプローチを用いて特徴付ける。完全相補性関係を用いて、最近のニュートリノ実験から選択された関連するパラメータの観点から予測可能性、局所コヒーレンス、非局所相関のトレードオフを研究する。 CCRはバイパーティイト相関に関する貢献をよく説明しているが、純粋な状態の場合において真の三部会の貢献を含むようにこれらの関係を促進する試みは、完全には意味のない結果をもたらす。しかし,本研究では,ccrとは独立に,純粋なインスタンスと混合ケースの両方に対して,真の三成分寄与の分析を行う。 We characterize quantum correlations encoded in a three-flavor oscillating neutrino system by using both plane-wave and wave-packet approach. By means of the Complete Complementarity Relations we study the trade off of predictability, local coherence and non local correlations in terms of the relevant parameters, chosen from recent neutrino experiments. Although the CCR describe very well the contributions associated to bipartite correlations, an attempt of promoting these relations to include the genuine tripartite contributions in the pure state case leads to a not completely meaningful result. However, we provide an analysis of the genuine tripartite contributions both for the pure instance and for the mixed case, independently of CCR.	翻訳日:2024-01-03 02:30:18 公開日:2023-12-29
# WikiSQE: Wikipediaにおける文質評価のための大規模データセット WikiSQE: A Large-Scale Dataset for Sentence Quality Estimation in Wikipedia ( http://arxiv.org/abs/2305.05928v2 ) ライセンス: Link先を確認	Kenichiro Ando, Satoshi Sekine, Mamoru Komachi	(参考訳) wikipediaは誰でも編集できるので、様々な質の高い文章が含まれている。そのためウィキペディアには質の悪い編集がいくつか含まれており、しばしば他の編集者によってマークアップされる。編集者のレビューはwikipediaの信頼性を高めるが、すべての編集されたテキストをチェックするのは難しい。このプロセスを支援することは非常に重要であるが、研究のための大規模で包括的なデータセットは現存していない。本稿では,ウィキペディアにおける文質推定のための大規模データセットであるWikiSQEを提案する。各文は、英語ウィキペディアの改訂履歴全体から抽出され、対象の品質ラベルを慎重に調査し、選択した。 WikiSQEには約3.4Mの文と153の品質ラベルがある。競合する機械学習モデルを用いた自動分類実験では、引用や構文・意味論、命題に問題がある文はより検出が難しいことが判明した。また,人的アノテーションによって開発したモデルの方が,クラウドソーシング作業者よりも優れていた。 WikiSQEはNLPの他のタスクにとって貴重なリソースであると期待されている。 Wikipedia can be edited by anyone and thus contains various quality sentences. Therefore, Wikipedia includes some poor-quality edits, which are often marked up by other editors. While editors' reviews enhance the credibility of Wikipedia, it is hard to check all edited text. Assisting in this process is very important, but a large and comprehensive dataset for studying it does not currently exist. Here, we propose WikiSQE, the first large-scale dataset for sentence quality estimation in Wikipedia. Each sentence is extracted from the entire revision history of English Wikipedia, and the target quality labels were carefully investigated and selected. WikiSQE has about 3.4 M sentences with 153 quality labels. In the experiment with automatic classification using competitive machine learning models, sentences that had problems with citation, syntax/semantics, or propositions were found to be more difficult to detect. In addition, by performing human annotation, we found that the model we developed performed better than the crowdsourced workers. WikiSQE is expected to be a valuable resource for other tasks in NLP.	翻訳日:2024-01-03 02:30:06 公開日:2023-12-29
# 局所的特徴量に基づく視覚定位のための制約付き近距離近傍 Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization ( http://arxiv.org/abs/2306.09012v3 ) ライセンス: Link先を確認	Dror Aiger, Andr\'e Araujo, Simon Lynen	(参考訳) 大規模なビジュアルローカライズシステムは、画像収集から構築された3dポイントクラウドに引き続き依存する。これらのモデルの3dポイントは局所的な画像特徴を用いて表現されるが、クエリ画像のローカル特徴とポイントクラウドとの直接マッチングは、最寄りの検索問題の規模のため困難である。視覚的ローカライゼーションに対する最近の多くのアプローチでは、まずグローバルな(画像ごとの)埋め込みを用いてデータベースイメージの小さなサブセットを検索し、クエリの局所的特徴をそれらに対してマッチングするハイブリッド手法が提案されている。各クエリイメージに2つの特徴型を計算しなければならないという大きな欠点があるにも関わらず、グローバルな埋め込みは、視覚的ローカライゼーションにおいてそのイメージ検索に不可欠である、という一般的な信念になったようだ。本稿では, この仮定から一歩引いて, 局所特徴のみを用いて, k-アネレスト近傍の連立解法であるConstrained Approximate Nearest Neighbors (CANN)を提案する。我々はまず,複数のメトリクスをまたいだk-nearest-neighbor検索の理論的基礎を導出し,CANNが視覚的ローカライゼーションをどのように改善するかを示す。公開ローカライズベンチマークを用いた実験により,本手法が最先端のグローバル特徴量ベース検索と局所特徴集約方式のアプローチを両立することを示した。さらに、これらのデータセットの特徴集約スキームよりも、インデックスとクエリ時間の両方で桁違いに高速である。コード: \url{https://github.com/google-research/google-research/tree/master/cann} Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localization have thus proposed a hybrid method, where first a global (per image) embedding is used to retrieve a small subset of database images, and local features of the query are matched only against those. It seems to have become common belief that global embeddings are critical for said image-retrieval in visual localization, despite the significant downside of having to compute two feature types for each query image. In this paper, we take a step back from this assumption and propose Constrained Approximate Nearest Neighbors (CANN), a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features. We first derive the theoretical foundation for k-nearest-neighbor retrieval across multiple metrics and then showcase how CANN improves visual localization. Our experiments on public localization benchmarks demonstrate that our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes. Moreover, it is an order of magnitude faster in both index and query time than feature aggregation schemes for these datasets. Code: \url{https://github.com/google-research/google-research/tree/master/cann}	翻訳日:2024-01-03 02:23:46 公開日:2023-12-29
# 契約によるコンテキストロボットミッションの正しい構成設計 Correct-by-Construction Design of Contextual Robotic Missions Using Contracts ( http://arxiv.org/abs/2306.08144v3 ) ライセンス: Link先を確認	Piergiuseppe Mallozzi, Pierluigi Nuzzo, Nir Piterman, Gerardo Schneider, Patrizio Pelliccione	(参考訳) ロボットミッションを効果的に指定し実装することは、ロボットシステムのソフトウェア工学にいくつかの課題をもたらす。これらの課題は、現実の運用環境において、さまざまなアプリケーションシナリオや状況(コンテキストとしても知られる)を考慮して、ロボットのハイレベルなタスクを形式化し実行する必要があることに起因する。複数のコンテキストを明示的に記述した正確なミッション仕様を書くのは面倒でエラーを起こしやすい。さらに、コンテキストの数が増え、したがって仕様の複雑さが増すにつれて、コンストラクションの正しい実装(例えば、合成法を使って)を生成することができる。これらの問題に対処するための実行可能なアプローチは、ミッション仕様をより小さく管理可能なサブミッションに分解し、それぞれのサブミッションを特定のコンテキストに合わせて調整することである。しかしながら、この構成的アプローチは、ミッション全体の正しさを保証するために、独自の課題を導入する。本稿では,前提-保証契約を用いたコンテキストロボットミッションの特定と実装のための新しい構成フレームワークを提案する。ミッション仕様は階層的でモジュラーな方法で構成されており、各サブミッションを独立したロボットコントローラとして合成することができる。本稿では,事前定義された条件下での精度を確保しつつ,サブミッションコントローラ間の動的切り替えの問題に対処する。 Effectively specifying and implementing robotic missions poses a set of challenges to software engineering for robotic systems. These challenges stem from the need to formalize and execute a robot's high-level tasks while considering various application scenarios and conditions, also known as contexts, in real-world operational environments. Writing correct mission specifications that explicitly account for multiple contexts can be tedious and error-prone. Furthermore, as the number of contexts, and consequently the complexity of the specification, increases, generating a correct-by-construction implementation (e.g., by using synthesis methods) can become intractable. A viable approach to address these issues is to decompose the mission specification into smaller, manageable sub-missions, with each sub-mission tailored to a specific context. Nevertheless, this compositional approach introduces its own set of challenges in ensuring the overall mission's correctness. In this paper, we propose a novel compositional framework for specifying and implementing contextual robotic missions using assume-guarantee contracts. The mission specification is structured in a hierarchical and modular fashion, allowing for each sub-mission to be synthesized as an independent robot controller. We address the problem of dynamically switching between sub-mission controllers while ensuring correctness under predefined conditions.	翻訳日:2024-01-03 02:22:31 公開日:2023-12-29
# villandiffusion:拡散モデルのための統一バックドア攻撃フレームワーク VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models ( http://arxiv.org/abs/2306.06874v5 ) ライセンス: Link先を確認	Sheng-Yen Chou, Pin-Yu Chen, Tsung-Yi Ho	(参考訳) 拡散モデル(dms)は、反復的ノイズ付加と雑音除去から可逆的破壊過程を学ぶ最先端の生成モデルである。これらは、テキストから画像への条件生成など、多くの生成AIアプリケーションのバックボーンである。しかし、最近の研究では、基本的な無条件DM(DDPMやDDIMなど)は、モデル入力における悪意ある埋め込みパターンによって引き起こされる出力操作攻撃であるバックドアインジェクションに弱いことが示されている。本稿では,dmsのバックドア解析の現在の範囲を拡大するための統一バックドアアタックフレームワーク(villandiffusion)を提案する。本フレームワークは, 主流の非条件および条件付きDM(デノジングベースおよびスコアベース)と, 総合評価のための各種トレーニングフリーサンプリングを対象とする。実験により,dm構成のバックドア解析を容易にするとともに,dmsに対するキャプションに基づくバックドア攻撃に対する新たな洞察を提供する。私たちのコードはgithubで入手できる。 \url{https://github.com/ibm/villandiffusion} Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising. They are the backbone of many generative AI applications, such as text-to-image conditional generation. However, recent studies have shown that basic unconditional DMs (e.g., DDPM and DDIM) are vulnerable to backdoor injection, a type of output manipulation attack triggered by a maliciously embedded pattern at model input. This paper presents a unified backdoor attack framework (VillanDiffusion) to expand the current scope of backdoor analysis for DMs. Our framework covers mainstream unconditional and conditional DMs (denoising-based and score-based) and various training-free samplers for holistic evaluations. Experiments show that our unified framework facilitates the backdoor analysis of different DM configurations and provides new insights into caption-based backdoor attacks on DMs. Our code is available on GitHub: \url{https://github.com/IBM/villandiffusion}	翻訳日:2024-01-03 02:21:47 公開日:2023-12-29
# 高輝度LHCにおけるデータ圧縮のための地球モーバー距離の微分 Differentiable Earth Mover's Distance for Data Compression at the High-Luminosity LHC ( http://arxiv.org/abs/2306.04712v3 ) ライセンス: Link先を確認	Rohan Shenoy and Javier Duarte and Christian Herwig and James Hirschauer and Daniel Noonan and Maurizio Pierini and Nhan Tran and Cristina Mantilla Suarez	(参考訳) 地球移動器距離(Earth mover's distance、EMD)は画像認識と分類に有用な指標であるが、通常の実装は微分可能ではなく、勾配降下による他のアルゴリズムを訓練するための損失関数として使うには遅すぎる。本稿では,畳み込みニューラルネットワーク(CNN)を用いて,EMDの微分可能かつ高速な近似を学習し,計算集約型EMD実装の代替として使用できることを示す。この微分可能な近似を、cernの高輝度lhcにおけるデータ圧縮のためのautoencoder-inspired neural network(encoder nn)のトレーニングに適用する。このエンコーダNNの目標は、粒子検出器内のエネルギー蓄積の分布に関する情報を保存しながらデータを圧縮することである。 EMD CNNを用いて訓練したエンコーダNNの性能が平均二乗誤差に基づく損失関数付きトレーニングよりも優れていることを示す。 The Earth mover's distance (EMD) is a useful metric for image recognition and classification, but its usual implementations are not differentiable or too slow to be used as a loss function for training other algorithms via gradient descent. In this paper, we train a convolutional neural network (CNN) to learn a differentiable, fast approximation of the EMD and demonstrate that it can be used as a substitute for computing-intensive EMD implementations. We apply this differentiable approximation in the training of an autoencoder-inspired neural network (encoder NN) for data compression at the high-luminosity LHC at CERN. The goal of this encoder NN is to compress the data while preserving the information related to the distribution of energy deposits in particle detectors. We demonstrate that the performance of our encoder NN trained using the differentiable EMD CNN surpasses that of training with loss functions based on mean squared error.	翻訳日:2024-01-03 02:20:20 公開日:2023-12-29
# cert: 濃度推定のレンズを通してデータベースシステムの性能問題を見つける CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality Estimation ( http://arxiv.org/abs/2306.00355v2 ) ライセンス: Link先を確認	Jinsheng Ba, Manuel Rigger	(参考訳) データベース管理システム(DBMS)は、クエリ計画を作成して所定のクエリを処理し、その後実行し、クエリの結果を計算する。効率的なクエリ計画の導出は困難であり、学術と産業の両方がクエリ最適化の研究に何十年も費やしている。しかし、DBMSはパフォーマンス上の問題になりがちで、DBMSは予期しないほど非効率なクエリプランを生成し、クエリの実行が遅くなる可能性がある。このような問題を見つけることは長年の問題であり、期待される実行時間に関する根拠となる真理情報は存在しないため、本質的に困難である。本研究では,濃度推定のレンズを通して性能問題を見つける新しい手法である濃度推定制限テスト(cert)を提案する。データベース上のクエリが与えられた場合、CERTはより制限的なクエリ(例えば、LEFT JOINをINNER JOINに置き換えるなど)を導出する。 CERTテストでは,クエリ最適化の最も重要な部分であることが示され,このような問題の発見と修正が最高のパフォーマンス向上をもたらすことを期待している。さらに、他の種類のクエリ最適化問題は、CERTでも見られる予期せぬ推定基準によって明らかにできることがわかった。 CERTはソースコードへのアクセスを必要としないブラックボックス技術であり、DBMSはEXPLAINステートメントを通じてクエリプランを公開する。 certはクエリの実行を回避し、コストがかかり、パフォーマンスの変動が発生しやすい。 CERTを広く使われている3つの成熟DBMS、MySQL、TiDB、CockroachDBで評価した。 CERTは13のユニークな問題を発見し、そのうち2つは修正され、9つは開発者によって確認された。私たちはDBMS開発者がDBMBSのパフォーマンスを改善するのに役立つパフォーマンスバグを見つけるための新しいアングルを期待しています。 Database Management Systems (DBMSs) process a given query by creating a query plan, which is subsequently executed, to compute the query's result. Deriving an efficient query plan is challenging, and both academia and industry have invested decades into researching query optimization. Despite this, DBMSs are prone to performance issues, where a DBMS produces an unexpectedly inefficient query plan that might lead to the slow execution of a query. Finding such issues is a longstanding problem and inherently difficult, because no ground truth information on an expected execution time exists. In this work, we propose Cardinality Estimation Restriction Testing (CERT), a novel technique that finds performance issues through the lens of cardinality estimation. Given a query on a database, CERT derives a more restrictive query (e.g., by replacing a LEFT JOIN with an INNER JOIN), whose estimated number of rows should not exceed the estimated number of rows for the original query. CERT tests cardinality estimation specifically, because they were shown to be the most important part for query optimization; thus, we expect that finding and fixing such issues might result in the highest performance gains. In addition, we found that other kinds of query optimization issues can be exposed by unexpected estimated cardinalities, which can also be found by CERT. CERT is a black-box technique that does not require access to the source code; DBMSs expose query plans via the EXPLAIN statement. CERT eschews executing queries, which is costly and prone to performance fluctuations. We evaluated CERT on three widely used and mature DBMSs, MySQL, TiDB, and CockroachDB. CERT found 13 unique issues, of which 2 issues were fixed and 9 confirmed by the developers. We expect that this new angle on finding performance bugs will help DBMS developers in improving DMBSs' performance.	翻訳日:2024-01-03 02:20:05 公開日:2023-12-29
# botartist: twitterのサスペンションに基づくtwitterボット検出機械学習モデル BotArtist: Twitter bot detection Machine Learning model based on Twitter suspension ( http://arxiv.org/abs/2306.00037v3 ) ライセンス: Link先を確認	Alexander Shevtsov, Despoina Antonakaki, Ioannis Lamprou, Polyvios Pratikakis, Sotiris Ioannidis	(参考訳) Twitterは最も人気のあるソーシャルネットワークの1つで、コミュニケーションとオンライン会話のための手段を提供しているが、残念ながらボットや偽アカウントのターゲットであり、偽情報の操作と拡散につながっている。この目的に向けて、我々は、最近のロシア・ウクライナ戦争に関する900万人のユーザーから生まれた、Twitter上での難解で多言語的なソーシャル談話データセットを収集し、ボットアカウントとそれらに関わる会話を検出する。 Twitter APIの停止アカウントコレクションには,約343Kのボットアカウントと8Mの一般ユーザが含まれています。さらに、Botometer-V3が提供するデータセットには、1,777のVarol、483のドイツアカウント、1,321の米国アカウントがあります。公開データセットの他に、2022年のエネルギー危機と2022年の陰謀に関する一般的な議論に関する2つの独立したデータセットも収集しています。どちらのデータセットも、twitterのサスペンションメカニズムに従ってラベル付けされた。我々は最先端のXGBoostモデルを用いたボット検出のための新しいMLモデルを構築した。 Twitterのサスペンションメカニズムの真実に則って、このモデルを大量のラベル付きツイートと組み合わせています。これは、Twitter APIとは独立しているため、コレクションから異なる期間でデータセットのラベル付けを可能にする、限定的なプロファイル機能を必要とする。ボットメーターと比較すると,本手法は2つの実例のシナリオデータセットよりも平均11%高いroc-aucスコアが得られる。 Twitter as one of the most popular social networks, offers a means for communication and online discourse, which unfortunately has been the target of bots and fake accounts, leading to the manipulation and spreading of false information. Towards this end, we gather a challenging, multilingual dataset of social discourse on Twitter, originating from 9M users regarding the recent Russo-Ukrainian war, in order to detect the bot accounts and the conversation involving them. We collect the ground truth for our dataset through the Twitter API suspended accounts collection, containing approximately 343K of bot accounts and 8M of normal users. Additionally, we use a dataset provided by Botometer-V3 with 1,777 Varol, 483 German accounts, and 1,321 US accounts. Besides the publicly available datasets, we also manage to collect 2 independent datasets around popular discussion topics of the 2022 energy crisis and the 2022 conspiracy discussions. Both of the datasets were labeled according to the Twitter suspension mechanism. We build a novel ML model for bot detection using the state-of-the-art XGBoost model. We combine the model with a high volume of labeled tweets according to the Twitter suspension mechanism ground truth. This requires a limited set of profile features allowing labeling of the dataset in different time periods from the collection, as it is independent of the Twitter API. In comparison with Botometer our methodology achieves an average 11% higher ROC-AUC score over two real-case scenario datasets.	翻訳日:2024-01-03 02:19:08 公開日:2023-12-29
# 合成画像のオープンセットアーキテクチャ属性に対するシームズによる検証システム A Siamese-based Verification System for Open-set Architecture Attribution of Synthetic Images ( http://arxiv.org/abs/2307.09822v2 ) ライセンス: Link先を確認	Lydia Abady, Jun Wang, Benedetta Tondi, Mauro Barni	(参考訳) 合成画像属性のための様々な手法が開発されているが、そのほとんどはトレーニングセットに含まれるモデルやアーキテクチャによって生成された画像のみを属性とすることができ、未知のアーキテクチャでは動作せず、現実のシナリオにおける適用性を妨げている。本稿では,合成画像から生成したアーキテクチャへのオープンセット帰属問題に対処するために,シャムネットワークを利用する検証フレームワークを提案する。私たちは2つの異なる設定を考えます。最初の設定では、2つの画像が同じ生成アーキテクチャで作成されたか否かを判定する。第2設定では、システムは、クレームアーキテクチャによって生成された1つまたは複数の参照画像を利用して、合成画像を生成するために使用されるアーキテクチャに関するクレームを検証する。提案システムの主な強みは、クローズドシナリオとオープンセットシナリオの両方で動作可能であり、入力画像(クエリ画像と参照画像の両方)が、トレーニング中に考慮されたアーキテクチャに属することができることである。 gan,拡散モデル,トランスフォーマなどの様々な生成アーキテクチャを包含する実験評価では,合成顔画像生成に着目し,クローズド設定とオープンセット設定の両方において優れた性能と強力な一般化性能を確認した。 Despite the wide variety of methods developed for synthetic image attribution, most of them can only attribute images generated by models or architectures included in the training set and do not work with unknown architectures, hindering their applicability in real-world scenarios. In this paper, we propose a verification framework that relies on a Siamese Network to address the problem of open-set attribution of synthetic images to the architecture that generated them. We consider two different settings. In the first setting, the system determines whether two images have been produced by the same generative architecture or not. In the second setting, the system verifies a claim about the architecture used to generate a synthetic image, utilizing one or multiple reference images generated by the claimed architecture. The main strength of the proposed system is its ability to operate in both closed and open-set scenarios so that the input images, either the query and reference images, can belong to the architectures considered during training or not. Experimental evaluations encompassing various generative architectures such as GANs, diffusion models, and transformers, focusing on synthetic face image generation, confirm the excellent performance of our method in both closed and open-set settings, as well as its strong generalization capabilities.	翻訳日:2024-01-03 02:11:13 公開日:2023-12-29
# Vision-Language Modelsは良いゲストになれるか? 時間と位置推論のためのVLMの探索 Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning ( http://arxiv.org/abs/2307.06166v2 ) ライセンス: Link先を確認	Gengyuan Zhang, Yurui Zhang, Kerui Zhang, Volker Tresp	(参考訳) 視覚言語モデル(vlms)は、常識的な知識を人間として推論できると期待されている。一つの例は、人間が知識に基づいて画像がどこでいつ撮影されるのかを判断できるということである。これは、視覚的な手がかりに基づいて、大規模な画像テキストリソースで事前訓練された視覚言語モデルが、推論時間と位置において人間の能力を上回ることができるかどうかを疑問視させる。そこで本研究では、VLMが時間や位置関連の特徴を認識できるかどうかを識別・生成するVLMに応用した2段階の認識空間探索タスクを提案する。この調査を容易にするために,リッチな社会文化的な手がかりで画像を合成する画像データセットWikiTiLoを紹介した。広範にわたる実験的研究において、VLMは視覚エンコーダの関連性を効果的に維持できるが、完全な推論ができないことが判明した。将来の研究を促進するために、データセットとコードをリリースします。 Vision-Language Models (VLMs) are expected to be capable of reasoning with commonsense knowledge as human beings. One example is that humans can reason where and when an image is taken based on their knowledge. This makes us wonder if, based on visual cues, Vision-Language Models that are pre-trained with large-scale image-text resources can achieve and even outperform human's capability in reasoning times and location. To address this question, we propose a two-stage \recognition\space and \reasoning\space probing task, applied to discriminative and generative VLMs to uncover whether VLMs can recognize times and location-relevant features and further reason about it. To facilitate the investigation, we introduce WikiTiLo, a well-curated image dataset compromising images with rich socio-cultural cues. In the extensive experimental studies, we find that although VLMs can effectively retain relevant features in visual encoders, they still fail to make perfect reasoning. We will release our dataset and codes to facilitate future studies.	翻訳日:2024-01-03 02:10:01 公開日:2023-12-29
# 大規模言語モデルの評価に関する調査 A Survey on Evaluation of Large Language Models ( http://arxiv.org/abs/2307.03109v9 ) ライセンス: Link先を確認	Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, Xing Xie	(参考訳) 大規模言語モデル(LLM)は、様々なアプリケーションにおける前例のない性能のため、学術と産業の両方で人気が高まっている。 LLMは研究と日常利用の両方において重要な役割を担い続けており、その評価はタスクレベルだけでなく社会レベルでもますます重要になり、潜在的なリスクの理解を深めている。過去数年間、様々な観点からLSMを調べるための重要な努力が続けられてきた。本稿では, これらのLCMの評価手法を総合的に検討し, 評価方法, 評価方法, 評価方法の3つの重要な側面に着目した。まず,一般的な自然言語処理タスク,推論,医療利用,倫理,教育,自然科学,社会科学,エージェント応用など,評価タスクの観点から概観する。第2に,LLMの性能評価において重要な要素である評価手法とベンチマークに飛び乗ることで,'where' と 'how' の質問に答える。次に、異なるタスクにおけるLCMの成功事例と失敗事例を要約する。最後に、llms評価の先にあるいくつかの将来の課題に光を当てた。我々の目的は、LLMの評価の領域における研究者に貴重な洞察を提供することであり、それによってより熟練したLLMの開発を支援することである。我々のキーポイントは、LCMの開発を支援するために、評価を必須の規律として扱うべきであるということです。関連したオープンソース資料は、https://github.com/mlgroupjlu/llm-eval-surveyで一貫して保守しています。 Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, educations, natural and social sciences, agent applications, and other areas. Secondly, we answer the `where' and `how' questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey.	翻訳日:2024-01-03 02:09:41 公開日:2023-12-29
# 核多体系における多体絡み合いと情報再構成 Multi-Body Entanglement and Information Rearrangement in Nuclear Many-Body Systems ( http://arxiv.org/abs/2306.16535v2 ) ライセンス: Link先を確認	S. Momme Hengstenberg, Caroline E. P. Robin, Martin J. Savage	(参考訳) 核多体系の有効モデル空間(EMS)計算について検討し,多粒子エンタングルメントの収束について検討した。一般化リプキン・メシュコフ・グリク(lmg)モデルは、核の絡み合い駆動記述の将来の発展の動機付けと洞察を提供するために用いられる。効果的なアプローチはヒルベルト空間の切り離しと、関連する基本自由度を構成するクォービット(スピン)の変分回転に基づいている。回転と切り離しの非可換性により、モデル空間の大部分でエネルギー収束が指数関数的に改善される。本分析では, 相関と絡み合いの測定を行い, その収束度をカットオフの増加とともに定量化する。マルチボディの絡み合いを推定するために, 1 および 2 スピンの絡み合いエントロピー,相互情報,および $n$-tangles に焦点を当てた。実効的な記述は回転したスピンのエントロピーや相互情報を強く抑制し、低いカットオフで正確な結果を広範囲に回収することができる。一方、素ハミルトニアンのネーブ・トランケーションは、これらの測度を人工的に過小評価する。本モデルにおけるn$-tangles は、n$-particle の絡み合いの基底独立測度を提供する。 EMSの記述ではこれらを捉えるのが難しいが、最小のハミルトニアンのトランケーションに比べて収束の改善は著しく劇的である。低エネルギーems法は多体系における低次オブザーバブルの予測能力を提供し、lmgモデルにおける量子相関や多体絡み合いの類似性を示し、核多体系や高エネルギー物理学や核物理学に関連する実効場理論の研究を動機付けるものであると結論づける。 We examine how effective-model-space (EMS) calculations of nuclear many-body systems rearrange and converge multi-particle entanglement. The generalized Lipkin-Meshkov-Glick (LMG) model is used to motivate and provide insight for future developments of entanglement-driven descriptions of nuclei. The effective approach is based on a truncation of the Hilbert space together with a variational rotation of the qubits (spins), which constitute the relevant elementary degrees of freedom. The non-commutivity of the rotation and truncation allows for an exponential improvement of the energy convergence throughout much of the model space. Our analysis examines measures of correlations and entanglement, and quantifies their convergence with increasing cut-off. We focus on one- and two-spin entanglement entropies, mutual information, and $n$-tangles for $n=2,4$ to estimate multi-body entanglement. The effective description strongly suppresses entropies and mutual information of the rotated spins, while being able to recover the exact results to a large extent with low cut-offs. Naive truncations of the bare Hamiltonian, on the other hand, artificially underestimate these measures. The $n$-tangles in the present model provide a basis-independent measures of $n$-particle entanglement. While these are more difficult to capture with the EMS description, the improvement in convergence, compared to truncations of the bare Hamiltonian, is significantly more dramatic. We conclude that the low-energy EMS techniques, that successfully provide predictive capabilities for low-lying observables in many-body systems, exhibit analogous efficacy for quantum correlations and multi-body entanglement in the LMG model, motivating future studies in nuclear many-body systems and effective field theories relevant to high-energy physics and nuclear physics.	翻訳日:2024-01-03 02:08:51 公開日:2023-12-29
# NeuroCLIP: CLIP と SNNによるニューロモルフィックデータ理解 NeuroCLIP: Neuromorphic Data Understanding by CLIP and SNN ( http://arxiv.org/abs/2306.12073v2 ) ライセンス: Link先を確認	Yufei Guo and Yuanpei Chen and Zhe Ma	(参考訳) 近年,ニューロモルフィック視覚センサが注目されている。 However, the neuromorphic data consists of asynchronous event spikes, which makes it difficult to construct a big benchmark to train a power general neural network model, thus limiting the neuromorphic data understanding for ``unseen" objects by deep learning. While for the frame image, since the training data can be obtained easily, the zero-shot and few-shot learning for ``unseen" task via the large Contrastive Vision-Language Pre-training (CLIP) model, which is pre-trained by large-scale image-text pairs in 2D, have shown inspirational performance. 問題に対処するために,CLIPをニューロモルフィックなデータ認識に移行できるのではないかと考えた。この目的のために、論文ではこのアイデアをneuroclipで実現している。 NeuroCLIPは2D CLIPとニューロモルフィックデータ理解のための2つの特別に設計されたモジュールで構成されている。まず、イベントスパイクを単純な識別戦略でシーケンシャルなフレームイメージに変換するイベントフレームモジュール。第2に、CLIPのビジュアルエンコーダから来る逐次的機能に対して、スパイキングニューラルネットワーク(SNN)をベースとしたシンプルな微調整アダプタである、タイムステップ間アダプタにより、数ショットのパフォーマンスが向上する。 N-MNIST、CIFAR10-DVS、ES-ImageNetなどのニューロモルフィックデータセットに関する様々な実験は、NeuroCLIPの有効性を実証している。私たちのコードはhttps://github.com/yfguo91/neuroclip.gitでオープンソースです。 Recently, the neuromorphic vision sensor has received more and more interest. However, the neuromorphic data consists of asynchronous event spikes, which makes it difficult to construct a big benchmark to train a power general neural network model, thus limiting the neuromorphic data understanding for ``unseen" objects by deep learning. While for the frame image, since the training data can be obtained easily, the zero-shot and few-shot learning for ``unseen" task via the large Contrastive Vision-Language Pre-training (CLIP) model, which is pre-trained by large-scale image-text pairs in 2D, have shown inspirational performance. We wonder whether the CLIP could be transferred to neuromorphic data recognition to handle the ``unseen" problem. To this end, we materialize this idea with NeuroCLIP in the paper. The NeuroCLIP consists of 2D CLIP and two specially designed modules for neuromorphic data understanding. First, an event-frame module that could convert the event spikes to the sequential frame image with a simple discrimination strategy. Second, an inter-timestep adapter, which is a simple fine-tuned adapter based on a spiking neural network (SNN) for the sequential features coming from the visual encoder of CLIP to improve the few-shot performance. Various experiments on neuromorphic datasets including N-MNIST, CIFAR10-DVS, and ES-ImageNet demonstrate the effectiveness of NeuroCLIP. Our code is open-sourced at https://github.com/yfguo91/NeuroCLIP.git.	翻訳日:2024-01-03 02:06:17 公開日:2023-12-29
# テキスト認識のための自己蒸留正規化コネクショニスト時間的分類損失:単純かつ効果的なアプローチ Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach ( http://arxiv.org/abs/2308.08806v4 ) ライセンス: Link先を確認	Ziyin Zhang, Ning Lu, Minghui Liao, Yongshuai Huang, Cheng Li, Min Wang and Wei Peng	(参考訳) テキスト認識手法は急速に発展しつつある。強力なモジュール、言語モデル、un-および半教師なしの学習スキームなど、いくつかの高度なテクニックは、公開ベンチマークのパフォーマンスを継続的に押し上げる。しかし、損失関数の観点から、テキスト認識モデルをいかに最適化するかという問題は概ね見過ごされている。 CTCに基づく手法は、性能と推論速度のバランスが良く、精度の低下に苦慮しているため、実際に広く用いられている。 CTC損失は、個々の文字を学習することを無視しながら、シーケンスターゲット全体の最適化を強調するためである。本稿では,CTCモデルを用いた自己蒸留方式を提案する。フレームワイズ正規化項をctc損失に取り入れ、個々の監督を強調し、潜在アライメントの最大化後アライメントを活用し、ctcベースのモデル間の蒸留で生じる不整合問題を解決する。正規化ctc損失を蒸留接続主義時間的分類 (dctc) 損失と呼ぶ。 DCTCの損失はモジュールフリーで、余分なパラメータや推論遅延、追加のトレーニングデータやフェーズを必要としない。公開ベンチマークの大規模な実験は、DCTCがこれらの欠点を全くなく、テキスト認識モデルの精度を最大2.6%向上させることができることを示した。 Text recognition methods are gaining rapid development. Some advanced techniques, e.g., powerful modules, language models, and un- and semi-supervised learning schemes, consecutively push the performance on public benchmarks forward. However, the problem of how to better optimize a text recognition model from the perspective of loss functions is largely overlooked. CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with accuracy degradation. This is because CTC loss emphasizes the optimization of the entire sequence target while neglecting to learn individual characters. We propose a self-distillation scheme for CTC-based model to address this issue. It incorporates a framewise regularization term in CTC loss to emphasize individual supervision, and leverages the maximizing-a-posteriori of latent alignment to solve the inconsistency problem that arises in distillation between CTC-based models. We refer to the regularized CTC loss as Distillation Connectionist Temporal Classification (DCTC) loss. DCTC loss is module-free, requiring no extra parameters, longer inference lag, or additional training data or phases. Extensive experiments on public benchmarks demonstrate that DCTC can boost text recognition model accuracy by up to 2.6%, without any of these drawbacks.	翻訳日:2024-01-03 01:59:26 公開日:2023-12-29
# 量子計測理論における正準占有状態(マクロ)のエントロピー Entropy of the Canonical Occupancy (Macro) State in the Quantum Measurement Theory ( http://arxiv.org/abs/2308.04472v6 ) ライセンス: Link先を確認	Arnaldo Spalvieri	(参考訳) 本稿では,任意の数の非相互作用ボソンからなる平衡状態における占有数とエントロピーの確率分布を解析した。確率分布は、環境統合と関心システム(経験的アプローチ)のボソニック固有状態から環境をトレースすることと、環境統合と関心システム(ベイズ的アプローチ)の混合状態から環境をトレースすることの両方によって導かれる。熱力学的極限では、この2つは一致し、多項分布に等しい。さらに, 熱力学的エントロピーの古典的解析において生じる矛盾を解消し, ボゾン系の物理的エントロピーと占有数のシャノンエントロピーを同定することを提案する。最後に、多項分布のエントロピーと多変量超幾何分布のエントロピーとの情報理論的不等式を利用して、ベイジアン主義と経験主義を共通の「情報力学」の枠組みに統合する。 The paper analyzes the probability distribution of the occupancy numbers and the entropy of a system at the equilibrium composed by an arbitrary number of non-interacting bosons. The probability distribution is derived both by tracing out the environment from a bosonic eigenstate of the union of environment and system of interest (the empirical approach) and by tracing out the environment from the mixed state of the union of environment and system of interest (the Bayesian approach). In the thermodynamic limit, the two coincide and are equal to the multinomial distribution. Furthermore, the paper proposes to identify the physical entropy of the bosonic system with the Shannon entropy of the occupancy numbers, fixing certain contradictions that arise in the classical analysis of thermodynamic entropy. Finally, by leveraging an information-theoretic inequality between the entropy of the multinomial distribution and the entropy of the multivariate hypergeometric distribution, Bayesianism and empiricism are integrated into a common ''infomechanical'' framework.	翻訳日:2024-01-03 01:57:10 公開日:2023-12-29
# 基本となるパターンを明らかにする:データセットの類似性、パフォーマンス、一般化 Revealing the Underlying Patterns: Investigating Dataset Similarity, Performance, and Generalization ( http://arxiv.org/abs/2308.03580v3 ) ライセンス: Link先を確認	Akshit Achara, Ram Krishna Pandey	(参考訳) 教師付きディープラーニングモデルは、特定のタスクで許容可能なパフォーマンスを達成するために、大量のラベル付きデータを必要とする。しかし、見当たらないデータでテストすると、そのモデルはうまく機能しないかもしれない。したがって、モデルは一般化を改善するために追加および様々なラベル付きデータで訓練される必要がある。本研究の目的は,モデルとその性能,一般化を理解することである。モデル動作に関する洞察を得るために、画像イメージ、データセット、画像データセット距離を確立する。提案する距離メトリクスとモデル性能を組み合わせることで,候補アーキテクチャのプールから適切なモデル/アーキテクチャを選択することができる。これらのモデルの一般化は,少数の未確認画像(例えば,1,3,7)をトレーニングセットに追加するだけで改善できることを示した。提案手法は、動的環境における未知のデータに対するモデル性能の推定を行い、トレーニングとアノテーションのコストを削減する。 Supervised deep learning models require significant amount of labeled data to achieve an acceptable performance on a specific task. However, when tested on unseen data, the models may not perform well. Therefore, the models need to be trained with additional and varying labeled data to improve the generalization. In this work, our goal is to understand the models, their performance and generalization. We establish image-image, dataset-dataset, and image-dataset distances to gain insights into the model's behavior. Our proposed distance metric when combined with model performance can help in selecting an appropriate model/architecture from a pool of candidate architectures. We have shown that the generalization of these models can be improved by only adding a small number of unseen images (say 1, 3 or 7) into the training set. Our proposed approach reduces training and annotation costs while providing an estimate of model performance on unseen data in dynamic environments.	翻訳日:2024-01-03 01:56:15 公開日:2023-12-29
# スレート政策の迅速な最適化 - plackett-luceを超越 Fast Slate Policy Optimization: Going Beyond Plackett-Luce ( http://arxiv.org/abs/2308.01566v2 ) ライセンス: Link先を確認	Otmane Sakhi, David Rohde, Nicolas Chopin	(参考訳) 大規模機械学習システムのますます重要になっているビルディングブロックは、スレートを返すことに基づいている。この技術には、検索、情報検索、推薦システムが含まれる。アクションスペースが大きい場合には、決定システムは特定の構造に制限され、オンラインクエリを迅速に完了する。本稿では,任意の報酬関数を与えられた大規模意思決定システムの最適化について述べる。我々は,この学習問題を政策最適化フレームワークにキャストし,決定関数の新たな緩和から生まれた新しい種類の政策を提案する。これにより、巨大なアクション空間にスケールする単純で効率的な学習アルゴリズムが実現される。提案手法を一般に採用されているPlanet-Luceポリシークラスと比較し,数百万のアクション空間サイズの問題に対するアプローチの有効性を示す。 An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions.	翻訳日:2024-01-03 01:55:28 公開日:2023-12-29
# 生成AIのセキュリティリスクの特定と修正 Identifying and Mitigating the Security Risks of Generative AI ( http://arxiv.org/abs/2308.14840v4 ) ライセンス: Link先を確認	Clark Barrett, Brad Boyd, Elie Burzstein, Nicholas Carlini, Brad Chen, Jihye Choi, Amrita Roy Chowdhury, Mihai Christodorescu, Anupam Datta, Soheil Feizi, Kathleen Fisher, Tatsunori Hashimoto, Dan Hendrycks, Somesh Jha, Daniel Kang, Florian Kerschbaum, Eric Mitchell, John Mitchell, Zulfikar Ramzan, Khawaja Shams, Dawn Song, Ankur Taly, Diyi Yang	(参考訳) あらゆる主要な技術発明が両用ジレンマを再浮上させ、新しい技術は善と害に使える可能性がある。大規模言語モデル(LLM)や拡散モデルのようなジェネレーティブAI(GenAI)技術は、顕著な能力(例えば、テキスト内学習、コード補完、テキストから画像への生成と編集)を示している。しかし、GenAIは攻撃者も同様に新しい攻撃を発生させ、既存の攻撃の速度と効果を高めるために使うことができる。本稿は、Google(スタンフォード大学とウィスコンシン大学マディソン校が共同で開催した、GenAIによる二重使用ジレンマに関するワークショップの成果を報告する。本論文は包括的ではなく,ワークショップで得られた興味深い知見のいくつかを合成する試みである。この話題について,コミュニティの短期的,長期的目標について論じる。この論文は、この重要なトピックに関する議論の出発点と、研究コミュニティが取り組むべき興味深い問題の両方を提供することを期待している。 Every major technical invention resurfaces the dual-use dilemma -- the new technology has the potential to be used for good as well as for harm. Generative AI (GenAI) techniques, such as large language models (LLMs) and diffusion models, have shown remarkable capabilities (e.g., in-context learning, code-completion, and text-to-image generation and editing). However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks. This paper reports the findings of a workshop held at Google (co-organized by Stanford University and the University of Wisconsin-Madison) on the dual-use dilemma posed by GenAI. This paper is not meant to be comprehensive, but is rather an attempt to synthesize some of the interesting findings from the workshop. We discuss short-term and long-term goals for the community on this topic. We hope this paper provides both a launching point for a discussion on this important topic as well as interesting problems that the research community can work to address.	翻訳日:2024-01-03 01:44:21 公開日:2023-12-29
# CNN分類性能向上のための多段階特徴デコレーション制約 Multi-stage feature decorrelation constraints for improving CNN classification performance ( http://arxiv.org/abs/2308.12880v2 ) ライセンス: Link先を確認	Qiuyu Zhu and Hao Wang and Xuewen Zu and Chengfei Liu	(参考訳) パターン分類に使用される畳み込みニューラルネットワーク(CNN)では、トレーニング損失関数は通常、ネットワークパラメータの正規化制約を除いて、ネットワークの最終出力に適用される。しかし,ネットワーク層数の増加に伴い,ネットワークフロント層に対する損失関数の影響は徐々に減少し,ネットワークパラメータは局所的に最適化される傾向にある。同時に、訓練されたネットワークは特徴のすべての段階で重要な情報冗長性を有しており、全ての段階における特徴マッピングの有効性を低減し、最適方向におけるネットワークのその後のパラメータの変化には影響しないことがわかった。したがって、前段特徴を抑える損失関数を設計し、前段特徴の情報冗長性を排除し、ネットワークのより最適化されたソリューションを得ることができ、さらにネットワークの分類精度を向上させることができる。本稿は,CNNにおいて,有効特徴を洗練し,全ての段階における特徴の相関性を制限することで情報冗長性を解消する多段階的特徴相関損失(MFD Loss)を提案する。 cnnには多数の層があり、実験的比較と分析を通じて、mfd損失はcnnの複数の前面層に作用し、各層と各チャネルの出力特性を制約し、ネットワークトレーニング中に分類損失機能と共同で監督訓練を行う。単一のSoftmax Loss教師付き学習と比較して、いくつかの典型的なCNNでよく使われるデータセットの実験は、Softmax Loss+MFD Lossの分類性能が著しく優れていることを証明している。一方、MFDロスと他の典型的な損失関数の組み合わせ前後の比較実験は、そのよい普遍性を検証する。 For the convolutional neural network (CNN) used for pattern classification, the training loss function is usually applied to the final output of the network, except for some regularization constraints on the network parameters. However, with the increasing of the number of network layers, the influence of the loss function on the network front layers gradually decreases, and the network parameters tend to fall into local optimization. At the same time, it is found that the trained network has significant information redundancy at all stages of features, which reduces the effectiveness of feature mapping at all stages and is not conducive to the change of the subsequent parameters of the network in the direction of optimality. Therefore, it is possible to obtain a more optimized solution of the network and further improve the classification accuracy of the network by designing a loss function for restraining the front stage features and eliminating the information redundancy of the front stage features .For CNN, this article proposes a multi-stage feature decorrelation loss (MFD Loss), which refines effective features and eliminates information redundancy by constraining the correlation of features at all stages. Considering that there are many layers in CNN, through experimental comparison and analysis, MFD Loss acts on multiple front layers of CNN, constrains the output features of each layer and each channel, and performs supervision training jointly with classification loss function during network training. Compared with the single Softmax Loss supervised learning, the experiments on several commonly used datasets on several typical CNNs prove that the classification performance of Softmax Loss+MFD Loss is significantly better. Meanwhile, the comparison experiments before and after the combination of MFD Loss and some other typical loss functions verify its good universality.	翻訳日:2024-01-03 01:43:37 公開日:2023-12-29
# 生涯多エージェント経路探索のための交通流最適化 Traffic Flow Optimisation for Lifelong Multi-Agent Path Finding ( http://arxiv.org/abs/2308.11234v4 ) ライセンス: Link先を確認	Zhe Chen, Daniel Harabor, Jiaoyang Li, Peter J. Stuckey	(参考訳) Multi-Agent Path Finding (MAPF)は、ロボット工学の基本的問題であり、エージェントのチームが衝突のない経路を計算し、全員が共有マップを横切るように要求する。この話題には多くの研究があるが、エージェントの数が増えるにつれて、現在のアルゴリズムはすべて苦労している。主な理由は、既存のアプローチが通常、渋滞を引き起こす自由フロー最適経路を計画しているからである。この問題に取り組むため,我々は,エージェントが混雑回避経路をたどって目的地へ誘導する新しい手法を提案する。各エージェントがひとつの宛先を持つワンショットMAPFと,エージェントが常に新しいタスクを割り当てる生涯MAPFの2つの大規模設定でこのアイデアを評価する。 MAPFの場合、我々のアプローチはソリューションの品質を大幅に改善する。 Lifelong MAPF ではスループットに大きな改善が報告されている。 Multi-Agent Path Finding (MAPF) is a fundamental problem in robotics that asks us to compute collision-free paths for a team of agents, all moving across a shared map. Although many works appear on this topic, all current algorithms struggle as the number of agents grows. The principal reason is that existing approaches typically plan free-flow optimal paths, which creates congestion. To tackle this issue we propose a new approach for MAPF where agents are guided to their destination by following congestion-avoiding paths. We evaluate the idea in two large-scale settings: one-shot MAPF, where each agent has a single destination, and lifelong MAPF, where agents are continuously assigned new tasks. For one-shot MAPF we show that our approach substantially improves solution quality. For Lifelong MAPF we report large improvements in overall throughput.	翻訳日:2024-01-03 01:42:47 公開日:2023-12-29
# 現代の非参照画像とビデオ品質メトリクスの堅牢性と敵攻撃の比較 Comparing the robustness of modern no-reference image- and video-quality metrics to adversarial attacks ( http://arxiv.org/abs/2310.06958v3 ) ライセンス: Link先を確認	Anastasia Antsiferova, Khaled Abud, Aleksandr Gushchin, Ekaterina Shumitskaya, Sergey Lavrushkin, Dmitriy Vatolin	(参考訳) 現在、ニューラルネットワークベースの画像およびビデオ品質指標は、従来の方法よりも優れたパフォーマンスを示している。しかし、視覚的品質を改善することなくメトリクスのスコアを上げる敵攻撃にもより脆弱になった。既存の品質指標のベンチマークは、主観的品質と計算時間との相関の観点からパフォーマンスを比較する。しかし、画像品質指標の敵対的ロバスト性も研究に値する分野である。本稿では,異なる敵攻撃に対する現代のメトリクスの堅牢性を分析する。コンピュータビジョンタスクからの敵意攻撃を適用し,15の非参照画像/ビデオ品質指標に対する攻撃の効率性を比較した。いくつかのメトリクスは、脆弱なメトリクスよりも安全なベンチマークでの使用を可能にする敵攻撃に対する高い抵抗を示した。このベンチマークは、攻撃に対してメトリクスをより堅牢にしたい研究者や、必要に応じてそのようなメトリクスを見つけたい研究者のために、新しいメトリクスの提出を受け入れる。 pip install robustness-benchmarkを使ってベンチマークを試してみよう。 Nowadays neural-network-based image- and video-quality metrics show better performance compared to traditional methods. However, they also became more vulnerable to adversarial attacks that increase metrics' scores without improving visual quality. The existing benchmarks of quality metrics compare their performance in terms of correlation with subjective quality and calculation time. However, the adversarial robustness of image-quality metrics is also an area worth researching. In this paper, we analyse modern metrics' robustness to different adversarial attacks. We adopted adversarial attacks from computer vision tasks and compared attacks' efficiency against 15 no-reference image/video-quality metrics. Some metrics showed high resistance to adversarial attacks which makes their usage in benchmarks safer than vulnerable metrics. The benchmark accepts new metrics submissions for researchers who want to make their metrics more robust to attacks or to find such metrics for their needs. Try our benchmark using pip install robustness-benchmark.	翻訳日:2024-01-03 01:36:53 公開日:2023-12-29
# 変分逆推論を用いたオフライン模倣学習 Offline Imitation Learning with Variational Counterfactual Reasoning ( http://arxiv.org/abs/2310.04706v4 ) ライセンス: Link先を確認	Bowei He, Zexu Sun, Jinxin Liu, Shuai Zhang, Xu Chen, Chen Ma	(参考訳) オフライン模倣学習(il)では、エージェントは、追加のオンライン環境の相互作用なしに最適な専門家の行動方針を学ぶことを目指している。しかし、ロボット操作のような現実世界の多くのシナリオでは、オフラインデータセットは報酬なしで最適な振る舞いから収集される。専門家データが少ないため、エージェントは通常、単に足跡を覚えず、環境の変化に弱いため、新しい環境に一般化する能力が欠如している。本稿では,高品質な専門家データを自動的に生成し,エージェントの一般化能力を向上させるために,デファクト推論を行うことにより,サンダーライン{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA)を提案する。特に、特定可能な変分オートエンコーダを利用して、専門家データ拡張のための \textit{counterfactual} サンプルを生成する。生成した専門家データの影響と一般化の改善を理論的に分析する。さらに,本手法が分散性能のための \textsc{deepmind control suite} ベンチマークと分散一般化のための \textsc{causalworld} ベンチマークの両方において,様々なベースラインを上回ることを示すために,広範な実験を行った。我々のコードは \url{https://github.com/ZexuSun/OILCA-NeurIPS23} で入手できる。 In offline imitation learning (IL), an agent aims to learn an optimal expert behavior policy without additional online environment interactions. However, in many real-world scenarios, such as robotics manipulation, the offline dataset is collected from suboptimal behaviors without rewards. Due to the scarce expert data, the agents usually suffer from simply memorizing poor trajectories and are vulnerable to variations in the environments, lacking the capability of generalizing to new environments. To automatically generate high-quality expert data and improve the generalization ability of the agent, we propose a framework named \underline{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA) by doing counterfactual inference. In particular, we leverage identifiable variational autoencoder to generate \textit{counterfactual} samples for expert data augmentation. We theoretically analyze the influence of the generated expert data and the improvement of generalization. Moreover, we conduct extensive experiments to demonstrate that our approach significantly outperforms various baselines on both \textsc{DeepMind Control Suite} benchmark for in-distribution performance and \textsc{CausalWorld} benchmark for out-of-distribution generalization. Our code is available at \url{https://github.com/ZexuSun/OILCA-NeurIPS23}.	翻訳日:2024-01-03 01:35:45 公開日:2023-12-29
# 敵対的特徴脱感化によるロバスト性強化隆起モデル Robustness-enhanced Uplift Modeling with Adversarial Feature Desensitization ( http://arxiv.org/abs/2310.04693v3 ) ライセンス: Link先を確認	Zexu Sun, Bowei He, Ming Ma, Jiakai Tang, Yuchen Wang, Chen Ma, Dugang Liu	(参考訳) uplift modelingは、オンラインマーケティングにおいて非常に有望な結果を示している。しかし、既存の作品の多くは、いくつかの実用的応用においてロバスト性に挑戦しがちである。本稿では,この現象の考えられる説明を最初に提示する。我々は,様々な実世界のデータセットを用いたオンラインマーケティングにおいて,いくつかの重要な特徴の摂動が上昇モデルの性能に重大な影響を与え,また逆の傾向を引き起こすような,特徴の感度問題が存在することを検証した。上記の問題を解決するために, 対角的特徴脱感化(RUAD)を用いた新しい頑健性強化リフトモデリングフレームワークを提案する。具体的には,入力特徴量からキー部分集合を識別するジョイント・マルチラベル・モデリングを備えた機能選択モジュールと,この選択された特徴のサブセットに対するモデルのロバスト性を高めるために,逆トレーニングとソフト補間操作を用いた敵機能デセンシタイズモジュールを含む,2つのカスタマイズモジュールにより,アップリフトモデルの特徴感度をより効果的に緩和する。最後に、オンラインマーケティングにおけるRUADの有効性を検証するために、パブリックデータセットと実際の製品データセットに関する広範な実験を行う。さらに、機能感度に対するruadの堅牢性や、さまざまなアップリフトモデルとの互換性も示しています。 Uplift modeling has shown very promising results in online marketing. However, most existing works are prone to the robustness challenge in some practical applications. In this paper, we first present a possible explanation for the above phenomenon. We verify that there is a feature sensitivity problem in online marketing using different real-world datasets, where the perturbation of some key features will seriously affect the performance of the uplift model and even cause the opposite trend. To solve the above problem, we propose a novel robustness-enhanced uplift modeling framework with adversarial feature desensitization (RUAD). Specifically, our RUAD can more effectively alleviate the feature sensitivity of the uplift model through two customized modules, including a feature selection module with joint multi-label modeling to identify a key subset from the input features and an adversarial feature desensitization module using adversarial training and soft interpolation operations to enhance the robustness of the model against this selected subset of features. Finally, we conduct extensive experiments on a public dataset and a real product dataset to verify the effectiveness of our RUAD in online marketing. In addition, we also demonstrate the robustness of our RUAD to the feature sensitivity, as well as the compatibility with different uplift models.	翻訳日:2024-01-03 01:35:12 公開日:2023-12-29
# PixArt-$\alpha$:フォトリアリスティックテキスト・画像合成のための拡散変換器の高速訓練 PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis ( http://arxiv.org/abs/2310.00426v3 ) ライセンス: Link先を確認	Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, Zhenguo Li	(参考訳) 最も先進的なテキスト・ツー・イメージ(T2I)モデルでは、膨大なトレーニングコスト(GPU時間など)が必要であり、AIGCコミュニティの根本的な革新を著しく妨げつつ、CO2排出量を増大させる。本稿では,最新の画像生成装置 (imagen, sdxl, midjourney など) と画像生成品質が競合するトランスフォーマチックなt2i拡散モデルpixart-$\alpha$について紹介する。さらに、図1と2に示すように、トレーニングコストの低い1024pxまでの高解像度画像合成をサポートする。 To achieve this goal, three core designs are proposed: (1) Training strategy decomposition: We devise three distinct training steps that separately optimize pixel dependency, text-image alignment, and image aesthetic quality; (2) Efficient T2I Transformer: We incorporate cross-attention modules into Diffusion Transformer (DiT) to inject text conditions and streamline the computation-intensive class-condition branch; (3) High-informative data: We emphasize the significance of concept density in text-image pairs and leverage a large Vision-Language model to auto-label dense pseudo-captions to assist text-image alignment learning. その結果、PIXART-$\alpha$のトレーニング速度は既存の大規模T2Iモデルを大きく上回り、例えば、PIXART-$\alpha$は安定拡散v1.5のトレーニング時間(675対6,250 A100 GPU日)の10.8%しか必要とせず、300,000ドル近く節約でき(26,000対320,000ドル)、90%のCO2排出量を削減できる。さらに、より大きなSOTAモデルであるRAPHAELと比較して、トレーニングコストは1%に過ぎません。大規模な実験により、PIXART-$\alpha$は画質、芸術性、セマンティックコントロールに優れていた。 PIXART-$\alpha$はAIGCコミュニティとスタートアップに新たな洞察を与えて、高品質で低コストな生成モデルをスクラッチから構築することを願っている。 The most advanced text-to-image (T2I) models require significant training costs (e.g., millions of GPU hours), seriously hindering the fundamental innovation for the AIGC community while increasing CO2 emissions. This paper introduces PIXART-$\alpha$, a Transformer-based T2I diffusion model whose image generation quality is competitive with state-of-the-art image generators (e.g., Imagen, SDXL, and even Midjourney), reaching near-commercial application standards. Additionally, it supports high-resolution image synthesis up to 1024px resolution with low training cost, as shown in Figure 1 and 2. To achieve this goal, three core designs are proposed: (1) Training strategy decomposition: We devise three distinct training steps that separately optimize pixel dependency, text-image alignment, and image aesthetic quality; (2) Efficient T2I Transformer: We incorporate cross-attention modules into Diffusion Transformer (DiT) to inject text conditions and streamline the computation-intensive class-condition branch; (3) High-informative data: We emphasize the significance of concept density in text-image pairs and leverage a large Vision-Language model to auto-label dense pseudo-captions to assist text-image alignment learning. As a result, PIXART-$\alpha$'s training speed markedly surpasses existing large-scale T2I models, e.g., PIXART-$\alpha$ only takes 10.8% of Stable Diffusion v1.5's training time (675 vs. 6,250 A100 GPU days), saving nearly \$300,000 (\$26,000 vs. \$320,000) and reducing 90% CO2 emissions. Moreover, compared with a larger SOTA model, RAPHAEL, our training cost is merely 1%. Extensive experiments demonstrate that PIXART-$\alpha$ excels in image quality, artistry, and semantic control. We hope PIXART-$\alpha$ will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.	翻訳日:2024-01-03 01:33:54 公開日:2023-12-29
# 非可換位相空間におけるディラック方程式のエレンフェストの理論 Ehrenfest's Theorem for the Dirac Equation in Noncommutative Phase-Space ( http://arxiv.org/abs/2309.16360v2 ) ライセンス: Link先を確認	Ilyas Haouam	(参考訳) 本稿では,ディラック粒子の位置と運動運動運動量演算子の時間微分を電磁場と非可換な設定で計算する非可換位相空間におけるディラック方程式からehrenfestの定理を考察する。これにより、位相空間の非可換性がエレンフェストの定理に及ぼす影響を調べることができる。線形boppシフトとmoyal-weyl積の両方で、非可換性が挿入される。 In this article, we investigate Ehrenfest's theorem from the Dirac equation in a noncommutative phase-space where we calculate the time derivative of the position and the kinetic momentum operators for Dirac particles in interaction with electromagnetic field and within a noncommutative setting. This allows examining the effect of the phase-space noncommutativity on Ehrenfest's theorem. Knowing that with both the linear Bopp-Shift and Moyal-Weyl product, the noncommutativity is inserted.	翻訳日:2024-01-03 01:33:15 公開日:2023-12-29
# 測度輸送による密度推定:生物科学への応用への展望 Density Estimation via Measure Transport: Outlook for Applications in the Biological Sciences ( http://arxiv.org/abs/2309.15366v2 ) ライセンス: Link先を確認	Vanessa Lopez-Marrero, Patrick R. Johnstone, Gilchan Park, Xihaier Luo	(参考訳) 測定輸送手法の利点の1つは、広範囲の確率測度に従って分散されたデータの処理と分析のための統一的なフレームワークを可能にすることである。本研究は, 生体科学研究を支援するためのワークフローの一環として, 三角輸送地図を用いた輸送技術の測定の可能性を評価することを目的とした計算研究の結果を提示する。放射能生物学のような領域で一般的なデータシナリオは特に興味深い。データが少ない場合、疎いトランスポートマップは有利である。特に、利用可能なデータサンプルの集合の一連のランダムに選択されたサブセットに基づいて訓練された一連の(少ない)適応輸送マップから集められた統計は、データに隠された情報を明らかにする。その結果, 放射線生物応用において, 本手法は, 放射線照射下での遺伝子関係とそのダイナミクスに関する仮説を生成するためのツールを提供する。 One among several advantages of measure transport methods is that they allow for a unified framework for processing and analysis of data distributed according to a wide class of probability measures. Within this context, we present results from computational studies aimed at assessing the potential of measure transport techniques, specifically, the use of triangular transport maps, as part of a workflow intended to support research in the biological sciences. Scarce data scenarios, which are common in domains such as radiation biology, are of particular interest. We find that when data is scarce, sparse transport maps are advantageous. In particular, statistics gathered from computing series of (sparse) adaptive transport maps, trained on a series of randomly chosen subsets of the set of available data samples, leads to uncovering information hidden in the data. As a result, in the radiation biology application considered here, this approach provides a tool for generating hypotheses about gene relationships and their dynamics under radiation exposure.	翻訳日:2024-01-03 01:32:39 公開日:2023-12-29
# 心理指標を用いた汎用AIの評価 Evaluating General-Purpose AI with Psychometrics ( http://arxiv.org/abs/2310.16379v2 ) ライセンス: Link先を確認	Xiting Wang, Liming Jiang, Jose Hernandez-Orallo, David Stillwell, Luning Sun, Fang Luo, Xing Xie	(参考訳) 大規模言語モデルのような汎用AIシステムの包括的かつ正確な評価は、リスクを効果的に軽減し、その能力のより深い理解を可能にする。現在の評価手法は、主に特定のタスクのベンチマークに基づいており、現在の技術では、予期せぬタスクのパフォーマンスを予測し、特定のタスク項目やユーザ入力におけるパフォーマンスを説明する科学的基盤が欠けているため、これらの多用途aiシステムを適切に評価することができない。さらに、特定のタスクの既存のベンチマークでは、信頼性と妥当性に関する懸念が高まっている。これらの課題に対処するため,タスク指向評価から構成指向評価への移行を提案する。心理学的測定の科学である心理計測学は、複数のタスクでパフォーマンスを損なう潜在構造を特定し測定するための厳密な方法論を提供する。そのメリットを議論し,潜在的な落とし穴に対して警告するとともに,それを実践するための枠組みを提案する。最後に、心理測定と汎用AIシステムの評価を統合する将来の機会について検討する。 Comprehensive and accurate evaluation of general-purpose AI systems such as large language models allows for effective mitigation of their risks and deepened understanding of their capabilities. Current evaluation methodology, mostly based on benchmarks of specific tasks, falls short of adequately assessing these versatile AI systems, as present techniques lack a scientific foundation for predicting their performance on unforeseen tasks and explaining their varying performance on specific task items or user inputs. Moreover, existing benchmarks of specific tasks raise growing concerns about their reliability and validity. To tackle these challenges, we suggest transitioning from task-oriented evaluation to construct-oriented evaluation. Psychometrics, the science of psychological measurement, provides a rigorous methodology for identifying and measuring the latent constructs that underlie performance across multiple tasks. We discuss its merits, warn against potential pitfalls, and propose a framework to put it into practice. Finally, we explore future opportunities of integrating psychometrics with the evaluation of general-purpose AI systems.	翻訳日:2024-01-03 01:25:19 公開日:2023-12-29
# notechat: 臨床ノートに基づく総合的な医師・患者会話のデータセット NoteChat: A Dataset of Synthetic Doctor-Patient Conversations Conditioned on Clinical Notes ( http://arxiv.org/abs/2310.15959v2 ) ライセンス: Link先を確認	Junda Wang, Zonghai Yao, Zhichao Yang, Huixue Zhou, Rumeng Li, Xun Wang, Yucheng Xu, Hong Yu	(参考訳) 本稿では,大規模言語モデル(llms)を活用した新しい協調型マルチエージェントフレームワークであるnotechatを紹介する。 NoteChatは、構造化されたロールプレイと戦略的プロンプトを通じて、ロール固有のLLMのアンサンブルが、割り当てられたロールをより効果的に実行できるという原則を具体化している。これらのロールプレイング LLM 間の相乗効果は結合的で効率的な対話生成をもたらす。 MTS-ダイアログ(MTS-dialogue, MTS-dialogue)の評価では、NoteChatによる強化された患者-生理的ダイアログで訓練されたモデルは、臨床ノートを生成するための他の最先端モデルよりも優れている。我々の総合的な自動評価と人的評価は、NoteChatがChatGPTやGPT-4のような最先端のモデルを大幅に上回り、ドメインの専門家が臨床ノートに基づいて優れた合成患者と物理学の対話を生成することを実証している。 NoteChatは、医師が燃え尽きる主な原因である、患者に直接関与し、臨床ドキュメントを支援する可能性がある。 We introduce NoteChat, a novel cooperative multi-agent framework leveraging Large Language Models (LLMs) to generate patient-physician dialogues. NoteChat embodies the principle that an ensemble of role-specific LLMs, through structured role-play and strategic prompting, can perform their assigned roles more effectively. The synergy among these role-playing LLMs results in a cohesive and efficient dialogue generation. Evaluation on MTS-dialogue, a benchmark dataset for patient-physician dialogues-note pairs, shows that models trained with the augmented synthetic patient-physician dialogues by NoteChat outperforms other state-of-the-art models for generating clinical notes. Our comprehensive automatic and human evaluation demonstrates that NoteChat substantially surpasses state-of-the-art models like ChatGPT and GPT-4 up to 22.78% by domain experts in generating superior synthetic patient-physician dialogues based on clinical notes. NoteChat has the potential to engage patients directly and help clinical documentation, a leading cause of physician burnout.	翻訳日:2024-01-03 01:24:22 公開日:2023-12-29
# 誘電体膜による反動注入によるダイヤモンド中の色中心の創製 Creation of color centers in diamond by recoil implantation through dielectric films ( http://arxiv.org/abs/2310.12484v2 ) ライセンス: Link先を確認	Yuyang Han, Christian Pederson, Bethany E. Matthews, Nicholas S. Yama, Maxwell F. Parsons, Kai-Mei C. Fu	(参考訳) 量子技術のためのダイヤモンドの地表に近い色中心の必要性は、結晶格子に特定の外部不純物のドーピングを制御する動機となる。近年の実験では、イオン注入による表面前駆体からの運動量移動によってこれが達成できることが示されている。ここでは、この技術を拡張し、窒素空孔(NV)とシリコン空孔(SiV)をダイヤモンドに形成するための誘電体前駆体を組み込む。具体的には, ダイヤモンド表面の窒化ケイ素や二酸化ケイ素の薄い層へのガリウム集電ビーム露光により, 外部不純物と炭素空孔の両方が導入された。これらの欠陥はその後、アニール後に好ましい光学特性を持つ表面近傍のNVとSiV中心を引き起こす。 The need of near-surface color centers in diamond for quantum technologies motivates the controlled doping of specific extrinsic impurities into the crystal lattice. Recent experiments have shown that this can be achieved by momentum transfer from a surface precursor via ion implantation, an approach known as ``recoil implantation.'' Here, we extend this technique to incorporate dielectric precursors for creating nitrogen-vacancy (NV) and silicon-vacancy (SiV) centers in diamond. Specifically, we demonstrate that gallium focused-ion-beam exposure to a thin layer of silicon nitride or silicon dioxide on the diamond surface results in the introduction of both extrinsic impurities and carbon vacancies. These defects subsequently give rise to near-surface NV and SiV centers with desirable optical properties after annealing.	翻訳日:2024-01-03 01:23:44 公開日:2023-12-29
# Score-based Generative Priors を用いた確率的イメージング Provable Probabilistic Imaging using Score-Based Generative Priors ( http://arxiv.org/abs/2310.10835v2 ) ライセンス: Link先を確認	Yu Sun, Zihui Wu, Yifan Chen, Berthy T. Feng, Katherine L. Bouman	(参考訳) 不確かさを定量化しながら高品質な画像を推定することは、不適切な逆問題を解くための画像再構成アルゴリズムにおいて2つの望ましい特徴である。本稿では,一般的な逆問題に対する解の空間を特徴付けるための原則的枠組みとして,プラグアンドプレイ型モンテカルロ(PMC)を提案する。 PMCは、高画質の画像再構成のために、表現力のあるスコアベースの生成先を組み込むことができる。特に,従来のpnp(plug-and-play priors)のサンプリングアナログと見なすことのできる2つのpmcアルゴリズムと,(red)アルゴリズムによる正規化を導入する。また,pmcアルゴリズムの収束を特徴付ける理論的解析も確立した。我々の分析は,非log-concave確率や不完全なスコアネットワークが存在する場合でも,両アルゴリズムの漸近的定常性を保証する。線形前方モデルと非線形前方モデルの両方を用いた複数の代表逆問題に対する PMC アルゴリズムの性能を示す。実験の結果, PMCは再建品質を著しく向上し, 高忠実度不確実性定量化を可能にした。 Estimating high-quality images while also quantifying their uncertainty are two desired features in an image reconstruction algorithm for solving ill-posed inverse problems. In this paper, we propose plug-and-play Monte Carlo (PMC) as a principled framework for characterizing the space of possible solutions to a general inverse problem. PMC is able to incorporate expressive score-based generative priors for high-quality image reconstruction while also performing uncertainty quantification via posterior sampling. In particular, we introduce two PMC algorithms which can be viewed as the sampling analogues of the traditional plug-and-play priors (PnP) and regularization by denoising (RED) algorithms. We also establish a theoretical analysis for characterizing the convergence of the PMC algorithms. Our analysis provides non-asymptotic stationarity guarantees for both algorithms, even in the presence of non-log-concave likelihoods and imperfect score networks. We demonstrate the performance of the PMC algorithms on multiple representative inverse problems with both linear and nonlinear forward models. Experimental results show that PMC significantly improves reconstruction quality and enables high-fidelity uncertainty quantification.	翻訳日:2024-01-03 01:22:23 公開日:2023-12-29
# 合成能力の多重化:合成課題における拡散モデルの探索 Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task ( http://arxiv.org/abs/2310.09336v3 ) ライセンス: Link先を確認	Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka	(参考訳) 現代の生成モデルは、非常に現実的なデータを生成する前例のない能力を示している。しかし、実世界の本質的な構成性を考えると、これらのモデルの実用的利用には、トレーニングデータセットにない出力を生成するための新しい概念セットを構成する能力を示す必要がある。先行研究は、最近の拡散モデルが興味深い組成一般化能力を示すが、予測不能に失敗することを示した。そこで本研究では, 条件付き拡散モデルにおける合成拡散モデルの構成一般化の理解, 学習データの属性の相違, サンプルアウトオブディストリビューション生成能力の測定について検討した。結果はこう示しています i) 概念からサンプルを生成し,それらを構成する能力が出現する順序は,基礎となるデータ生成プロセスの構造によって支配される。二構成的課題における演出は、構成的課題の性能に依拠し、部分的には生成モデルにみられる創発的な現象を説明するため、突然の「緊急」を示す。 (iii) 分布サンプルを生成するためのトレーニングデータの頻度が低い概念を構成するには、分布サンプルを生成するよりもかなり多くの最適化ステップが必要となる。本研究は、データ中心の観点から、生成モデルにおける能力と構成性を理解するための基礎を築いた。 Modern generative models exhibit unprecedented capabilities to generate extremely realistic data. However, given the inherent compositionality of the real world, reliable use of these models in practical applications requires that they exhibit the capability to compose a novel set of concepts to generate outputs not seen in the training data set. Prior work demonstrates that recent diffusion models do exhibit intriguing compositional generalization abilities, but also fail unpredictably. Motivated by this, we perform a controlled study for understanding compositional generalization in conditional diffusion models in a synthetic setting, varying different attributes of the training data and measuring the model's ability to generate samples out-of-distribution. Our results show: (i) the order in which the ability to generate samples from a concept and compose them emerges is governed by the structure of the underlying data-generating process; (ii) performance on compositional tasks exhibits a sudden "emergence" due to multiplicative reliance on the performance of constituent tasks, partially explaining emergent phenomena seen in generative models; and (iii) composing concepts with lower frequency in the training data to generate out-of-distribution samples requires considerably more optimization steps compared to generating in-distribution samples. Overall, our study lays a foundation for understanding capabilities and compositionality in generative models from a data-centric perspective.	翻訳日:2024-01-03 01:20:36 公開日:2023-12-29
# 確率的保証と実践による連続pomdp計画における複雑観測モデルの簡略化 Simplifying Complex Observation Models in Continuous POMDP Planning with Probabilistic Guarantees and Practice ( http://arxiv.org/abs/2311.07745v3 ) ライセンス: Link先を確認	Idan Lev-Yehudi, Moran Barenboim, Vadim Indelman	(参考訳) カメラ画像のような高次元かつ連続的な観察で部分的に観測可能なマルコフ決定プロセス(POMDP)を解くことは、多くの実生活ロボットや計画問題に必要である。近年の研究では、観測モデルとして機械学習確率モデルが提案されているが、オンライン展開には計算コストが大きすぎる。我々は,ソリューションの品質に関する正式な保証を維持しつつ,簡易な観測モデルを計画に使用することがどのような意味を持つのかという問題に対処する。我々の主な貢献は、単純化モデルの統計総変動距離に基づく新しい確率的境界である。提案手法は,PMDP値w.r.t.オリジナルモデルと経験的計画値と簡易モデルとのバウンドを示し,近年の粒子信頼性MDP濃度バウンドの結果を一般化した。私たちの計算はオフラインとオンラインの2つに分けることができ、計画中にコストのかかるモデルに全くアクセスすることなく正式な保証を得ることができます。最後に,既存の連続オンラインpomdpソルバのルーチンにバウンドをどのように統合するかをシミュレーションで示す。 Solving partially observable Markov decision processes (POMDPs) with high dimensional and continuous observations, such as camera images, is required for many real life robotics and planning problems. Recent researches suggested machine learned probabilistic models as observation models, but their use is currently too computationally expensive for online deployment. We deal with the question of what would be the implication of using simplified observation models for planning, while retaining formal guarantees on the quality of the solution. Our main contribution is a novel probabilistic bound based on a statistical total variation distance of the simplified model. We show that it bounds the theoretical POMDP value w.r.t. original model, from the empirical planned value with the simplified model, by generalizing recent results of particle-belief MDP concentration bounds. Our calculations can be separated into offline and online parts, and we arrive at formal guarantees without having to access the costly model at all during planning, which is also a novel result. Finally, we demonstrate in simulation how to integrate the bound into the routine of an existing continuous online POMDP solver.	翻訳日:2024-01-03 01:13:40 公開日:2023-12-29
# glamm: 大きなマルチモーダルモデルを持つピクセル GLaMM: Pixel Grounding Large Multimodal Model ( http://arxiv.org/abs/2311.03356v2 ) ライセンス: Link先を確認	Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan	(参考訳) 大規模マルチモーダルモデル(LMM)は、大規模言語モデルを視覚領域に拡張する。初期のLMMは、全体像とテキストプロンプトを使用して、根拠のないテキスト応答を生成する。近年,領域レベルのLMMは視覚的に接地された応答を生成するために用いられている。しかし、それらは一度に1つのオブジェクトカテゴリのみを参照すること、ユーザが領域を指定すること、あるいは高密度のピクセル単位のオブジェクトグラウンドを提供することができないことに限定されている。本研究では,対応するオブジェクト分割マスクとシームレスに連動する自然言語応答を生成可能な最初のモデルであるGrounding LMM(GLaMM)を提案する。 GLaMMは会話に現れるオブジェクトを接地するだけでなく、テキストとオプションの視覚的プロンプト(関心領域)の両方を入力として受け入れるほど柔軟である。これによりユーザは、テキストドメインとビジュアルドメインの両方において、さまざまなレベルの粒度でモデルと対話できるようになる。視覚的接地会話生成(gcg)の新たな設定のための標準ベンチマークが欠如していることから,我々は,接地会話を用いた包括的評価プロトコルを導入する。提案したGCGタスクは,大規模に自然界に密着した概念を必要とする。そこで本研究では,セグメンテーションマスク付きで利用可能な合計810万の領域を基盤とした7.5万のユニークな概念を含む自動アノテーションパイプラインを用いて,GranD(GranD)を提案する。 GCGに加えて、GLaMMは、表現のセグメンテーション、画像と地域レベルのキャプション、視覚言語による会話など、いくつかの下流タスクでも効果的に実行する。 Large Multimodal Models (LMMs) extend Large Language Models to the vision domain. Initial LMMs used holistic images and text prompts to generate ungrounded textual responses. Recently, region-level LMMs have been used to generate visually grounded responses. However, they are limited to only referring to a single object category at a time, require users to specify the regions, or cannot offer dense pixel-wise object grounding. In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks. GLaMM not only grounds objects appearing in the conversations but is flexible enough to accept both textual and optional visual prompts (region of interest) as input. This empowers users to interact with the model at various levels of granularity, both in textual and visual domains. Due to the lack of standard benchmarks for the novel setting of visually Grounded Conversation Generation (GCG), we introduce a comprehensive evaluation protocol with our curated grounded conversations. Our proposed GCG task requires densely grounded concepts in natural scenes at a large-scale. To this end, we propose a densely annotated Grounding-anything Dataset (GranD) using our proposed automated annotation pipeline that encompasses 7.5M unique concepts grounded in a total of 810M regions available with segmentation masks. Besides GCG, GLaMM also performs effectively on several downstream tasks, e.g., referring expression segmentation, image and region-level captioning and vision-language conversations.	翻訳日:2024-01-03 01:12:07 公開日:2023-12-29
# LLM4Drive: 自動運転のための大規模言語モデルの調査 LLM4Drive: A Survey of Large Language Models for Autonomous Driving ( http://arxiv.org/abs/2311.01043v3 ) ライセンス: Link先を確認	Zhenjie Yang, Xiaosong Jia, Hongyang Li, Junchi Yan	(参考訳) 交通と都市移動に革命をもたらす触媒である自動運転技術は、ルールベースのシステムからデータ駆動戦略に移行する傾向にある。従来のモジュールベースのシステムは、カスケードモジュール間の累積誤差と柔軟性のない事前設定規則によって制約される。対照的に、エンドツーエンドの自動運転システムは、完全にデータ駆動のトレーニングプロセスによってエラーの蓄積を避ける可能性があるが、その“ブラックボックス”の性質によって透明性が欠如し、決定の検証とトレーサビリティが複雑になることが多い。近年,大規模言語モデル (LLM) は,文脈理解,論理的推論,回答生成などの能力を示した。自然の思考は、これらの能力を自律運転に活用することである。 LLMとファンデーションビジョンモデルを組み合わせることで、現在の自律運転システムが欠落しているオープンワールド理解、推論、少数ショット学習への扉を開くことができる。本稿では、自律運転のための大規模言語モデル(llm4ad)に関する研究ラインを体系的にレビューする。本研究は,技術進歩の現状を評価し,この分野の主要な課題と今後の方向性を明確に概説する。学術と産業の研究者の利便性のために、この分野の最新の進歩と、指定されたリンクを通じて関連するオープンソースリソースをリアルタイムで更新する。 Autonomous driving technology, a catalyst for revolutionizing transportation and urban mobility, has the tend to transition from rule-based systems to data-driven strategies. Traditional module-based systems are constrained by cumulative errors among cascaded modules and inflexible pre-set rules. In contrast, end-to-end autonomous driving systems have the potential to avoid error accumulation due to their fully data-driven training process, although they often lack transparency due to their "black box" nature, complicating the validation and traceability of decisions. Recently, large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers. A natural thought is to utilize these abilities to empower autonomous driving. By combining LLM with foundation vision models, it could open the door to open-world understanding, reasoning, and few-shot learning, which current autonomous driving systems are lacking. In this paper, we systematically review a research line about \textit{Large Language Models for Autonomous Driving (LLM4AD)}. This study evaluates the current state of technological advancements, distinctly outlining the principal challenges and prospective directions for the field. For the convenience of researchers in academia and industry, we provide real-time updates on the latest advances in the field as well as relevant open-source resources via the designated link: https://github.com/Thinklab-SJTU/Awesome-LLM4AD.	翻訳日:2024-01-03 01:11:20 公開日:2023-12-29
# 頭部・視線空間・時間的相互作用コンテキストのキャプチャによるエンドツーエンド映像視線推定 End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context ( http://arxiv.org/abs/2310.18131v3 ) ライセンス: Link先を確認	Yiran Guan, Zhuoguang Chen, Wenzheng Zeng, Zhiguo Cao, and Yang Xiao	(参考訳) 本稿では,頭部,顔,眼の空間的相互作用コンテキストを,まだ意識されていないエンドツーエンドの学習方法で把握し,映像の視線推定を容易にする新しい手法MCGaze(Multi-Clue Gaze)を提案する。 mcgazeの主な利点は、頭、顔、目の手がかりの局在化のタスクを、最適な性能を求めるための協調最適化とともに、一段階の視点推定のために共同で解決できることである。この間、空間的-時間的文脈交換は頭、顔、目の手がかりの間で起こる。したがって、様々なクエリから特徴を融合して得られる最終視線は、頭や顔からのグローバルな手がかりと、パフォーマンスを生かした目からのローカルな手がかりを同時に認識することができる。一方、ワンステップ走行方式は高い走行効率を確保する。 gaze360データセットの挑戦的な実験は、提案の優越性を検証する。ソースコードはhttps://github.com/zgchen33/MCGazeで公開される。 In this letter, we propose a new method, Multi-Clue Gaze (MCGaze), to facilitate video gaze estimation via capturing spatial-temporal interaction context among head, face, and eye in an end-to-end learning way, which has not been well concerned yet. The main advantage of MCGaze is that the tasks of clue localization of head, face, and eye can be solved jointly for gaze estimation in a one-step way, with joint optimization to seek optimal performance. During this, spatial-temporal context exchange happens among the clues on the head, face, and eye. Accordingly, the final gazes obtained by fusing features from various queries can be aware of global clues from heads and faces, and local clues from eyes simultaneously, which essentially leverages performance. Meanwhile, the one-step running way also ensures high running efficiency. Experiments on the challenging Gaze360 dataset verify the superiority of our proposition. The source code will be released at https://github.com/zgchen33/MCGaze.	翻訳日:2024-01-03 01:10:15 公開日:2023-12-29
# プリプロセッシング時間を改善するサブリニア時間スペクトルクラスタリングオラクル A Sublinear-Time Spectral Clustering Oracle with Improved Preprocessing Time ( http://arxiv.org/abs/2310.17878v2 ) ライセンス: Link先を確認	Ranran Shen, Pan Peng	(参考訳) 本稿では,クラスタ性が強いグラフに対して,サブ線形時間スペクトルクラスタリングオラクルを設計する問題に対処する。これらのグラフは、それぞれ大きな内部伝導(少なくとも$\varphi$)と小さな外部伝導(ほとんどの$\varepsilon$)によって特徴づけられる、潜伏クラスター$k$を含む。我々の目的は、グラフを前処理してクラスタリングメンバシップクエリを有効にすることであり、前処理とクエリ応答の両方をサブライン時間で実行し、その結果のパーティションは、地上のクラスタリングに近い$k$-partitionと整合性を持つべきである。以前のオラクルは、内部コンダクタンスと外部コンダクタンスの間の$\textrm{poly}(k)\log n$ギャップか($k/\varepsilon$)前処理時間に依存していた。我々のアルゴリズムは、少し高い分類率のコストで、これらの仮定を緩和する。また、クラスタリングオラクルはいくつかのランダムなエッジ削除に対して堅牢であることを示す。理論境界を検証するために,合成ネットワーク実験を行った。 We address the problem of designing a sublinear-time spectral clustering oracle for graphs that exhibit strong clusterability. Such graphs contain $k$ latent clusters, each characterized by a large inner conductance (at least $\varphi$) and a small outer conductance (at most $\varepsilon$). Our aim is to preprocess the graph to enable clustering membership queries, with the key requirement that both preprocessing and query answering should be performed in sublinear time, and the resulting partition should be consistent with a $k$-partition that is close to the ground-truth clustering. Previous oracles have relied on either a $\textrm{poly}(k)\log n$ gap between inner and outer conductances or exponential (in $k/\varepsilon$) preprocessing time. Our algorithm relaxes these assumptions, albeit at the cost of a slightly higher misclassification ratio. We also show that our clustering oracle is robust against a few random edge deletions. To validate our theoretical bounds, we conducted experiments on synthetic networks.	翻訳日:2024-01-03 01:09:54 公開日:2023-12-29
# ビデオグラウンド化のための拡散モデルによる反復的リファインメントの探索 Exploring Iterative Refinement with Diffusion Models for Video Grounding ( http://arxiv.org/abs/2310.17189v2 ) ライセンス: Link先を確認	Xiao Liang, Tao Shi, Yaoyuan Liang, Te Tao, Shao-Lun Huang	(参考訳) ビデオグラウンディングは、与えられた文クエリに対応する未トリミングビデオ内のターゲットモーメントをローカライズすることを目的としている。既存の手法は通常、事前に定義された一連の提案から最良の予測を選択したり、標的を単発的に直接回帰させたりすることで、体系的な予測改善プロセスが欠如する。本稿では,DiffusionVGを提案する。DiffusionVGは条件生成タスクとしてビデオグラウンドを定式化し,ガウス雑音入力からターゲットスパンを生成し,逆拡散過程において相互に洗練する拡散モデルである。訓練中、ディフュージョンvgは目標スパンに一定の前方拡散過程で徐々にノイズを加え、逆拡散過程において目標スパンを回復することを学習する。推論において、DiffusionVGは、映像文表現に条件付き学習された逆拡散プロセスによりガウス雑音入力からターゲットスパンを生成することができる。 DiffusionVGは、メインストリームのCharades-STA、ActivityNet Captions、TACoSベンチマークの既存の優れたモデルと比較して、優れたパフォーマンスを示している。 Video grounding aims to localize the target moment in an untrimmed video corresponding to a given sentence query. Existing methods typically select the best prediction from a set of predefined proposals or directly regress the target span in a single-shot manner, resulting in the absence of a systematical prediction refinement process. In this paper, we propose DiffusionVG, a novel framework with diffusion models that formulates video grounding as a conditional generation task, where the target span is generated from Gaussian noise inputs and interatively refined in the reverse diffusion process. During training, DiffusionVG progressively adds noise to the target span with a fixed forward diffusion process and learns to recover the target span in the reverse diffusion process. In inference, DiffusionVG can generate the target span from Gaussian noise inputs by the learned reverse diffusion process conditioned on the video-sentence representations. Without bells and whistles, our DiffusionVG demonstrates superior performance compared to existing well-crafted models on mainstream Charades-STA, ActivityNet Captions and TACoS benchmarks.	翻訳日:2024-01-03 01:09:07 公開日:2023-12-29
# ステップサイズチューニングとプログレッシブシャープニングの相互作用について On the Interplay Between Stepsize Tuning and Progressive Sharpening ( http://arxiv.org/abs/2312.00209v3 ) ライセンス: Link先を確認	Vincent Roulet, Atish Agarwala, Fabian Pedregosa	(参考訳) 近年の実証研究は、最適化器が安定の端で作動する臨界値を中心に安定するまで、シャープネス(ヘッセンの最大の固有値)が最適化を通して増加する深層学習モデルの興味深い性質を明らかにしている(Cohen et al, 2022)。本研究は, ステップサイズチューナーを用いて, ステップサイズ・チューナーを用いて, ステップサイズを局所的な量(例えば, 暗黙的に, シャープネス自体)に適応させる手法を実証的に検討する。決定論的設定における古典的アーミージョ線探索の驚くほど低い性能は、その目標の鋭さを常に増やそうとする傾向からよく説明できる。一方,polyakステップ化は一般に安定性の辺で,あるいは少し先でも動作し,アルミージョよりも優れており,決定論的設定では対応するステップが一定である。ステップサイズチューナーのアンロックには,ステップサイズとシャープネスのジョイントダイナミクスの理解が必要であることを示唆する分析で結論付けた。 Recent empirical work has revealed an intriguing property of deep learning models by which the sharpness (largest eigenvalue of the Hessian) increases throughout optimization until it stabilizes around a critical value at which the optimizer operates at the edge of stability, given a fixed stepsize (Cohen et al, 2022). We investigate empirically how the sharpness evolves when using stepsize-tuners, the Armijo linesearch and Polyak stepsizes, that adapt the stepsize along the iterations to local quantities such as, implicitly, the sharpness itself. We find that the surprisingly poor performance of a classical Armijo linesearch in the deterministic setting may be well explained by its tendency to ever-increase the sharpness of the objective. On the other hand, we observe that Polyak stepsizes operate generally at the edge of stability or even slightly beyond, outperforming its Armijo and constant stepsizes counterparts in the deterministic setting. We conclude with an analysis that suggests unlocking stepsize tuners requires an understanding of the joint dynamics of the step size and the sharpness.	翻訳日:2024-01-03 01:02:10 公開日:2023-12-29
# MicroCinema:テキスト・ビデオ・ジェネレーションのための分断型アプローチ MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation ( http://arxiv.org/abs/2311.18829v2 ) ライセンス: Link先を確認	Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo	(参考訳) 高品質でコヒーレントなテキスト対ビデオ生成のための,単純かつ効果的なフレームワークであるmicrocinemaを提案する。テキストプロンプトとビデオを直接結びつける既存のアプローチとは異なり、microcinemaでは、テキストからビデオへの分割と分割という2段階のプロセスを導入している。この戦略には2つの大きな利点がある。 a) 安定拡散、ミッドジャーニー、ダルルといった最近のテキスト対画像モデルの進歩を最大限に活用し、フォトリアリスティックで高精細な画像を生成することができる。 b) 生成された画像を活用することで,運動力学の効率的な学習を優先して,細部への焦点を小さくすることができる。この戦略を効果的に実施するために,2つのコア設計を導入する。まず,画像の外観の保存性を高めた外観注入ネットワークを提案する。第2に,事前学習した2次元拡散モデルの能力を維持するための新しいメカニズムである外観雑音優先法を導入する。これらのデザイン要素により、マイクロシネマは、提供されたテキストプロンプトによって、正確な動きで高品質なビデオを生成することができる。大規模な実験は提案フレームワークの優位性を実証している。具体的には、microCinemaはUCF-101では342.86、MSR-VTTでは377.40のSOTAゼロショットFVDを達成する。ビデオサンプルはhttps://wangyanhui666.github.io/microcinema.github.io/を参照。 We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation. Unlike existing approaches that align text prompts with video directly, MicroCinema introduces a Divide-and-Conquer strategy which divides the text-to-video into a two-stage process: text-to-image generation and image\&text-to-video generation. This strategy offers two significant advantages. a) It allows us to take full advantage of the recent advances in text-to-image models, such as Stable Diffusion, Midjourney, and DALLE, to generate photorealistic and highly detailed images. b) Leveraging the generated image, the model can allocate less focus to fine-grained appearance details, prioritizing the efficient learning of motion dynamics. To implement this strategy effectively, we introduce two core designs. First, we propose the Appearance Injection Network, enhancing the preservation of the appearance of the given image. Second, we introduce the Appearance Noise Prior, a novel mechanism aimed at maintaining the capabilities of pre-trained 2D diffusion models. These design elements empower MicroCinema to generate high-quality videos with precise motion, guided by the provided text prompts. Extensive experiments demonstrate the superiority of the proposed framework. Concretely, MicroCinema achieves SOTA zero-shot FVD of 342.86 on UCF-101 and 377.40 on MSR-VTT. See https://wangyanhui666.github.io/MicroCinema.github.io/ for video samples.	翻訳日:2024-01-03 01:01:50 公開日:2023-12-29
# 教師なしセマンティックセグメンテーションのための軽量クラスタリングフレームワーク A Lightweight Clustering Framework for Unsupervised Semantic Segmentation ( http://arxiv.org/abs/2311.18628v2 ) ライセンス: Link先を確認	Yau Shing Jonathan Cheung, Xi Chen, Lihe Yang, Hengshuang Zhao	(参考訳) 教師なしセマンティクスセグメンテーションは、注釈付きデータを使わずに画像の各ピクセルを対応するクラスに分類することを目的としている。ラベル付きデータセットの取得は高価であるため、広く研究されている分野である。この分野でのこれまでの研究は、モデルの正確性が徐々に向上することを示したが、ほとんどのニューラルネットワークトレーニングは必要だった。これによりセグメンテーションは、特に大規模なデータセットを扱う場合、等しく高価になった。本論文では,教師なしセマンティクスセグメンテーションのための軽量クラスタリングフレームワークを提案する。自己監督型視覚変換器の注意特徴は,前景と背景の差異が強いことが判明した。したがって、前景と背景画像のパッチを効果的に分離するためにクラスタリングを利用することができる。当社のフレームワークでは、まず、データセットレベル、カテゴリレベル、イメージレベルの複数レベルのクラスタリングを行い、一貫性を維持します。そして、抽出されたバイナリパッチレベルの擬似マスクをアップサンプリングし、洗練し、最終的にラベル付けする。さらに、自己監督型ビジョントランスフォーマーの特徴を包括的に分析し、DINOとDINOv2の詳細な比較を行い、我々の主張を正当化する。我々のフレームワークは、教師なしセマンティックセグメンテーションにおいて大きな可能性を証明し、PASCAL VOCおよびMS COCOデータセットの最先端結果を達成する。 Unsupervised semantic segmentation aims to categorize each pixel in an image into a corresponding class without the use of annotated data. It is a widely researched area as obtaining labeled datasets is expensive. While previous works in the field have demonstrated a gradual improvement in model accuracy, most required neural network training. This made segmentation equally expensive, especially when dealing with large-scale datasets. We thus propose a lightweight clustering framework for unsupervised semantic segmentation. We discovered that attention features of the self-supervised Vision Transformer exhibit strong foreground-background differentiability. Therefore, clustering can be employed to effectively separate foreground and background image patches. In our framework, we first perform multilevel clustering across the Dataset-level, Category-level, and Image-level, and maintain consistency throughout. Then, the binary patch-level pseudo-masks extracted are upsampled, refined and finally labeled. Furthermore, we provide a comprehensive analysis of the self-supervised Vision Transformer features and a detailed comparison between DINO and DINOv2 to justify our claims. Our framework demonstrates great promise in unsupervised semantic segmentation and achieves state-of-the-art results on PASCAL VOC and MS COCO datasets.	翻訳日:2024-01-03 01:01:27 公開日:2023-12-29
# DAP:視覚・言語ナビゲーションのためのドメイン認識型プロンプト学習 DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation ( http://arxiv.org/abs/2311.17812v4 ) ライセンス: Link先を確認	Ting Liu, Yue Hu, Wansen Wu, Youkai Wang, Kai Xu, Quanjun Yin	(参考訳) 未知の環境をナビゲートするための言語指示に従うことは、自律型実施エージェントにとって困難なタスクである。強力な表現能力により、事前訓練された視覚・言語モデルはVLNで広く使われている。しかし、そのほとんどはWebcrawledの汎用データセットでトレーニングされており、VLNタスクで使用する場合、かなりのドメインギャップが生じる。そこで本研究では,新しいモデルに依存しないdap(domain-aware prompt learning)フレームワークを提案する。 VLNタスクにおいて、事前訓練されたモデルに特定のオブジェクトレベルとシーンレベルのクロスモーダルアライメントを持たせるために、DAPは低コストのプロンプトチューニングパラダイムを適用し、ドメイン内の画像セマンティクスを抽出するためのソフトな視覚的プロンプトを学習する。具体的には、CLIPモデルの助けを借りて、まずドメイン内の画像とテキストのペアを生成する。次に,事前学習モデルにおいて,視覚エンコーダの入力空間にソフトビジュアルプロンプトを導入する。 DAPは、訓練済みモデルの視覚エンコーダにドメイン内の視覚知識を効率的に注入する。 R2RとREVERIEの両方の実験結果は、既存の最先端手法と比較してDAPの優位性を示している。 Following language instructions to navigate in unseen environments is a challenging task for autonomous embodied agents. With strong representation capabilities, pretrained vision-and-language models are widely used in VLN. However, most of them are trained on web-crawled general-purpose datasets, which incurs a considerable domain gap when used for VLN tasks. To address the problem, we propose a novel and model-agnostic domain-aware prompt learning (DAP) framework. For equipping the pretrained models with specific object-level and scene-level cross-modal alignment in VLN tasks, DAP applies a low-cost prompt tuning paradigm to learn soft visual prompts for extracting in-domain image semantics. Specifically, we first generate a set of in-domain image-text pairs with the help of the CLIP model. Then we introduce soft visual prompts in the input space of the visual encoder in a pretrained model. DAP injects in-domain visual knowledge into the visual encoder of the pretrained model in an efficient way. Experimental results on both R2R and REVERIE show the superiority of DAP compared to existing state-of-the-art methods.	翻訳日:2024-01-03 01:01:10 公開日:2023-12-29
# 時系列におけるイベント検出:ユニバーサルディープラーニングアプローチ Event Detection in Time Series: Universal Deep Learning Approach ( http://arxiv.org/abs/2311.15654v2 ) ライセンス: Link先を確認	Menouar Azib, Benjamin Renard, Philippe Garnier, Vincent G\'enot, Nicolas Andr\'e	(参考訳) 時系列におけるイベント検出は、不均衡なデータセット、まれなイベント、時間間隔定義イベントの頻度のため、困難なタスクである。従来の教師付きディープラーニング手法は主にバイナリ分類を採用しており、各ステップにはイベントの有無を示すバイナリラベルが割り当てられている。しかし、これらの手法はこれらの特定のシナリオを効果的に扱うのに苦労する。これらの制約に対処するために,分類に基づく手法よりもいくつかの利点を提供する,教師付き回帰に基づくディープラーニング手法を提案する。パラメータが限られているこのアプローチは、まれなイベントや不均衡なデータセットを含む、統一されたフレームワーク内のさまざまな種類のイベントを効果的に処理できる。我々は,その普遍性と精度を理論的に正当化し,様々な領域,特に稀な事象や不均衡なデータセットにおいて,その優れた性能を示す。 Event detection in time series is a challenging task due to the prevalence of imbalanced datasets, rare events, and time interval-defined events. Traditional supervised deep learning methods primarily employ binary classification, where each time step is assigned a binary label indicating the presence or absence of an event. However, these methods struggle to handle these specific scenarios effectively. To address these limitations, we propose a novel supervised regression-based deep learning approach that offers several advantages over classification-based methods. Our approach, with a limited number of parameters, can effectively handle various types of events within a unified framework, including rare events and imbalanced datasets. We provide theoretical justifications for its universality and precision and demonstrate its superior performance across diverse domains, particularly for rare events and imbalanced datasets.	翻訳日:2024-01-03 01:00:15 公開日:2023-12-29
# オンラインコミュニティからの完全な視覚的質問応答データセット Fully Authentic Visual Question Answering Dataset from Online Communities ( http://arxiv.org/abs/2311.15562v2 ) ライセンス: Link先を確認	Chongyan Chen, Mengchen Liu, Noel Codella, Yunsheng Li, Lu Yuan, Danna Gurari	(参考訳) VQA(Visual Question Answering)は、画像に関する質問に答える機能である。 VQAデータセットは、すべてのコンテンツが真正のユースケースから生まれたものである。オンラインの質問応答コミュニティフォーラムから引用して、VQAonlineと呼ぶ。次に、データセットと、他の8つのvqaデータセットとの関係を特徴付ける。データセットの回答はもっと長い(平均173語)ので、標準的なVQA評価指標と互換性がないため、テキスト評価を長くするための6つの一般的な指標のうちどれが人間の判断に最も適しているかを次に分析する。次に、最も適したメトリクスを使用して、VQAonline上で6つの最先端のビジョンと言語基盤モデルを評価し、最も苦労している場所を明らかにします。データセットはhttps://vqaonline.github.io/で公開されている。 Visual Question Answering (VQA) entails answering questions about images. We introduce the first VQA dataset in which all contents originate from an authentic use case. Sourced from online question answering community forums, we call it VQAonline. We then characterize our dataset and how it relates to eight other VQA datasets. Observing that answers in our dataset tend to be much longer (e.g., with a mean of 173 words) and thus incompatible with standard VQA evaluation metrics, we next analyze which of the six popular metrics for longer text evaluation align best with human judgments. We then use the best-suited metrics to evaluate six state-of-the-art vision and language foundation models on VQAonline and reveal where they struggle most. The dataset can be found publicly at https://vqaonline.github.io/.	翻訳日:2024-01-03 01:00:00 公開日:2023-12-29
# 背景モジュロ観測における量子力学 Quantum Mechanics on a background modulo observation ( http://arxiv.org/abs/2311.12493v2 ) ライセンス: Link先を確認	Jose A. Pereira Frugone	(参考訳) 背景の時空を観測領域や測定領域によってモジュラー化された空間に変換するとき、量子力学の何が残るのか? この新しいモジュライ空間は、量子位相比較(観測、測定)が示唆される時空領域を同定することによって構成される。これを観測モジュール空間(OM-space)と呼ぶ。さらに、qm文では、プランク定数 (h) を$\zeta_0 4 \pi^2$(ここで$\zeta_0$ はプランク長さ)に置き換え、$p_0$ (プランクモーメント) を$4 \pi^2$に置き換える。これは量子力学を、観測モジュラ量子力学(OM-QM)と呼ばれる非常にリッチな双対数理論にマッピングする。我々は、ディラック方程式、量子波動関数、自由粒子質量に対する OM-双対を求める。エネルギーの OM-QM 対はリーマンゼータ函数の零点の単純函数であることが判明した。また、om-qmは電子スピン、電子電荷、電界および微細構造定数に対応する。また、ハイゼムベルクの不確かさ関係とアインシュタインの一般相対性場方程式のOM-QM対応式は、一意なOM-QM方程式の一定の極限として現れる。また、重力定数と宇宙定数のOM-QM対応も得られる。我々は、OM-QM側のホログラフィーのアナログを見つけ、スピンを高次元曲率として解釈する。 OM-QM対応の解釈は、測定や観測に依存しないQM情報の一部を与えるものとして提案される。この対応の潜在的な将来の応用について論じる。 In this work we will answer the following question: What remains of Quantum Mechanics when we transform the background space-time into a space modularized by observation or measurement regions ? This new moduli space is constructed by identifying regions of space-time where quantum phase comparison (observation, measurement) is implied. We call it Observation Modular space (OM-space). In addition we replace in QM statements the Plank constant (h) by the quantity $\zeta_0 4 \pi^2$ (where $\zeta_0$ is the Plank Length) or otherwise, replacing $P_0$ (the Planck Momentum) by $4 \pi^2$. This maps Quantum Mechanics into a very rich dual Number Theory which we call Observation Modular Quantum Mechanics (OM-QM). We find the OM-dual to the Dirac Equation, the quantum Wave Function and a free particle's mass. The OM-QM counterparts of the Energy turns out to be a simple function of the zeroes of the Riemann zeta function. We also find the OM-QM correspondents to the electron spin, the electron charge, the Electric Field and the Fine Structure Constant. We also find the OM-QM correspondents of the Heisemberg uncertainty relation and Einstein's General Relativity Field equation emerging as certain limits of a unique OM-QM equation. We also get the OM-QM correspondents of the Gravitational Constant and the Cosmological Constant. We find the analog of holography in the OM-QM side and we get an interpretation of spin as a high dimensional curvature. An interpretation of the OM-QM correspondence is proposed as giving the part of QM information which is not measurement or observation dependent. Some potential future applications of this correspondence are discussed.	翻訳日:2024-01-03 00:58:31 公開日:2023-12-29
# CARAT:マルチモーダルマルチラベル感情認識のためのコントラスト特徴再構成と集約 CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-modal Multi-label Emotion Recognition ( http://arxiv.org/abs/2312.10201v2 ) ライセンス: Link先を確認	Cheng Peng, Ke Chen, Lidan Shou, Gang Chen	(参考訳) マルチモーダルマルチラベル感情認識(MMER)は、複数のモーダルから関連する感情を特定することを目的としている。 mmerの課題は、異種データから複数のラベルの識別的特徴を効果的に捉える方法である。最近の研究は主に、マルチモーダル情報を全てのラベルの統一表現に統合するための様々な融合戦略の探求に費やされている。しかし、このような学習スキームは、各モダリティの特異性を見逃すだけでなく、異なるラベルに対する個々の識別的特徴を捉えることに失敗する。さらに、ラベルやモダリティの依存関係を効果的にモデル化することはできない。これらの課題に対処するために,MMERタスクのためのContrAstive Feature Restruction and AggregaTion(CARAT)を提案する。具体的には,モーダル分離とラベル特有の特徴を対比的に学習することにより,細粒度モダリティとラベル間の依存性をよりよくモデル化するための再構成ベースの融合機構を考案する。モータリティの相補性をさらに活用するために,ラベル間の共起コラボレーションを充実させるシャッフルベースのアグリゲーション戦略を導入する。 CMU-MOSEIとM3EDの2つのベンチマークデータセットの実験は、最先端手法に対するCARATの有効性を示した。コードはhttps://github.com/chengzju/CARAT.comで入手できる。 Multi-modal multi-label emotion recognition (MMER) aims to identify relevant emotions from multiple modalities. The challenge of MMER is how to effectively capture discriminative features for multiple labels from heterogeneous data. Recent studies are mainly devoted to exploring various fusion strategies to integrate multi-modal information into a unified representation for all labels. However, such a learning scheme not only overlooks the specificity of each modality but also fails to capture individual discriminative features for different labels. Moreover, dependencies of labels and modalities cannot be effectively modeled. To address these issues, this paper presents ContrAstive feature Reconstruction and AggregaTion (CARAT) for the MMER task. Specifically, we devise a reconstruction-based fusion mechanism to better model fine-grained modality-to-label dependencies by contrastively learning modal-separated and label-specific features. To further exploit the modality complementarity, we introduce a shuffle-based aggregation strategy to enrich co-occurrence collaboration among labels. Experiments on two benchmark datasets CMU-MOSEI and M3ED demonstrate the effectiveness of CARAT over state-of-the-art methods. Code is available at https://github.com/chengzju/CARAT.	翻訳日:2024-01-03 00:52:36 公開日:2023-12-29
# 地球は平らである:―説得的会話を通してLLMの誤報に対する信念を調査する The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation ( http://arxiv.org/abs/2312.09085v3 ) ライセンス: Link先を確認	Rongwu Xu, Brian S. Lin, Shujian Yang, Tianqi Zhang, Weiyan Shi, Tianwei Zhang, Zhixuan Fang, Wei Xu, Han Qiu	(参考訳) 大きな言語モデル(LLM)は膨大な量の知識をカプセル化するが、それでも外部の誤情報に弱いままである。既存の研究は主に、この感受性の挙動を単一ターンで研究している。しかし、信念は多面的な会話、特に説得力のある会話の間に変化する可能性がある。そこで本研究では,llmの説得的会話に対する感受性,特に正解できる事実的質問について考察する。我々はまず、体系的に生成された説得的誤報と組み合わせた事実質問を含むFact to Misinformデータセット(Fact to Misinform)をキュレートする。そこで我々は,llmsの信念変化を説得力のある対話で追跡するテストフレームワークを開発した。広範にわたる実験により,LLMの事実知識に対する正しい信念は,様々な説得戦略によって容易に操作できることがわかった。 Large Language Models (LLMs) encapsulate vast amounts of knowledge but still remain vulnerable to external misinformation. Existing research mainly studied this susceptibility behavior in a single-turn setting. However, belief can change during a multi-turn conversation, especially a persuasive one. Therefore, in this study, we delve into LLMs' susceptibility to persuasive conversations, particularly on factual questions that they can answer correctly. We first curate the Farm (i.e., Fact to Misinform) dataset, which contains factual questions paired with systematically generated persuasive misinformation. Then, we develop a testing framework to track LLMs' belief changes in a persuasive dialogue. Through extensive experiments, we find that LLMs' correct beliefs on factual knowledge can be easily manipulated by various persuasive strategies.	翻訳日:2024-01-03 00:51:04 公開日:2023-12-29
# 産業サイバー物理システムにおける予後と健康管理の基礎モデルに関する研究 Survey on Foundation Models for Prognostics and Health Management in Industrial Cyber-Physical Systems ( http://arxiv.org/abs/2312.06261v2 ) ライセンス: Link先を確認	Ruonan Liu, Quanhu Zhang, Te Han	(参考訳) 産業サイバー物理システム(ICPS)は、コンピュータ科学、通信技術、工学の分野を統合し、現代の製造業と産業の不可欠な構成要素として登場した。しかし、ICPSは機器の故障、性能劣化、セキュリティ上の脅威など、長期運用において様々な課題に直面している。効率的なメンテナンスと管理を実現するため、診断と健康管理(PHM)はICPSにおいて障害予測、健康モニタリング、保守意思決定などの重要なタスクに広く適用されている。 bertやgptのような大規模基礎モデル(lfm)の出現は、ai技術の著しい進歩を意味し、chatgptはこの研究パラダイムにおける顕著な成果であり、一般的な人工知能の可能性を保っている。データ取得技術とデータ処理能力の継続的な向上を考えると、LCMはICPSのPHMドメインにおいて重要な役割を担うことが期待される。しかし、現在、ICPSにおけるPHMへのLPMの適用については合意が得られておらず、今後の方向性を解明するために体系的なレビューとロードマップが必要である。このギャップを埋めるために,本論文は基礎となるモデルの重要な要素と最近の進歩を解明する。icpにおけるphmのグランドモデリングの最新動向の包括的検証と理解は,icpの信頼性,可用性,安全性のさらなる向上を図りつつ,産業分野の意思決定者や研究者に貴重な資料を提供することができる。 Industrial Cyber-Physical Systems (ICPS) integrate the disciplines of computer science, communication technology, and engineering, and have emerged as integral components of contemporary manufacturing and industries. However, ICPS encounters various challenges in long-term operation, including equipment failures, performance degradation, and security threats. To achieve efficient maintenance and management, prognostics and health management (PHM) finds widespread application in ICPS for critical tasks, including failure prediction, health monitoring, and maintenance decision-making. The emergence of large-scale foundation models (LFMs) like BERT and GPT signifies a significant advancement in AI technology, and ChatGPT stands as a remarkable accomplishment within this research paradigm, harboring potential for General Artificial Intelligence. Considering the ongoing enhancement in data acquisition technology and data processing capability, LFMs are anticipated to assume a crucial role in the PHM domain of ICPS. However, at present, a consensus is lacking regarding the application of LFMs to PHM in ICPS, necessitating systematic reviews and roadmaps to elucidate future directions. To bridge this gap, this paper elucidates the key components and recent advances in the underlying model.A comprehensive examination and comprehension of the latest advances in grand modeling for PHM in ICPS can offer valuable references for decision makers and researchers in the industrial field while facilitating further enhancements in the reliability, availability, and safety of ICPS.	翻訳日:2024-01-03 00:49:55 公開日:2023-12-29
# text-to-sqlのためのsqlクエリのハードネス解析の分離 Decoupling SQL Query Hardness Parsing for Text-to-SQL ( http://arxiv.org/abs/2312.06172v2 ) ライセンス: Link先を確認	Jiawen Yi and Guo Chen	(参考訳) Text-to-SQLタスクの基本的な目標は、自然言語の質問をSQLクエリに変換することだ。現在の研究は、主に自然言語質問とスキーマ間の情報結合を強調しており、この分野では重要な進歩がなされている。主要なタスク要求源としての自然言語の質問は、対応するSQLクエリの難易度を決定するが、両者の相関は常に無視される。しかし、質問とクエリの相関が切り離された場合、タスクを単純化する可能性がある。本稿では,SQLクエリの難易度解析の分離に基づくテキストからSQLへの革新的フレームワークを提案する。このフレームワークは質問やスキーマを分析し、クエリの難しさに基づいてText-to-SQLタスクを分離する。これにより、言語モデルに対する解析のプレッシャーを大幅に減らす。提案フレームワークを評価し,クモデベロップメントにおけるファインターン方式の新たな最先端性能を実現する。 The fundamental goal of the Text-to-SQL task is to translate natural language question into SQL query. Current research primarily emphasizes the information coupling between natural language questions and schemas, and significant progress has been made in this area. The natural language questions as the primary task requirements source determines the hardness of correspond SQL queries, the correlation between the two always be ignored. However, when the correlation between questions and queries was decoupled, it may simplify the task. In this paper, we introduce an innovative framework for Text-to-SQL based on decoupling SQL query hardness parsing. This framework decouples the Text-to-SQL task based on query hardness by analyzing questions and schemas, simplifying the multi-hardness task into a single-hardness challenge. This greatly reduces the parsing pressure on the language model. We evaluate our proposed framework and achieve a new state-of-the-art performance of fine-turning methods on Spider dev.	翻訳日:2024-01-03 00:49:27 公開日:2023-12-29
# 多様なニューラルアーキテクチャを処理するグラフメタネットワーク Graph Metanetworks for Processing Diverse Neural Architectures ( http://arxiv.org/abs/2312.04501v2 ) ライセンス: Link先を確認	Derek Lim, Haggai Maron, Marc T. Law, Jonathan Lorraine, James Lucas	(参考訳) ニューラルネットワークは、学習した情報をパラメータ内で効率的にエンコードする。したがって、ニューラルネットワーク自体を入力データとして扱うことで、多くのタスクを統一することができる。その際、近年の研究は、パラメータ空間の対称性と幾何学の計算の重要性を実証した。しかし、これらの作品はmlpやcnnのような特定のネットワーク向けに正規化層のないアーキテクチャを開発しており、そのようなアーキテクチャを他の種類のネットワークに一般化することは困難である。本研究では、他のニューラルネットワークから重みを取るニューラルネットワークを入力として構築することで、これらの課題を克服する。簡単に言えば、入力ニューラルネットワークを表すグラフを慎重に構築し、グラフニューラルネットワークを使用してグラフを処理する。当社のアプローチであるgraph metanetworks(gmns)は、マルチヘッドアテンション層、正規化層、畳み込み層、resnetブロック、グループ同変線形層など、競合するメソッドが苦労する神経アーキテクチャに一般化します。 GMNは,入力ニューラルネットワーク関数が変化しないパラメータ置換対称性と等価であることを示す。多様なニューラルネットワークアーキテクチャ上でのメタネットワークタスクにおける本手法の有効性を検証する。 Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks - neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. Our approach, Graph Metanetworks (GMNs), generalizes to neural architectures where competing methods struggle, such as multi-head attention layers, normalization layers, convolutional layers, ResNet blocks, and group-equivariant linear layers. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions unchanged. We validate the effectiveness of our method on several metanetwork tasks over diverse neural network architectures.	翻訳日:2024-01-03 00:48:11 公開日:2023-12-29
# MACCA:Causal Credit Assignmentによるオフラインマルチエージェント強化学習 MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment ( http://arxiv.org/abs/2312.03644v2 ) ライセンス: Link先を確認	Ziyan Wang, Yali Du, Yudi Zhang, Meng Fang, Biwei Huang	(参考訳) オフラインマルチエージェント強化学習(MARL)は、オンラインインタラクションが非現実的またはリスクのあるシナリオで有用である。 MARLの独立した学習は柔軟性とスケーラビリティを提供するが、オフライン環境で個々のエージェントにクレジットを正確に割り当てることは、環境とのインタラクションが禁止されているため、課題となる。本稿では、オフラインMARL設定におけるクレジット割り当てに対処するため、MACCA(Multi-Agent Causal Credit Assignment)という新しいフレームワークを提案する。我々のアプローチであるMACCAは、生成過程を動的ベイズネットワークとして特徴づけ、環境変数、状態、行動、報酬の関係を捉える。このモデルをオフラインデータ上で推定すると、MACCAは個々の報酬の因果関係を分析し、正確かつ解釈可能なクレジット割り当てを確実にすることで、各エージェントの貢献を学習することができる。さらに、このアプローチのモジュラリティにより、様々なオフラインMARLメソッドとシームレスに統合できます。理論的には、オフラインデータセットの設定の下では、基礎となる因果構造とエージェントの個々の報酬を生成する関数が識別可能であることが証明され、モデリングの正確性の基礎となった。実験では,MACCAが最先端の手法より優れるだけでなく,他のバックボーンと統合した場合の性能も向上することを示した。 Offline Multi-agent Reinforcement Learning (MARL) is valuable in scenarios where online interaction is impractical or risky. While independent learning in MARL offers flexibility and scalability, accurately assigning credit to individual agents in offline settings poses challenges because interactions with an environment are prohibited. In this paper, we propose a new framework, namely Multi-Agent Causal Credit Assignment (MACCA), to address credit assignment in the offline MARL setting. Our approach, MACCA, characterizing the generative process as a Dynamic Bayesian Network, captures relationships between environmental variables, states, actions, and rewards. Estimating this model on offline data, MACCA can learn each agent's contribution by analyzing the causal relationship of their individual rewards, ensuring accurate and interpretable credit assignment. Additionally, the modularity of our approach allows it to seamlessly integrate with various offline MARL methods. Theoretically, we proved that under the setting of the offline dataset, the underlying causal structure and the function for generating the individual rewards of agents are identifiable, which laid the foundation for the correctness of our modeling. In our experiments, we demonstrate that MACCA not only outperforms state-of-the-art methods but also enhances performance when integrated with other backbones.	翻訳日:2024-01-03 00:47:53 公開日:2023-12-29
# beyond isolation: ナレッジグラフ構築を改善するマルチエージェントシナジー Beyond Isolation: Multi-Agent Synergy for Improving Knowledge Graph Construction ( http://arxiv.org/abs/2312.03022v2 ) ライセンス: Link先を確認	Hongbin Ye, Honghao Gui, Aijia Zhang, Tong Liu, Wei Hua, Weiqiang Jia	(参考訳) 知識グラフ構築(KGC)は、エンティティ、関係、イベントの抽出を含む多面的な作業である。伝統的に、大規模言語モデル(llm)はこの複雑な状況において単独のタスク解決エージェントと見なされてきた。しかし,本稿では,新しいフレームワークである cooperkgc を導入することで,このパラダイムに挑戦する。従来のアプローチとは別に、CooperKGCは協調処理ネットワークを確立し、エンティティ、リレーショナル、イベント抽出タスクを同時に処理できるKGCコラボレーションチームを構成する。我々の実験は、CooperKGC内の多様なエージェント間の協調と情報相互作用の促進が、単独で動作している個々の認知プロセスよりも優れた結果をもたらすことを示した。重要な点として,cooperkgcによるコラボレーションは,複数のインタラクションをまたいだ知識選択,修正,集約能力の向上に寄与することが明らかとなった。 Knowledge graph construction (KGC) is a multifaceted undertaking involving the extraction of entities, relations, and events. Traditionally, large language models (LLMs) have been viewed as solitary task-solving agents in this complex landscape. However, this paper challenges this paradigm by introducing a novel framework, CooperKGC. Departing from the conventional approach, CooperKGC establishes a collaborative processing network, assembling a KGC collaboration team capable of concurrently addressing entity, relation, and event extraction tasks. Our experiments unequivocally demonstrate that fostering collaboration and information interaction among diverse agents within CooperKGC yields superior results compared to individual cognitive processes operating in isolation. Importantly, our findings reveal that the collaboration facilitated by CooperKGC enhances knowledge selection, correction, and aggregation capabilities across multiple rounds of interactions.	翻訳日:2024-01-03 00:47:02 公開日:2023-12-29
# 一般化時空間インプテーションのための低ランク性トランスフォーマー ImputeFormer: Low Rankness-Induced Transformers for Generalizable Spatiotemporal Imputation ( http://arxiv.org/abs/2312.01728v2 ) ライセンス: Link先を確認	Tong Nie, Guoyang Qin, Wei Ma, Yuewen Mei, and Jian Sun	(参考訳) データの欠如は、科学と工学の両方のタスク、特に時空間データのモデリングにおいて広く問題となっている。この問題は、機械学習ソリューションに貢献するために多くの研究を惹きつける。既存の計算ソリューションには、主に低ランクモデルとディープラーニングモデルが含まれる。一方、低ランクモデルは一般的な構造的優先権を持つが、モデルの容量は限られている。一方、深層学習モデルは、時空間過程の事前知識を欠きながら、表現性の健全な特徴を有する。両パラダイムの強みを活かし,低ランク性によるトランスフォーマーモデルを用いて,強い帰納バイアスと高いモデル表現率のバランスを実現する。時空間データの固有構造を活用することにより、バランスの取れた信号-雑音表現を学習し、様々な計算問題に応用できる。交通速度,交通量,太陽エネルギー,スマートメータリング,空気品質など,異種データセットの精度,効率,一般性において,その優位性を示す。包括的ケーススタディにより、解釈可能性をさらに強化する。実証結果の証明は、低ランク性のような時系列プリミティブを組み込むことで、広範囲の時空間計算問題にアプローチする一般化可能なモデルの開発を大幅に促進できるという強い信念を与える。 Missing data is a pervasive issue in both scientific and engineering tasks, especially for the modeling of spatiotemporal data. This problem attracts many studies to contribute to machine learning solutions. Existing imputation solutions mainly include low-rank models and deep learning models. On the one hand, low-rank models assume general structural priors, but have limited model capacity. On the other hand, deep learning models possess salient features of expressivity, while lack prior knowledge of the spatiotemporal process. Leveraging the strengths of both two paradigms, we demonstrate a low rankness-induced Transformer model to achieve a balance between strong inductive bias and high model expressivity. The exploitation of the inherent structures of spatiotemporal data enables our model to learn balanced signal-noise representations, making it versatile for a variety of imputation problems. We demonstrate its superiority in terms of accuracy, efficiency, and generality in heterogeneous datasets, including traffic speed, traffic volume, solar energy, smart metering, and air quality. Comprehensive case studies are performed to further strengthen interpretability. Promising empirical results provide strong conviction that incorporating time series primitives, such as low-rank properties, can substantially facilitate the development of a generalizable model to approach a wide range of spatiotemporal imputation problems.	翻訳日:2024-01-03 00:46:49 公開日:2023-12-29
# 事故GPT:マルチモーダル大モデルによるV2X環境認識の事故解析と防止 AccidentGPT: Accident Analysis and Prevention from V2X Environmental Perception with Multi-modal Large Model ( http://arxiv.org/abs/2312.13156v3 ) ライセンス: Link先を確認	Lening Wang, Yilong Ren, Han Jiang, Pinlong Cai, Daocheng Fu, Tianqi Wang, Zhiyong Cui, Haiyang Yu, Xuesong Wang, Hanchu Zhou, Helai Huang, Yinhai Wang	(参考訳) 交通事故は、人的被害と財産の被害の両方に重要な貢献をするものであり、交通安全の分野で多くの研究者が研究の焦点となっている。しかし、従来の研究では、静的環境アセスメントや動的運転分析、事故前予測や事故後ルール分析に焦点をあてた研究は、通常は孤立して行われている。交通安全の包括的な理解と応用を開発するための効果的な枠組みが欠如している。このギャップに対処するために,本研究では,総合的な事故解析とマルチモーダル大モデルであるAccidentGPTを紹介する。事故GPTは,交通安全分野における事故解析と防止に対する総合的なアプローチを可能にする,マルチセンサ認識に基づくマルチモーダル情報インタラクションフレームワークを確立する。具体的には, 自律走行車では, 総合的な環境認識と, 車両の制御と衝突回避のための理解を提供する。人間の運転する車両では、プロアクティブな長距離安全警告と盲点警報を提供すると同時に、人間と機械の対話と対話を通じて安全運転の推奨と行動規範を提供する。さらに,交通警察や交通管理機関では,歩行者,車両,道路,環境などを含む交通安全のインテリジェントかつリアルタイムな分析を,複数の車両や道路試験装置からの協調的な認識を通じて支援している。このシステムはまた、車両衝突後の事故原因と責任を徹底的に分析することができる。我々のフレームワークは交通安全研究に総合的なシーン理解を統合する最初の大規模モデルである。プロジェクトページ: https://accidentgpt.github.io Traffic accidents, being a significant contributor to both human casualties and property damage, have long been a focal point of research for many scholars in the field of traffic safety. However, previous studies, whether focusing on static environmental assessments or dynamic driving analyses, as well as pre-accident predictions or post-accident rule analyses, have typically been conducted in isolation. There has been a lack of an effective framework for developing a comprehensive understanding and application of traffic safety. To address this gap, this paper introduces AccidentGPT, a comprehensive accident analysis and prevention multi-modal large model. AccidentGPT establishes a multi-modal information interaction framework grounded in multi-sensor perception, thereby enabling a holistic approach to accident analysis and prevention in the field of traffic safety. Specifically, our capabilities can be categorized as follows: for autonomous driving vehicles, we provide comprehensive environmental perception and understanding to control the vehicle and avoid collisions. For human-driven vehicles, we offer proactive long-range safety warnings and blind-spot alerts while also providing safety driving recommendations and behavioral norms through human-machine dialogue and interaction. Additionally, for traffic police and management agencies, our framework supports intelligent and real-time analysis of traffic safety, encompassing pedestrian, vehicles, roads, and the environment through collaborative perception from multiple vehicles and road testing devices. The system is also capable of providing a thorough analysis of accident causes and liability after vehicle collisions. Our framework stands as the first large model to integrate comprehensive scene understanding into traffic safety studies. Project page: https://accidentgpt.github.io	翻訳日:2024-01-02 21:04:59 公開日:2023-12-29
# サドル支配スクランブルにおけるスプレッド複雑性 Spread complexity in saddle-dominated scrambling ( http://arxiv.org/abs/2312.12593v2 ) ライセンス: Link先を確認	Kyoung-Bum Huh, Hyun-Sik Jeong, Juan F. Pedraza	(参考訳) 近年、量子システムの複雑性とカオス性の尺度として、拡散複雑性の概念krylov complexity for statesが導入された。本稿では,サドル支配スクランブルを示す<emph{integrable} 系における熱場二重状態の拡散複雑性について検討する。具体的には,saddle-dominated scramblingを特徴とする量子力学系の代表的な例として,リプキン・メシュコフ・グリックモデルと逆調和振動子に着目した。 Lanczosアルゴリズムの適用により,これらのシステムにおける拡散複雑性は,特異なランプピーク・スロープ・プレートパターンを呈し,emph{chaotic}システムに類似した特徴を示すことが明らかとなった。その結果、拡散複雑性は貴重なプローブとして機能するが、真の量子カオスを正確に診断するには、一般に追加の物理入力が必要であることが示された。また,拡散複雑性,スペクトル形状因子,クリロフ空間内の遷移確率との関係についても検討した。我々は,計算結果の分析的確認を行い,複雑性のehrenfest定理を検証し,拡散複雑性の早い段階での異なる二次的挙動を同定する。 Recently, the concept of spread complexity, Krylov complexity for states, has been introduced as a measure of the complexity and chaoticity of quantum systems. In this paper, we study the spread complexity of the thermofield double state within \emph{integrable} systems that exhibit saddle-dominated scrambling. Specifically, we focus on the Lipkin-Meshkov-Glick model and the inverted harmonic oscillator as representative examples of quantum mechanical systems featuring saddle-dominated scrambling. Applying the Lanczos algorithm, our numerical investigation reveals that the spread complexity in these systems exhibits features reminiscent of \emph{chaotic} systems, displaying a distinctive ramp-peak-slope-plateau pattern. Our results indicate that, although spread complexity serves as a valuable probe, accurately diagnosing true quantum chaos generally necessitates additional physical input. We also explore the relationship between spread complexity, the spectral form factor, and the transition probability within the Krylov space. We provide analytical confirmation of our numerical results, validating the Ehrenfest theorem of complexity and identifying a distinct quadratic behavior in the early-time regime of spread complexity.	翻訳日:2024-01-02 21:03:36 公開日:2023-12-29
# 編集できますか? 大規模言語モデルによるコード編集指導の追跡能力の評価 Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions ( http://arxiv.org/abs/2312.12450v3 ) ライセンス: Link先を確認	Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Anton Lozhkov, Carolyn Jane Anderson, Arjun Guha	(参考訳) 様々なコード合成タスクのための大規模言語モデルの開発と評価に、かなりの量の研究が集中している。これには、自然言語命令からのコード合成、コードからのテストの合成、コードの説明の合成が含まれる。対照的に、LLMを用いた命令コード編集の動作について検討する。これらはモデルがプロンプトで提供されるコードのブロックを更新するよう指示されるタスクである。編集命令は、追加または削除する機能、バグの説明、修正の要求、異なる種類のソリューションの要求、その他の多くの一般的なコード編集タスクを要求できる。コード編集タスクのベンチマークを慎重に作成し,いくつかの最先端LCMを評価した。我々の評価は、最先端のオープンモデルとクローズドモデルの間の大きなギャップを露呈する。例えば、GPT-3.5-Turboでさえ、コード編集において最高のオープンモデルよりも8.8%良い。また、新しく、慎重にキュレートされ、パーミッシブにライセンスされたコード編集セットと自然言語命令も導入しました。このトレーニングセットを使うことで、オープンコードllmを微調整して、コード編集能力を大幅に改善できることを示します。 A significant amount of research is focused on developing and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language instructions, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is instructed to update a block of code provided in a prompt. The editing instruction may ask for a feature to added or removed, describe a bug and ask for a fix, ask for a different kind of solution, or many other common code editing tasks. We introduce a carefully crafted benchmark of code editing tasks and use it evaluate several cutting edge LLMs. Our evaluation exposes a significant gap between the capabilities of state-of-the-art open and closed models. For example, even GPT-3.5-Turbo is 8.8% better than the best open model at editing code. We also introduce a new, carefully curated, permissively licensed training set of code edits coupled with natural language instructions. Using this training set, we show that we can fine-tune open Code LLMs to significantly improve their code editing capabilities.	翻訳日:2024-01-02 21:03:15 公開日:2023-12-29
# 大規模言語モデルのための検索型生成:調査 Retrieval-Augmented Generation for Large Language Models: A Survey ( http://arxiv.org/abs/2312.10997v2 ) ライセンス: Link先を確認	Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang and Haofen Wang	(参考訳) 大きな言語モデル(LLM)は重要な能力を示すが、幻覚、時代遅れの知識、不透明で追跡不能な推論プロセスといった課題に直面している。 Augmented Generation (RAG) は、外部データベースからのリアルタイムデータを LLM 応答に組み込むことによって、これらの問題に対する有望な解決策として登場した。これによってモデル、特に知識集約型タスクの正確性と信頼性が向上し、継続的な知識更新とドメイン固有情報の統合が可能になる。 RAG は LLM の本質的な知識と外部データベースの巨大な動的リポジトリを相乗的に統合する。本稿では,RAGの進化を詳細に分析し,Naive RAG,Advanced RAG,Modular RAGの3つのパラダイムに着目した。 RAGシステムの3つの基本コンポーネント(レトリバー、ジェネレータ、拡張方法)を方法論的に検討し、各コンポネネット内の最先端技術について検討する。さらに、RAGモデルを評価するための新しいメトリクスと機能や、最新の評価フレームワークについても紹介する。最後に,今後の課題,モダリティの拡張,RAG技術スタックとエコシステムの開発という3つの視点から,今後の研究方向性を概説する。 Large Language Models (LLMs) demonstrate significant capabilities but face challenges such as hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Augmented Generation (RAG) has emerged as a promising solution to these issues by incorporating real-time data from external databases into LLM responses. This enhances the accuracy and credibility of the models, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of external databases. This survey paper provides an in-depth analysis of the evolution of RAG, focusing on three key paradigms: Naive RAG, Advanced RAG, and Modular RAG. It methodically examines the three fundamental components of RAG systems: the retriever, the generator, and the augmentation methods, underscoring the cutting-edge technologies within each componenet. Additionally, the paper introduces novel metrics and capabilities for evaluating RAG models, as well as the most recent evaluation framework. Finally, the paper outlines future research directions from three perspectives: future challenges,modality extension,and the development of the RAG technical stack and ecosystem	翻訳日:2024-01-02 21:01:58 公開日:2023-12-29
# 差分強度検出とパリティ検出に基づくマッハ・ツェンダー干渉計の最適非ガウス演算 Optimal non-Gaussian operations in difference-intensity detection and parity detection-based Mach-Zehnder interferometer ( http://arxiv.org/abs/2312.10774v2 ) ライセンス: Link先を確認	Manali Verma, Chandan Kumar, Karunesh K. Mishra, and Prasanta K. Panigrahi	(参考訳) 位相推定における確率的非ガウス演算の利点を差分強度とパリティ検出に基づくマッハ・ツェンダー干渉計(MZI)を用いて検討する。我々は,光子サブトラクション(PS),光子付加(PA),光子触媒(PC)の3種類の非ガウス的操作を単一モード圧縮真空(SSV)状態で行う実験的に実装可能なモデルを考える。差分強度検出に基づくMZIでは、2つのPC操作が最も最適であるのに対し、パリティ検出に基づくMZIでは2つのPA操作が最も最適なプロセスとして現れる。また,本研究は実験家にとって有益であるように,最高の性能で対応するスクイージングパラメータと透過率パラメータも提供してきた。さらに, モーメント生成関数の一般表現を導出し, ホモダイン検出や二次ホモダイン検出などの他の検出手法の探索に有用である。 We investigate the benefits of probabilistic non-Gaussian operations in phase estimation using difference-intensity and parity detection-based Mach-Zehnder interferometers (MZI). We consider an experimentally implementable model to perform three different non-Gaussian operations, namely photon subtraction (PS), photon addition (PA), and photon catalysis (PC) on a single-mode squeezed vacuum (SSV) state. In difference-intensity detection-based MZI, two PC operation is found to be the most optimal, while for parity detection-based MZI, two PA operation emerges as the most optimal process. We have also provided the corresponding squeezing and transmissivity parameters at best performance, making our study relevant for experimentalists. Further, we have derived the general expression of moment-generating function, which shall be useful in exploring other detection schemes such as homodyne detection and quadratic homodyne detection.	翻訳日:2024-01-02 21:01:10 公開日:2023-12-29
# 連続可変量子鍵分布の実用繊維による有限離開時のセキュリティ解析 The Security Analysis of Continuous-Variable Quantum Key Distribution under Limited Eavesdropping with Practical Fiber ( http://arxiv.org/abs/2312.16206v2 ) ライセンス: Link先を確認	Sheng Liu, Lu Fan, Zhengyu Li, Qiang Zhou, Yunbo Li, Dong Wang, Dechao Zhang, Yichen Zhang, and Han Li	(参考訳) 実用条件下での最適盗聴モデルの研究は、セキュアな情報伝達に量子鍵分布(QKD)システムを用いる場合の現実的なリスクを評価するのに役立つ。直感的には、繊維の損失は、盗聴者によって収穫されるのではなく、環境への光エネルギーの漏出につながり、qkdシステムの性能を実用的に向上しながら盗聴能力を制限する。しかし、チャネルが正規パートナーの制御外であり、漏洩信号が検出できないため、損失ファイバの存在下で最適な盗聴モデルを定義することは困難である。本稿では,2つの遠隔局と共用絡み込み源を必要とする遠隔地攻撃モデルに基づいて,ファイバロスが盗聴能力に与える影響について検討する。実際の損失により分散した絡み合いが制限されると、2つのテレポーテーションステーションを1つにマージして送信サイトの近くに配置すると最適な攻撃が起こり、これは絡み合い攻撃と類似するがワイヤーテーピング比が低下する。 Eveが最高のホロウコアファイバーを使用していると仮定すると、実用環境での秘密鍵レートは理想の盗聴よりも20%から40%高い。エンタングルメント蒸留技術が十分に成熟し、高品質な分散エンタングルメントを提供することができるなら、2つのテレポーテーションステーションは、盗聴性能を向上させるために遠距離分離されるべきであり、盗聴は最適な集団攻撃に近づくことさえ可能である。現在の絡み合い浄化技術の下では、避けられない繊維の損失は、盗聴能力を大幅に制限し、現実的なシステムの秘密鍵レートと送信距離を高め、実用的な応用シナリオにおけるQKDシステムの開発を促進することができる。 Research on optimal eavesdropping models under practical conditions will help to evaluate realistic risk when employing quantum key distribution (QKD) system for secure information transmission. Intuitively, fiber loss will lead to the optical energy leaking to the environment, rather than harvested by the eavesdropper, which also limits the eavesdropping ability while improving the QKD system performance in practical use. However, defining the optimal eavesdropping model in the presence of lossy fiber is difficult because the channel is beyond the control of legitimate partners and the leaked signal is undetectable. Here we investigate how the fiber loss influences the eavesdropping ability based on a teleportation-based collective attack model which requires two distant stations and a shared entanglement source. We find that if the distributed entanglement is limited due to the practical loss, the optimal attack occurs when the two teleportation stations are merged to one and placed close to the transmitter site, which performs similar to the entangling-cloning attack but with a reduced wiretapping ratio. Assuming Eve uses the best available hollow-core fiber, the secret key rate in the practical environment can be 20%~40% higher than that under ideal eavesdropping. While if the entanglement distillation technology is mature enough to provide high quality of distributed entanglement, the two teleportation stations should be distantly separated for better eavesdropping performance, where the eavesdropping can even approach the optimal collective attack. Under the current level of entanglement purification technology, the unavoidable fiber loss can still greatly limit the eavesdropping ability as well as enhance the secret key rate and transmission distance of the realistic system, which promotes the development of QKD systems in practical application scenarios.	翻訳日:2024-01-02 20:27:08 公開日:2023-12-29
# SymmPI: グループ対称性を持つデータの予測推論 SymmPI: Predictive Inference for Data with Group Symmetries ( http://arxiv.org/abs/2312.16160v2 ) ライセンス: Link先を確認	Edgar Dobriban, Mengxin Yu	(参考訳) 予測の不確かさの定量化は、現代の統計学において核となる問題である。予測推論の手法は様々な仮定の下で開発されており、例えば標準共形予測では、置換群のような特別な変換群の下でデータの分布の不変性に依存することが多い。さらに,既存の予測手法の多くは,特徴出力観測の順序で観測されていない結果を予測することを目的としている。一方、より一般的な観測モデル(例えば、部分的に観測された特徴)の下での予測推論や、より一般的な分布対称性を満たすデータ(例えば、物理学における回転不変あるいは座標非依存観察)に関心がある。本稿では,データ分布が任意の観測モデルに一般群対称性を持つ場合の予測推論手法であるSymbPIを提案する。本手法は,分布不変性を維持しつつデータを処理する分布同変変換の新たな概念を利用する。 symmpiは分布不変性下で有効なカバレッジを有し,分布シフト時の性能を特徴付け,最近の結果を特殊事例として回収する。ネットワーク内の頂点に関連付けられた未観測値を予測するために,SymmPIを適用した。 2層階層モデルにおけるいくつかのシミュレーションと経験的データ分析の例では、symmpiは既存の手法と比較して好適に機能する。 Quantifying the uncertainty of predictions is a core problem in modern statistics. Methods for predictive inference have been developed under a variety of assumptions, often -- for instance, in standard conformal prediction -- relying on the invariance of the distribution of the data under special groups of transformations such as permutation groups. Moreover, many existing methods for predictive inference aim to predict unobserved outcomes in sequences of feature-outcome observations. Meanwhile, there is interest in predictive inference under more general observation models (e.g., for partially observed features) and for data satisfying more general distributional symmetries (e.g., rotationally invariant or coordinate-independent observations in physics). Here we propose SymmPI, a methodology for predictive inference when data distributions have general group symmetries in arbitrary observation models. Our methods leverage the novel notion of distributional equivariant transformations, which process the data while preserving their distributional invariances. We show that SymmPI has valid coverage under distributional invariance and characterize its performance under distribution shift, recovering recent results as special cases. We apply SymmPI to predict unobserved values associated to vertices in a network, where the distribution is unchanged under relabelings that keep the network structure unchanged. In several simulations in a two-layer hierarchical model, and in an empirical data analysis example, SymmPI performs favorably compared to existing methods.	翻訳日:2024-01-02 20:26:36 公開日:2023-12-29
# ローゼン・モース散乱状態に対するルジャンドル関数の一般化 Generalization of Legendre functions applied to Rosen-Morse scattering states ( http://arxiv.org/abs/2312.15652v2 ) ライセンス: Link先を確認	F. L. Freitas	(参考訳) 関連するレジェンド関数の一般化が提案され、ローゼン・モースポテンシャルの散乱状態を記述するために用いられる。関数は、超幾何関数の言葉で明示的な式が与えられ、その漸近的な振る舞いを調べ、全反射領域と部分反射領域の状態の要求に合致するように示される。反射係数と透過係数の基本的な式が与えられ、一般化されたルジャンドル関数の積分恒等式が証明され、散乱状態に対する誘導積分変換のスペクトル測度が計算される。これらの手法は、経路積分法を必要とせず、ポテンシャルに対する完全な古典解を与える。 A generalization of associated Legendre functions is proposed and used to describe the scattering states of the Rosen-Morse potential. The functions are then given explicit formulas in terms of the hypergeometric function, their asymptotic behavior is examined and shown to match the requirements for states in the regions of total and partial reflection. Elementary expressions are given for reflection and transmission coefficients, and an integral identity for the generalized Legendre functions is proven, allowing the calculation of the spectral measure of the induced integral transform for the scattering states. These methods provide a complete classical solution to the potential, without need of path integral techniques.	翻訳日:2024-01-02 20:25:58 公開日:2023-12-29
# SOLAR 10.7B: 単純だが効果的なアップスケーリングによる大規模言語モデルのスケーリング SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling ( http://arxiv.org/abs/2312.15166v2 ) ライセンス: Link先を確認	Dahyun Kim, Chanjun Park, Sanghoon Kim, Wonsung Lee, Wonho Song, Yunsu Kim, Hyeonwoo Kim, Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee, Sunghun Kim	(参考訳) 我々は107億のパラメータを持つ大規模言語モデル(LLM)であるSOLAR 10.7Bを紹介し、様々な自然言語処理(NLP)タスクにおいて優れた性能を示す。近年の大規模llmの効率化に触発されて,深度拡大スケーリング(dus, depth up-scaling)と呼ばれるllmのスケーリング手法を提案する。他のLLMアップスケーリング手法とは異なり、DUSはトレーニングや推論を効率的に行うのに複雑な変更を必要としない。実験により, DUS は単純だが, 高速 LLM のスケールアップには有効であることがわかった。 dusモデルに基づいて、さらに、命令追従機能のために微調整された変種であるsolar 10.7b-instructを、mixtral-8x7b-instructを上回っている。 solar 10.7bはapache 2.0ライセンスの下で公開されており、llm分野の幅広いアクセスとアプリケーションを促進する。 We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present a method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise scaling and continued pretraining. In contrast to other LLM up-scaling methods that use mixture-of-experts, DUS does not require complex changes to train and inference efficiently. We show experimentally that DUS is simple yet effective in scaling up high-performance LLMs from small ones. Building on the DUS model, we additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B-Instruct. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.	翻訳日:2024-01-02 20:25:47 公開日:2023-12-29
# sc-gs: 編集可能な動的シーンのためのスパース制御ガウススプレート SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes ( http://arxiv.org/abs/2312.14937v2 ) ライセンス: Link先を確認	Yi-Hua Huang and Yang-Tian Sun and Ziyi Yang and Xiaoyang Lyu and Yan-Pei Cao and Xiaojuan Qi	(参考訳) ダイナミックシーンのための新しいビュー合成は、コンピュータビジョンとグラフィックスにおいて依然として困難な問題である。近年,静的なシーンを表現し,高品質でリアルタイムな新規ビュー合成を実現するための堅牢な手法としてガウススプラッティングが登場している。この手法に基づき,動的シーンの動きと外観を,それぞれ疎い制御点と密集したガウス型に明示的に分解する新しい表現法を提案する。我々のキーとなる考え方は、3次元ガウスの運動場を得るために、学習補間重みを通して局所的に補間できるコンパクトな6DF変換基底を学ぶために、ガウス変換よりもはるかに少ないスパース制御点を使用することである。変形MLPを用いて各制御点の時間変化6 DoF変換を予測し,学習の複雑さを低減し,学習能力を高め,時間的および空間的コヒーレントな動作パターンの獲得を容易にする。次に,3次元ガウス,制御点の標準空間位置,変形MLPを共同で学習し,3次元シーンの外観,幾何学,ダイナミックスを再構築する。学習中、異なる領域の異なる運動複雑度に対応するために制御点の位置と個数を適応的に調整し、学習運動の空間的連続性と局所的剛性をできるだけ厳密な原理に従ってARAP損失を発生させる。最後に, 明示的なスパースモーション表現と外観からの分解により, 高忠実性を維持しつつ, ユーザ制御によるモーション編集を実現する。広汎な実験により,本手法は,新しいビュー合成手法を高速で実現し,新しい外観保存型モーション編集アプリケーションを実現する。プロジェクトページ:https://yihua7.github.io/SC-GS-web/ Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. Recently, Gaussian splatting has emerged as a robust technique to represent static scenes and enable high-quality and real-time novel view synthesis. Building upon this technique, we propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse control points and dense Gaussians, respectively. Our key idea is to use sparse control points, significantly fewer in number than the Gaussians, to learn compact 6 DoF transformation bases, which can be locally interpolated through learned interpolation weights to yield the motion field of 3D Gaussians. We employ a deformation MLP to predict time-varying 6 DoF transformations for each control point, which reduces learning complexities, enhances learning abilities, and facilitates obtaining temporal and spatial coherent motion patterns. Then, we jointly learn the 3D Gaussians, the canonical space locations of control points, and the deformation MLP to reconstruct the appearance, geometry, and dynamics of 3D scenes. During learning, the location and number of control points are adaptively adjusted to accommodate varying motion complexities in different regions, and an ARAP loss following the principle of as rigid as possible is developed to enforce spatial continuity and local rigidity of learned motions. Finally, thanks to the explicit sparse motion representation and its decomposition from appearance, our method can enable user-controlled motion editing while retaining high-fidelity appearances. Extensive experiments demonstrate that our approach outperforms existing approaches on novel view synthesis with a high rendering speed and enables novel appearance-preserved motion editing applications. Project page: https://yihua7.github.io/SC-GS-web/	翻訳日:2024-01-02 20:24:22 公開日:2023-12-29
# 量子実時間発展のためのテンソル正規化群法 Tensor Renormalization Group Methods for Quantum Real-time Evolution ( http://arxiv.org/abs/2312.14825v2 ) ライセンス: Link先を確認	Michael Hite and Yannick Meurice	(参考訳) 格子ゲージ理論における実時間発展のab-initio計算は、非常に興味深い応用であるが、計算の難解な側面を提示している。ユークリッド時間格子場理論の文脈で開発されたテンソル再正規化群法は, トロタライズ展開作用素のリアルタイム計算に応用できることを示す。本稿では,各種観測器の切断手順の最適化について検討する。この数値解法を1次元量子イジングモデルに適用し,順序相の外部横場を用いて計算を行い,$n_{s}=4$および8サイトの普遍量子計算と比較する。 Ab-initio calculations of real-time evolution for lattice gauge theory have very interesting potential applications but present challenging computational aspects. We show that tensor renormalization group methods developed in the context of Euclidean-time lattice field theory can be applied to calculation of Trotterized evolution operators at real time. We discuss the optimization of truncation procedures for various observables. We apply the numerical methods to the 1D Quantum Ising Model with an external transverse field in the ordered phase and compare with universal quantum computing for $N_{s}=4$ and 8 sites.	翻訳日:2024-01-02 20:23:49 公開日:2023-12-29
# Large Language Model (LLM) Bias Index -- LLMBI Large Language Model (LLM) Bias Index -- LLMBI ( http://arxiv.org/abs/2312.14769v3 ) ライセンス: Link先を確認	Abiodun Finbarrs Oketunji, Muhammad Anas, Deepthi Saina	(参考訳) LLMBI(Large Language Model Bias Index)は、GPT-4のような大規模言語モデル(LLM)に固有のバイアスを定量化し、対処するための先駆的なアプローチである。多様な分野におけるLSMの普及と影響を認識している。本研究は,モデル応答を誘発する可能性のあるバイアスを系統的に測定し緩和する新しい計量 LLMBI を導入する。年齢,性別,人種的偏見に限らず,多次元の偏見を取り入れた複合スコアリングシステムを用いたLSMBIの定式化を行った。このメトリクスを運用するには, LLM応答の収集と注釈付け, バイアス検出のための洗練された自然言語処理(NLP)技術の適用, 特殊な数学的公式による LLMBI スコアの計算を含む多段階的なプロセスに携わる。この公式は、様々なバイアス次元の重み付け平均値、データセットの多様性の欠陥に対するペナルティ、感情バイアスに対する補正を統合する。 OpenAIのAPIからの応答を用いた実証分析では,バイアス検出の代表的な方法として,高度な感情分析を採用している。この研究は、LLMがテキスト生成において印象的な能力を示す一方で、異なる次元にまたがる様々なバイアスを示すことを明らかにしている。 LLMBIは、モデルと時間とともにバイアスを比較するための定量尺度を提供し、LLMの公平性と信頼性を高める上で、システムエンジニア、研究者、規制当局にとって重要なツールを提供する。偏見のない人間のような反応を模倣するLLMの可能性を強調している。さらに、社会規範や倫理基準の進化に合わせて、そのようなモデルを継続的に監視し、再検討する必要性を強調している。 The Large Language Model Bias Index (LLMBI) is a pioneering approach designed to quantify and address biases inherent in large language models (LLMs), such as GPT-4. We recognise the increasing prevalence and impact of LLMs across diverse sectors. This research introduces a novel metric, LLMBI, to systematically measure and mitigate biases potentially skewing model responses. We formulated LLMBI using a composite scoring system incorporating multiple dimensions of bias, including but not limited to age, gender, and racial biases. To operationalise this metric, we engaged in a multi-step process involving collecting and annotating LLM responses, applying sophisticated Natural Language Processing (NLP) techniques for bias detection, and computing the LLMBI score through a specially crafted mathematical formula. The formula integrates weighted averages of various bias dimensions, a penalty for dataset diversity deficiencies, and a correction for sentiment biases. Our empirical analysis, conducted using responses from OpenAI's API, employs advanced sentiment analysis as a representative method for bias detection. The research reveals LLMs, whilst demonstrating impressive capabilities in text generation, exhibit varying degrees of bias across different dimensions. LLMBI provides a quantifiable measure to compare biases across models and over time, offering a vital tool for systems engineers, researchers and regulators in enhancing the fairness and reliability of LLMs. It highlights the potential of LLMs in mimicking unbiased human-like responses. Additionally, it underscores the necessity of continuously monitoring and recalibrating such models to align with evolving societal norms and ethical standards.	翻訳日:2024-01-02 20:23:39 公開日:2023-12-29
# 強化学習に基づく列生成のための複数カラム選択戦略 A Reinforcement-Learning-Based Multiple-Column Selection Strategy for Column Generation ( http://arxiv.org/abs/2312.14213v2 ) ライセンス: Link先を確認	Haofeng Yuan, Lichang Fang, Shiji Song	(参考訳) カラム生成(CG)は、大規模線形プログラミング(LP)問題を解決する最も成功した手法の一つである。非常に多くの変数(列)を持つLPが与えられた場合、CGの考え方は列のサブセットのみを明示的に考慮し、目的値を改善するために潜在的カラムを反復的に追加することである。最も負のコストでカラムを追加するとcgの収束が保証されるが、単一のカラムではなく、イテレーション毎に複数のカラムを追加することがより高速な収束につながることが示されている。しかし、多数の候補列から最も有望な列を選択するために、複数列選択戦略を設計することは依然として課題である。本稿では,新しい強化学習ベース(RL)マルチカラム選択戦略を提案する。私たちの知る限りでは、cgに対するrlベースの最初のマルチカラム選択戦略です。本手法の有効性は,カットストック問題とグラフカラー問題という2つの問題に対して評価される。 RLをベースとした複数カラム選択戦略は, 広く使用されている単一カラムと複数カラムの選択戦略と比較して, より高速に収束し, CGイテレーション数や実行回数を大幅に削減する。 Column generation (CG) is one of the most successful approaches for solving large-scale linear programming (LP) problems. Given an LP with a prohibitively large number of variables (i.e., columns), the idea of CG is to explicitly consider only a subset of columns and iteratively add potential columns to improve the objective value. While adding the column with the most negative reduced cost can guarantee the convergence of CG, it has been shown that adding multiple columns per iteration rather than a single column can lead to faster convergence. However, it remains a challenge to design a multiple-column selection strategy to select the most promising columns from a large number of candidate columns. In this paper, we propose a novel reinforcement-learning-based (RL) multiple-column selection strategy. To the best of our knowledge, it is the first RL-based multiple-column selection strategy for CG. The effectiveness of our approach is evaluated on two sets of problems: the cutting stock problem and the graph coloring problem. Compared to several widely used single-column and multiple-column selection strategies, our RL-based multiple-column selection strategy leads to faster convergence and achieves remarkable reductions in the number of CG iterations and runtime.	翻訳日:2024-01-02 20:22:22 公開日:2023-12-29
# 二重単位回路の基本電荷 Fundamental charges for dual-unitary circuits ( http://arxiv.org/abs/2312.14148v2 ) ライセンス: Link先を確認	Tom Holden-Dye, Lluis Masanes, Arijeet Pal	(参考訳) デュアルユニタリ量子回路は、近年、多体量子力学の解析的扱いやすいモデルとして注目を集めている。ブリックワーク」パターンで配置された2量子ゲートの1+1D格子を構成するこれらのモデルは、空間と時間の役割を交換して各ゲートがユニタリでなければならないという制約によって定義される。この二重ユニタリ性は、これらの回路における局所作用素のダイナミクスを制限する:そのような作用素の支持は、回路の幾何学によって設定された因果光円錐の端の1つまたは両方に沿って、システムの有効光速で成長しなければならない。この特性を用いて、1+1D双対ユニタリ回路の場合、幅-$w$保存密度の集合($w$連続部位で支えられた演算子から構成される)は幅-$w$ソリトン演算子の集合と一対一の対応であり、乗算位相までは、双対ユニタリ力学により光の有効速度で空間的に変換される。これらの多体ソリトンを構成するいくつかの方法(具体的には局所ヒルベルト空間次元$d=2$)が証明される: 第一に、より小さく構成的なソリトン積を含む単純な構成、第二に、より小さなソリトン積として単に理解できない構成によって、ヨルダン・ウィグナー変換の下でのフェルミオンの積の正確な解釈を持つ。これにより、複雑な多体ソリトン(量子ビット上の双対ユニタリ回路)の微視的構造を特徴づける部分的な進歩がもたらされる一方で、フェルミオンモデルと双対ユニタリ回路の間のリンクが確立され、この枠組みで探究できる物理学の理解が促進される。 Dual-unitary quantum circuits have recently attracted attention as an analytically tractable model of many-body quantum dynamics. Consisting of a 1+1D lattice of 2-qudit gates arranged in a 'brickwork' pattern, these models are defined by the constraint that each gate must remain unitary under swapping the roles of space and time. This dual-unitarity restricts the dynamics of local operators in these circuits: the support of any such operator must grow at the effective speed of light of the system, along one or both of the edges of a causal light cone set by the geometry of the circuit. Using this property, it is shown here that for 1+1D dual-unitary circuits the set of width-$w$ conserved densities (constructed from operators supported over $w$ consecutive sites) is in one-to-one correspondence with the set of width-$w$ solitons - operators which, up to a multiplicative phase, are simply spatially translated at the effective speed of light by the dual-unitary dynamics. A number of ways to construct these many-body solitons (explicitly in the case where the local Hilbert space dimension $d=2$) are then demonstrated: firstly, via a simple construction involving products of smaller, constituent solitons; and secondly, via a construction which cannot be understood as simply in terms of products of smaller solitons, but which does have a neat interpretation in terms of products of fermions under a Jordan-Wigner transformation. This provides partial progress towards a characterisation of the microscopic structure of complex many-body solitons (in dual-unitary circuits on qubits), whilst also establishing a link between fermionic models and dual-unitary circuits, advancing our understanding of what kinds of physics can be explored in this framework.	翻訳日:2024-01-02 20:22:03 公開日:2023-12-29
# DiffusionGAN3D: 3D GANとDiffusion Priorを併用したテキスト誘導3D生成とドメイン適応 DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors ( http://arxiv.org/abs/2312.16837v2 ) ライセンス: Link先を確認	Biwen Lei, Kai Yu, Mengyang Feng, Miaomiao Cui, Xuansong Xie	(参考訳) テキスト誘導型ドメイン適応と3D認識画像の生成は、様々な分野で多くの応用を見出した。しかしながら、トレーニングデータの欠如と、多種多様な幾何学と外観を扱うことの難しさから、これらのタスクの既存の方法は、柔軟性の欠如、不安定性、低忠実性といった問題に苦しめられている。本稿では,3D GANと拡散前処理を組み合わせたテキスト誘導型3Dドメイン適応と生成を促進する新しいフレームワークDiffusionGAN3Dを提案する。具体的には,事前学習した3次元生成モデル(eg3dなど)とテキストから画像への拡散モデルを統合する。前者はテキストから安定した高品質なアバター生成のための強力な基盤を提供する。そして、拡散モデルは、強力で効率的なテキスト誘導ドメイン適応を実現するために、3Dジェネレータの微調整を情報的方向でガイドする。テキスト対アバタールにおけるドメイン適応の多様性と生成能力を高めるために,それぞれ,相対的距離損失とケース固有の学習可能な三面体を導入する。さらに,上述の両タスクのテクスチャ品質を向上させるために,プログレッシブなテクスチャリファインメントモジュールを設計する。広範な実験により、提案フレームワークは、ドメイン適応とテキストからアバレルタスクの両方において優れた結果を達成でき、生成品質と効率の点で既存の方法よりも優れています。プロジェクトのホームページはhttps://younglbw.github.io/DiffusionGAN3D-homepage/にある。 Text-guided domain adaption and generation of 3D-aware portraits find many applications in various fields. However, due to the lack of training data and the challenges in handling the high variety of geometry and appearance, the existing methods for these tasks suffer from issues like inflexibility, instability, and low fidelity. In this paper, we propose a novel framework DiffusionGAN3D, which boosts text-guided 3D domain adaption and generation by combining 3D GANs and diffusion priors. Specifically, we integrate the pre-trained 3D generative models (e.g., EG3D) and text-to-image diffusion models. The former provides a strong foundation for stable and high-quality avatar generation from text. And the diffusion models in turn offer powerful priors and guide the 3D generator finetuning with informative direction to achieve flexible and efficient text-guided domain adaption. To enhance the diversity in domain adaption and the generation capability in text-to-avatar, we introduce the relative distance loss and case-specific learnable triplane respectively. Besides, we design a progressive texture refinement module to improve the texture quality for both tasks above. Extensive experiments demonstrate that the proposed framework achieves excellent results in both domain adaption and text-to-avatar tasks, outperforming existing methods in terms of generation quality and efficiency. The project homepage is at https://younglbw.github.io/DiffusionGAN3D-homepage/.	翻訳日:2024-01-02 19:56:23 公開日:2023-12-29
# DarkShot:低解像度で高画質で暗い画像を照らす DarkShot: Lighting Dark Images with Low-Compute and High-Quality ( http://arxiv.org/abs/2312.16805v2 ) ライセンス: Link先を確認	Jiazhang Zheng, Lei Li, Qiuping Liao, Cheng Li, Li Li, Yangxing Liu	(参考訳) 夜間の撮影は極端に低照度で、主に極低信号対雑音比に起因する困難に遭遇する。現実のデプロイメントでは、実用的なソリューションは視覚的に魅力的な結果を生み出すだけでなく、最小限の計算も必要です。しかし、既存のほとんどの手法は修復性能の改善に焦点を当てているか、品質の犠牲で軽量モデルを採用するかのどちらかである。本稿では,計算量を最小限に抑えつつ,低照度化タスクにおける既存のSOTA手法よりも優れた軽量ネットワークを提案する。提案ネットワークは,Siamese Self-Attention Block (SSAB) と Skip-Channel Attention (SCA) モジュールを組み込んで,グローバルな情報を集約するモデルの能力を高め,高解像度画像に適している。また,低照度画像復元プロセスの解析に基づいて,優れた結果を得るための2段階フレームワークを提案する。我々のモデルは、SOTA復元の品質を維持しながら、最小限の計算でUHD 4K解像度画像を復元することができる。 Nighttime photography encounters escalating challenges in extremely low-light conditions, primarily attributable to the ultra-low signal-to-noise ratio. For real-world deployment, a practical solution must not only produce visually appealing results but also require minimal computation. However, most existing methods are either focused on improving restoration performance or employ lightweight models at the cost of quality. This paper proposes a lightweight network that outperforms existing state-of-the-art (SOTA) methods in low-light enhancement tasks while minimizing computation. The proposed network incorporates Siamese Self-Attention Block (SSAB) and Skip-Channel Attention (SCA) modules, which enhance the model's capacity to aggregate global information and are well-suited for high-resolution images. Additionally, based on our analysis of the low-light image restoration process, we propose a Two-Stage Framework that achieves superior results. Our model can restore a UHD 4K resolution image with minimal computation while keeping SOTA restoration quality.	翻訳日:2024-01-02 19:55:59 公開日:2023-12-29
# 正規および不規則時系列インプットのための連続時間オートエンコーダ Continuous-time Autoencoders for Regular and Irregular Time Series Imputation ( http://arxiv.org/abs/2312.16581v2 ) ライセンス: Link先を確認	Hyowon Wi, Yehjin Shin, Noseong Park	(参考訳) 時系列計算は、時系列の最も基本的なタスクの1つである。実世界の時系列データセットは、しばしば不完全である(または観測が不完全である)。多くの異なる時系列計算法が提案されている。最近のセルフアテンションに基づく手法は最先端のインプテーション性能を示している。しかし、連続時間リカレントニューラルネットワーク(rnn)、すなわちニューラルネットワーク制御微分方程式(ncdes)に基づくインプテーション法を設計するのは、長い間見過ごされてきた。この目的のために、NCDEに基づいて時系列(変分)オートエンコーダを再設計する。提案手法は連続時間オートエンコーダ(cta)と呼ばれ、入力時系列サンプルを(隠れたベクトルではなく)連続した隠れ経路に符号化し、それをデコードして入力を再構成・インデュートする。 4つのデータセットと19のベースラインを用いた実験では、ほぼすべてのケースで最高のインプテーション性能を示す。 Time series imputation is one of the most fundamental tasks for time series. Real-world time series datasets are frequently incomplete (or irregular with missing observations), in which case imputation is strongly required. Many different time series imputation methods have been proposed. Recent self-attention-based methods show the state-of-the-art imputation performance. However, it has been overlooked for a long time to design an imputation method based on continuous-time recurrent neural networks (RNNs), i.e., neural controlled differential equations (NCDEs). To this end, we redesign time series (variational) autoencoders based on NCDEs. Our method, called continuous-time autoencoder (CTA), encodes an input time series sample into a continuous hidden path (rather than a hidden vector) and decodes it to reconstruct and impute the input. In our experiments with 4 datasets and 19 baselines, our method shows the best imputation performance in almost all cases.	翻訳日:2024-01-02 19:55:19 公開日:2023-12-29
# 弱教師付き3次元意味セグメンテーションに対するマルチモダリティアフィニティ推論 Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation ( http://arxiv.org/abs/2312.16578v2 ) ライセンス: Link先を確認	Xiawei Li, Qingyuan Xu, Jing Zhang, Tianyi Zhang, Qian Yu, Lu Sheng, Dong Xu	(参考訳) 3d point cloud semantic segmentationには幅広いアプリケーションがある。近年,シーンレベルのラベルを活用することで,高価な手作業によるアノテーション処理を緩和することを目的とした,制御の弱いポイントクラウドセグメンテーション手法が提案されている。しかし、これらの手法は、RGB-Dスキャンに存在するリッチな幾何学情報(形状やスケールなど)や外観情報(色やテクスチャなど)を効果的に活用していない。さらに、現在のアプローチでは、弱いシーンレベルのラベルから学ぶのに不可欠である特徴抽出ネットワークから推測できる点親和性を完全に活用できない。さらに、従来の研究は、弱教師付き3次元セマンティックセマンティックセグメンテーションにおけるポイントクラウドデータの長期分布による有害な効果を見落としている。そこで本研究では,新たに導入された多モード点親和性推論モジュールを用いて,シーンレベルの弱教師付きポイントクラウドセグメンテーション手法を提案する。本論文で提案する点親和性は,複数モード(例えば,点雲とRGB)の特徴を特徴とし,分類器重みを正規化することにより,カテゴリ分布の先行を必要とせずに,長い尾分布の有害な影響を軽減する。 ScanNetとS3DISベンチマークの大規模な実験により,提案手法の有効性が検証された。コードはhttps://github.com/Sunny599/AAAI24-3DWSSG-MMAで公開されている。 3D point cloud semantic segmentation has a wide range of applications. Recently, weakly supervised point cloud segmentation methods have been proposed, aiming to alleviate the expensive and laborious manual annotation process by leveraging scene-level labels. However, these methods have not effectively exploited the rich geometric information (such as shape and scale) and appearance information (such as color and texture) present in RGB-D scans. Furthermore, current approaches fail to fully leverage the point affinity that can be inferred from the feature extraction network, which is crucial for learning from weak scene-level labels. Additionally, previous work overlooks the detrimental effects of the long-tailed distribution of point cloud data in weakly supervised 3D semantic segmentation. To this end, this paper proposes a simple yet effective scene-level weakly supervised point cloud segmentation method with a newly introduced multi-modality point affinity inference module. The point affinity proposed in this paper is characterized by features from multiple modalities (e.g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution. Extensive experiments on the ScanNet and S3DIS benchmarks verify the effectiveness of our proposed method, which outperforms the state-of-the-art by ~4% to ~6% mIoU. Codes are released at https://github.com/Sunny599/AAAI24-3DWSSG-MMA.	翻訳日:2024-01-02 19:54:36 公開日:2023-12-29
# GRSDet:Few-shot Object Detectionのための局所逆サンプル生成学習 GRSDet: Learning to Generate Local Reverse Samples for Few-shot Object Detection ( http://arxiv.org/abs/2312.16571v2 ) ライセンス: Link先を確認	Hefei Mei, Taijin Zhao, Shiyuan Tang, Heqian Qiu, Lanxiao Wang, Minjian Zhang, Fanman Meng, Hongliang Li	(参考訳) Few-shot Object Detection (FSOD) は、いくつかの新しいクラストレーニングデータを用いてのみオブジェクト検出を実現することを目的としている。既存の手法の多くは、ベースクラスの知識を伝達することで、新しいクラス分布を構築するための移行学習戦略を採用している。しかし、この直接的な方法は、決定空間における新しいクラスと他の類似のカテゴリとを簡単に混同する。この問題に対処するために,プロトタイプ参照フレームに局所逆サンプル(lrsamples)を生成し,新しいクラス分布の中心位置と境界範囲を適応的に調整し,fsodのより識別的な新しいクラスサンプルを学習する。まず, LRSamples の選択規則, LRSamples の生成元, 校正分布中心への拡張を含む Center Calibration Variance Augmentation (CCVA) モジュールを提案する。具体的には,クラス内機能変換器(ifc)をccvaの生成器として設計し,選択規則を学習する。 IFCは、ベーストレーニングから微調整に知識を移すことで、新しいクラス分布を校正するために、豊富な新しいサンプルを生成する。さらに,決定境界からの距離に応じて,サンプルの重要性を適応的に調整する特徴密度境界最適化 (FDBO) モジュールを提案する。類似クラスの高密度領域(閉じた決定境界領域)の重要性を強調し、類似クラスの低密度領域(より決定境界領域)の重みを減少させることで、各カテゴリの明確な決定境界を最適化することができる。提案手法の有効性を示すために広範な実験を行った。提案手法は,DeFRCN と MFDC ベースラインに基づく Pascal VOC と MS COCO データセットに対して一貫した改善を実現する。 Few-shot object detection (FSOD) aims to achieve object detection only using a few novel class training data. Most of the existing methods usually adopt a transfer-learning strategy to construct the novel class distribution by transferring the base class knowledge. However, this direct way easily results in confusion between the novel class and other similar categories in the decision space. To address the problem, we propose generating local reverse samples (LRSamples) in Prototype Reference Frames to adaptively adjust the center position and boundary range of the novel class distribution to learn more discriminative novel class samples for FSOD. Firstly, we propose a Center Calibration Variance Augmentation (CCVA) module, which contains the selection rule of LRSamples, the generator of LRSamples, and augmentation on the calibrated distribution centers. Specifically, we design an intra-class feature converter (IFC) as the generator of CCVA to learn the selecting rule. By transferring the knowledge of IFC from the base training to fine-tuning, the IFC generates plentiful novel samples to calibrate the novel class distribution. Moreover, we propose a Feature Density Boundary Optimization (FDBO) module to adaptively adjust the importance of samples depending on their distance from the decision boundary. It can emphasize the importance of the high-density area of the similar class (closer decision boundary area) and reduce the weight of the low-density area of the similar class (farther decision boundary area), thus optimizing a clearer decision boundary for each category. We conduct extensive experiments to demonstrate the effectiveness of our proposed method. Our method achieves consistent improvement on the Pascal VOC and MS COCO datasets based on DeFRCN and MFDC baselines.	翻訳日:2024-01-02 19:54:10 公開日:2023-12-29
# PanGu-Draw: 時間分割学習と再利用可能なクープ拡散による資源効率の良いテキスト・画像合成 PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion ( http://arxiv.org/abs/2312.16486v2 ) ライセンス: Link先を確認	Guansong Lu, Yuanfan Guo, Jianhua Han, Minzhe Niu, Yihan Zeng, Songcen Xu, Zeyi Huang, Zhao Zhong, Wei Zhang, Hang Xu	(参考訳) 現在の大規模拡散モデルは条件付き画像合成において大きな飛躍を示しており、テキスト、人間のポーズ、エッジといった多様な手がかりを解釈することができる。しかし、計算資源や膨大なデータ収集への依存は依然としてボトルネックとなっている。一方で、異なる制御とユニークな潜在空間での操作に特化した既存の拡散モデルの統合は、互換性のない画像解像度と潜在空間埋め込み構造のために課題となり、共同使用を妨げている。これらの制約に対処するため,複数の制御信号に対応可能な資源効率の高いテキスト・画像合成のための新しい潜時拡散モデルPanGu-Drawを提案する。まず,モノリシックなテキストから画像へのモデルを構造とテクスチャ生成器に分割した,リソース効率の高い時間分離トレーニング戦略を提案する。各ジェネレータは、データ利用と計算効率を最大化し、データ準備を48%削減し、トレーニングリソースを51%削減するレジームを使用してトレーニングされる。次に,異なる潜在空間と事前定義された分解能を持つ様々な事前学習拡散モデルの協調的利用を可能にするアルゴリズムであるcoop-diffusionを提案する。これにより、追加データや再トレーニングを必要とせず、任意の解像度でマルチコントロール画像合成が可能となる。 pangu-drawの実証的検証は、テキスト対画像およびマルチコントロール画像生成における例外的な能力を示し、将来のモデルのトレーニング効率と世代の汎用性に有望な方向を示している。最大の5B T2I PanGu-DrawモデルはAscendプラットフォームでリリースされた。プロジェクトページ:$\href{https://pangu-draw.github.io}{this~https~url}$ Current large-scale diffusion models represent a giant leap forward in conditional image synthesis, capable of interpreting diverse cues like text, human poses, and edges. However, their reliance on substantial computational resources and extensive data collection remains a bottleneck. On the other hand, the integration of existing diffusion models, each specialized for different controls and operating in unique latent spaces, poses a challenge due to incompatible image resolutions and latent space embedding structures, hindering their joint use. Addressing these constraints, we present "PanGu-Draw", a novel latent diffusion model designed for resource-efficient text-to-image synthesis that adeptly accommodates multiple control signals. We first propose a resource-efficient Time-Decoupling Training Strategy, which splits the monolithic text-to-image model into structure and texture generators. Each generator is trained using a regimen that maximizes data utilization and computational efficiency, cutting data preparation by 48% and reducing training resources by 51%. Secondly, we introduce "Coop-Diffusion", an algorithm that enables the cooperative use of various pre-trained diffusion models with different latent spaces and predefined resolutions within a unified denoising process. This allows for multi-control image synthesis at arbitrary resolutions without the necessity for additional data or retraining. Empirical validations of Pangu-Draw show its exceptional prowess in text-to-image and multi-control image generation, suggesting a promising direction for future model training efficiencies and generation versatility. The largest 5B T2I PanGu-Draw model is released on the Ascend platform. Project page: $\href{https://pangu-draw.github.io}{this~https~URL}$	翻訳日:2024-01-02 19:53:39 公開日:2023-12-29
# LLMファクトスコープ:内部状態解析によるLLMのFactual Discernmentの発見 LLM Factoscope: Uncovering LLMs' Factual Discernment through Inner States Analysis ( http://arxiv.org/abs/2312.16374v2 ) ライセンス: Link先を確認	Jinwen He, Yujia Gong, Kai Chen, Zijin Lin, Chengan Wei, Yue Zhao	(参考訳) 大規模言語モデル(llm)は、幅広い知識と創造性を備えた様々なドメインに革命をもたらした。しかし、LLMにおける重要な問題は、現実と異なる出力を生成する傾向にある。この現象は、正確性が最重要である医療相談や法的助言のような敏感な応用において特に関係している。本稿では,llmの内部状態を利用して事実検出を行う新しいシャムネットワークモデルであるllmfactoscopeを提案する。本研究は,LLMの内部状態における実物と非実物との区別可能なパターンを明らかにする。我々は,様々なアーキテクチャにおけるllmファクトスコープの有効性を実証し,96%以上の精度を実現した。本研究は, LLMの内部状態を事実検出に活用するための新たな道を開き, 信頼性と透明性を高めるため, LLMの内部動作のさらなる探索を奨励する。 Large Language Models (LLMs) have revolutionized various domains with extensive knowledge and creative capabilities. However, a critical issue with LLMs is their tendency to produce outputs that diverge from factual reality. This phenomenon is particularly concerning in sensitive applications such as medical consultation and legal advice, where accuracy is paramount. In this paper, we introduce the LLM factoscope, a novel Siamese network-based model that leverages the inner states of LLMs for factual detection. Our investigation reveals distinguishable patterns in LLMs' inner states when generating factual versus non-factual content. We demonstrate the LLM factoscope's effectiveness across various architectures, achieving over 96% accuracy in factual detection. Our work opens a new avenue for utilizing LLMs' inner states for factual detection and encourages further exploration into LLMs' inner workings for enhanced reliability and transparency.	翻訳日:2024-01-02 19:51:56 公開日:2023-12-29
# 電磁界と相互作用する加速原子の絡み合いダイナミクス Entanglement dynamics of accelerated atoms interacting with the Electromagnetic Field ( http://arxiv.org/abs/2312.16342v2 ) ライセンス: Link先を確認	M. S. Soares, N. F. Svaiter and G. Menezes	(参考訳) 開量子系の理論を用いたエンタングルメント力学における加速度の影響について検討する。このシナリオでは、異なる適切な時間で異なる双曲軌道に沿って移動する2つの原子を考える。一般化マスター方程式は、電磁場と相互作用する双極子対に使用される。本研究は, エンタングルメント収穫や急激な死現象において, 適切な加速が重要な役割を担っていることを観察し, 原子の偏光がこの結果に与える影響について検討する。 We study the effects of acceleration in entanglement dynamics using the theory of open quantum systems. In this scenario we consider two atoms moving along different hyperbolic trajectories with different proper times. The generalized master equation is used for a pair of dipoles interacting with the electromagnetic field. We observe that the proper acceleration plays an essential role in the entanglement harvesting and sudden death phenomenom and we study how the polarization of the atoms affects this results.	翻訳日:2024-01-02 19:51:31 公開日:2023-12-29
# DL3DV-10K:ディープラーニングに基づく3Dビジョンのための大規模シーンデータセット DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision ( http://arxiv.org/abs/2312.16256v2 ) ライセンス: Link先を確認	Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera	(参考訳) 我々は、ニューラルレイディアンス場(NeRF)に基づく3次元表現学習から、新しいビュー合成(NVS)への応用まで、ディープラーニングに基づく3次元視覚の進歩を目の当たりにしてきた。しかし、ディープラーニングに基づく3Dビジョンのための既存のシーンレベルのデータセットは、合成環境か現実世界のシーンの限られた選択に限られており、非常に不十分である。この不十分さは、既存の方法の包括的なベンチマークを妨げるだけでなく、深層学習に基づく3d分析で探せることの欠如を損なう。この重要なギャップに対処するため、DL3DV-10Kは大規模なシーンデータセットで、65種類のPOI(point-of-interest)位置から撮影された10,510の動画から51.2万フレームを特徴としている。我々は, DL3DV-10Kにおける最近のNVS手法の総合的なベンチマークを行い, 今後のNVS研究に有用な知見を明らかにした。さらに, DL3DV-10Kから一般化可能なNeRFを学習するためのパイロット実験の結果を得た。私たちのDL3DV-10Kデータセット、ベンチマーク結果、モデルはhttps://dl3dv-10k.github.io/DL3DV-10K/で公開されます。 We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However, existing scene-level datasets for deep learning-based 3D vision, limited to either synthetic environments or a narrow selection of real-world scenes, are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap, we present DL3DV-10K, a large-scale scene dataset, featuring 51.2 million frames from 10,510 videos captured from 65 types of point-of-interest (POI) locations, covering both bounded and unbounded scenes, with different levels of reflection, transparency, and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K, which revealed valuable insights for future research in NVS. In addition, we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K, which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation. Our DL3DV-10K dataset, benchmark results, and models will be publicly accessible at https://dl3dv-10k.github.io/DL3DV-10K/.	翻訳日:2024-01-02 19:51:21 公開日:2023-12-29
# 弾力性制約強化学習 Resilient Constrained Reinforcement Learning ( http://arxiv.org/abs/2312.17194v2 ) ライセンス: Link先を確認	Dongsheng Ding and Zhengyan Huan and Alejandro Ribeiro	(参考訳) 本研究では,複数の制約仕様をトレーニング前に特定しない制約強化学習(rl)問題のクラスについて検討する。報酬最大化目標と制約満足度との間に不明確なトレードオフがあるため、適切な制約仕様を特定することは困難である。この問題に対処するために、ポリシーと制約仕様を一緒に検索する新しい制約付きRLアプローチを提案する。本手法は、学習目的に導入される緩和コストに応じて制約を緩和する適応を特徴とする。この特徴は、生態系が操作を変えることによって破壊に適応する様子を模倣するので、我々のアプローチは弾力性制約付きRLと呼ばれる。具体的には、制約満足度と弾力性均衡の概念による報酬の最大化を両立させる十分条件を提供し、この均衡を最適解とする弾力性制約性ポリシー最適化の扱いやすい定式化を提案し、最適性ギャップと制約満足度に対する非漸近収束性保証を持つ2つの弾力性制約付きポリシー探索アルゴリズムを提唱する。さらに,計算実験において,本手法の有効性と有効性を示す。 We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before training. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward maximization objective and the constraint satisfaction, which is ubiquitous in constrained decision-making. To tackle this issue, we propose a new constrained RL approach that searches for policy and constraint specifications together. This method features the adaptation of relaxing the constraint according to a relaxation cost introduced in the learning objective. Since this feature mimics how ecological systems adapt to disruptions by altering operation, our approach is termed as resilient constrained RL. Specifically, we provide a set of sufficient conditions that balance the constraint satisfaction and the reward maximization in notion of resilient equilibrium, propose a tractable formulation of resilient constrained policy optimization that takes this equilibrium as an optimal solution, and advocate two resilient constrained policy search algorithms with non-asymptotic convergence guarantees on the optimality gap and constraint satisfaction. Furthermore, we demonstrate the merits and the effectiveness of our approach in computational experiments.	翻訳日:2024-01-02 19:08:28 公開日:2023-12-29
# DreamGaussian4D: 4Dガウシアンスプラッティング DreamGaussian4D: Generative 4D Gaussian Splatting ( http://arxiv.org/abs/2312.17142v2 ) ライセンス: Link先を確認	Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu	(参考訳) 最近、4Dコンテンツ生成で顕著な進歩を遂げた。しかし、既存の手法では、最適化時間が長く、動作制御性が欠如しており、詳細度が低い。本稿では,4次元ガウス分割表現に基づく効率的な4D生成フレームワークであるDreamGaussian4Dを紹介する。我々の重要な洞察は、ガウススプラッティングにおける空間変換の明示的なモデリングは、暗黙の表現よりも4次元生成設定に適しているということである。 dreamgaussian4dは最適化時間を数時間から数分に短縮し、生成された3dモーションを柔軟に制御し、3dエンジンで効率的にレンダリングできるアニメーションメッシュを生成する。 Remarkable progress has been made in 4D content generation recently. However, existing methods suffer from long optimization time, lack of motion controllability, and a low level of detail. In this paper, we introduce DreamGaussian4D, an efficient 4D generation framework that builds on 4D Gaussian Splatting representation. Our key insight is that the explicit modeling of spatial transformations in Gaussian Splatting makes it more suitable for the 4D generation setting compared with implicit representations. DreamGaussian4D reduces the optimization time from several hours to just a few minutes, allows flexible control of the generated 3D motion, and produces animated meshes that can be efficiently rendered in 3D engines.	翻訳日:2024-01-02 19:07:46 公開日:2023-12-29
# ARTrackV2: 自動回帰トラッカーの表示方法と説明方法 ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe ( http://arxiv.org/abs/2312.17133v2 ) ライセンス: Link先を確認	Yifan Bai, Zeyang Zhao, Yihong Gong, Xing Wei	(参考訳) ARTrackV2は、追跡の2つの重要な側面、すなわち、どこを見るか(ローカライゼーション)と、ターゲットオブジェクトをビデオフレーム間でどのように記述するか(外観分析)の2点を統合する。 artrackv2は、前者の基盤を基盤として、オブジェクトの軌跡を「読み出し」し、その外観を自己回帰的に「書き直す」ための統一的な生成フレームワークを導入することで、概念を拡張している。このアプローチは、動きと視覚的特徴の合同進化をモデル化する時間連続的な方法論を育む。さらに、ARTrackV2はその効率性と単純さで際立つもので、フレーム内オートレグレッションの低さと外観更新のための手動パラメータを回避している。そのシンプルさにもかかわらず、artrackv2は、既存のベンチマークデータセットで最先端のパフォーマンスを実現し、優れた効率性を示している。特にARTrackV2は、GOT-10kで79.5\%、TrackingNetで86.1\%のAOスコアを達成し、ARTrackより3.6 \times$速い。コードはリリースされます。 We present ARTrackV2, which integrates two pivotal aspects of tracking: determining where to look (localization) and how to describe (appearance analysis) the target object across video frames. Building on the foundation of its predecessor, ARTrackV2 extends the concept by introducing a unified generative framework to "read out" object's trajectory and "retell" its appearance in an autoregressive manner. This approach fosters a time-continuous methodology that models the joint evolution of motion and visual features, guided by previous estimates. Furthermore, ARTrackV2 stands out for its efficiency and simplicity, obviating the less efficient intra-frame autoregression and hand-tuned parameters for appearance updates. Despite its simplicity, ARTrackV2 achieves state-of-the-art performance on prevailing benchmark datasets while demonstrating remarkable efficiency improvement. In particular, ARTrackV2 achieves AO score of 79.5\% on GOT-10k, and AUC of 86.1\% on TrackingNet while being $3.6 \times$ faster than ARTrack. The code will be released.	翻訳日:2024-01-02 19:07:35 公開日:2023-12-29
# 因果決定のための大規模言語モデル Large Language Model for Causal Decision Making ( http://arxiv.org/abs/2312.17122v2 ) ライセンス: Link先を確認	Haitao Jiang, Lin Ge, Yuhe Gao, Jianian Wang, Rui Song	(参考訳) 大規模言語モデル(llm)は、一般的なトピックに対する言語理解と推論の成功を示している。しかし、因果決定のようなコーパス・レア概念におけるユーザ特定構造化データと知識に基づく推論能力はまだ限られている。本研究では,LLM を LLM4Causal に微調整することで,因果的タスクを識別し,対応する関数を実行し,ユーザのクエリと提供されるデータセットに基づいてその数値結果を解釈できる可能性を検討する。一方,より制御可能なgptプロンプトのためのデータ生成プロセスを提案し,(1)因果問題識別のためのcausal-retrieval-benchと因果関数呼び出しのための入力パラメータ抽出,(2)文脈内因果解釈のためのcausal-interpret-benchの2つの命令チューニングデータセットを提案する。 3つのケーススタディで、llm4causalは因果問題に対するエンドツーエンドソリューションを提供し、理解しやすい回答を提供できることを示した。数値研究では、クエリによって与えられた正しい因果タスクを識別する能力も明らかにされている。 Large Language Models (LLMs) have shown their success in language understanding and reasoning on general topics. However, their capability to inference based on user-specified structured data and knowledge in corpus-rare concepts like causal decision-making is still limited. In this work, we explore the possibility of fine-tuning an open-sourced LLM into LLM4Causal, which can identify the causal task, execute a corresponding function, and interpret its numerical results based on users' queries and the provided dataset. Meanwhile, we propose a data generation process for more controllable GPT prompting and present two instruction-tuning datasets: (1) Causal-Retrieval-Bench for causal problem identification and input parameter extraction for causal function calling and (2) Causal-Interpret-Bench for in-context causal interpretation. With three case studies, we showed that LLM4Causal can deliver end-to-end solutions for causal problems and provide easy-to-understand answers. Numerical studies also reveal that it has a remarkable ability to identify the correct causal task given a query.	翻訳日:2024-01-02 19:07:15 公開日:2023-12-29
# 完全スパース3次元パノプティカル占有予測 Fully Sparse 3D Panoptic Occupancy Prediction ( http://arxiv.org/abs/2312.17118v2 ) ライセンス: Link先を確認	Haisong Liu, Haiguang Wang, Yang Chen, Zetong Yang, Jia Zeng, Li Chen, Limin Wang	(参考訳) 運転予測は自動運転の領域において重要な役割を果たす。従来の手法では、通常、密集した3Dボリュームを構築し、シーン固有の空間を無視し、高い計算コストをもたらす。さらに、これらの手法は意味的占有に限られており、異なるインスタンスを区別できない。そこで本研究では,スパルスOccと呼ばれる,スパルスなパン光学占有ネットワークを新たに導入する。 SparseOccは最初、視覚入力からスパース3D表現を再構築する。その後、スパースインスタンスクエリを使用して、スパース3D表現から各オブジェクトインスタンスを予測する。これらのインスタンスクエリはマスク誘導スパースサンプリングを介して2次元特徴と相互作用するため、コストのかかる高密度特徴やグローバルな注意を回避できる。さらに、視覚中心のpanoptic占有率ベンチマークを初めて確立しました。 SparseOccはその有効性をOcc3D-nusデータセットで示し、平均的な連邦間(mIoU)を26.0で達成し、リアルタイムの推論速度は25.4 FPSを維持した。 SparseOccは、前の8フレームから時間的モデリングを取り入れることで、その性能をさらに向上させ、30.9 mIoUをホイッスルやベルなしで達成した。コードは利用可能になる。 Occupancy prediction plays a pivotal role in the realm of autonomous driving. Previous methods typically constructs a dense 3D volume, neglecting the inherent sparsity of the scene, which results in a high computational cost. Furthermore, these methods are limited to semantic occupancy and fail to differentiate between distinct instances. To exploit the sparsity property and ensure instance-awareness, we introduce a novel fully sparse panoptic occupancy network, termed SparseOcc. SparseOcc initially reconstructs a sparse 3D representation from visual inputs. Subsequently, it employs sparse instance queries to predict each object instance from the sparse 3D representation. These instance queries interact with 2D features via mask-guided sparse sampling, thereby circumventing the need for costly dense features or global attention. Additionally, we have established the first-ever vision-centric panoptic occupancy benchmark. SparseOcc demonstrates its efficacy on the Occ3D-nus dataset by achieving a mean Intersection over Union (mIoU) of 26.0, while maintaining a real-time inference speed of 25.4 FPS. By incorporating temporal modeling from the preceding 8 frames, SparseOcc further improves its performance, achieving 30.9 mIoU without whistles and bells. Code will be made available.	翻訳日:2024-01-02 19:06:53 公開日:2023-12-29
# 変圧器の長さ外挿:位置符号化の観点から Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding ( http://arxiv.org/abs/2312.17044v2 ) ライセンス: Link先を確認	Liang Zhao, Xiaocheng Feng, Xiachong Feng, Bing Qin, Ting Liu	(参考訳) Transformerは、シークエンスにおける複雑な依存関係をモデル化する優れた能力のため、誕生以来、自然言語処理(NLP)の分野を嵐によって捉えてきた。ほぼ全てのnlpタスクにおけるトランスフォーマーに基づく事前学習言語モデル(plms)の成功にもかかわらず、それらはすべて事前設定された長さ制限に苦しめられており、この成功は、見かけのデータを超えた長いシーケンス、すなわち長さの補間問題にまで拡張することができない。長さ外挿は人間の言語能力の中核的な特徴であるため、研究者の間で大きな関心を集めている。トランスフォーマーの長さ外挿を強化するため,多くの手法が提案され,主に外挿可能な位置符号化に焦点が当てられている。本稿では,既存の手法をより深く理解し,今後の研究に刺激を与えることを目的として,位置符号化の観点から,これらの研究成果を統一的な表記法として整理的かつ体系的に検討する。 Transformer has taken the natural language processing (NLP) field by storm since birth, owing to its superior ability to model complex dependencies in sequences. Despite the great success of pretrained language models (PLMs) based on Transformer across almost all NLP tasks, they all suffer from a preset length limit and thus can hardly extend this success to longer sequences beyond seen data, namely the length extrapolation problem. Length extrapolation has aroused great interest among researchers, as it is the core feature of human language capacity. To enhance length extrapolation of Transformers, a plethora of methods have been proposed, mostly focusing on extrapolatable position encodings. In this article, we provide an organized and systematical review of these research efforts in a unified notation from a position encoding perspective, aiming to enable the reader to gain a deep understanding of existing methods and provide stimuli for future research.	翻訳日:2024-01-02 19:06:29 公開日:2023-12-29
# ソフトウェア開発エージェントの体験的共同学習 Experiential Co-Learning of Software-Developing Agents ( http://arxiv.org/abs/2312.17025v2 ) ライセンス: Link先を確認	Chen Qian and Yufan Dang and Jiahao Li and Wei Liu and Weize Chen and Cheng Yang and Zhiyuan Liu and Maosong Sun	(参考訳) 近年の大規模言語モデル(llm)の発展は、特にllm駆動の自律エージェントを通じて、様々なドメインに大きな変化をもたらした。これらのエージェントは、シームレスに協調し、タスクを分割し、精度を高め、人間の関与の必要性を最小限に抑えることができる。しかし、これらのエージェントはしばしば、過去の経験から利益を得ることなく、独立した様々なタスクにアプローチする。この分離は、タスク解決における繰り返しのミスや非効率な試行につながる可能性がある。そこで,本稿では,教師とアシスタントエージェントが過去の軌跡からショートカット指向の体験を収集し,過去の経験を相互推論に利用するための新しい枠組みであるExperiential Co-Learningを紹介する。このパラダイムは、以前の経験に富んだもので、エージェントに見えないタスクをより効果的に対処させる。 Recent advancements in large language models (LLMs) have brought significant changes to various domains, especially through LLM-driven autonomous agents. These agents are now capable of collaborating seamlessly, splitting tasks and enhancing accuracy, thus minimizing the need for human involvement. However, these agents often approach a diverse range of tasks in isolation, without benefiting from past experiences. This isolation can lead to repeated mistakes and inefficient trials in task solving. To this end, this paper introduces Experiential Co-Learning, a novel framework in which instructor and assistant agents gather shortcut-oriented experiences from their historical trajectories and use these past experiences for mutual reasoning. This paradigm, enriched with previous experiences, equips agents to more effectively address unseen tasks.	翻訳日:2024-01-02 19:06:09 公開日:2023-12-29
# FFCA-Net:サイド情報の高速カスケードアライメントによるステレオ画像圧縮 FFCA-Net: Stereo Image Compression via Fast Cascade Alignment of Side Information ( http://arxiv.org/abs/2312.16963v2 ) ライセンス: Link先を確認	Yichong Xia, Yujun Huang, Bin Chen, Haoqian Wang, Yaowei Wang	(参考訳) マルチビュー圧縮技術、特にステレオ画像圧縮(SIC)は、車載カメラや3D関連アプリケーションにおいて重要な役割を果たす。興味深いことに、分散ソース符号化(DSC)理論は、独立符号化と共同復号によって相関ソースの効率的なデータ圧縮を実現することができることを示唆している。これは近年急速に発展してきた分散SIC手法を動機付けている。しかし、これらのアプローチはステレオ撮影タスクのユニークな特徴を無視し、高い復号遅延を引き起こす。この制限に対処するために,デコーダの側情報を完全に活用する機能ベースの高速カスケードアライメントネットワーク(FFCA-Net)を提案する。 FFCAは粗大なカスケードアライメントアプローチを採用する。最初の段階では、FFCAはステレオプリミティブに基づいたフィーチャードメインパッチマッチングモジュールを使用する。このモジュールは、自明なマッチング手法の探索空間における冗長性を低減し、さらにノイズの導入を緩和する。その後の段階では、時間ガラスを用いたスパースステレオリファインメントネットワークを用いて、画像間特徴を計算コストの削減とともに調整する。さらに,FFF(Fast Feature Fusion Network)と呼ばれる軽量かつ高性能な機能融合ネットワークを考案し,その特徴をデコードした。 InStereo2K,KITTI,Cityscapesのデータセットによる実験結果から,従来のSIC手法よりもアプローチが優れていることが示された。特に,提案手法は,他の手法よりも3倍から10倍高速な復号化を実現する。 Multi-view compression technology, especially Stereo Image Compression (SIC), plays a crucial role in car-mounted cameras and 3D-related applications. Interestingly, the Distributed Source Coding (DSC) theory suggests that efficient data compression of correlated sources can be achieved through independent encoding and joint decoding. This motivates the rapidly developed deep-distributed SIC methods in recent years. However, these approaches neglect the unique characteristics of stereo-imaging tasks and incur high decoding latency. To address this limitation, we propose a Feature-based Fast Cascade Alignment network (FFCA-Net) to fully leverage the side information on the decoder. FFCA adopts a coarse-to-fine cascaded alignment approach. In the initial stage, FFCA utilizes a feature domain patch-matching module based on stereo priors. This module reduces redundancy in the search space of trivial matching methods and further mitigates the introduction of noise. In the subsequent stage, we utilize an hourglass-based sparse stereo refinement network to further align inter-image features with a reduced computational cost. Furthermore, we have devised a lightweight yet high-performance feature fusion network, called a Fast Feature Fusion network (FFF), to decode the aligned features. Experimental results on InStereo2K, KITTI, and Cityscapes datasets demonstrate the significant superiority of our approach over traditional and learning-based SIC methods. In particular, our approach achieves significant gains in terms of 3 to 10-fold faster decoding speed than other methods.	翻訳日:2024-01-02 19:05:55 公開日:2023-12-29
# 認知図面の自動描画:金標準に対する機械的スコアの品質評価 Automatic Scoring of Cognition Drawings: Assessing the Quality of Machine-Based Scores Against a Gold Standard ( http://arxiv.org/abs/2312.16887v2 ) ライセンス: Link先を確認	Arne Bethmann, Marina Aoki, Charlotte Hunsicker, Claudia Weileder	(参考訳) 図面描画はしばしば認知症スクリーニングプロトコルの一部として使われる。 The Survey of Health Aging and Retirement in Europe (SHARE)は、認知に関する調査モジュールの一部として、Addenbrooke's Cognitive Examination IIIの3つの図面テストを採用した。図面は通常訓練を受けた臨床医が採点するが、shareは面接を行う対面面接者を使ってフィールドワーク中に図面を採点する。インタビュアーはスコアの一貫性が低く、臨床訓練の欠如によりエラーを起こしやすいため、これはデータ品質にリスクをもたらす可能性がある。そこで本稿では,最初の概念実証を報告し,ディープラーニングを用いたスコアリングの自動化の可能性について評価する。我々は,ドイツにおける第8波から約2,000枚の図面とそれに対応するインタビュアースコア,および自己開発した'ゴールドスタンダード'スコアを用いて,いくつかの異なる畳み込みニューラルネットワーク(cnn)モデルを訓練した。結果は、このアプローチが実際に実現可能であることを示唆している。インタビュアースコアのトレーニングと比較すると、ゴールド標準データに基づいてトレーニングされたモデルは、予測精度を約10ポイント向上する。最高のパフォーマンスモデルであるconvnext baseは、約85%の精度を実現している。これは有望な結果であるが、モデルはまだ部分的に正しい図面を得るのに苦労しており、これはインタビュアーにとっても問題となっている。これは、プロダクションレベルの予測精度を達成するために、より優れたトレーニングデータが必要であることを示唆している。したがって、トレーニング例の品質と量を改善するための次のステップについて議論する。 Figure drawing is often used as part of dementia screening protocols. The Survey of Health Aging and Retirement in Europe (SHARE) has adopted three drawing tests from Addenbrooke's Cognitive Examination III as part of its questionnaire module on cognition. While the drawings are usually scored by trained clinicians, SHARE uses the face-to-face interviewers who conduct the interviews to score the drawings during fieldwork. This may pose a risk to data quality, as interviewers may be less consistent in their scoring and more likely to make errors due to their lack of clinical training. This paper therefore reports a first proof of concept and evaluates the feasibility of automating scoring using deep learning. We train several different convolutional neural network (CNN) models using about 2,000 drawings from the 8th wave of the SHARE panel in Germany and the corresponding interviewer scores, as well as self-developed 'gold standard' scores. The results suggest that this approach is indeed feasible. Compared to training on interviewer scores, models trained on the gold standard data improve prediction accuracy by about 10 percentage points. The best performing model, ConvNeXt Base, achieves an accuracy of about 85%, which is 5 percentage points higher than the accuracy of the interviewers. While this is a promising result, the models still struggle to score partially correct drawings, which are also problematic for interviewers. This suggests that more and better training data is needed to achieve production-level prediction accuracy. We therefore discuss possible next steps to improve the quality and quantity of training examples.	翻訳日:2024-01-02 19:05:31 公開日:2023-12-29
# ClST:知識蒸留による自動変調認識のための畳み込みトランスフォーマフレームワーク ClST: A Convolutional Transformer Framework for Automatic Modulation Recognition by Knowledge Distillation ( http://arxiv.org/abs/2312.17446v1 ) ライセンス: Link先を確認	Dongbin Hou, Lixin Li, Wensheng Lin, Junli Liang, Zhu Han	(参考訳) 近年のディープラーニング (DL) の急速な発展に伴い, DLを用いた自動変調認識 (AMR) の精度が向上した。しかし、複雑なチャネル環境や大規模DLモデルにおける訓練信号データ不足は、DL手法の展開を難しくする重要な要因である。そこで,本研究では,畳み込み結合信号変換(clst)と呼ばれる新しいニューラルネットワークと,信号知識蒸留(skd)と呼ばれる新しい知識蒸留法を提案する。 ClSTは、畳み込みを含むトランスフォーマー階層、平行空間チャネルアテンション(PSCA)機構と呼ばれる新しいアテンション機構、畳み込み-トランスフォーマープロジェクション(CTP)と呼ばれる新しい畳み込みトランスフォーマーブロックの3つの主要な修正によって達成される。 SKDは、ニューラルネットワークのパラメータと複雑さを効果的に削減する知識蒸留法である。 2つの軽量ニューラルネットワークをskdアルゴリズム、kd-cnnとkd-mobilenetを用いてトレーニングし、ニューラルネットワークを小型デバイスで使用できるというニーズを満たす。シミュレーションの結果、clstはすべてのデータセットで高度なニューラルネットワークを上回ることがわかった。さらに、kd-cnnとkd-mobilenetは、ネットワークの複雑さを少なくして高い認識精度を得られるため、小型通信デバイスへのamrの展開に非常に有用である。 With the rapid development of deep learning (DL) in recent years, automatic modulation recognition (AMR) with DL has achieved high accuracy. However, insufficient training signal data in complicated channel environments and large-scale DL models are critical factors that make DL methods difficult to deploy in practice. Aiming to these problems, we propose a novel neural network named convolution-linked signal transformer (ClST) and a novel knowledge distillation method named signal knowledge distillation (SKD). The ClST is accomplished through three primary modifications: a hierarchy of transformer containing convolution, a novel attention mechanism named parallel spatial-channel attention (PSCA) mechanism and a novel convolutional transformer block named convolution-transformer projection (CTP) to leverage a convolutional projection. The SKD is a knowledge distillation method to effectively reduce the parameters and complexity of neural networks. We train two lightweight neural networks using the SKD algorithm, KD-CNN and KD-MobileNet, to meet the demand that neural networks can be used on miniaturized devices. The simulation results demonstrate that the ClST outperforms advanced neural networks on all datasets. Moreover, both KD-CNN and KD-MobileNet obtain higher recognition accuracy with less network complexity, which is very beneficial for the deployment of AMR on miniaturized communication devices.	翻訳日:2024-01-02 14:07:25 公開日:2023-12-29
# SMoT: ステートマシンについて考える SMoT: Think in State Machine ( http://arxiv.org/abs/2312.17445v1 ) ライセンス: Link先を確認	Jia Liu, Jie Shuai	(参考訳) 言語モデル推論の現在の推進的アプローチは、主に言語モデル(LLM)による推論経路の自律的な探索に依存しており、誤った経路に遭遇した場合、避けられない追跡操作に直面している。これに続いて、代替の推論経路が追求される。しかしながら、人間は問題から最適解を抽象化することに長けており、同様の問題解決のための迅速かつ正確な推論を容易にする。これを踏まえ、私たちは専門家の知識を活用してLLM内の問題解決を強化する可能性について検討する。我々は,LLMを効率的な推論経路で表現し,無作為な探索をなくすために,事前定義された状態マシンを利用する新しいパラダイムであるState Machine of Thought(SMoT)を導入する。さらに,エージェントに異なる目的を割り当てるマルチエージェント機構を提案し,SMoT推論の精度を高めることを目的とした。アレイ推論タスクから導かれた実験結果から,SMoTが95%の異常精度を実現し,最先端のベースラインの性能を上回ることがわかった。 Current prompting approach for language model inference mainly rely on Language Model's (LLM) autonomous exploration of reasoning paths, confronts an inevitable retracing operation when erroneous routes are encountered. This is followed by the pursuit of alternative reasoning paths. However, humans are adept at abstracting optimal solutions from problems, thereby facilitating swift and precise reasoning for similar problems resolution. In light of this, we delves into the potential of harnessing expert knowledge to enhance problem-solving within LLMs. We introduce a novel paradigm, the State Machine of Thought (SMoT), which employs predefined state machines to furnish LLMs with efficient reasoning paths, thereby eliminating fruitless exploration. Furthermore, we propose a multi-agent mechanism that assigns different objectives to agents, aiming to enhance the accuracy of SMoT reasoning. The experimental results, derived from an array reasoning task, reveal that SMoT realizes an extraordinary accuracy of 95\%, surpassing the performance of the state-of-the-art baselines.	翻訳日:2024-01-02 14:06:59 公開日:2023-12-29
# ハトの穴から抜け出す:レコメンデーションシステムにおけるミスカバリレーション、バイアス、ステレオタイプを調べるための統一フレームワーク Break Out of a Pigeonhole: A Unified Framework for Examining Miscalibration, Bias, and Stereotype in Recommender Systems ( http://arxiv.org/abs/2312.17443v1 ) ライセンス: Link先を確認	Yongsu Ahn and Yu-Ru Lin	(参考訳) 利用者のニーズに合わせて商品や情報をパーソナライズすることの利点にもかかわらず、推薦システムは人気アイテムや特定のカテゴリーのアイテムや支配的なユーザーグループに有利なバイアスをもたらす傾向がある。本研究では,レコメンデーションシステムの体系的誤りと,ステレオタイプやバイアス,誤校正など,さまざまな説明責任問題にどのように現れるかを明らかにすることを目的とする。本稿では,予測誤りの原因を,個人レベルでも集団レベルでも,様々な種類のシステム誘発効果を定量化する重要な指標の集合に識別する統合フレームワークを提案する。評価の枠組みに基づき,映画推薦の文脈において最も広く採用されているアルゴリズムについて検討した。 1) アルゴリズムの違い: 単純なアルゴリズムによって生成されるレコメンデーションは、より複雑なアルゴリズムによって生成されるものよりもステレオタイプ的であるが、バイアスが少ない傾向にある。 2) グループや個人に対する異なる影響: システムによる偏見とステレオタイプは非定型的ユーザや少数派(女性や高齢者など)に不均等な影響を及ぼす。 3) 緩和機会: 構造方程式モデリングを用いて, ユーザ特性(典型的・多様性), システム誘発効果, 誤校正の相互作用を同定する。また,ステレオタイプ低減や推奨品質の向上に有効な過小評価されたグループや個人を過小評価することで,システム誘発効果の軽減の可能性についても検討した。本研究は,レコメンダシステムにおけるシステム誘発効果とミスキャリブレーションだけでなく,ステレオタイプ問題も体系的に検討した最初の研究である。 Despite the benefits of personalizing items and information tailored to users' needs, it has been found that recommender systems tend to introduce biases that favor popular items or certain categories of items, and dominant user groups. In this study, we aim to characterize the systematic errors of a recommendation system and how they manifest in various accountability issues, such as stereotypes, biases, and miscalibration. We propose a unified framework that distinguishes the sources of prediction errors into a set of key measures that quantify the various types of system-induced effects, both at the individual and collective levels. Based on our measuring framework, we examine the most widely adopted algorithms in the context of movie recommendation. Our research reveals three important findings: (1) Differences between algorithms: recommendations generated by simpler algorithms tend to be more stereotypical but less biased than those generated by more complex algorithms. (2) Disparate impact on groups and individuals: system-induced biases and stereotypes have a disproportionate effect on atypical users and minority groups (e.g., women and older users). (3) Mitigation opportunity: using structural equation modeling, we identify the interactions between user characteristics (typicality and diversity), system-induced effects, and miscalibration. We further investigate the possibility of mitigating system-induced effects by oversampling underrepresented groups and individuals, which was found to be effective in reducing stereotypes and improving recommendation quality. Our research is the first systematic examination of not only system-induced effects and miscalibration but also the stereotyping issue in recommender systems.	翻訳日:2024-01-02 14:06:40 公開日:2023-12-29
# ウィグダーソンズの不確実性原理へのアプローチについて On Wigdersons' approach to the uncertainty principle ( http://arxiv.org/abs/2312.17438v1 ) ライセンス: Link先を確認	Nuno Costa Dias, Franz Luef and Jo\~ao Nuno Prata	(参考訳) 我々は、A. Wigderson と Y. Wigderson が提案する不確実性原理を再考する。このアプローチは、時間と周波数の同時的急激な局所化の不確かさを表すいくつかの不等式を導出できる主要な不確実性原理に基づいている。さらに、フーリエ変換の特別な性質は必要とせず、したがって一次不確実性原理を満たすすべての作用素に容易に適用できる。 A. Wigderson と Y. Wigderson も高次元への多くの一般化を提案し、本論文で述べるいくつかの予想を述べた。我々は,著者が提案する結果を証明するためには,より一般的な初等不確実性原理を考える必要があると論じる。副産物として、カウリング・プライス不確実性原理に類似した新たな不等式を求め、一次不確実性原理からエントロピー不確実性原理を導出する。 We revisit the uncertainty principle from the point of view suggested by A. Wigderson and Y. Wigderson. This approach is based on a primary uncertainty principle from which one can derive several inequalities expressing the impossibility of a simultaneous sharp localization in time and frequency. Moreover, it requires no specific properties of the Fourier transform and can therefore be easily applied to all operators satisfying the primary uncertainty principle. A. Wigderson and Y. Wigderson also suggested many generalizations to higher dimensions and stated several conjectures which we address in the present paper. We argue that we have to consider a more general primary uncertainty principle to prove the results suggested by the authors. As a by-product we obtain some new inequalities akin to the Cowling-Price uncertainty principle and derive the entropic uncertainty principle from the primary uncertainty principles.	翻訳日:2024-01-02 14:06:09 公開日:2023-12-29
# 大規模言語モデルによるビデオ理解:調査 Video Understanding with Large Language Models: A Survey ( http://arxiv.org/abs/2312.17432v1 ) ライセンス: Link先を確認	Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, Jianguo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu	(参考訳) オンラインビデオプラットフォームの急成長とビデオコンテンツの増大に伴い、熟練したビデオ理解ツールの需要が著しく高まっている。本稿では,LLM(Large Language Models, LLMs)を用いて, LLM(Vid-LLMs)のパワーを利用した映像理解の最近の進歩について概説する。 Vid-LLMの創発的能力は驚くほど進歩しており、特に空間的空間的推論と常識的知識が組み合わさり、将来的なビデオ理解の道のりを示唆している。我々は、vid-llmsのユニークな特徴と能力を調べ、そのアプローチをllmベースのビデオエージェント、vid-llmsプリトレーニング、vid-llms命令チューニング、ハイブリッド手法の4つのタイプに分類した。さらに,本調査では,Vid-LLMのタスクとデータセットの包括的調査と評価手法についても紹介した。さらに、調査は、様々なドメインにわたるvid-llmの広範囲な応用を探求し、実世界のビデオ理解における課題に対処する上で、その驚くべきスケーラビリティと汎用性を示す。最後に,既存のvid-llmの限界と今後の研究の方向性をまとめた。詳細については、https://github.com/yunlong10/Awesome-LLMs-for-Video-Understandingのリポジトリをご覧ください。 With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly. With Large Language Models (LLMs) showcasing remarkable capabilities in key language tasks, this survey provides a detailed overview of the recent advancements in video understanding harnessing the power of LLMs (Vid-LLMs). The emergent capabilities of Vid-LLMs are surprisingly advanced, particularly their ability for open-ended spatial-temporal reasoning combined with commonsense knowledge, suggesting a promising path for future video understanding. We examine the unique characteristics and capabilities of Vid-LLMs, categorizing the approaches into four main types: LLM-based Video Agents, Vid-LLMs Pretraining, Vid-LLMs Instruction Tuning, and Hybrid Methods. Furthermore, this survey also presents a comprehensive study of the tasks and datasets for Vid-LLMs, along with the methodologies employed for evaluation. Additionally, the survey explores the expansive applications of Vid-LLMs across various domains, thereby showcasing their remarkable scalability and versatility in addressing challenges in real-world video understanding. Finally, the survey summarizes the limitations of existing Vid-LLMs and the directions for future research. For more information, we recommend readers visit the repository at https://github.com/yunlong10/Awesome-LLMs-for-Video-Understanding.	翻訳日:2024-01-02 14:05:53 公開日:2023-12-29
# MVPatch:現実世界の物体探知機に対する敵のカモフラージュ攻撃のより鮮明なパッチ MVPatch: More Vivid Patch for Adversarial Camouflaged Attacks on Object Detectors in the Physical World ( http://arxiv.org/abs/2312.17431v1 ) ライセンス: Link先を確認	Zheng Zhou, Hongbo Zhao, Ju Liu, Qiaosheng Zhang, Guangbiao Wang, Chunlei Wang and Wenquan Feng	(参考訳) 近年の研究では、敵のパッチがオブジェクト検出モデルからの出力を操作できることが示されている。しかし、これらのパッチの顕著なパターンは、より注意を引き、人間の間で疑念を喚起する可能性がある。さらに、既存の研究は主に個々のモデルの攻撃性能に重点を置いており、複数のオブジェクト検出モデルに対するアンサンブル攻撃のための敵パッチの生成を無視している。これらの問題に対処するため,従来のパラダイム,例えば識別の容易さや伝達性の低さを考慮しつつ,敵パッチの転送性やステルス性を改善することを目的とした「MVPatch(More Vivid Patch)」と呼ばれる新しいアプローチを提案する。本手法では, アンサンブル攻撃損失関数を用いて, 複数の物体検出器の物体信頼度を低減し, 対向パッチの転送性を向上させる。さらに,画像類似度比較(CSS)損失関数によって実現される軽量な視覚類似度測定アルゴリズムを提案する。拡張実験により,提案したMVPatchアルゴリズムは,デジタルドメインと物理ドメインの両方で類似したアルゴリズムよりも優れた攻撃伝達性を実現するとともに,より自然な外観を示すことを示した。これらの結果は,提案したMVPatch攻撃アルゴリズムの顕著なステルス性と伝達性を強調した。 Recent research has shown that adversarial patches can manipulate outputs from object detection models. However, the conspicuous patterns on these patches may draw more attention and raise suspicions among humans. Moreover, existing works have primarily focused on the attack performance of individual models and have neglected the generation of adversarial patches for ensemble attacks on multiple object detection models. To tackle these concerns, we propose a novel approach referred to as the More Vivid Patch (MVPatch), which aims to improve the transferability and stealthiness of adversarial patches while considering the limitations observed in prior paradigms, such as easy identification and poor transferability. Our approach incorporates an attack algorithm that decreases object confidence scores of multiple object detectors by using the ensemble attack loss function, thereby enhancing the transferability of adversarial patches. Additionally, we propose a lightweight visual similarity measurement algorithm realized by the Compared Specified Image Similarity (CSS) loss function, which allows for the generation of natural and stealthy adversarial patches without the reliance on additional generative models. Extensive experiments demonstrate that the proposed MVPatch algorithm achieves superior attack transferability compared to similar algorithms in both digital and physical domains, while also exhibiting a more natural appearance. These findings emphasize the remarkable stealthiness and transferability of the proposed MVPatch attack algorithm.	翻訳日:2024-01-02 14:05:26 公開日:2023-12-29
# lefl: フェデレーション学習における低エントロピークライアントサンプリング LEFL: Low Entropy Client Sampling in Federated Learning ( http://arxiv.org/abs/2312.17430v1 ) ライセンス: Link先を確認	Waqwoya Abebe, Pablo Munoz, Ali Jannesari	(参考訳) Federated Learning(FL)は、複数のクライアントが協力して、プライベートデータを使用して単一のグローバルモデルを最適化する、機械学習パラダイムである。グローバルモデルは、一連のトレーニングラウンドを通じてFLトレーニングプロセスを編成する中央サーバによって維持される。各ラウンドで、サーバはクライアントプールからクライアントをサンプリングし、さらに最適化するために最新のグローバルモデルパラメータを送信する。ナイーブサンプリング戦略はランダムクライアントサンプリングを実装し、プライバシの理由からクライアントデータ分布を見積もらない。そこで我々は,データプライバシを尊重しつつ,モデルの学習したハイレベル機能に基づいて,クライアントの1回クラスタリングを行うことで,代替サンプリング戦略を提案する。これにより、サーバは各ラウンドでクラスタ間で階層化されたクライアントサンプリングを実行することができる。このアプローチで選択されたサンプルクライアントのデータセットは、グローバルデータ分布に対して低い相対エントロピーをもたらす。その結果、flトレーニングはノイズが少なくなり、いくつかの実験でグローバルモデルの収束率を最大7.4%向上させる。さらに、目標精度を達成するために必要な通信ラウンドを大幅に削減する。 Federated learning (FL) is a machine learning paradigm where multiple clients collaborate to optimize a single global model using their private data. The global model is maintained by a central server that orchestrates the FL training process through a series of training rounds. In each round, the server samples clients from a client pool before sending them its latest global model parameters for further optimization. Naive sampling strategies implement random client sampling and fail to factor client data distributions for privacy reasons. Hence we proposes an alternative sampling strategy by performing a one-time clustering of clients based on their model's learned high-level features while respecting data privacy. This enables the server to perform stratified client sampling across clusters in every round. We show datasets of sampled clients selected with this approach yield a low relative entropy with respect to the global data distribution. Consequently, the FL training becomes less noisy and significantly improves the convergence of the global model by as much as 7.4% in some experiments. Furthermore, it also significantly reduces the communication rounds required to achieve a target accuracy.	翻訳日:2024-01-02 14:05:01 公開日:2023-12-29
# ゼロショット自然言語ビデオローカライズのためのコモンセンス Commonsense for Zero-Shot Natural Language Video Localization ( http://arxiv.org/abs/2312.17429v1 ) ライセンス: Link先を確認	Meghana Holla, Ismini Lourentzou	(参考訳) Zero-shot Natural Language-Video Localization (NLVL)法は,ビデオセグメントと擬似クエリアノテーションを動的に生成することにより,生のビデオデータのみを用いたNLVLモデルのトレーニングにおいて有望な結果を示した。しかし、既存の擬似クエリーはソースビデオの基盤を欠くことが多く、構造化されていないコンテンツと解離したコンテンツをもたらす。本稿では,ゼロショットNLVLにおけるコモンセンス推論の有効性について検討する。具体的には、コモンセンスを利用したゼロショットNLVLフレームワークであるCORONETを紹介し、コモンセンス拡張モジュールを介してビデオと生成された擬似クエリ間のギャップを埋める。 CORONETは、知識グラフから抽出されたコモンセンス情報を符号化するグラフ畳み込みネットワーク(GCN)と、ローカライゼーションの前にエンコードされたビデオと擬似クエリ表現を強化するクロスアテンション機構を利用する。 2つのベンチマークデータセットに対する実証的な評価を通じて、CORONETがゼロショットと弱教師付きベースラインを越え、様々なリコールしきい値で最大32.13%、mIoUで最大6.33%の改善を達成したことを示す。これらの結果は, ゼロショットNLVLにおけるコモンセンス推論の活用の重要性を裏付けるものである。 Zero-shot Natural Language-Video Localization (NLVL) methods have exhibited promising results in training NLVL models exclusively with raw video data by dynamically generating video segments and pseudo-query annotations. However, existing pseudo-queries often lack grounding in the source video, resulting in unstructured and disjointed content. In this paper, we investigate the effectiveness of commonsense reasoning in zero-shot NLVL. Specifically, we present CORONET, a zero-shot NLVL framework that leverages commonsense to bridge the gap between videos and generated pseudo-queries via a commonsense enhancement module. CORONET employs Graph Convolution Networks (GCN) to encode commonsense information extracted from a knowledge graph, conditioned on the video, and cross-attention mechanisms to enhance the encoded video and pseudo-query representations prior to localization. Through empirical evaluations on two benchmark datasets, we demonstrate that CORONET surpasses both zero-shot and weakly supervised baselines, achieving improvements up to 32.13% across various recall thresholds and up to 6.33% in mIoU. These results underscore the significance of leveraging commonsense reasoning for zero-shot NLVL.	翻訳日:2024-01-02 14:04:48 公開日:2023-12-29
# ChangeNet: マルチテンポラルな非対称な変更検出データセット ChangeNet: Multi-Temporal Asymmetric Change Detection Dataset ( http://arxiv.org/abs/2312.17428v1 ) ライセンス: Link先を確認	Deyi Ji, Siqi Gao, Mingyuan Tao, Hongtao Lu, Feng Zhao	(参考訳) 変更検出(CD)は、バイテンポラルデータセットが利用できることで、大きな関心を集めている。しかし、マルチ時間画像の取得とラベル付けの膨大なコストのため、既存の変更検出データセットは少ない量で、時間的に短く、実践性も低い。そのためコミュニティの活性化には,広範な時間的フェーズをカバーする大規模実用指向データセットが緊急に必要となる。この目的のために、特に多時間変化検出のためのchangenetデータセットと、`asymmetric change detection(非対称変化検出)という新しいタスクが提示される。具体的には、changenetは31,000のマルチテンポラルイメージペア、100都市のさまざまな複雑なシーン、6つのピクセルレベルの注釈付きカテゴリで構成されており、levir-cdやwhu building cdなど、既存のすべての変更検出データセットよりもはるかに優れている。さらに、changenetには、同じ領域の異なる時間相における現実世界の視点歪みの量が含まれており、変化検出アルゴリズムの実用化を促進することができる。 ChangeNetデータセットはバイナリ変更検出(BCD)とセマンティック変更検出(SCD)の両方に適している。そこで我々は,6つのBCD法と2つのSCD法でChangeNetデータセットをベンチマークし,その課題と大きな意義を実証した。データセットはhttps://github.com/jankyee/ChangeNetで公開されている。 Change Detection (CD) has been attracting extensive interests with the availability of bi-temporal datasets. However, due to the huge cost of multi-temporal images acquisition and labeling, existing change detection datasets are small in quantity, short in temporal, and low in practicability. Therefore, a large-scale practical-oriented dataset covering wide temporal phases is urgently needed to facilitate the community. To this end, the ChangeNet dataset is presented especially for multi-temporal change detection, along with the new task of ``Asymmetric Change Detection". Specifically, ChangeNet consists of 31,000 multi-temporal images pairs, a wide range of complex scenes from 100 cities, and 6 pixel-level annotated categories, which is far superior to all the existing change detection datasets including LEVIR-CD, WHU Building CD, etc.. In addition, ChangeNet contains amounts of real-world perspective distortions in different temporal phases on the same areas, which is able to promote the practical application of change detection algorithms. The ChangeNet dataset is suitable for both binary change detection (BCD) and semantic change detection (SCD) tasks. Accordingly, we benchmark the ChangeNet dataset on six BCD methods and two SCD methods, and extensive experiments demonstrate its challenges and great significance. The dataset is available at https://github.com/jankyee/ChangeNet.	翻訳日:2024-01-02 14:04:26 公開日:2023-12-29
# 非バイアスシーングラフ生成のためのコンテキストベース転送と効率的な反復学習 Context-based Transfer and Efficient Iterative Learning for Unbiased Scene Graph Generation ( http://arxiv.org/abs/2312.17425v1 ) ライセンス: Link先を確認	Qishen Chen, Xinyu Lyu, Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song	(参考訳) アンバイアスドシーングラフ生成(USGG)は、SGGのバイアスド予測に対処することを目的としている。この目的のために、データ転送法は、粗粒度述語を細粒度に変換し、不均衡分布を緩和するように設計されている。しかし、「女性テーブル」の「食」が適さないなど、移動ラベルと対象物対の間の文脈的関連性を見落としている。さらに、それらは通常、データ転送のモデルを事前トレーニングしてから、転送ラベルを使用してスクラッチからトレーニングすることから始まり、重要な計算コストを伴う2段階のプロセスを伴う。そこで我々は,段階的に拡張されたデータを用いてSGGモデルを反復的に学習するCITransというプラグイン・アンド・プレイ方式を提案する。まず,きめ細かなデータ転送を実現するために,述語意味空間内に主観オブジェクト制約を課すコンテキスト制限転送(crt)を導入する。その後、効率的な反復学習(eil)が反復的にモデルを訓練し、モデルの学習状態と一致する拡張ラベルを生成し、トレーニングプロセスを加速する。最後に、広範囲な実験によりCITransが最先端を実現し、高い効率で結果が得られた。 Unbiased Scene Graph Generation (USGG) aims to address biased predictions in SGG. To that end, data transfer methods are designed to convert coarse-grained predicates into fine-grained ones, mitigating imbalanced distribution. However, them overlook contextual relevance between transferred labels and subject-object pairs, such as unsuitability of 'eating' for 'woman-table'. Furthermore, they typically involve a two-stage process with significant computational costs, starting with pre-training a model for data transfer, followed by training from scratch using transferred labels. Thus, we introduce a plug-and-play method named CITrans, which iteratively trains SGG models with progressively enhanced data. First, we introduce Context-Restricted Transfer (CRT), which imposes subject-object constraints within predicates' semantic space to achieve fine-grained data transfer. Subsequently, Efficient Iterative Learning (EIL) iteratively trains models and progressively generates enhanced labels which are consistent with model's learning state, thereby accelerating the training process. Finally, extensive experiments show that CITrans achieves state-of-the-art and results with high efficiency.	翻訳日:2024-01-02 14:04:03 公開日:2023-12-29
# 正規化偏差2乗統計量を用いたガウス混合フィルタの厳密整合性試験 Exact Consistency Tests for Gaussian Mixture Filters using Normalized Deviation Squared Statistics ( http://arxiv.org/abs/2312.17420v1 ) ライセンス: Link先を確認	Nisar Ahmed, Luke Burks, Kailah Cabral, Alyssa Bekai Rose	(参考訳) ガウス混合系の確率的系状態密度を近似する離散時間確率フィルタにおける動的一貫性の評価問題を考える。動的一貫性とは、推定確率分布が実際の不確かさを正確に記述することを意味する。このように、一貫性テストの問題は、推定子チューニングと検証に関するアプリケーションで自然に生じます。しかし、密度関数の一般複雑性のため、混合型推定器の整合性テストのための簡単なアプローチは定義と実装が難しいままである。本稿では正規化偏差二乗(NDS)統計の枠組みにおけるガウス混合整合性試験の新しい正確な結果を得る。一般多変量ガウス混合モデルのNDSテスト統計は、効率的に計算ツールが利用できる一般化されたカイ二乗分布の混合を正確に追従している。結果の整合性試験の精度と有用性を静的および動的混合推定例で数値的に示す。 We consider the problem of evaluating dynamic consistency in discrete time probabilistic filters that approximate stochastic system state densities with Gaussian mixtures. Dynamic consistency means that the estimated probability distributions correctly describe the actual uncertainties. As such, the problem of consistency testing naturally arises in applications with regards to estimator tuning and validation. However, due to the general complexity of the density functions involved, straightforward approaches for consistency testing of mixture-based estimators have remained challenging to define and implement. This paper derives a new exact result for Gaussian mixture consistency testing within the framework of normalized deviation squared (NDS) statistics. It is shown that NDS test statistics for generic multivariate Gaussian mixture models exactly follow mixtures of generalized chi-square distributions, for which efficient computational tools are available. The accuracy and utility of the resulting consistency tests are numerically demonstrated on static and dynamic mixture estimation examples.	翻訳日:2024-01-02 14:03:43 公開日:2023-12-29
# ユニバーサルクトリットゲートのベンチマーク Benchmarking of Universal Qutrit Gates ( http://arxiv.org/abs/2312.17418v1 ) ライセンス: Link先を確認	David Amaro-Alcal\'a, Barry C. Sanders, Hubert de Guise	(参考訳) 本稿では,ユニバーサルクトリットゲート集合のキャラクタリゼーションスキームを提案する。量子力学系に対する関心の高まりに動機づけられ、我々の超二面体群がクトリット t~gate の性能を特徴づけるためのスキームを基礎付けるための基準を適用する。結果として得られたqutritスキームは実現可能であり、qutrit cliffordのランダム化ベンチマークに使用されるリソースに似たリソースとデータ分析技術が必要です。クトリットのT~ゲートベンチマークと既知のクトリットクリフォードゲートベンチマークを組み合わせることで、普遍クトリットゲートセットの完全な特徴付けが可能になる。 We introduce a characterisation scheme for a universal qutrit gate set. Motivated by rising interest in qutrit systems, we apply our criteria to establish that our hyperdihedral group underpins a scheme to characterise the performance of a qutrit T~gate. Our resulting qutrit scheme is feasible, as it requires resources and data analysis techniques similar to resources employed for qutrit Clifford randomised benchmarking. Combining our T~gate benchmarking procedure for qutrits with known qutrit Clifford-gate benchmarking enables complete characterisation of a universal qutrit gate set.	翻訳日:2024-01-02 14:03:29 公開日:2023-12-29
# ベイズ上皮性不確実性推定のための生成後ネットワーク Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation ( http://arxiv.org/abs/2312.17411v1 ) ライセンス: Link先を確認	Melrose Roderick, Felix Berkenkamp, Fatemeh Sheikholeslami, Zico Kolter	(参考訳) 多くの実世界の問題では、トレーニングデータには制限があるが、ラベルのないデータが豊富にある。本稿では,ラベルのないデータを用いて高次元問題におけるてんかん不確実性を推定する新しい手法GPNを提案する。 gpnは、関数上の事前分布が与えられたとき、ネットワークを事前のサンプルに向けて正規化することによって後続分布を直接近似する生成モデルである。理論上,本手法はベイズ後方を近似し,それよりも認識的不確実性推定と拡張性が向上することを示す。 In many real-world problems, there is a limited set of training data, but an abundance of unlabeled data. We propose a new method, Generative Posterior Networks (GPNs), that uses unlabeled data to estimate epistemic uncertainty in high-dimensional problems. A GPN is a generative model that, given a prior distribution over functions, approximates the posterior distribution directly by regularizing the network towards samples from the prior. We prove theoretically that our method indeed approximates the Bayesian posterior and show empirically that it improves epistemic uncertainty estimation and scalability over competing methods.	翻訳日:2024-01-02 14:03:15 公開日:2023-12-29
# 点雲データにおける異なる地形表面の粗さ記述子の比較 Comparing roughness descriptors for distinct terrain surfaces in point cloud data ( http://arxiv.org/abs/2312.17407v1 ) ライセンス: Link先を確認	Lei Fan and Yang Zhao	(参考訳) 地形表面の粗さは、しばしば抽象的に説明され、文献に見られる様々な記述子によって定量的な特徴付けに困難をもたらす。本研究は,5種類の粗さ記述子を比較し,異なる空間的変動を伴う3つの地形の地形表面粗さマップ間の相関について検討した。さらに,これらの相関に対する空間的尺度と補間法の影響について検討した。本研究では,光検出と測位技術により得られた濃厚点雲データを用いた。本研究は,局所的粗さ値がその後の解析において重要な役割を果たす研究において,複数の記述子を組み込むことの重要性を強調しながら,大域的パターンの類似性と局所的パターンの区別の両方を浮き彫りにした。空間スケールは、より粗い地形への影響が小さく、補間法は異なる記述子から派生した粗さマップに最小限の影響が認められた。 Terrain surface roughness, often described abstractly, poses challenges in quantitative characterisation with various descriptors found in the literature. This study compares five commonly used roughness descriptors, exploring correlations among their quantified terrain surface roughness maps across three terrains with distinct spatial variations. Additionally, the study investigates the impacts of spatial scales and interpolation methods on these correlations. Dense point cloud data obtained through Light Detection and Ranging technique are used in this study. The findings highlight both global pattern similarities and local pattern distinctions in the derived roughness maps, emphasizing the significance of incorporating multiple descriptors in studies where local roughness values play a crucial role in subsequent analyses. The spatial scales were found to have a smaller impact on rougher terrain, while interpolation methods had minimal influence on roughness maps derived from different descriptors.	翻訳日:2024-01-02 14:03:02 公開日:2023-12-29
# 意識割当(POCA)を用いたパラメータ最適化 Parameter Optimization with Conscious Allocation (POCA) ( http://arxiv.org/abs/2312.17404v1 ) ライセンス: Link先を確認	Joshua Inman, Tanmay Khandait, Giulia Pedrielli, and Lalitha Sankar	(参考訳) 現代の機械学習アルゴリズムの性能は、ハイパーパラメータのセットの選択に依存する。ハイパーパラメータの一般的な例は、学習率と密集したニューラルネットワークの層数である。 Auto-MLは最適化の一分野であり、この分野で重要な貢献をしている。 Auto-MLでは、低予算で評価した後の低パフォーマンスな構成を排除するハイパーバンドベースのアプローチが最も効果的である。しかし、これらのアルゴリズムの性能は、計算予算を様々なハイパーパラメータ構成にどの程度効果的に割り当てるかに大きく依存する。本稿では,入力した予算をベイジアンサンプリングスキームに従って生成するハイパーパラメータ構成に適応的に割り当てるハイパーバンドベースのアルゴリズムであるパラメータ最適化(POCA)を提案する。我々はPOCAを、人工玩具関数とディープニューラルネットワークのハイパーパラメータを最適化する最も近い競合相手と比較し、POCAが両方の設定でより高速な構成を見つけることを発見した。 The performance of modern machine learning algorithms depends upon the selection of a set of hyperparameters. Common examples of hyperparameters are learning rate and the number of layers in a dense neural network. Auto-ML is a branch of optimization that has produced important contributions in this area. Within Auto-ML, hyperband-based approaches, which eliminate poorly-performing configurations after evaluating them at low budgets, are among the most effective. However, the performance of these algorithms strongly depends on how effectively they allocate the computational budget to various hyperparameter configurations. We present the new Parameter Optimization with Conscious Allocation (POCA), a hyperband-based algorithm that adaptively allocates the inputted budget to the hyperparameter configurations it generates following a Bayesian sampling scheme. We compare POCA to its nearest competitor at optimizing the hyperparameters of an artificial toy function and a deep neural network and find that POCA finds strong configurations faster in both settings.	翻訳日:2024-01-02 14:02:46 公開日:2023-12-29
# ジョブの正しいプロンプト:大規模言語モデルによるコードレビュー欠陥の修復 The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model ( http://arxiv.org/abs/2312.17485v1 ) ライセンス: Link先を確認	Zelin Zhao, Zhaogui Xu, Jialong Zhu, Peng Di, Yuan Yao, Xiaoxing Ma	(参考訳) 自動プログラム修復(APR)技術は、コードレビュー(CR)プロセス中にプログラム欠陥を発見して修復する手作業を減らす可能性がある。しかしながら、既存のaprアプローチに伴う正確さと時間的コストの制限は、彼らの産業的実践への採用を妨げている。重要な要素の1つはレビューコメントの未使用であり、欠陥や潜在的な修正に関する貴重な洞察を提供する。近年のLLM(Large Language Models)の進歩により、自然言語やプログラミング言語を理解する能力が向上し、レビューコメントに基づいたパッチの生成が可能になった。本稿では, CR欠陥の修復にLLMを有効利用するための包括的調査を行う。本研究では,人間のレビュアーと自動チェッカーの2つの異なるデータセットを用いて,主流のllm間でさまざまなプロンプトを設計,比較する。実験の結果, 72.97%の顕著な補修率を示し, 自動補修技術の有効性と実用性を大幅に向上させた。 Automatic program repair (APR) techniques have the potential to reduce manual efforts in uncovering and repairing program defects during the code review (CR) process. However, the limited accuracy and considerable time costs associated with existing APR approaches hinder their adoption in industrial practice. One key factor is the under-utilization of review comments, which provide valuable insights into defects and potential fixes. Recent advancements in Large Language Models (LLMs) have enhanced their ability to comprehend natural and programming languages, enabling them to generate patches based on review comments. This paper conducts a comprehensive investigation into the effective utilization of LLMs for repairing CR defects. In this study, various prompts are designed and compared across mainstream LLMs using two distinct datasets from human reviewers and automated checkers. Experimental results demonstrate a remarkable repair rate of 72.97% with the best prompt, highlighting a substantial improvement in the effectiveness and practicality of automatic repair techniques.	翻訳日:2024-01-02 13:44:25 公開日:2023-12-29
# truth forest: チューニングなし介入による大規模言語モデルにおける多元的真理性の実現に向けて Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning ( http://arxiv.org/abs/2312.17484v1 ) ライセンス: Link先を確認	Zhongzhi Chen, Xingwu Sun, Xianfeng Jiao, Fengzong Lian, Zhanhui Kang, Di Wang, Cheng-Zhong Xu	(参考訳) 大きな言語モデル(LLM)が様々なタスクで大きな成功を収めたが、幻覚を生じさせることに苦しむ。多次元直交プローブを用いて隠れ真理表現を明らかにすることでllmの真理性を高める方法である真理フォレストを提案する。具体的には、プローブに直交制約を組み込むことで真理をモデリングするための複数の直交基底を生成する。さらに,LLMにおける識別と真理特徴の生成のギャップを減らし,シーケンス内の幅広い位置を考慮に入れた体系的手法であるRandom Peekを導入する。このアプローチを用いることで,Llama-2-7Bの真偽を40.8\%から74.5\%に改善した。同様に、微調整されたモデルでも顕著な改善が見られる。我々はプローブを用いて真理特徴の徹底的な解析を行った。可視化の結果,直交プローブが真理関連特徴を補完し,データセットの固有構造を明らかにするクラスタを形成することがわかった。コード: \url{https://github.com/jongjyh/trfr} Despite the great success of large language models (LLMs) in various tasks, they suffer from generating hallucinations. We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. Specifically, it creates multiple orthogonal bases for modeling truth by incorporating orthogonal constraints into the probes. Moreover, we introduce Random Peek, a systematic technique considering an extended range of positions within the sequence, reducing the gap between discerning and generating truth features in LLMs. By employing this approach, we improved the truthfulness of Llama-2-7B from 40.8\% to 74.5\% on TruthfulQA. Likewise, significant improvements are observed in fine-tuned models. We conducted a thorough analysis of truth features using probes. Our visualization results show that orthogonal probes capture complementary truth-related features, forming well-defined clusters that reveal the inherent structure of the dataset. Code: \url{https://github.com/jongjyh/trfr}	翻訳日:2024-01-02 13:44:08 公開日:2023-12-29
# 冗長性補修によるバケット旅団量子ランダムアクセスメモリの収量最大化 Maximizing the Yield of Bucket Brigade Quantum Random Access Memory using Redundancy Repair ( http://arxiv.org/abs/2312.17483v1 ) ライセンス: Link先を確認	Dongmin Kim, Sovanmonynuth Heng, Sengthai Heng and Youngsun Han	(参考訳) 量子ランダムアクセスメモリ(Quantum Random Access Memory, qRAM)は、オラクルベースの量子アルゴリズムを実行するための重要な計算要素である。 qRAMは、量子重ね合わせの原理を利用して、メモリセルに格納された全てのデータに同時にアクセスし、量子アルゴリズムの優れた性能を保証する。 qRAMメモリセルは、様々な量子ノイズに対するqRAMの動作を成功させるために量子エラー補正技術によって符号化された論理量子ビットを含む。量子ノイズに加えて、シリコン技術に基づく低技術ノードは量子ビット密度を増加させ、欠陥量子ビットを導入する可能性がある。 qRAMは多くの量子ビットから構成されているので、その収量は欠陥量子ビットによって減少する。しかし、qecスキームは大量の物理キュービットを必要とするため、リソースのオーバーヘッドがかかる。このオーバーヘッド問題を解決するために、冗長量子ビットを導入して欠陥量子ビットを補う量子メモリアーキテクチャを提案する。また,qRAM における論理量子ビット数の違いに対して,理想的生成誤差率を 0.5% から 1% に変化させることにより,提案アーキテクチャがもたらす収率改善を解析した。 1,024個の論理量子ビットからなるqRAMでは、8つの冗長論理量子ビットは、冗長な修復スキームを使用しないqRAMから95.92%向上した。 Quantum Random Access Memory (qRAM) is an essential computing element for running oracle-based quantum algorithms. qRAM exploits the principle of quantum superposition to access all data stored in the memory cell simultaneously and guarantees the superior performance of quantum algorithms. A qRAM memory cell comprises logical qubits encoded through quantum error correction technology for the successful operation of qRAM against various quantum noises. In addition to quantum noise, the low-technology nodes based on silicon technology can increase the qubit density and may introduce defective qubits. As qRAM comprises many qubits, its yield will be reduced by defective qubits; these qubits must be handled using QEC scheme. However, the QEC scheme requires numerous physical qubits, which burdens resource overhead. To resolve this overhead problem, we propose a quantum memory architecture that compensates for defective qubits by introducing redundant qubits. We also analyze the yield improvement offered by our proposed architecture by varying the ideal fabrication error rate from 0.5% to 1% for different numbers of logical qubits in the qRAM. In the qRAM comprising 1,024 logical qubits, eight redundant logical qubits improved the yield by 95.92% from that of qRAM not employing the redundant repair scheme.	翻訳日:2024-01-02 13:43:52 公開日:2023-12-29
# MosaicBERT: 高速プレトレーニング用に最適化された双方向エンコーダ MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining ( http://arxiv.org/abs/2312.17482v1 ) ライセンス: Link先を確認	Jacob Portes, Alex Trott, Sam Havens, Daniel King, Abhinav Venigalla, Moin Nadeem, Nikhil Sardana, Daya Khudia, Jonathan Frankle	(参考訳) BERT型エンコーダモデルはNLP研究で多用されているが、多くの研究者はトレーニングコストが高いため、スクラッチから独自のBERTを事前訓練していない。 BERTが普及してから30年が経ち、BERTに体系的に組み込まれていない他のトランスフォーマーアーキテクチャやトレーニング構成で多くの進歩が見られた。本稿では,bert形式のエンコーダアーキテクチャとトレーニングレシピであるmosaicbertを紹介する。この効率的なアーキテクチャは、FlashAttention、Atention with Linear Biases (ALiBi)、Gated Linear Units (GLU)、パッド付きトークンを動的に除去するモジュール、そして低精度のLayerNormを古典的なトランスフォーマーエンコーダブロックに組み込む。トレーニングレシピには、Masked Language Modeling(MLM)目標の30%のマスキング比率、bfloat16精度、GPUスループットに最適化された語彙サイズ、RoBERTaや他のエンコーダモデルのベストプラクティスが含まれている。 C4データセットのスクラッチから事前トレーニングされた場合、このベースモデルは、約20ドルで8 A100 80 GB GPU上で1.13時間の平均GLUEスコア79.6を達成する。我々は, 事前学習速度のパレート曲線に対して広範囲の精度をプロットし, モザイクBERTベースと大が競合するBERTベースと大と比べ常にパレートが最適であることを示す。この事前トレーニングでの実証的なスピードアップにより、研究者やエンジニアは既存のジェネリックモデルの微調整ではなく、BERTスタイルのカスタムモデルを低コストで事前トレーニングすることができる。私たちはモデル重みとコードをオープンソース化します。 Although BERT-style encoder models are heavily used in NLP research, many researchers do not pretrain their own BERTs from scratch due to the high cost of training. In the past half-decade since BERT first rose to prominence, many advances have been made with other transformer architectures and training configurations that have yet to be systematically incorporated into BERT. Here, we introduce MosaicBERT, a BERT-style encoder architecture and training recipe that is empirically optimized for fast pretraining. This efficient architecture incorporates FlashAttention, Attention with Linear Biases (ALiBi), Gated Linear Units (GLU), a module to dynamically remove padded tokens, and low precision LayerNorm into the classic transformer encoder block. The training recipe includes a 30% masking ratio for the Masked Language Modeling (MLM) objective, bfloat16 precision, and vocabulary size optimized for GPU throughput, in addition to best-practices from RoBERTa and other encoder models. When pretrained from scratch on the C4 dataset, this base model achieves a downstream average GLUE (dev) score of 79.6 in 1.13 hours on 8 A100 80 GB GPUs at a cost of roughly $20. We plot extensive accuracy vs. pretraining speed Pareto curves and show that MosaicBERT base and large are consistently Pareto optimal when compared to a competitive BERT base and large. This empirical speed up in pretraining enables researchers and engineers to pretrain custom BERT-style models at low cost instead of finetune on existing generic models. We open source our model weights and code.	翻訳日:2024-01-02 13:43:36 公開日:2023-12-29
# 量子対数多重フラクタル Quantum logarithmic multifractality ( http://arxiv.org/abs/2312.17481v1 ) ライセンス: Link先を確認	Weitao Chen, Olivier Giraud, Jiangbin Gong, Gabriel Lemari\'e	(参考訳) 厳密な解析的導出と広範な数値シミュレーションを組み合わせることで、アンダーソン転移の事実上無限次元系において「対数的多フラクタル性」と呼ばれるエキゾチックな多フラクタル挙動を報告した。有限次元アンダーソン遷移やスケール不変な二階相遷移で観察される従来の多重フラクタル臨界特性とは対照的に、対数的多重フラクタル性、固有状態統計、空間相関、波動パケットダイナミクスの存在下では、システムのサイズや時間の対数において代数的なスケーリング則を示すことができる。我々の発見は、アンダーソン転移を経る複雑な系の強い有限サイズ効果とスローダイナミクス、例えば多体局在遷移に関する重要な知見を提供する。 Through a combination of rigorous analytical derivations and extensive numerical simulations, this work reports an exotic multifractal behavior, dubbed "logarithmic multifractality", in effectively infinite-dimensional systems undergoing the Anderson transition. In marked contrast to conventional multifractal critical properties observed at finite-dimensional Anderson transitions or scale-invariant second-order phase transitions, in the presence of logarithmic multifractality, eigenstate statistics, spatial correlations, and wave packet dynamics can all exhibit scaling laws which are algebraic in the logarithm of system size or time. Our findings offer crucial insights into strong finite-size effects and slow dynamics in complex systems undergoing the Anderson transition, such as the many-body localization transition.	翻訳日:2024-01-02 13:43:06 公開日:2023-12-29
# 文化的学習型モラルマシン:逆強化学習によるAIによる人間価値システムの帰納学習 Culturally-Attuned Moral Machines: Implicit Learning of Human Value Systems by AI through Inverse Reinforcement Learning ( http://arxiv.org/abs/2312.17479v1 ) ライセンス: Link先を確認	Nigini Oliveira, Jasmine Li, Koosha Khalvati, Rodolfo Cortes Barragan, Katharina Reinecke, Andrew N. Meltzoff, and Rajesh P. N. Rao	(参考訳) 人工知能(AI)の普遍的道徳コードの構築は、異なる人間の文化が道徳と異なる社会的規範の定義を持っていることを考えると、困難または不可能である。したがって、我々は、AIの価値体系は文化的に直感的であるべきだと主張する: 特定の文化で育てられた子供が、その文化の特定の価値と規範を学ぶのと同じように、特定の人間のコミュニティで活動するAIエージェントは、そのコミュニティの道徳的、倫理的、文化的規範を取得するべきである。 AIシステムは、人間の観察とインタラクションからこのようなコードを取得することができるのか? 本稿では,AIエージェントが暗黙的に文化的に調整された価値システムを取得する方法として,逆強化学習(IRL)を提案する。我々は、AIエージェントがIRLを使用してエージェントの道徳的価値を管理する異なる報酬関数を学習する実験パラダイムを用いて、リアルタイムな意思決定を必要とするオンラインバーチャルワールドにおける異なる文化グループの振る舞いを観察する。本稿では,特定の文化集団の平均的行動から学習するaiエージェントは,その集団の行動を反映した利他的特徴を身につけることができ,この学習価値システムは利他的判断を必要とする新たなシナリオに一般化できることを示す。私たちの知識では、AIエージェントが人間との観察と相互作用から継続的に価値と規範を学習し、それによって彼らが活動している文化に順応する能力によって、潜在的にAIエージェントが授けられる可能性の最初のデモンストレーションを提供します。 Constructing a universal moral code for artificial intelligence (AI) is difficult or even impossible, given that different human cultures have different definitions of morality and different societal norms. We therefore argue that the value system of an AI should be culturally attuned: just as a child raised in a particular culture learns the specific values and norms of that culture, we propose that an AI agent operating in a particular human community should acquire that community's moral, ethical, and cultural codes. How AI systems might acquire such codes from human observation and interaction has remained an open question. Here, we propose using inverse reinforcement learning (IRL) as a method for AI agents to acquire a culturally-attuned value system implicitly. We test our approach using an experimental paradigm in which AI agents use IRL to learn different reward functions, which govern the agents' moral values, by observing the behavior of different cultural groups in an online virtual world requiring real-time decision making. We show that an AI agent learning from the average behavior of a particular cultural group can acquire altruistic characteristics reflective of that group's behavior, and this learned value system can generalize to new scenarios requiring altruistic judgments. Our results provide, to our knowledge, the first demonstration that AI agents could potentially be endowed with the ability to continually learn their values and norms from observing and interacting with humans, thereby becoming attuned to the culture they are operating in.	翻訳日:2024-01-02 13:42:50 公開日:2023-12-29
# llmsの意思決定能力の感度を探求する:迅速な変動とハイパーパラメータからの洞察 Exploring the Sensitivity of LLMs' Decision-Making Capabilities: Insights from Prompt Variation and Hyperparameters ( http://arxiv.org/abs/2312.17476v1 ) ライセンス: Link先を確認	Manikanta Loya, Divya Anand Sinha, Richard Futrell	(参考訳) 大規模言語モデル(llm)の進歩により、意思決定を含む幅広いタスクで広く使われるようになった。これまでの研究では、LLMの意思決定能力と人間の意思決定能力を比較してきた。しかし、これらの研究は必ずしもLLMの行動の過度パラメータに対する感受性とプロンプトの変化を適切に考慮していない。本研究では,Binz と Schulz (2023) による水平決定タスクにおける LLM の性能について検討し,LLM がプロンプトやハイパーパラメータの変動にどう反応するかを解析した。異なる能力を持つ3つのOpenAI言語モデルで実験することにより、入力プロンプトと温度設定に基づいて意思決定能力が変動することを確認する。以前の発見言語モデルとは対照的に、プロンプトへの簡単な調整の後、人間のような探索的エクスプロイトのトレードオフを表示する。 The advancement of Large Language Models (LLMs) has led to their widespread use across a broad spectrum of tasks including decision making. Prior studies have compared the decision making abilities of LLMs with those of humans from a psychological perspective. However, these studies have not always properly accounted for the sensitivity of LLMs' behavior to hyperparameters and variations in the prompt. In this study, we examine LLMs' performance on the Horizon decision making task studied by Binz and Schulz (2023) analyzing how LLMs respond to variations in prompts and hyperparameters. By experimenting on three OpenAI language models possessing different capabilities, we observe that the decision making abilities fluctuate based on the input prompts and temperature settings. Contrary to previous findings language models display a human-like exploration exploitation tradeoff after simple adjustments to the prompt.	翻訳日:2024-01-02 13:42:19 公開日:2023-12-29
# 患者とAIの EHR インタラクション:注意: EHR インタラクション EHR Interaction Between Patients and AI: NoteAid EHR Interaction ( http://arxiv.org/abs/2312.17475v1 ) ライセンス: Link先を確認	Xiaocheng Zhang, Zonghai Yao, Hong Yu	(参考訳) 大規模言語モデル(LLM)の急速な進歩と意味的・文脈的理解における卓越した性能により、特殊領域におけるLLMの可能性は探索を保証している。本報告では,患者を電子健康記録(EHR)の理解に役立てる必要性に起因した課題である,患者教育を支援するためのジェネレーティブ LLM を用いた革新的アプローチである NoteAid EHR Interaction Pipeline を紹介する。本研究は, 患者がEHRを読んだ後, 患者が提示する疑問に答えることのできない, EHR内容の説明の提供という, 患者の視点からの2つの新しい課題を考案した。これらのデータとnoteaid ehrインタラクションパイプラインを通じて,made medical notes collectionから1万インスタンスを含むデータセットと876インスタンスをそれぞれ抽出し,2つのタスクの実行を行った。これらのタスクにおけるLCMの性能データを収集し,対応するNoteAid EHRインタラクションデータセットとして構築した。 LLM評価と64例の厳密な手作業によるデータセット全体の総合的な評価を通じて,患者教育におけるLLMの可能性を示す。さらに、この領域における将来の探索と応用のための貴重なデータを提供し、また、社内システムトレーニングのための高品質な合成データセットも提供する。 With the rapid advancement of Large Language Models (LLMs) and their outstanding performance in semantic and contextual comprehension, the potential of LLMs in specialized domains warrants exploration. This paper introduces the NoteAid EHR Interaction Pipeline, an innovative approach developed using generative LLMs to assist in patient education, a task stemming from the need to aid patients in understanding Electronic Health Records (EHRs). Building upon the NoteAid work, we designed two novel tasks from the patient's perspective: providing explanations for EHR content that patients may not understand and answering questions posed by patients after reading their EHRs. We extracted datasets containing 10,000 instances from MIMIC Discharge Summaries and 876 instances from the MADE medical notes collection, respectively, executing the two tasks through the NoteAid EHR Interaction Pipeline with these data. Performance data of LLMs on these tasks were collected and constructed as the corresponding NoteAid EHR Interaction Dataset. Through a comprehensive evaluation of the entire dataset using LLM assessment and a rigorous manual evaluation of 64 instances, we showcase the potential of LLMs in patient education. Besides, the results provide valuable data support for future exploration and applications in this domain while also supplying high-quality synthetic datasets for in-house system training.	翻訳日:2024-01-02 13:42:08 公開日:2023-12-29
# FerKD : 効率的な蒸留用ラベル適応 FerKD: Surgical Label Adaptation for Efficient Distillation ( http://arxiv.org/abs/2312.17473v1 ) ライセンス: Link先を確認	Zhiqiang Shen	(参考訳) 本稿では, 部分ソフトハードラベル適応と領域校正機構を組み合わせた新しい効率的な知識蒸留フレームワークであるFerKDを提案する。我々のアプローチは、RandomResizedCropのような標準的なデータ拡張が、入力を簡単な正、強正、強負のさまざまな条件に変換する傾向にあるという観察と直感に由来する。伝統的な蒸留フレームワークでは、これらの変換されたサンプルは、事前訓練された教師モデルに由来する予測確率によって等しく利用される。しかし、事前学習した教師の予測値に頼るだけでは、従来の研究では、これらのソフトラベル予測の信頼性を無視している。そこで本研究では,ソフト化したハードグラウンドルースラベルを用いて,信頼度の低い領域をコンテキストとする新しいスキームを提案する。私たちのアプローチは、ハードリージョンの採掘とキャリブレーションのプロセスです。本手法が収束速度と最終的な精度を劇的に向上できることを実証的に示す。さらに, 一貫した混合戦略は, ソフトラベルを生かして, ソフト監督の分布を安定化できることがわかった。その結果、同一画像内に類似領域を混合することにより、混合画像と対応するソフトラベルの変動を弱める安定化された自己混合増強法が導入された。 FerKDは直感的でよく設計された学習システムであり、以前のFKDソリューションではいくつかのヒューリスティックやハイパーパラメータを排除している。さらに重要なのは、ImageNet-1Kと下流タスクの大幅な改善だ。例えば、FerKDはResNet-50でImageNet-1Kで81.2%を達成し、FKDとFunMatchを著しく上回っている。より優れたトレーニング済み重量とより大きなアーキテクチャを活用して、微調整されたViT-G14は89.9%も達成しました。私たちのコードはhttps://github.com/szq0214/FKD/tree/main/FerKDで利用可能です。 We present FerKD, a novel efficient knowledge distillation framework that incorporates partial soft-hard label adaptation coupled with a region-calibration mechanism. Our approach stems from the observation and intuition that standard data augmentations, such as RandomResizedCrop, tend to transform inputs into diverse conditions: easy positives, hard positives, or hard negatives. In traditional distillation frameworks, these transformed samples are utilized equally through their predictive probabilities derived from pretrained teacher models. However, merely relying on prediction values from a pretrained teacher, a common practice in prior studies, neglects the reliability of these soft label predictions. To address this, we propose a new scheme that calibrates the less-confident regions to be the context using softened hard groundtruth labels. Our approach involves the processes of hard regions mining + calibration. We demonstrate empirically that this method can dramatically improve the convergence speed and final accuracy. Additionally, we find that a consistent mixing strategy can stabilize the distributions of soft supervision, taking advantage of the soft labels. As a result, we introduce a stabilized SelfMix augmentation that weakens the variation of the mixed images and corresponding soft labels through mixing similar regions within the same image. FerKD is an intuitive and well-designed learning system that eliminates several heuristics and hyperparameters in former FKD solution. More importantly, it achieves remarkable improvement on ImageNet-1K and downstream tasks. For instance, FerKD achieves 81.2% on ImageNet-1K with ResNet-50, outperforming FKD and FunMatch by remarkable margins. Leveraging better pre-trained weights and larger architectures, our finetuned ViT-G14 even achieves 89.9%. Our code is available at https://github.com/szq0214/FKD/tree/main/FerKD.	翻訳日:2024-01-02 13:41:43 公開日:2023-12-29
# soi相関光子対源のための低損失・高安定・再利用可能なエッジカプラの実証 Demonstration of a low loss, highly stable and re-useable edge coupler for SOI correlated photon pair sources ( http://arxiv.org/abs/2312.17464v1 ) ライセンス: Link先を確認	Jinyi Du, George F.R. Chen, Hongwei Gao, James A. Grieve, Dawn T.H. Tan, Alexander Ling	(参考訳) シリコンオン絶縁体(SOI)フォトニックチップから光ファイバーに光を結合する安定低損失法について報告する。オンチップテーパ導波路と切断された小型コア光ファイバを用いてこの技術を実現する。オンチップテーパはモノリシックであり、パターン化されたクラッドを必要としないため、チップ製造プロセスが簡単になる。光ファイバセグメントは、直径0.1dB以下のSMF−28繊維にスプライシングされたセンチメートルの小型コアファイバ(UHNA7)からなる。この設計で全体の結合損失は-0.64dBである。チップエッジとファイバ先端は、オンチップテーパやファイバを傷つけることなく結合することができる。表面間の摩擦はアライメントを維持し、接着剤を用いずに10日間連続測定中に+=0.1dbカップリング変動を観測する。この技術は、ファイバ内でラマンノイズを発生させる可能性を最小限に抑え、長いUHNAファイバやフレキシブルレンズファイバに基づく結合戦略と比較して優れた安定性を有する。また, エッジカプラを相関光子対源に印加し, 1.21万cps, シーディング効率21.3%の生偶然カウント率を観測した。低ポンプ電力系統において, オート相関関数g_H^2 (0) を0.0004以下に達成した。 We report a stable, low loss method for coupling light from silicon-on-insulator (SOI) photonic chips into optical fibers. The technique is realized using an on-chip tapered waveguide and a cleaved small core optical fiber. The on-chip taper is monolithic and does not require a patterned cladding, thus simplifying the chip fabrication process. The optical fiber segment is composed of a centimeter-long small core fiber (UHNA7) which is spliced to SMF-28 fiber with less than -0.1 dB loss. We observe an overall coupling loss of -0.64 dB with this design. The chip edge and fiber tip can be butt coupled without damaging the on-chip taper or fiber. Friction between the surfaces maintains alignment leading to an observation of += 0.1 dB coupling fluctuation during a ten-day continuous measurement without use of any adhesive. This technique minimizes the potential for generating Raman noise in the fiber, and has good stability compared to coupling strategies based on longer UHNA fibers or fragile lensed fibers. We also applied the edge coupler on a correlated photon pair source and observed a raw coincidence count rate of 1.21 million cps and heralding efficiency of 21.3%. We achieved an auto correlation function g_H^2 (0) as low as 0.0004 at the low pump power regime.	翻訳日:2024-01-02 13:41:12 公開日:2023-12-29
# 普通に:共変量シフトの回帰に分光的に適応する Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift ( http://arxiv.org/abs/2312.17463v1 ) ライセンス: Link先を確認	Benjamin Eyre, Elliot Creager, David Madras, Vardan Papyan, Richard Zemel	(参考訳) 利用可能なトレーニングデータとは異なる分布を堅牢に実行するディープニューラルネットワーク分類器の設計は、機械学習研究の活発な領域である。しかし、回帰-連続目標-残基のモデリングにおける類似問題に対する分布外一般化は、比較的未探索である。この問題に対処するため、我々は第一原理に戻り、通常最小方形(OLS)回帰に対する閉形式解が共変量シフトにどのように敏感であるかを分析する。我々は、ソースとターゲットデータの固有スペクトル分解の観点から、OLSモデルの分布外リスクを特徴付ける。次に、この知見を用いて、事前学習されたニューラル回帰モデルの最後の層の重みを適応させ、異なる分布から得られる入力データを改善する方法を提案する。本稿では,この軽量スペクトル適応手法により,合成および実世界のデータセットの分布外性能が向上することを示す。 Designing deep neural network classifiers that perform robustly on distributions differing from the available training data is an active area of machine learning research. However, out-of-distribution generalization for regression-the analogous problem for modeling continuous targets-remains relatively unexplored. To tackle this problem, we return to first principles and analyze how the closed-form solution for Ordinary Least Squares (OLS) regression is sensitive to covariate shift. We characterize the out-of-distribution risk of the OLS model in terms of the eigenspectrum decomposition of the source and target data. We then use this insight to propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution. We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.	翻訳日:2024-01-02 13:40:49 公開日:2023-12-29
# フェルミオンk-RDM推定のためのバランシング誤差予算 Balancing error budget for fermionic k-RDM estimation ( http://arxiv.org/abs/2312.17452v1 ) ライセンス: Link先を確認	Nayuta Takemori, Yusuke Teranishi, Wataru Mizukami, and Nobuyuki Yoshioka	(参考訳) 還元密度行列(RDM)は、局所的な物理量情報を含む物理特性を理解するために、量子多体系において重要である。本研究の目的は、量子コンピューティングにおける高次RDM推定の課題を引き起こす様々な誤差制約を最小化することである。我々は,高次RDM推定における統計的誤差と系統的誤差の最適バランスを,特に累積展開を用いてサンプルの複雑性を抑える際に同定する。さらに,1次元と2次元のFermi Hubbardモデルに対する量子部分空間法の数値実験を通して,励起状態計算におけるハードウェアノイズの抑制効果を示す。我々の研究は、コスト効率のよい実用的な量子コンピューティングへの道を歩み、実際、エラーの複数の側面によって制約されている。 The reduced density matrix (RDM) is crucial in quantum many-body systems for understanding physical properties, including all local physical quantity information. This study aims to minimize various error constraints that causes challenges in higher-order RDMs estimation in quantum computing. We identify the optimal balance between statistical and systematic errors in higher-order RDM estimation in particular when cumulant expansion is used to suppress the sample complexity. Furthermore, we show via numerical demonstration of quantum subspace methods for one and two dimensional Fermi Hubbard model that, biased yet efficient estimations better suppress hardware noise in excited state calculations. Our work paves a path towards cost-efficient practical quantum computing that in reality is constrained by multiple aspects of errors.	翻訳日:2024-01-02 13:40:35 公開日:2023-12-29
# fedled:垂直フェデレートトランスファー学習によるラベルフリー機器障害診断 FedLED: Label-Free Equipment Fault Diagnosis with Vertical Federated Transfer Learning ( http://arxiv.org/abs/2312.17451v1 ) ライセンス: Link先を確認	Jie Shen, Shusen Yang, Cong Zhao, Xuebin Ren, Peng Zhao, Yuqian Yang, Qing Han, Shuaijun Wu	(参考訳) フェデレート・トランスファー・ラーニング(FTL)に基づく知的機器故障診断は、学術と産業の両方からかなりの注目を集めている。サンプルが限られている実世界の産業エージェントは、生のデータプライバシを損なうことなく、障害診断モデルを構築することができる。しかし、既存のアプローチでは、実用エージェントの異なる作業条件によって引き起こされる厳密なサンプルの不均一性や、新しく配備された機器の極端な欠陥ラベルの不足に対処できない。これらの問題に対処するため、FedLEDは非教師なしFTL機器故障診断法であり、非ラベル対象領域の知識をさらに活用して効果的な教師なしモデル転送を行う。実機観測データを用いた広範な実験の結果、FedLEDは診断精度(最大4.13倍)と一般性の両方において、SOTAアプローチよりも明らかに優れていた。対象ドメイン知識によって体系的に強化されたラベルフリー機器故障診断のさらなる研究を期待する。 Intelligent equipment fault diagnosis based on Federated Transfer Learning (FTL) attracts considerable attention from both academia and industry. It allows real-world industrial agents with limited samples to construct a fault diagnosis model without jeopardizing their raw data privacy. Existing approaches, however, can neither address the intense sample heterogeneity caused by different working conditions of practical agents, nor the extreme fault label scarcity, even zero, of newly deployed equipment. To address these issues, we present FedLED, the first unsupervised vertical FTL equipment fault diagnosis method, where knowledge of the unlabeled target domain is further exploited for effective unsupervised model transfer. Results of extensive experiments using data of real equipment monitoring demonstrate that FedLED obviously outperforms SOTA approaches in terms of both diagnosis accuracy (up to 4.13 times) and generality. We expect our work to inspire further study on label-free equipment fault diagnosis systematically enhanced by target domain knowledge.	翻訳日:2024-01-02 13:40:21 公開日:2023-12-29
# 量子チャネルにおける情報フレギリティまたはロバスト性 Information Fragility or Robustness Under Quantum Channels ( http://arxiv.org/abs/2312.17450v1 ) ライセンス: Link先を確認	Nicholas Laracuente, Graeme Smith	(参考訳) 量子状態はノイズ下で自然に崩壊する。多くの初期の研究は、崩壊率の低い境界を定量化し、様々な文脈で指数関数的な崩壊を示した。雑音が十分弱い場合、雑音後情報量と初期情報量の比率に均一な上限はあるか? 古典を含むいくつかのシナリオでは、乗法的な逆境界を見つける。しかし、必ずしもそうとは限らない。 qubit dephasing や depolarizing のような単純なノイズであっても、相互情報は任意に弱い雑音の下で非有界因子によって低下することがある。適用例として、任意に良好な入力コピーを環境に送信する確率が高いにもかかわらず、非ゼロなプライベートキャパシティを持つチャネルのファミリーを見つける。 Quantum states naturally decay under noise. Many earlier works have quantified and demonstrated lower bounds on the decay rate, showing exponential decay in a wide variety of contexts. Here we study the converse question: are there uniform upper bounds on the ratio of post-noise to initial information quantities when noise is sufficiently weak? In several scenarios, including classical, we find multiplicative converse bounds. However, this is not always the case. Even for simple noise such as qubit dephasing or depolarizing, mutual information may fall by an unbounded factor under arbitrarily weak noise. As an application, we find families of channels with non-zero private capacity despite arbitrarily high probability of transmitting an arbitrarily good copy of the input to the environment.	翻訳日:2024-01-02 13:40:03 公開日:2023-12-29
# ヒューマンインテント推論によるトラッキング Tracking with Human-Intent Reasoning ( http://arxiv.org/abs/2312.17448v1 ) ライセンス: Link先を確認	Jiawen Zhu, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Huchuan Lu, Yifeng Geng, Xuansong Xie	(参考訳) 知覚モデリングの進歩は、物体追跡の性能を大幅に改善した。しかし、初期フレームのターゲットオブジェクトを特定する現在のメソッドは、どちらかである。 1)ボックスまたはマスクテンプレートを使用するか、または 2) 明示的な言語記述を提供する。これらの方法は面倒で、トラッカーが自己推論能力を持つことを許さない。そこで本研究では,トラッカがビデオフレーム内で自動的にトラッキングを行うための暗黙的なトラッキング命令を提供する,新たなトラッキングタスク -- 命令追跡を提案する。本研究では,物体追跡のための大規模視覚言語モデル(lvlm)から知識と推論能力の統合について検討する。具体的には,複雑な推論に基づく追跡が可能なtrackgptと呼ばれるトラッカーを提案する。 TrackGPTは、まずLVLMを使用して、追跡命令を理解し、どのターゲットを追跡するかの手がかりを埋め込みを参照させる。そして、知覚成分は、埋め込みに基づいて追跡結果を生成する。 TrackGPTの性能を評価するため,インストラクション・チューニングと評価のためのインストラクション・ビデオ・ペアが1万を超えるインストラクション・トラッキング・ベンチマークであるInsTrackを構築した。実験によれば、 trackgpt はビデオオブジェクトのセグメンテーションベンチマークを参照して性能が向上し、例えば 66.5 $\mathcal{j}\&\mathcal{f}$ on refer-davis という新しいパフォーマンスが得られる。また、新しい評価プロトコル下での命令追跡の優れた性能を示す。コードとモデルは \href{https://github.com/jiawen-zhu/TrackGPT}{https://github.com/jiawen-zhu/TrackGPT} で公開されている。 Advances in perception modeling have significantly improved the performance of object tracking. However, the current methods for specifying the target object in the initial frame are either by 1) using a box or mask template, or by 2) providing an explicit language description. These manners are cumbersome and do not allow the tracker to have self-reasoning ability. Therefore, this work proposes a new tracking task -- Instruction Tracking, which involves providing implicit tracking instructions that require the trackers to perform tracking automatically in video frames. To achieve this, we investigate the integration of knowledge and reasoning capabilities from a Large Vision-Language Model (LVLM) for object tracking. Specifically, we propose a tracker called TrackGPT, which is capable of performing complex reasoning-based tracking. TrackGPT first uses LVLM to understand tracking instructions and condense the cues of what target to track into referring embeddings. The perception component then generates the tracking results based on the embeddings. To evaluate the performance of TrackGPT, we construct an instruction tracking benchmark called InsTrack, which contains over one thousand instruction-video pairs for instruction tuning and evaluation. Experiments show that TrackGPT achieves competitive performance on referring video object segmentation benchmarks, such as getting a new state-of the-art performance of 66.5 $\mathcal{J}\&\mathcal{F}$ on Refer-DAVIS. It also demonstrates a superior performance of instruction tracking under new evaluation protocols. The code and models are available at \href{https://github.com/jiawen-zhu/TrackGPT}{https://github.com/jiawen-zhu/TrackGPT}.	翻訳日:2024-01-02 13:39:51 公開日:2023-12-29
# Darwin3:新しいISAとオンチップ学習を備えた大規模ニューロモルフィックチップ Darwin3: A large-scale neuromorphic chip with a Novel ISA and On-Chip Learning ( http://arxiv.org/abs/2312.17582v1 ) ライセンス: Link先を確認	De Ma, Xiaofei Jin, Shichun Sun, Yitao Li, Xundong Wu, Youneng Hu, Fangchao Yang, Huajin Tang, Xiaolei Zhu, Peng Lin and Gang Pan	(参考訳) スパイキングニューラルネットワーク(SNN)は,その生物学的妥当性と計算効率向上の可能性に注目が集まっている。 SNNの高時空間ダイナミクスと一致させるためには、ハードウェアベースのニューロンとシナプス回路でSNNを直接実行するのがニューロモルフィックチップである。本稿では、10個の命令と数個の拡張命令からなる新しい命令セットアーキテクチャ(isa)を備えた,darwin3と呼ばれる大規模ニューロモルフィックチップを提案する。柔軟なニューロンモデルプログラミングと局所学習ルール設計をサポートする。 darwin3チップアーキテクチャは、革新的なルーティングアルゴリズムを備えたコンピューティングノードのメッシュで設計されている。圧縮機構を用いてシナプス接続を表現し,メモリ使用量を大幅に削減した。 darwin3チップは最大2億3500万のニューロンをサポートしており、ニューロン規模では最大である。実験の結果,darwin3ではコード密度が28.3倍まで向上し,ニューロンコアのファンインとファンアウトは接続圧縮により4096倍,3072倍に向上した。われわれのDarwin3チップは、畳み込みスパイクニューラルネットワーク(CSNN)をチップにマッピングする際に、メモリを6.8Xから200.8Xまで節約し、他のニューロモルフィックチップと比較して、最先端の性能とレイテンシを示す。 Spiking Neural Networks (SNNs) are gaining increasing attention for their biological plausibility and potential for improved computational efficiency. To match the high spatial-temporal dynamics in SNNs, neuromorphic chips are highly desired to execute SNNs in hardware-based neuron and synapse circuits directly. This paper presents a large-scale neuromorphic chip named Darwin3 with a novel instruction set architecture(ISA), which comprises 10 primary instructions and a few extended instructions. It supports flexible neuron model programming and local learning rule designs. The Darwin3 chip architecture is designed in a mesh of computing nodes with an innovative routing algorithm. We used a compression mechanism to represent synaptic connections, significantly reducing memory usage. The Darwin3 chip supports up to 2.35 million neurons, making it the largest of its kind in neuron scale. The experimental results showed that code density was improved up to 28.3x in Darwin3, and neuron core fan-in and fan-out were improved up to 4096x and 3072x by connection compression compared to the physical memory depth. Our Darwin3 chip also provided memory saving between 6.8X and 200.8X when mapping convolutional spiking neural networks (CSNN) onto the chip, demonstrating state-of-the-art performance in accuracy and latency compared to other neuromorphic chips.	翻訳日:2024-01-02 12:51:09 公開日:2023-12-29
# 時系列予測のための多目的進化アンサンブル学習を用いたLSTMネットワークにおける組込み特徴選択 Embedded feature selection in LSTM networks with multi-objective evolutionary ensemble learning for time series forecasting ( http://arxiv.org/abs/2312.17517v1 ) ライセンス: Link先を確認	Raquel Espinosa, Fernando Jim\'enez, Jos\'e Palma	(参考訳) 時系列予測は様々な分野において重要な役割を担い、複雑な時間パターンを効果的に扱える堅牢なモデルの開発を必要とする。本稿では,多目的進化アルゴリズムを用いて,長期短期記憶ネットワークに埋め込まれた特徴選択手法を提案する。本手法は,特定のデータ分割におけるルート平均二乗誤差をターゲットとした進化的アルゴリズムの各目的関数を用いて,LSTMの重みと偏りを分割的に最適化する。アルゴリズムによって同定された非支配予測モデルの集合を用いて、積み重ねに基づくアンサンブル学習によりメタモデルを構築する。さらに,提案手法は,非支配的予測モデル群における属性選択頻度が属性の重要性を反映しているため,属性重要度決定への道筋を提供する。この属性の重要性洞察は予測プロセスに解釈可能な次元を追加します。イタリアとスペイン南東部の大気質時系列データを用いた実験により,従来のLSTMの一般化能力を大幅に向上し,オーバーフィッティングを効果的に低減することを示した。 CancelOut法とEAR-FS法の比較分析により,本手法の優れた性能が示された。 Time series forecasting plays a crucial role in diverse fields, necessitating the development of robust models that can effectively handle complex temporal patterns. In this article, we present a novel feature selection method embedded in Long Short-Term Memory networks, leveraging a multi-objective evolutionary algorithm. Our approach optimizes the weights and biases of the LSTM in a partitioned manner, with each objective function of the evolutionary algorithm targeting the root mean square error in a specific data partition. The set of non-dominated forecast models identified by the algorithm is then utilized to construct a meta-model through stacking-based ensemble learning. Furthermore, our proposed method provides an avenue for attribute importance determination, as the frequency of selection for each attribute in the set of non-dominated forecasting models reflects their significance. This attribute importance insight adds an interpretable dimension to the forecasting process. Experimental evaluations on air quality time series data from Italy and southeast Spain demonstrate that our method substantially improves the generalization ability of conventional LSTMs, effectively reducing overfitting. Comparative analyses against state-of-the-art CancelOut and EAR-FS methods highlight the superior performance of our approach.	翻訳日:2024-01-02 12:50:47 公開日:2023-12-29
# コラボレーティブ・オン・ザ・フライ: avalon gameにおけるアドホックなチームワークのための言語エージェントの探索 Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game ( http://arxiv.org/abs/2312.17515v1 ) ライセンス: Link先を確認	Zijing Shi, Meng Fang, Shunfeng Zheng, Shilong Deng, Ling Chen, Yali Du	(参考訳) 大規模言語モデル(llm)とのマルチエージェントコラボレーションは、基本的なタスクの熟練度を示しているが、より複雑なシナリオでの効率性は未検討である。ゲーム環境では、これらのエージェントはコーディネーションプロトコルが確立されていない状況に直面し、限られたデータからチームメイトに関するインテリジェントな推論を行う必要がある。この問題は、エージェントがさまざまなチームメイトと協力して共通の目標を達成する可能性のある、アドホックなチームワークの領域を動機付けている。本研究は,エージェントが自然言語によって駆動される環境で動作するアドホックなチームワーク問題に焦点を当てている。チームコラボレーションにおけるllmエージェントの可能性を明らかにし,コミュニケーションの幻覚に関する問題点を浮き彫りにした。この問題を解決するために、我々は、llmに拡張メモリとコード駆動推論を装備する汎用エージェントであるcodeactを開発した。 Multi-agent collaboration with Large Language Models (LLMs) demonstrates proficiency in basic tasks, yet its efficiency in more complex scenarios remains unexplored. In gaming environments, these agents often face situations without established coordination protocols, requiring them to make intelligent inferences about teammates from limited data. This problem motivates the area of ad hoc teamwork, in which an agent may potentially cooperate with a variety of teammates to achieve a shared goal. Our study focuses on the ad hoc teamwork problem where the agent operates in an environment driven by natural language. Our findings reveal the potential of LLM agents in team collaboration, highlighting issues related to hallucinations in communication. To address this issue, we develop CodeAct, a general agent that equips LLM with enhanced memory and code-driven reasoning, enabling the repurposing of partial information for rapid adaptation to new teammates.	翻訳日:2024-01-02 12:50:26 公開日:2023-12-29
# クエリ計画ガイダンスによるデータベースエンジンのテスト Testing Database Engines via Query Plan Guidance ( http://arxiv.org/abs/2312.17510v1 ) ライセンス: Link先を確認	Jinsheng Ba, Manuel Rigger	(参考訳) データベースシステムはデータの保存とクエリに広く使われている。テストオラクルは、そのようなシステム、すなわちデータベースシステムが誤った結果を計算する原因となるバグを見つけるために提案されている。完全自動化されたテストアプローチを実現するために、テストオラクルをテストケース生成技術と組み合わせ、テストケースは、データベースの状態とテストオラクルを適用可能なクエリを参照する。本研究では,自動テストの"興味深い"テストケースへの誘導を目的としたクエリプランガイダンス(QPG)の概念を提案する。 SQLや他のクエリ言語は宣言的です。したがって、クエリを実行するために、データベースシステムは、ソース言語内のすべての演算子を、実行可能な、いわゆる物理演算子の1つに翻訳する。私たちの直感は、さまざまなクエリ計画の探索に向けてテストを行うことによって、より興味深い振る舞いも探求する、ということです。そこで本研究では,データベースの状態に有望な変異を徐々に適用し,dbmsがその後のクエリに対して多様なクエリプランを作成する変異手法を提案する。我々はこの手法を,SQLite,TiDB,CockroachDBの3つの成熟した,広く使用されている,広範囲にテストされたデータベースシステムに適用した。提案手法は, 単純乱数生成法より4.85-408.48倍, コードカバレッジガイダンス法より7.46倍, クエリプランを設計する。商用のクエリプランを含むほとんどのデータベースシステムでは、qpgは一般的に適用可能なブラックボックスアプローチであると考えており、コアアイデアは他のコンテキスト(例えばテストスイートの品質を測定するために)にも適用できると考えています。 Database systems are widely used to store and query data. Test oracles have been proposed to find logic bugs in such systems, that is, bugs that cause the database system to compute an incorrect result. To realize a fully automated testing approach, such test oracles are paired with a test case generation technique; a test case refers to a database state and a query on which the test oracle can be applied. In this work, we propose the concept of Query Plan Guidance (QPG) for guiding automated testing towards "interesting" test cases. SQL and other query languages are declarative. Thus, to execute a query, the database system translates every operator in the source language to one of potentially many so-called physical operators that can be executed; the tree of physical operators is referred to as the query plan. Our intuition is that by steering testing towards exploring diverse query plans, we also explore more interesting behaviors-some of which are potentially incorrect. To this end, we propose a mutation technique that gradually applies promising mutations to the database state, causing the DBMS to create diverse query plans for subsequent queries. We applied our method to three mature, widely-used, and extensively-tested database systems-SQLite, TiDB, and CockroachDB-and found 53 unique, previously unknown bugs. Our method exercises 4.85-408.48X more unique query plans than a naive random generation method and 7.46X more than a code coverage guidance method. Since most database systems-including commercial ones-expose query plans to the user, we consider QPG a generally applicable, black-box approach and believe that the core idea could also be applied in other contexts (e.g., to measure the quality of a test suite).	翻訳日:2024-01-02 12:50:07 公開日:2023-12-29
# インスタンスレベルの感情音声変換のための注意型対話型ディスタングルネットワーク Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion ( http://arxiv.org/abs/2312.17508v1 ) ライセンス: Link先を確認	Yun Chen, Lingxiao Yang, Qi Chen, Jian-Huang Lai, Xiaohua Xie	(参考訳) 感情音声変換は、非感情成分を維持しながら、所定の感情に従って音声を操作することを目的としている。既存のアプローチでは、きめ細かい感情的な特性を表現できない。本稿では,音声変換にインスタンスワイドな感情知識を活用する,意図に基づく対話型ディスタングネットワーク(AINN)を提案する。ステージiでは,言語間コントラスト学習(inter-speech contrastive learning)を利用して,きめ細かな感情をモデル化し,感情とコンテンツを分離する。ステージIIでは,多視点整合性機構による変換の正規化を提案する。この技術は、きめ細かい感情を伝達し、音声コンテンツを維持するのに役立つ。大規模な実験の結果、AINNは客観的指標と主観的指標の両方で最先端の成績を示している。 Emotional Voice Conversion aims to manipulate a speech according to a given emotion while preserving non-emotion components. Existing approaches cannot well express fine-grained emotional attributes. In this paper, we propose an Attention-based Interactive diseNtangling Network (AINN) that leverages instance-wise emotional knowledge for voice conversion. We introduce a two-stage pipeline to effectively train our network: Stage I utilizes inter-speech contrastive learning to model fine-grained emotion and intra-speech disentanglement learning to better separate emotion and content. In Stage II, we propose to regularize the conversion with a multi-view consistency mechanism. This technique helps us transfer fine-grained emotion and maintain speech content. Extensive experiments show that our AINN outperforms state-of-the-arts in both objective and subjective metrics.	翻訳日:2024-01-02 12:49:37 公開日:2023-12-29
# HIV-1に対する抗レトロウイルス治療成績予測のためのアウトオブディストリビューションロバスト性グラフニューラルネットワークモデル A graph neural network-based model with Out-of-Distribution Robustness for enhancing Antiretroviral Therapy Outcome Prediction for HIV-1 ( http://arxiv.org/abs/2312.17506v1 ) ライセンス: Link先を確認	Giulia Di Teodoro, Federico Siciliano, Valerio Guarrasi, Anne-Mieke Vandamme, Valeria Ghisetti, Anders S\"onnerborg, Maurizio Zazzi, Fabrizio Silvestri, Laura Palagi	(参考訳) HIV-1に対する抗レトロウイルス療法の結果を予測することは、特に有効データが限られている薬物を含む治療体制において、非常に難しい臨床課題である。この不足は、新しい薬物が市場に導入されたか、臨床での使用が制限されたために生じる可能性がある。この問題に対処するために,完全連結(FC)ニューラルネットワークとグラフニューラルネットワーク(GNN)の機能を組み合わせた,新しいジョイントフュージョンモデルを導入する。 FCネットワークは、最新の遺伝子型抵抗試験で同定されたウイルス変異からなる特徴ベクターと、治療に用いられる薬物を用いた表型データを使用する。逆に、gnnは、ウイルスの遺伝配列に基づいて生体内治療の有効性を推定するためのベンチマーク基準となるスタンフォードの薬剤耐性変異テーブルから得られた知識を活用して、有益なグラフを構築する。テストセットにおけるアウト・オブ・ディストリビューション・ドラッグに対するこれらのモデルの堅牢性を評価するとともに,そのようなシナリオを扱う上でのGNNの役割に着目した。包括的分析により,提案モデルがFCモデル,特にアウト・オブ・ディストリビューション・ドラッグにおいて一貫した性能を示した。これらの結果は、スタンフォードのスコアをモデルに統合し、その一般化性と堅牢性を高めるという利点を強調すると同時に、データ可用性の制限された現実世界のアプリケーションでもその有用性を拡張する。本研究は,抗レトロウイルス療法の予後予測と,よりインフォームドな臨床判断に寄与するアプローチの可能性を強調した。 Predicting the outcome of antiretroviral therapies for HIV-1 is a pressing clinical challenge, especially when the treatment regimen includes drugs for which limited effectiveness data is available. This scarcity of data can arise either due to the introduction of a new drug to the market or due to limited use in clinical settings. To tackle this issue, we introduce a novel joint fusion model, which combines features from a Fully Connected (FC) Neural Network and a Graph Neural Network (GNN). The FC network employs tabular data with a feature vector made up of viral mutations identified in the most recent genotypic resistance test, along with the drugs used in therapy. Conversely, the GNN leverages knowledge derived from Stanford drug-resistance mutation tables, which serve as benchmark references for deducing in-vivo treatment efficacy based on the viral genetic sequence, to build informative graphs. We evaluated these models' robustness against Out-of-Distribution drugs in the test set, with a specific focus on the GNN's role in handling such scenarios. Our comprehensive analysis demonstrates that the proposed model consistently outperforms the FC model, especially when considering Out-of-Distribution drugs. These results underscore the advantage of integrating Stanford scores in the model, thereby enhancing its generalizability and robustness, but also extending its utility in real-world applications with limited data availability. This research highlights the potential of our approach to inform antiretroviral therapy outcome prediction and contribute to more informed clinical decisions.	翻訳日:2024-01-02 12:49:23 公開日:2023-12-29
# カモフラージュインスタンスセグメンテーションへのオープンボキャブラリー拡散の活用 Leveraging Open-Vocabulary Diffusion to Camouflaged Instance Segmentation ( http://arxiv.org/abs/2312.17505v1 ) ライセンス: Link先を確認	Tuan-Anh Vu, Duc Thanh Nguyen, Qing Guo, Binh-Son Hua, Nhat Minh Chung, Ivor W. Tsang, Sai-Kit Yeung	(参考訳) テキスト・ツー・イメージ拡散技術は、テキスト記述から高品質な画像を生成する素晴らしい能力を示している。これは、視覚領域とテキスト領域の間に強い相関があることを示している。さらに、CLIPのようなテキストイメージ識別モデルは、オープンな概念から利用できるリッチで多様な情報のおかげで、テキストプロンプトからのイメージラベリングが優れている。本稿では,これらの技術的進歩を活用し,コンピュータビジョンにおける課題を解決している。具体的には,オープンボキャブラリによって,迷彩物体表現の多元的テキスト的特徴を学習する権限を付与された,最先端の拡散モデルに基づく手法を提案する。このようなクロスドメイン表現は、視覚的手がかりが微妙であるカモフラージュされたオブジェクトのセグメンテーションにおいて、特に訓練で見えない新しいオブジェクトのセグメンテーションにおいて望ましい。また、ドメイン間機能を効果的に融合し、各フォアグラウンドオブジェクトに対して関連する機能を関与させる技術支援コンポーネントも開発しています。提案手法を検証し,カモフラージュされたインスタンスセグメンテーションと一般のオープン語彙インスタンスセグメンテーションのベンチマークデータセット上で既存手法と比較する。提案手法の既存手法に対する進歩を実験的に検証した。将来の研究をサポートするために、コードと事前訓練されたモデルを公開します。 Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions. This indicates that there exists a strong correlation between the visual and textual domains. In addition, text-image discriminative models such as CLIP excel in image labelling from text prompts, thanks to the rich and diverse information available from open concepts. In this paper, we leverage these technical advances to solve a challenging problem in computer vision: camouflaged instance segmentation. Specifically, we propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations. Such cross-domain representations are desirable in segmenting camouflaged objects where visual cues are subtle to distinguish the objects from the background, especially in segmenting novel objects which are not seen in training. We also develop technically supportive components to effectively fuse cross-domain features and engage relevant features towards respective foreground objects. We validate our method and compare it with existing ones on several benchmark datasets of camouflaged instance segmentation and generic open-vocabulary instance segmentation. Experimental results confirm the advances of our method over existing ones. We will publish our code and pre-trained models to support future research.	翻訳日:2024-01-02 12:48:53 公開日:2023-12-29
# hibid:階層的オフライン深層強化学習による予算配分を伴うクロスチャネル制約入札システム HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning ( http://arxiv.org/abs/2312.17503v1 ) ライセンス: Link先を確認	Hao Wang, Bo Tang, Chi Harold Liu, Shangqin Mao, Jiahong Zhou, Zipeng Dai, Yaqi Sun, Qianlong Xie, Xingxing Wang, Dong Wang	(参考訳) オンラインディスプレイ広告プラットフォームは、毎日何十億もの広告要求に対してリアルタイム入札(RTB)を提供することで、多くの広告主にサービスを提供する。入札戦略は、広告リクエストを複数のチャネルにまたがって処理し、設定された財務上の制約、すなわち総予算とクリック単価(cpc)などに基づくクリック数を最大化する。単一チャネル入札を主軸とする既存事業とは違って,予算配分を伴うクロスチャネル制約入札を明示的に検討する。具体的には,非競争予算配分のための補助損失を備えた高レベルプランナーと,予算割り当てに応じて適応入札戦略を行うデータ拡張型低レベル実行ツールからなる,階層型オフライン深層強化学習(drl)フレームワーク「hibid」を提案する。さらに、チャネル間CPC制約を満たすために、CPC誘導動作選択機構を導入する。大規模ログデータとオンラインA/Bテストの両方に関する広範な実験を通じて、HiBidはクリック数、CPC満足率、投資率(ROI)において6つのベースラインを上回っていることを確認した。また、HiBid on Meituanの広告プラットホームも展開しており、毎日何万もの広告主が利用している。 Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., total budget and cost-per-click (CPC), etc. Different from existing works mainly focusing on single channel bidding, we explicitly consider cross-channel constrained bidding with budget allocation. Specifically, we propose a hierarchical offline deep reinforcement learning (DRL) framework called ``HiBid'', consisted of a high-level planner equipped with auxiliary loss for non-competitive budget allocation, and a data augmentation enhanced low-level executor for adaptive bidding strategy in response to allocated budgets. Additionally, a CPC-guided action selection mechanism is introduced to satisfy the cross-channel CPC constraint. Through extensive experiments on both the large-scale log data and online A/B testing, we confirm that HiBid outperforms six baselines in terms of the number of clicks, CPC satisfactory ratio, and return-on-investment (ROI). We also deploy HiBid on Meituan advertising platform to already service tens of thousands of advertisers every day.	翻訳日:2024-01-02 12:48:29 公開日:2023-12-29
# 正三部絡みの忠実な幾何学的尺度 Faithful geometric measures for genuine tripartite entanglement ( http://arxiv.org/abs/2312.17496v1 ) ライセンス: Link先を確認	Xiaozhen Ge, Yong Wang, Yu Xiang, Guofeng Zhang, Lijun Liu, Li Li, and Shuming Cheng	(参考訳) 離散的,連続的,ハイブリッドな量子系の真の三部構造交絡に対する忠実な幾何学図式を示す。まず、三角形関係 $\mathcal{E}^\alpha_{i\|jk}\leq \mathcal{E}^\alpha_{j\|ik}+\mathcal{E}^\alpha_{k\|ij}$ は、すべての部分加法的二部分エンタングルメント測度 $\mathcal{E}$ 、すべてのパーティー $i, j, k$ 、すべての$\alpha \in [0, 1]$ と全ての純三部分状態に対して成り立つ。幾何学的解釈では、$\mathcal{E}^\alpha$ で測られる二分交絡は三角形の側面に対応し、$\alpha \in (0, 1)$ の面積が 0 でないのは、基底状態が真に絡み合っている場合に限りである。すると、0<\alpha\leq 1/2$ の非可視三角形領域を厳密に証明する。これらの測度に対する有用な下界と上界が得られ、その結果の一般化も示される。最後に、半加法および非加法測度の集合が与えられたとき、いくつかの状態は常に任意の$\alpha>1$の三角関係に違反し、三角領域は任意の$\alpha>1/2$の尺度ではないことが証明される。したがって, 離散的および連続的多成分絡み合いの研究において, 有意な進展が期待できる。 We present a faithful geometric picture for genuine tripartite entanglement of discrete, continuous, and hybrid quantum systems. We first find that the triangle relation $\mathcal{E}^\alpha_{i\|jk}\leq \mathcal{E}^\alpha_{j\|ik}+\mathcal{E}^\alpha_{k\|ij}$ holds for all subadditive bipartite entanglement measure $\mathcal{E}$, all permutations under parties $i, j, k$, all $\alpha \in [0, 1]$, and all pure tripartite states. It provides a geometric interpretation that bipartition entanglement, measured by $\mathcal{E}^\alpha$, corresponds to the side of a triangle, of which the area with $\alpha \in (0, 1)$ is nonzero if and only if the underlying state is genuinely entangled. Then, we rigorously prove the non-obtuse triangle area with $0<\alpha\leq 1/2$ is a measure for genuine tripartite entanglement. Useful lower and upper bounds for these measures are obtained, and generalizations of our results are also presented. Finally, it is significantly strengthened for qubits that, given a set of subadditive and non-additive measures, some state is always found to violate the triangle relation for any $\alpha>1$, and the triangle area is not a measure for any $\alpha>1/2$. Hence, our results are expected to aid significant progress in studying both discrete and continuous multipartite entanglement.	翻訳日:2024-01-02 12:48:05 公開日:2023-12-29
# 薬物特性予測のためのマルチモーダル融合深層学習における化学言語と分子グラフの統合 Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction ( http://arxiv.org/abs/2312.17495v1 ) ライセンス: Link先を確認	Xiaohua Lu, Liangxu Xie, Lei Xu, Rongzhi Mao, Shan Chang, Xiaojun Xu	(参考訳) 正確な分子特性の予測は難しいが、薬物発見には不可欠である。近年,分子特性予測に多くのモノモーダル深層学習法が応用されている。しかし、モノモダル学習の固有の制限は、分子表現の1つのモダリティのみに依存することであり、薬物分子の包括的理解を制限し、データノイズに対する反発を阻害する。この制限を克服するため,分子表現の異なるマルチモーダル深層学習モデルを構築した。薬物分子を3つの分子表現、SMILES符号化ベクター、ECFP指紋、分子グラフに変換する。モーダル情報処理には、トランスフォーマーエンコーダ、双方向ゲートリカレントユニット(BiGRU)、グラフ畳み込みネットワーク(GCN)をそれぞれ利用し、相補的および自然的に発生する生体情報を取得するモデル能力を向上することができる。 6分子データを用いたトリプルモーダルモデルの評価を行った。バイモーダル学習モデルと異なり、特定の特徴を捉え、各モーダル情報の寄与をよりよく活用するための5つの融合手法を採用する。モノモーダルモデルと比較すると,マルチモーダルフューズドディープラーニング(MMFDL)モデルは単一モデルよりも精度,信頼性,耐雑音性に優れている。さらに,PDBbindの精製集合におけるタンパク質-リガンド複合体分子の結合定数の予測における一般化能力を示す。マルチモーダルモデルの利点は、適切なモデルと適切な融合法を用いて多様なデータソースを処理する能力にある。 Accurately predicting molecular properties is a challenging but essential task in drug discovery. Recently, many mono-modal deep learning methods have been successfully applied to molecular property prediction. However, the inherent limitation of mono-modal learning arises from relying solely on one modality of molecular representation, which restricts a comprehensive understanding of drug molecules and hampers their resilience against data noise. To overcome the limitations, we construct multimodal deep learning models to cover different molecular representations. We convert drug molecules into three molecular representations, SMILES-encoded vectors, ECFP fingerprints, and molecular graphs. To process the modal information, Transformer-Encoder, bi-directional gated recurrent units (BiGRU), and graph convolutional network (GCN) are utilized for feature learning respectively, which can enhance the model capability to acquire complementary and naturally occurring bioinformatics information. We evaluated our triple-modal model on six molecule datasets. Different from bi-modal learning models, we adopt five fusion methods to capture the specific features and leverage the contribution of each modal information better. Compared with mono-modal models, our multimodal fused deep learning (MMFDL) models outperform single models in accuracy, reliability, and resistance capability against noise. Moreover, we demonstrate its generalization ability in the prediction of binding constants for protein-ligand complex molecules in the refined set of PDBbind. The advantage of the multimodal model lies in its ability to process diverse sources of data using proper models and suitable fusion methods, which would enhance the noise resistance of the model while obtaining data diversity.	翻訳日:2024-01-02 12:47:25 公開日:2023-12-29
# qgface: 混合品質顔認識のための品質誘導共同学習 QGFace: Quality-Guided Joint Training For Mixed-Quality Face Recognition ( http://arxiv.org/abs/2312.17494v1 ) ライセンス: Link先を確認	Youzhe Song and Feng Wang	(参考訳) 画像中の顔作物の品質は、カメラの解像度、距離、照明条件などの多くの要因によって決定される。これにより、現実的なアプリケーションでは、異なる品質の顔画像の識別が困難な問題となる。しかし、既存のほとんどのアプローチは、高品質(HQ)または低品質(LQ)の画像に特化して設計されており、その性能は混合画質で劣化する。また, 事前訓練した特徴抽出器や他の補助構造を用いて, 訓練と評価を支援する手法も多数存在する。本稿では,本部画像とLQ画像の両方を同時に理解するための鍵は,その特性に応じて異なる学習手法を適用することである,と指摘する。本論文では,単一エンコーダで異なる品質の画像を同時に学習できる,混合品質顔認識のための新しい品質誘導共同学習手法を提案する。品質分割に基づいて、hqデータ学習に分類に基づく手法を用いる。一方,id情報を持たないlq画像では,自己教師付き画像コントラスト学習によって学習する。モデル更新を効果的にキャッチアップし、協調訓練シナリオにおけるコントラスト学習の識別性を向上させるために、真のエンコーダの機能を備えたコントラストペアを構成するプロキシ更新リアルタイムキューを提案する。低品質データセットSCfaceとTinyface、混合品質データセットIJB-B、および5つの高品質データセットの実験は、異なる品質の顔画像を認識するための提案手法の有効性を示した。 The quality of a face crop in an image is decided by many factors such as camera resolution, distance, and illumination condition. This makes the discrimination of face images with different qualities a challenging problem in realistic applications. However, most existing approaches are designed specifically for high-quality (HQ) or low-quality (LQ) images, and the performances would degrade for the mixed-quality images. Besides, many methods ask for pre-trained feature extractors or other auxiliary structures to support the training and the evaluation. In this paper, we point out that the key to better understand both the HQ and the LQ images simultaneously is to apply different learning methods according to their qualities. We propose a novel quality-guided joint training approach for mixed-quality face recognition, which could simultaneously learn the images of different qualities with a single encoder. Based on quality partition, classification-based method is employed for HQ data learning. Meanwhile, for the LQ images which lack identity information, we learn them with self-supervised image-image contrastive learning. To effectively catch up the model update and improve the discriminability of contrastive learning in our joint training scenario, we further propose a proxy-updated real-time queue to compose the contrastive pairs with features from the genuine encoder. Experiments on the low-quality datasets SCface and Tinyface, the mixed-quality dataset IJB-B, and five high-quality datasets demonstrate the effectiveness of our proposed approach in recognizing face images of different qualities.	翻訳日:2024-01-02 12:46:58 公開日:2023-12-29
# フェデレート学習を用いた大規模言語モデルの微分プライベート低ランク適応 Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning ( http://arxiv.org/abs/2312.17493v1 ) ライセンス: Link先を確認	Xiao-Yang Liu, Rongyi Zhu, Daochen Zha, Jiechao Gao, Shan Zhong, Meikang Qiu	(参考訳) 大型言語モデル(LLM)の関心の高まりと応用は、金融や医学などの特定の応用に適合するように、これらのモデルを微調整するきっかけとなった。しかし、データプライバシに関する懸念は、特に複数の利害関係者が機密データを使用してLLMを協調的に強化しようとするときに現れている。このシナリオでは、連合学習は自然な選択となり、生データを中央サーバに公開することなく、分散微調整を可能にする。そこで本研究では,LLMにおけるデータプライバシを,実践的なフェデレーション学習アプローチを通じて微調整し,複数のパーティからのセキュアなコントリビューションによるLLMの強化を実現する方法について検討する。しかし、課題が生じる。 1)生データ露出を避けつつも、モデル出力からセンシティブな情報を推測するリスクがある。 2) LLM の連合学習は, 通信オーバーヘッドが顕著である。これらの課題に対処するために,本論文では,LLMに適した新しいフェデレーション学習アルゴリズムDP-LoRAを紹介する。 dp-loraは、重み付け更新にノイズを追加し、個々のデータプライバシを維持しながら、協調的なモデルトレーニングを促進するガウス機構を採用することで、データのプライバシを保護する。さらに、DP-LoRAは、低ランク適応による通信効率を最適化し、分散トレーニング中の更新重みの伝達を最小化する。様々なLCMを用いた医療、財務、一般データセットにわたる実験結果から、DP-LoRAは通信オーバーヘッドを最小限にしつつ、厳格なプライバシー制約を効果的に保証することを示した。 The surge in interest and application of large language models (LLMs) has sparked a drive to fine-tune these models to suit specific applications, such as finance and medical science. However, concerns regarding data privacy have emerged, especially when multiple stakeholders aim to collaboratively enhance LLMs using sensitive data. In this scenario, federated learning becomes a natural choice, allowing decentralized fine-tuning without exposing raw data to central servers. Motivated by this, we investigate how data privacy can be ensured in LLM fine-tuning through practical federated learning approaches, enabling secure contributions from multiple parties to enhance LLMs. Yet, challenges arise: 1) despite avoiding raw data exposure, there is a risk of inferring sensitive information from model outputs, and 2) federated learning for LLMs incurs notable communication overhead. To address these challenges, this article introduces DP-LoRA, a novel federated learning algorithm tailored for LLMs. DP-LoRA preserves data privacy by employing a Gaussian mechanism that adds noise in weight updates, maintaining individual data privacy while facilitating collaborative model training. Moreover, DP-LoRA optimizes communication efficiency via low-rank adaptation, minimizing the transmission of updated weights during distributed training. The experimental results across medical, financial, and general datasets using various LLMs demonstrate that DP-LoRA effectively ensures strict privacy constraints while minimizing communication overhead.	翻訳日:2024-01-02 12:46:33 公開日:2023-12-29
# HEAP:Contrastive Groupingによる教師なしオブジェクト発見とローカライゼーション HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping ( http://arxiv.org/abs/2312.17492v1 ) ライセンス: Link先を確認	Xin Zhang, Jinheng Xie, Yuan Yuan, Michael Bi Mi, Robby T. Tan	(参考訳) 教師なしオブジェクト発見とローカライゼーション(unsupervised object discovery and localization)は、監視なしで画像内のオブジェクトを検出または分割することを目的としている。近年の取り組みは、自己監督型トランスフォーマー機能を利用して、有能な前景物体を識別する顕著な可能性を実証している。しかし、そのスコープはイメージ内のパッチレベルの機能のみの上に構築され、領域/イメージレベルとクロスイメージの関係をより広いスケールで無視する。さらに、これらの方法は複数のインスタンスと様々なセマンティクスを区別できない。これらの問題に対処するため,Herarchical mErging framework via contrAstive grouPing (HEAP)を提案する。具体的には,自己教師付き特徴間の相関に基づいて画像内パッチを意味的にコヒーレントな領域に適応的にグループ化するクロスアテンション機構を備えた新しい軽量ヘッドを提案する。さらに,各領域間の識別性を確保するため,画像にまたがる類似領域を絞り込むために,領域レベルのコントラストクラスタリング損失を導入する。また、フォアグラウンドと背景表現を分離するために画像レベルのコントラスト損失が存在し、それによってフォアグラウンドオブジェクトと背景が発見される。 HEAPは効率的な階層的な画像分解を容易にし、より正確なオブジェクト発見に寄与すると同時に、様々なクラスのオブジェクトの区別を可能にする。セマンティックセグメンテーション検索、教師なしオブジェクト発見、およびサリエンシ検出タスクに関する大規模な実験結果は、HEAPが最先端のパフォーマンスを達成することを示す。 Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision. Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features. However, their scopes only build upon patch-level features within an image, neglecting region/image-level and cross-image relationships at a broader scale. Moreover, these methods cannot differentiate various semantics from multiple instances. To address these problems, we introduce Hierarchical mErging framework via contrAstive grouPing (HEAP). Specifically, a novel lightweight head with cross-attention mechanism is designed to adaptively group intra-image patches into semantically coherent regions based on correlation among self-supervised features. Further, to ensure the distinguishability among various regions, we introduce a region-level contrastive clustering loss to pull closer similar regions across images. Also, an image-level contrastive loss is present to push foreground and background representations apart, with which foreground objects and background are accordingly discovered. HEAP facilitates efficient hierarchical image decomposition, which contributes to more accurate object discovery while also enabling differentiation among objects of various classes. Extensive experimental results on semantic segmentation retrieval, unsupervised object discovery, and saliency detection tasks demonstrate that HEAP achieves state-of-the-art performance.	翻訳日:2024-01-02 12:46:04 公開日:2023-12-29
# 双曲偏微分方程式の作用素学習 Operator learning for hyperbolic partial differential equations ( http://arxiv.org/abs/2312.17489v1 ) ライセンス: Link先を確認	Christopher Wang and Alex Townsend	(参考訳) 本研究では,2変数の双曲偏微分方程式(pde)の解演算子を入出力訓練ペアから復元する最初の厳密な確率論的アルゴリズムを構築した。双曲型PDEの解作用素を復元する主な課題は特性の存在であり、それに伴うグリーン関数は不連続である。したがって,本アルゴリズムの中心的な構成要素は,特徴の近似位置を特定するランク検出方式である。ランダム化された特異値分解とドメインの適応的階層分割を組み合わせることで、演算子ノルムにおいて$O(\Psi_\epsilon^{-1}\epsilon^{-7}\log(\Xi_\epsilon^{-1}\epsilon^{-1}))$入力出力ペアに対して相対誤差$O(\Xi_\epsilon^{-1}\epsilon)$を$\epsilon\to0$と高確率で構築する。ここで、$\Psi_\epsilon$は解演算子の退化特異値の存在を表し、$\Xi_\epsilon$はトレーニングデータの品質を測定する。双曲 pde の係数の正則性に関する仮定は、双曲 pde が楕円型および放物型 pde の ‘instantaneous smoothing effect’' を持たないことを考慮すれば相対的に弱く、係数の正則性が増加するにつれて回復率は向上する。 We construct the first rigorously justified probabilistic algorithm for recovering the solution operator of a hyperbolic partial differential equation (PDE) in two variables from input-output training pairs. The primary challenge of recovering the solution operator of hyperbolic PDEs is the presence of characteristics, along which the associated Green's function is discontinuous. Therefore, a central component of our algorithm is a rank detection scheme that identifies the approximate location of the characteristics. By combining the randomized singular value decomposition with an adaptive hierarchical partition of the domain, we construct an approximant to the solution operator using $O(\Psi_\epsilon^{-1}\epsilon^{-7}\log(\Xi_\epsilon^{-1}\epsilon^{-1}))$ input-output pairs with relative error $O(\Xi_\epsilon^{-1}\epsilon)$ in the operator norm as $\epsilon\to0$, with high probability. Here, $\Psi_\epsilon$ represents the existence of degenerate singular values of the solution operator, and $\Xi_\epsilon$ measures the quality of the training data. Our assumptions on the regularity of the coefficients of the hyperbolic PDE are relatively weak given that hyperbolic PDEs do not have the ``instantaneous smoothing effect'' of elliptic and parabolic PDEs, and our recovery rate improves as the regularity of the coefficients increases.	翻訳日:2024-01-02 12:45:39 公開日:2023-12-29
# 予測プロセスモニタリングのための解釈可能かつ説明可能な機械学習手法:体系的文献レビュー Interpretable and Explainable Machine Learning Methods for Predictive Process Monitoring: A Systematic Literature Review ( http://arxiv.org/abs/2312.17584v1 ) ライセンス: Link先を確認	Nijat Mehdiyev, Maxim Majlatow and Peter Fettke	(参考訳) 本稿では, PRISMAフレームワークを用いて, 予測プロセスマイニングの文脈における機械学習モデルの説明可能性と解釈可能性について, 体系的文献レビュー(SLR)を提案する。人工知能(AI)とMLシステムの急速な進歩を踏まえ、これらの技術の「ブラックボックス」の性質を理解することがますます重要になっている。プロセスマイニングの領域に特化して、複雑なビジネスプロセスデータでトレーニングされたMLモデルを解釈する際の課題を考察する。我々は本質的に解釈可能なモデルとポストホックな説明技術を必要とするモデルとを区別し、現在の方法論とそれらの様々なアプリケーションドメインにまたがるアプリケーションの概要を提供する。本研究は厳密な書誌分析を通じて,予測プロセスマイニングにおける説明可能性と解釈可能性の状態を詳細に合成し,重要な傾向,課題,今後の方向性を明らかにする。本研究の目的は,より信頼性が高く,透明性が高く,効果的な知的システムを開発・実装する方法について,研究者や実践者により深く理解させることである。 This paper presents a systematic literature review (SLR) on the explainability and interpretability of machine learning (ML) models within the context of predictive process mining, using the PRISMA framework. Given the rapid advancement of artificial intelligence (AI) and ML systems, understanding the "black-box" nature of these technologies has become increasingly critical. Focusing specifically on the domain of process mining, this paper delves into the challenges of interpreting ML models trained with complex business process data. We differentiate between intrinsically interpretable models and those that require post-hoc explanation techniques, providing a comprehensive overview of the current methodologies and their applications across various application domains. Through a rigorous bibliographic analysis, this research offers a detailed synthesis of the state of explainability and interpretability in predictive process mining, identifying key trends, challenges, and future directions. Our findings aim to equip researchers and practitioners with a deeper understanding of how to develop and implement more trustworthy, transparent, and effective intelligent systems for predictive process analytics.	翻訳日:2024-01-02 10:18:55 公開日:2023-12-29
# 行動項目駆動による長文の要約 Action-Item-Driven Summarization of Long Meeting Transcripts ( http://arxiv.org/abs/2312.17581v1 ) ライセンス: Link先を確認	Logan Golia, Jugal Kalita	(参考訳) オンライン会議の普及の増加は、所定の会議の概要を自動的に生成できるモデルの実用性を大幅に向上させた。本稿では,ミーティングサマリーの生成を自動化する新しい,効果的なアプローチを提案する。この問題に対する現在のアプローチは、ミーティングを単に長い対話として考えることで、一般的かつ基本的な要約を生み出している。しかし,本アルゴリズムでは,会議書に含まれるアクション項目によって駆動される抽象的な会議要約を生成することができる。これは、要約を逐次生成し、ミーティングの各セクションを並列にアクションイテム抽出アルゴリズムを用いて行う。これらのセクションのサマリーはすべて結合され、コヒーレントかつアクション・テーマ駆動のサマリを作成するためにまとめられる。さらに,長文をトピックベースのセクションに分割することで,アルゴリズムの時間効率を向上させるとともに,長期依存を忘れる大規模言語モデル(LLM)の問題を解決するための3つの新しい手法を提案する。我々のパイプラインは、AMIコーパス全体で64.98のBERTSスコアを達成した。これは、細調整されたBART(Bidirectional and Auto-Regressive Transformers)モデルによって生成された現在の最先端結果から約4.98%の増加である。 The increased prevalence of online meetings has significantly enhanced the practicality of a model that can automatically generate the summary of a given meeting. This paper introduces a novel and effective approach to automate the generation of meeting summaries. Current approaches to this problem generate general and basic summaries, considering the meeting simply as a long dialogue. However, our novel algorithms can generate abstractive meeting summaries that are driven by the action items contained in the meeting transcript. This is done by recursively generating summaries and employing our action-item extraction algorithm for each section of the meeting in parallel. All of these sectional summaries are then combined and summarized together to create a coherent and action-item-driven summary. In addition, this paper introduces three novel methods for dividing up long transcripts into topic-based sections to improve the time efficiency of our algorithm, as well as to resolve the issue of large language models (LLMs) forgetting long-term dependencies. Our pipeline achieved a BERTScore of 64.98 across the AMI corpus, which is an approximately 4.98% increase from the current state-of-the-art result produced by a fine-tuned BART (Bidirectional and Auto-Regressive Transformers) model.	翻訳日:2024-01-02 10:18:35 公開日:2023-12-29
# 分布型低ランク埋め込み Distribution-based Low-rank Embedding ( http://arxiv.org/abs/2312.17579v1 ) ライセンス: Link先を確認	Bardia Yousefi	(参考訳) 乳房異常の早期発見は重要な課題である。特に,赤外線サーモグラフィーは乳癌検診や臨床検診(CBE)に有用である。不均質な熱パターンを測定することは、行列分解法によって達成される計算力学サーモグラフィを組み込む鍵である。これらのアプローチは、熱系列全体から主要な熱パターンを抽出することに焦点を当てている。しかし, 時間的変化を効果的に表現する支配的イメージを歌唱する作業は, 計算サーモグラフィの分野において難解な課題である。本稿では,この課題に対する2つの新しい戦略として,固有ベクトル (jse) とワイブル埋め込みアプローチに対するjames-steinの適用を提案する。主な目的は、熱データストリームの低次元(LD)表現を作ることである。このLD近似は、早期乳癌検出のための熱力学を抽出し、最適化されたハイパーパラメーターで分類モデルを訓練する基盤となる。さらに, 種々の埋め込み結合と行列分解法の比較解析を行う。提案手法は,Weibull 埋め込みを用いた分類精度が81.7% (+/-5.2%) であり,従来提案した他の埋め込み手法よりも優れていることを示す。比較分析では、Sparse PCTとDeep SemiNMFは、それぞれ80.9%と78.6%が最も高い精度を示した。これらの結果から,jseとweibullの埋込み技術は,cbeの改善につながるバイオマーカーとして重要な熱パターンを保存し,乳癌の早期発見を可能にすることが示唆された。 The early detection of breast abnormalities is a matter of critical significance. Notably, infrared thermography has emerged as a valuable tool in breast cancer screening and clinical breast examination (CBE). Measuring heterogeneous thermal patterns is the key to incorporating computational dynamic thermography, which can be achieved by matrix factorization techniques. These approaches focus on extracting the predominant thermal patterns from the entire thermal sequence. Yet, the task of singling out the dominant image that effectively represents the prevailing temporal changes remains a challenging pursuit within the field of computational thermography. In this context, we propose applying James-Stein for eigenvector (JSE) and Weibull embedding approaches, as two novel strategies in response to this challenge. The primary objective is to create a low-dimensional (LD) representation of the thermal data stream. This LD approximation serves as the foundation for extracting thermomics and training a classification model with optimized hyperparameters, for early breast cancer detection. Furthermore, we conduct a comparative analysis of various embedding adjuncts to matrix factorization methods. The results of the proposed method indicate an enhancement in the projection of the predominant basis vector, yielding classification accuracy of 81.7% (+/-5.2%) using Weibull embedding, which outperformed other embedding approaches we proposed previously. In comparison analysis, Sparse PCT and Deep SemiNMF showed the highest accuracies having 80.9% and 78.6%, respectively. These findings suggest that JSE and Weibull embedding techniques substantially help preserve crucial thermal patterns as a biomarker leading to improved CBE and enabling the very early detection of breast cancer.	翻訳日:2024-01-02 10:18:13 公開日:2023-12-29
# カードの量子ハウス The Quantum House Of Cards ( http://arxiv.org/abs/2312.17570v1 ) ライセンス: Link先を確認	Xavier Waintal	(参考訳) 量子コンピュータは、新しい薬物の発見、肥料生産のための新しい触媒、暗号化プロトコルの破断、金融ポートフォリオの最適化、新しい人工知能アプリケーションの実装など、多くの重要な問題を解決するために提案されている。しかし、これまでは3から5への乗算のような単純なタスクは、既存の量子ハードウェアを超えている。本稿では、量子コンピュータが約束を果たすためには、解決すべき困難について検討する。私は、最上位層(実際のアルゴリズムと関連するアプリケーション)から最下位層(量子ハードウェア、その制御電子回路、低温工学など)まで量子コンピュータを構築することを想定された技術のスタック全体について、量子エラー訂正の重要な中間層を忘れずに議論します。 Quantum computers have been proposed to solve a number of important problems such as discovering new drugs, new catalysts for fertilizer production, breaking encryption protocols, optimizing financial portfolios, or implementing new artificial intelligence applications. Yet, to date, a simple task such as multiplying 3 by 5 is beyond existing quantum hardware. This article examines the difficulties that would need to be solved for quantum computers to live up to their promises. I discuss the whole stack of technologies that has been envisioned to build a quantum computer from the top layers (the actual algorithms and associated applications) down to the very bottom ones (the quantum hardware, its control electronics, cryogeny, etc.) while not forgetting the crucial intermediate layer of quantum error correction.	翻訳日:2024-01-02 10:17:47 公開日:2023-12-29
# Few-Shot Neural Radiance Fieldのインフォームティブ光選択 Informative Rays Selection for Few-Shot Neural Radiance Fields ( http://arxiv.org/abs/2312.17561v1 ) ライセンス: Link先を確認	Marco Orsingher, Anthony Dell'Eva, Paolo Zani, Paolo Medici, Massimo Bertozzi	(参考訳) NeRF(Neural Radiance Fields)は最近、画像ベースの3D再構成の強力な方法として登場したが、シーンごとの長い最適化は、特にリソース制約のある環境での実用的利用を制限する。既存のアプローチでは、入力ビューの数を減らし、複雑な損失または他のモダリティからの追加入力で学習されたボリューム表現を規則化する。本稿では,キー情報線に着目して,数ショットシナリオにおけるNeRFの簡易かつ効果的なトレーニング方法であるKeyNeRFを提案する。このような光線は、まず、シーンのカバレッジを保証しながらベースラインの多様性を促進するビューセレクションアルゴリズムによりカメラレベルで選択され、その後、ローカル画像エントロピーに基づく確率分布からのサンプリングにより画素レベルで選択される。提案手法は,既存のNeRFコードベースの変更を最小限に抑えつつ,最先端の手法に対して良好に機能する。 Neural Radiance Fields (NeRF) have recently emerged as a powerful method for image-based 3D reconstruction, but the lengthy per-scene optimization limits their practical usage, especially in resource-constrained settings. Existing approaches solve this issue by reducing the number of input views and regularizing the learned volumetric representation with either complex losses or additional inputs from other modalities. In this paper, we present KeyNeRF, a simple yet effective method for training NeRF in few-shot scenarios by focusing on key informative rays. Such rays are first selected at camera level by a view selection algorithm that promotes baseline diversity while guaranteeing scene coverage, then at pixel level by sampling from a probability distribution based on local image entropy. Our approach performs favorably against state-of-the-art methods, while requiring minimal changes to existing NeRF codebases.	翻訳日:2024-01-02 10:17:34 公開日:2023-12-29
# 神経節下出血後の頭部CTにおける深部血行再建のためのスキントランスフォーマーを用いた完全自動化パイプライン A Fully Automated Pipeline Using Swin Transformers for Deep Learning-Based Blood Segmentation on Head CT Scans After Aneurysmal Subarachnoid Hemorrhage ( http://arxiv.org/abs/2312.17553v1 ) ライセンス: Link先を確認	Sergio Garcia Garcia, Santiago Cepeda, Ignacio Arrese, Rosario Sarabia	(参考訳) 背景: 自発性くも膜下出血(SAH)の正確な容積評価は, その臨床的, 予後に関連があると思われる, 現在の手動・半自動的手法による労働集約的な作業である。本研究では,非コントラストCT(noncontrast Computed Tomography, NCCT)スキャンを用いて, トランスフォーマーをベースとしたSwin UNETRアーキテクチャを用いて, SAH患者に対して, 人工知能による完全自動血液セグメンテーションツールを開発した。方法:Swin UNETRを用いた大動脈瘤下出血(aSAH)患者のNCCTスキャンを経時的に解析した。提案手法は,diceスコア,intersection over union (iou), volumetric similarity index (vsi), 対称平均表面距離 (sasd), 感度, 特異性などの指標を用いて, 手動セグメンテッド・グラウンド・真実データに対する性能評価を行った。モデルの一般化性をテストするために,外部機関からの検証コホートを組み込んだ。結果: モデルでは, 内部および外部の検証コホートにまたがって, 堅牢な性能指標で高い精度を示した。特に高Dice係数 (0.873), IoU (0.810), VSI (0.840), 感度 (0.821), 特異度 (0.996) およびSASD (1.866) を達成し,SAH患者の血液分画能を示唆した。モデルの効率は処理速度に反映され、リアルタイムアプリケーションの可能性を示している。結論: 本モデルでは, ncct 画像における asah 後の血液自動分画の大幅な進歩を示す。計算強度にもかかわらず、このモデルはユーザフレンドリーなインターフェースを備えた標準ハードウェアで効果的に動作し、より広範な臨床応用を促進する。様々なデータセットのさらなる検証は、その臨床的信頼性を確認することが保証される。 Background: Accurate volumetric assessment of spontaneous subarachnoid hemorrhage (SAH) is a labor-intensive task performed with current manual and semiautomatic methods that might be relevant for its clinical and prognostic implications. In the present research, we sought to develop and validate an artificial intelligence-driven, fully automated blood segmentation tool for SAH patients via noncontrast computed tomography (NCCT) scans employing a transformer-based Swin UNETR architecture. Methods: We retrospectively analyzed NCCT scans from patients with confirmed aneurysmal subarachnoid hemorrhage (aSAH) utilizing the Swin UNETR for segmentation. The performance of the proposed method was evaluated against manually segmented ground truth data using metrics such as Dice score, intersection over union (IoU), the volumetric similarity index (VSI), the symmetric average surface distance (SASD), and sensitivity and specificity. A validation cohort from an external institution was included to test the generalizability of the model. Results: The model demonstrated high accuracy with robust performance metrics across the internal and external validation cohorts. Notably, it achieved high Dice coefficient (0.873), IoU (0.810), VSI (0.840), sensitivity (0.821) and specificity (0.996) values and a low SASD (1.866), suggesting proficiency in segmenting blood in SAH patients. The model's efficiency was reflected in its processing speed, indicating potential for real-time applications. Conclusions: Our Swin UNETR-based model offers significant advances in the automated segmentation of blood after aSAH on NCCT images. Despite the computational intensity, the model operates effectively on standard hardware with a user-friendly interface, facilitating broader clinical adoption. Further validation across diverse datasets is warranted to confirm its clinical reliability.	翻訳日:2024-01-02 10:17:16 公開日:2023-12-29
# 自然言語推論を用いた効率的なユニバーサル分類器の構築 Building Efficient Universal Classifiers with Natural Language Inference ( http://arxiv.org/abs/2312.17543v1 ) ライセンス: Link先を確認	Moritz Laurer, Wouter van Atteveldt, Andreu Casas, Kasper Welbers	(参考訳) 生成型大言語モデル(llm)は、テキスト生成の普遍性のおかげで、マイノリティショットとゼロショット学習の主流となっている。しかし、多くのユーザーは、分類タスクを自動化したい場合にのみ、生成LDMの幅広い機能を必要としない。より小さなbertライクなモデルは普遍的なタスクも学べるので、細かいチューニング(ゼロショットの分類)を必要とせず、新しいタスクをほんの数例(fewshot)で学べる一方で、生成型llmよりもはるかに効率的である。本稿では、自然言語推論(nli)を、生成型llmの命令の微調整として類似した原則に従う普遍的分類タスクとして用いる方法を説明し、(2)普遍的分類器を構築するための再利用可能なjupyterノートブックによるステップバイステップガイドを提供し、389のクラスで33のデータセットで訓練された結果の普遍的分類器を共有する。私たちが共有しているコードの一部は、2023年12月時点で5500万回以上ダウンロードされた古いゼロショット分類器のトレーニングに使用されています。我々の新しい分類器はゼロショット性能を9.4%向上させる。 Generative Large Language Models (LLMs) have become the mainstream choice for fewshot and zeroshot learning thanks to the universality of text generation. Many users, however, do not need the broad capabilities of generative LLMs when they only want to automate a classification task. Smaller BERT-like models can also learn universal tasks, which allow them to do any text classification task without requiring fine-tuning (zeroshot classification) or to learn new tasks with only a few examples (fewshot), while being significantly more efficient than generative LLMs. This paper (1) explains how Natural Language Inference (NLI) can be used as a universal classification task that follows similar principles as instruction fine-tuning of generative LLMs, (2) provides a step-by-step guide with reusable Jupyter notebooks for building a universal classifier, and (3) shares the resulting universal classifier that is trained on 33 datasets with 389 diverse classes. Parts of the code we share has been used to train our older zeroshot classifiers that have been downloaded more than 55 million times via the Hugging Face Hub as of December 2023. Our new classifier improves zeroshot performance by 9.4%.	翻訳日:2024-01-02 10:16:41 公開日:2023-12-29
# 説明可能な二分分類のための距離誘導生成逆ネットワーク Distance Guided Generative Adversarial Network for Explainable Binary Classifications ( http://arxiv.org/abs/2312.17538v1 ) ライセンス: Link先を確認	Xiangyu Xiong, Yue Sun, Xiaohong Liu, Wei Ke, Chan-Tong Lam, Jiangang Chen, Mingfeng Jiang, Mingwei Wang, Hui Xie, Tong Tong, Qinquan Gao, Hao Chen, Tao Tan	(参考訳) データ拡張の潜在的な利点はデータ不足を軽減することであるが、従来の拡張手法は主にドメイン内の知識に依存している。一方,gans (advanced generative adversarial networks) では,多種多様なドメイン間サンプルを生成する。これらの手法は二項分類における決定境界の記述に限定的な貢献をする。本稿では,超平面空間における生成サンプルの変動度を制御する距離誘導型GAN(DisGAN)を提案する。具体的には、2つの方法を組み合わせてDisGANのアイデアをインスタンス化する。第1の方法は垂直距離GAN(VerDisGAN)であり、ドメイン間の生成は垂直距離で条件付けられる。第2の方法は水平距離GAN(HorDisGAN)であり、ドメイン内生成は水平距離に条件付けられる。さらに、VerDisGANは、ソースイメージをハイパープレーンにマッピングすることで、クラス固有の領域を生成することができる。実験結果から, DisGAN は GAN に基づく拡張法よりも一貫した性能を示した。提案手法は異なる分類アーキテクチャに適用でき,マルチクラス分類に拡張できる可能性がある。 Despite the potential benefits of data augmentation for mitigating the data insufficiency, traditional augmentation methods primarily rely on the prior intra-domain knowledge. On the other hand, advanced generative adversarial networks (GANs) generate inter-domain samples with limited variety. These previous methods make limited contributions to describing the decision boundaries for binary classification. In this paper, we propose a distance guided GAN (DisGAN) which controls the variation degrees of generated samples in the hyperplane space. Specifically, we instantiate the idea of DisGAN by combining two ways. The first way is vertical distance GAN (VerDisGAN) where the inter-domain generation is conditioned on the vertical distances. The second way is horizontal distance GAN (HorDisGAN) where the intra-domain generation is conditioned on the horizontal distances. Furthermore, VerDisGAN can produce the class-specific regions by mapping the source images to the hyperplane. Experimental results show that DisGAN consistently outperforms the GAN-based augmentation methods with explainable binary classification. The proposed method can apply to different classification architectures and has potential to extend to multi-class classification.	翻訳日:2024-01-02 10:16:16 公開日:2023-12-29
# Olapa-MCoT:中国のLLMの数学的推論能力向上 Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of LLMs ( http://arxiv.org/abs/2312.17535v1 ) ライセンス: Link先を確認	Shaojie Zhu, Zhaobin Wang, Chengxiang Zhuo, Hui Lu, Bo Hu and Zang Li	(参考訳) CoT(Chain-of-Thought)は、LLMの推論問題を解決する方法である。近年,LLMのCoT性能向上に向けた研究が数多く行われている。本研究では,Lama2-13B PLMをベースとしたLLMであるOlapa-MCoTを提案する。アライメントトレーニング中に,オラパmcotの中国数学推論能力を最適化することを中心に,simrrhfアルゴリズムと不正確なデータ再学習を提案した。実験の結果、中国の数学的推論の正確さは、llama2-13bと比較して50%、36%上昇した。さらに、英語の推論能力の精度も4%近く向上した。 CoT (Chain-of-Thought) is a way to solve reasoning problems for LLMs . Recently, many researches appear for improving the CoT capability of LLMs. In this work, we also proposed Olapa-MCoT, which is a LLMs based on llama2-13B PLM for finetuning and alignment learning. During the alignment training, we proposed the SimRRHF algorithm and Incorrect Data Relearning and mainly focused on optimizing the Chinese mathematical reasoning ability of Olapa-MCoT. The experiment achieved significant results, with the accuracy of Chinese mathematical reasoning up to 50%, 36% rise compared to llama2-13B. In addition, the accuracy of English reasoning ability also increased by nearly 4%.	翻訳日:2024-01-02 10:15:59 公開日:2023-12-29
# 次元知覚による大規模言語モデルの量的推論能力の向上 Enhancing Quantitative Reasoning Skills of Large Language Models through Dimension Perception ( http://arxiv.org/abs/2312.17532v1 ) ライセンス: Link先を確認	Yuncheng Huang, Qianyu He, Jiaqing Liang, Sihang Jiang, Yanghua Xiao and Yunwen Chen	(参考訳) 量は、エンティティのマグニチュード特性を特徴づけるテキストの個別で重要な要素であり、特に推論タスクにおいて自然言語を理解するための正確な視点を提供する。近年、大言語モデル(llm)に基づく推論タスクの研究が盛んに行われており、そのほとんどは数値だけに焦点を当てており、その重要性にもかかわらず量と単位の次元概念を無視している。我々は、次元の概念は量を理解するのに不可欠であり、LLMが量的推論を行うのに非常に重要であると論じる。しかし、次元知識と量関連ベンチマークの欠如により、LLMの性能は低下した。そこで,我々は,次元知覚に基づく言語モデルの量的推論能力を高める枠組みを提案する。まず,この領域の知識ギャップに対処するため,次元単位知識ベース(DimUnitKB)を構築した。本研究では,llmの次元知覚スキルを探究し,向上させるために,3つのカテゴリからなる7つのタスクからなるベンチマークディメバルを提案する。本手法の有効性を評価するために,定量的推論タスクを提案し,実験を行う。その結果, GPT-4と比較して, 定量的推論の精度(43.55%～50.67%)が劇的に向上することがわかった。 Quantities are distinct and critical components of texts that characterize the magnitude properties of entities, providing a precise perspective for the understanding of natural language, especially for reasoning tasks. In recent years, there has been a flurry of research on reasoning tasks based on large language models (LLMs), most of which solely focus on numerical values, neglecting the dimensional concept of quantities with units despite its importance. We argue that the concept of dimension is essential for precisely understanding quantities and of great significance for LLMs to perform quantitative reasoning. However, the lack of dimension knowledge and quantity-related benchmarks has resulted in low performance of LLMs. Hence, we present a framework to enhance the quantitative reasoning ability of language models based on dimension perception. We first construct a dimensional unit knowledge base (DimUnitKB) to address the knowledge gap in this area. We propose a benchmark DimEval consisting of seven tasks of three categories to probe and enhance the dimension perception skills of LLMs. To evaluate the effectiveness of our methods, we propose a quantitative reasoning task and conduct experiments. The experimental results show that our dimension perception method dramatically improves accuracy (43.55%->50.67%) on quantitative reasoning tasks compared to GPT-4.	翻訳日:2024-01-02 10:15:47 公開日:2023-12-29
# RS-DGC:リモートセンシング画像解釈における動的勾配圧縮の周辺統計探査 RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation ( http://arxiv.org/abs/2312.17530v1 ) ライセンス: Link先を確認	Weiying Xie, Zixuan Wang, Jitao Ma, Daixun Li, Yunsong Li	(参考訳) 分散ディープラーニング(distributed deep learning)は、地球観測プログラムによって毎日生成されるオープンデータの量の増加によって引き起こされる課題のために、リモートセンシング(rs)アプリケーションで注目を集めている。しかし、複数のノード間でモデル更新を送信するための高い通信コストは、スケーラブルな分散学習にとって大きなボトルネックである。グラディエントスペーシフィケーションは、通信コストを削減し、トレーニング速度を加速する効果的な勾配圧縮(GC)技術として検証されている。現状のグラデーションスパーシフィケーション法は主に「より大きく、より重要」な基準に基づいており、パフォーマンスに影響を与えるために一般的に観察される小さなグラデーションの重要性を無視している。近傍情報からの多様体構造の情報表現に着想を得て,RS-DGC と呼ばれる近傍統計指標を用いた簡易かつ効果的な動的勾配圧縮手法を提案する。まず、勾配近傍を導入することで勾配間の相互依存性を高め、ランダムノイズの影響を低減する。 RS-DGCのキーコンポーネントはNSI(Neighborhood Statistical Indicator)であり、各ノード上の特定の近傍における勾配の重要性を定量化し、各イテレーションにおける勾配伝達前の局所勾配を分散させる。さらに, 各層の重要性変化をリアルタイムで追跡するために, 層幅動的圧縮方式を提案する。広範なダウンストリームタスクは,rs画像のインテリジェントな解釈という観点から,提案手法の優位性を検証する。例えば、VGG-19ネットワークを用いて、NWPU-RESISC45データセット上で50倍以上の通信圧縮を行い、0.51%の精度向上を実現した。 Distributed deep learning has recently been attracting more attention in remote sensing (RS) applications due to the challenges posed by the increased amount of open data that are produced daily by Earth observation programs. However, the high communication costs of sending model updates among multiple nodes are a significant bottleneck for scalable distributed learning. Gradient sparsification has been validated as an effective gradient compression (GC) technique for reducing communication costs and thus accelerating the training speed. Existing state-of-the-art gradient sparsification methods are mostly based on the "larger-absolute-more-important" criterion, ignoring the importance of small gradients, which is generally observed to affect the performance. Inspired by informative representation of manifold structures from neighborhood information, we propose a simple yet effective dynamic gradient compression scheme leveraging neighborhood statistics indicator for RS image interpretation, termed RS-DGC. We first enhance the interdependence between gradients by introducing the gradient neighborhood to reduce the effect of random noise. The key component of RS-DGC is a Neighborhood Statistical Indicator (NSI), which can quantify the importance of gradients within a specified neighborhood on each node to sparsify the local gradients before gradient transmission in each iteration. Further, a layer-wise dynamic compression scheme is proposed to track the importance changes of each layer in real time. Extensive downstream tasks validate the superiority of our method in terms of intelligent interpretation of RS images. For example, we achieve an accuracy improvement of 0.51% with more than 50 times communication compression on the NWPU-RESISC45 dataset using VGG-19 network.	翻訳日:2024-01-02 10:15:26 公開日:2023-12-29
# 画像超解像の初期訓練におけるノイズフリー最適化 Noise-free Optimization in Early Training Steps for Image Super-Resolution ( http://arxiv.org/abs/2312.17526v1 ) ライセンス: Link先を確認	MinKyu Lee, Jae-Pil Heo	(参考訳) 近年の深層学習に基づく単一画像超解像法(SISR)は,高分解能(HR)画像に対する画素幅の最小化により,ネットワークをトレーニングする典型的な手法である。しかし, 基本訓練方式が主流であるにもかかわらず, 不正な逆問題の文脈での利用については, 十分に検討されていない。本研究では,対象のHR画像から複数のHR画像に対する期待値である最適セントロイドと,HR画像とセントロイドの残差として定義される固有ノイズの2つのサブコンポーネントに分解することで,基礎となる構成成分をよりよく理解することを目的とする。以上の結果から,現在のトレーニング手法ではSISRの不正な性質を捉えられず,特に早期訓練の段階では固有のノイズ項に弱いことが示唆された。そこで本研究では,バニラ訓練の初期段階における固有雑音項を,最適な遠心率を推定し,直接的最適化を行うことで効果的に除去できる新しい最適化手法を提案する。実験の結果,提案手法はバニラ訓練の安定性を効果的に向上し,全体の性能向上につながることが示された。コードはgithub.com/2minkyulee/ECOで入手できる。 Recent deep-learning-based single image super-resolution (SISR) methods have shown impressive performance whereas typical methods train their networks by minimizing the pixel-wise distance with respect to a given high-resolution (HR) image. However, despite the basic training scheme being the predominant choice, its use in the context of ill-posed inverse problems has not been thoroughly investigated. In this work, we aim to provide a better comprehension of the underlying constituent by decomposing target HR images into two subcomponents: (1) the optimal centroid which is the expectation over multiple potential HR images, and (2) the inherent noise defined as the residual between the HR image and the centroid. Our findings show that the current training scheme cannot capture the ill-posed nature of SISR and becomes vulnerable to the inherent noise term, especially during early training steps. To tackle this issue, we propose a novel optimization method that can effectively remove the inherent noise term in the early steps of vanilla training by estimating the optimal centroid and directly optimizing toward the estimation. Experimental results show that the proposed method can effectively enhance the stability of vanilla training, leading to overall performance gain. Codes are available at github.com/2minkyulee/ECO.	翻訳日:2024-01-02 10:14:59 公開日:2023-12-29
# 強化学習アプローチによる近似計算手法の設計空間探索 Design Space Exploration of Approximate Computing Techniques with a Reinforcement Learning Approach ( http://arxiv.org/abs/2312.17525v1 ) ライセンス: Link先を確認	Sepide Saeedi, Alessandro Savino, Stefano Di Carlo	(参考訳) 近似コンピューティング(AxC)技術は、様々なアプリケーションのパフォーマンス向上の正確さのトレードオフにおいて、ますます人気が高まっている。あるアプリケーションに最適なaxcテクニックを選択するのは困難です。設計空間を探索するための提案手法のうち、強化学習(rl)のような機械学習アプローチは有望な結果を示している。本稿では,精度の低下とパワー,計算時間の削減を両立させるアプリケーションの近似バージョンを求めるために,rlを用いた多目的設計空間探索手法を提案する。実験の結果,いくつかのベンチマークにおいて,精度低下と消費電力減少と計算時間とのトレードオフが良好であった。 Approximate Computing (AxC) techniques have become increasingly popular in trading off accuracy for performance gains in various applications. Selecting the best AxC techniques for a given application is challenging. Among proposed approaches for exploring the design space, Machine Learning approaches such as Reinforcement Learning (RL) show promising results. In this paper, we proposed an RL-based multi-objective Design Space Exploration strategy to find the approximate versions of the application that balance accuracy degradation and power and computation time reduction. Our experimental results show a good trade-off between accuracy degradation and decreased power and computation time for some benchmarks.	翻訳日:2024-01-02 10:14:37 公開日:2023-12-29
# CHIP2023におけるPromptCBLUE共有タスクの概要 Overview of the PromptCBLUE Shared Task in CHIP2023 ( http://arxiv.org/abs/2312.17522v1 ) ライセンス: Link先を確認	Wei Zhu, Xiaoling Wang, Mosha Chen, Buzhou Tang	(参考訳) 本稿では,CHIP-2023会議におけるPromptCBLUE共有タスク(http://cips-chip.org.cn/2023/eval1)の概要を紹介する。この共有タスクはcblueベンチマークを改訂し、一般的な医学自然言語処理において、中国オープンドメインまたは医療ドメイン大規模言語モデル(llm)のための優れたテストベッドを提供する。 2つの異なる線がある。 (a)プロンプト・チューニング・トラック、LLMのマルチタスク・プロンプト・チューニングの調査 (b)オープンソースllmのコンテキスト内学習能力の検証。業界と学界の両方の多くのチームが共有タスクに参加し、トップチームは素晴らしいテスト結果を得た。本稿では,タスク,データセット,評価指標,および両タスクの上位システムについて述べる。最後に,参加チームによる様々なアプローチの評価手法と結果について概説する。 This paper presents an overview of the PromptCBLUE shared task (http://cips-chip.org.cn/2023/eval1) held in the CHIP-2023 Conference. This shared task reformualtes the CBLUE benchmark, and provide a good testbed for Chinese open-domain or medical-domain large language models (LLMs) in general medical natural language processing. Two different tracks are held: (a) prompt tuning track, investigating the multitask prompt tuning of LLMs, (b) probing the in-context learning capabilities of open-sourced LLMs. Many teams from both the industry and academia participated in the shared tasks, and the top teams achieved amazing test results. This paper describes the tasks, the datasets, evaluation metrics, and the top systems for both tasks. Finally, the paper summarizes the techniques and results of the evaluation of the various approaches explored by the participating teams.	翻訳日:2024-01-02 10:14:28 公開日:2023-12-29
# 悲観的二段階最適化による決定に焦点を当てた予測:計算的研究 Decision-focused predictions via pessimistic bilevel optimization: a computational study ( http://arxiv.org/abs/2312.17640v1 ) ライセンス: Link先を確認	V\'ictor Bucarey, Sophia Calder\'on, Gonzalo Mu\~noz, Frederic Semet	(参考訳) 最適化パラメータの不確実性に対処することは、重要かつ長年の課題である。通常、不確定パラメータを正確に予測し、決定論的最適化問題を解く。しかし、このいわゆる \emph{predict-then-Optimize} 手順による決定は、不確実なパラメータに非常に敏感である。本研究は,<emph{regret>尺度を最小化することを目的として構築された予測モデルを構築することを目的とした,<emph{decision</de>予測の作成における最近の取り組みに貢献する。我々は悲観的二レベル最適化モデルとして、正確に期待される後悔の最小化を定式化する。そして、双対性引数を用いて、これを非凸二次最適化問題として再構成する。最後に,トラクタビリティを実現するための様々な計算手法を示す。コストベクトルが不確実なショートパスの場合の計算結果を報告する。提案手法は, 意思決定型学習の最先端手法であるElmachtoub と Grigas (2022) のアプローチにより, トレーニング性能を向上させることができることを示す。 Dealing with uncertainty in optimization parameters is an important and longstanding challenge. Typically, uncertain parameters are predicted accurately, and then a deterministic optimization problem is solved. However, the decisions produced by this so-called \emph{predict-then-optimize} procedure can be highly sensitive to uncertain parameters. In this work, we contribute to recent efforts in producing \emph{decision-focused} predictions, i.e., to build predictive models that are constructed with the goal of minimizing a \emph{regret} measure on the decisions taken with them. We formulate the exact expected regret minimization as a pessimistic bilevel optimization model. Then, using duality arguments, we reformulate it as a non-convex quadratic optimization problem. Finally, we show various computational techniques to achieve tractability. We report extensive computational results on shortest-path instances with uncertain cost vectors. Our results indicate that our approach can improve training performance over the approach of Elmachtoub and Grigas (2022), a state-of-the-art method for decision-focused learning.	翻訳日:2024-01-02 09:52:50 公開日:2023-12-29
# マルチモーダルICUデータを用いた病院内死亡予測のためのXAI XAI for In-hospital Mortality Prediction via Multimodal ICU Data ( http://arxiv.org/abs/2312.17624v1 ) ライセンス: Link先を確認	Xingqiao Li, Jindong Gu, Zhiyong Wang, Yancheng Yuan, Bo Du, and Fengxiang He	(参考訳) 集中治療室(ICU)患者の院内死亡予測は最終臨床結果の鍵となる。 AIは正確さに長けているが、説明責任の欠如に悩まされている。この問題に対処するために,マルチモーダルICUデータを用いた病院内死亡予測のための,効率的かつ説明可能なAIソリューションであるeXplainable Multimodal Mortality Predictor (X-MMP)を提案する。我々は,臨床データから異種入力を受け取り,意思決定が可能なマルチモーダル学習をフレームワークに採用する。さらに,lrp法のトランスへの適切な拡張として,マルチモーダル入力上での説明を生成し,予測に寄与する有意な特徴を明らかにした。さらに, 臨床結果に対する各モダリティの寄与を可視化し, 意思決定の背後にある理由を理解することを支援する。我々はMIMIC-IIIとMIMIC-III波形データベースマッチングサブセットに基づくマルチモーダルデータセットを構築した。ベンチマークデータセットに関する包括的実験は,提案手法が競合予測精度で合理的に解釈できることを実証する。特に、我々の枠組みは、医療研究において重要な要素の発見を容易にする他の臨床課題に容易に移行することができる。 Predicting in-hospital mortality for intensive care unit (ICU) patients is key to final clinical outcomes. AI has shown advantaged accuracy but suffers from the lack of explainability. To address this issue, this paper proposes an eXplainable Multimodal Mortality Predictor (X-MMP) approaching an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data. We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions. Furthermore, we introduce an explainable method, namely Layer-Wise Propagation to Transformer, as a proper extension of the LRP method to Transformers, producing explanations over multimodal inputs and revealing the salient features attributed to prediction. Moreover, the contribution of each modality to clinical outcomes can be visualized, assisting clinicians in understanding the reasoning behind decision-making. We construct a multimodal dataset based on MIMIC-III and MIMIC-III Waveform Database Matched Subset. Comprehensive experiments on benchmark datasets demonstrate that our proposed framework can achieve reasonable interpretation with competitive prediction accuracy. In particular, our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.	翻訳日:2024-01-02 09:52:33 公開日:2023-12-29
# 絡み合い幅に基づく絡み合い量子化器の下位境界 Lower Bounds of Entanglement Quantifiers Based On Entanglement Witnesses ( http://arxiv.org/abs/2312.17620v1 ) ライセンス: Link先を確認	Xian Shi	(参考訳) ある絡み合い測度の観点で二成分系の絡み合いを定量化することは、一般的には難しい問題であり、システムに関する情報が少ない場合にははるかに悪い。本書では,エンタングルメント基準の2つのクラスに基づいて,エンタングルメント測度,コンカージェンス,形成のエンタングルメント,幾何学的エンタングルメント測度の下限を求める方法を提案する。 To quantify the entanglement of bipartite systems in terms of some entanglement measure is a challenging problem in general, and it is much worse when the information about the system is less. In this manuscript, based on two classes of entanglement criteria, we present a method to obtain the lower bounds of the entanglement measures, concurrence, entanglement of formation, and geometrical entanglement measure.	翻訳日:2024-01-02 09:52:09 公開日:2023-12-29
# 生成情報抽出のための大規模言語モデル:調査 Large Language Models for Generative Information Extraction: A Survey ( http://arxiv.org/abs/2312.17617v1 ) ライセンス: Link先を確認	Derong Xu, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, Enhong Chen	(参考訳) 情報抽出(ie)は、自然言語テキストから構造的知識(エンティティ、関係、イベントなど)を抽出することを目的としている。近年,ジェネレーティブ・Large Language Models (LLM) はテキスト理解と生成において顕著な能力を示し,様々な領域やタスクをまたいだ一般化を実現している。その結果、LLMの能力を活用し、生成パラダイムに基づいたIEタスクに実行可能なソリューションを提供するための多くの研究が提案されている。そこで本研究では,IE タスクにおける LLM の取り組みを総合的に検討し,最近の進歩を調査する。まず,これらの課題を多種多様なIEサブタスクと学習パラダイムで分類し,先進的な手法を実証的に分析し,LLMによるIEタスクの出現傾向を明らかにする。徹底的なレビューに基づいて,今後の研究にふさわしい技術と有望な研究方向性について,いくつかの知見を見出している。パブリックリポジトリをメンテナンスし、関連するリソースを継続的に更新します。 Information extraction (IE) aims to extract structural knowledge (such as entities, relations, and events) from plain natural language texts. Recently, generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation, allowing for generalization across various domains and tasks. As a result, numerous works have been proposed to harness abilities of LLMs and offer viable solutions for IE tasks based on a generative paradigm. To conduct a comprehensive systematic review and exploration of LLM efforts for IE tasks, in this study, we survey the most recent advancements in this field. We first present an extensive overview by categorizing these works in terms of various IE subtasks and learning paradigms, then we empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs. Based on thorough review conducted, we identify several insights in technique and promising research directions that deserve further exploration in future studies. We maintain a public repository and consistently update related resources at: \url{https://github.com/quqxui/Awesome-LLM4IE-Papers}.	翻訳日:2024-01-02 09:52:00 公開日:2023-12-29
# グラフ畳み込みネットワークのワンショットマルチレートプルーニング One-Shot Multi-Rate Pruning of Graph Convolutional Networks ( http://arxiv.org/abs/2312.17615v1 ) ライセンス: Link先を確認	Hichem Sahbi	(参考訳) 本稿では,マルチレート・マグニチュード・プルーニング(Multi-Rate Magnitude Pruning,MRMP)と呼ばれる,ネットワークトポロジと重みを併用した軽量なグラフ畳み込みネットワーク(GCN)の設計を提案する。本手法は,学習したネットワークの重み分布を事前分布と整合させることにより,変動し,進行する。一方で、任意の固定プルーニングレートを実装したり、設計した軽量gcnの一般化性能を向上させることができる。一方、MRMPは、重みを調整することなく、任意の目標プルーニング速度で正確なネットワークを推定するために、共有重みの上に複数のGCNを共同で訓練する。骨格に基づく認識の課題に対して行われた大規模な実験は、特に非常に高い刈取体制下で、我々の軽量GCNのかなりの増加を示している。 In this paper, we devise a novel lightweight Graph Convolutional Network (GCN) design dubbed as Multi-Rate Magnitude Pruning (MRMP) that jointly trains network topology and weights. Our method is variational and proceeds by aligning the weight distribution of the learned networks with an a priori distribution. In the one hand, this allows implementing any fixed pruning rate, and also enhancing the generalization performances of the designed lightweight GCNs. In the other hand, MRMP achieves a joint training of multiple GCNs, on top of shared weights, in order to extrapolate accurate networks at any targeted pruning rate without retraining their weights. Extensive experiments conducted on the challenging task of skeleton-based recognition show a substantial gain of our lightweight GCNs particularly at very high pruning regimes.	翻訳日:2024-01-02 09:51:40 公開日:2023-12-29
# 印刷多層パーセプトロンの増積と活性化のベスポーク近似 Bespoke Approximation of Multiplication-Accumulation and Activation Targeting Printed Multilayer Perceptrons ( http://arxiv.org/abs/2312.17612v1 ) ライセンス: Link先を確認	Florentia Afentaki, Gurol Saglam, Argyris Kokkinis, Kostas Siozios, Georgios Zervakis, Mehdi B Tahoori	(参考訳) Printed Electronics (PE) は、真のユビキタスコンピューティングを実現するための顕著な技術である、際立った特徴と特徴を特徴とする。これは、これまでコンピューティングの浸透が限られていた整合性および超低コストのソリューションを必要とするアプリケーションドメインに特に関係している。シリコンベースの技術とは異なり、peは非繰り返しのエンジニアリングコスト、超低製造コスト、コンフォーサル、フレキシブル、非毒性、伸縮可能なハードウェアのオンデマンド製造などの非並列的な機能を提供する。しかし、PEはその大きな特徴サイズのために一定の制限に直面しており、機械学習分類器のような複雑な回路の実現を妨げる。本研究では,近似計算の原理と(完全にカスタマイズされた)設計の原理を活用し,これらの制約に対処する。超低出力多層パーセプトロン(mlp)分類器の設計のための自動化フレームワークを提案する。これは初めて、mlpのニューロンの全ての機能を近似する包括的アプローチである、乗算、蓄積、活性化を用いる。各種のMLPを網羅的に評価することにより,最も複雑なMLPアーキテクチャであっても,バッテリ駆動による操作が可能であり,技術の現状を大きく上回っていることを示す。 Printed Electronics (PE) feature distinct and remarkable characteristics that make them a prominent technology for achieving true ubiquitous computing. This is particularly relevant in application domains that require conformal and ultra-low cost solutions, which have experienced limited penetration of computing until now. Unlike silicon-based technologies, PE offer unparalleled features such as non-recurring engineering costs, ultra-low manufacturing cost, and on-demand fabrication of conformal, flexible, non-toxic, and stretchable hardware. However, PE face certain limitations due to their large feature sizes, that impede the realization of complex circuits, such as machine learning classifiers. In this work, we address these limitations by leveraging the principles of Approximate Computing and Bespoke (fully-customized) design. We propose an automated framework for designing ultra-low power Multilayer Perceptron (MLP) classifiers which employs, for the first time, a holistic approach to approximate all functions of the MLP's neurons: multiplication, accumulation, and activation. Through comprehensive evaluation across various MLPs of varying size, our framework demonstrates the ability to enable battery-powered operation of even the most intricate MLP architecture examined, significantly surpassing the current state of the art.	翻訳日:2024-01-02 09:51:26 公開日:2023-12-29
# P2M2-Net: Part-Aware Prompt-Guided Multimodal Point Cloud Completion P2M2-Net: Part-Aware Prompt-Guided Multimodal Point Cloud Completion ( http://arxiv.org/abs/2312.17611v1 ) ライセンス: Link先を確認	Linlian Jiang, Pan Chen, Ye Wang, Tieru Wu, Rui Ma	(参考訳) 厳重に遮蔽された点雲からの欠落領域の推測は非常に困難である。特に、幾何学や構造の詳細が豊富な3次元形状では、未知の部分の固有の曖昧さが存在する。既存のアプローチでは、1対1のマッピングを教師ありの方法で学習するか、生成モデルを訓練して3dポイントクラウド形状の完了のための欠失点を合成する。しかし、これらの方法は完了過程の制御性に欠けており、結果は決定論的か制御されていない多様性を示す。そこで我々は,プロンプト駆動型データ生成と編集に着想を得て,p2m2-netと呼ばれる新しいプロンプト誘導型ポイントクラウド補完フレームワークを提案する。入力部分点クラウドと、意味論や欠落領域の構造といった部分認識情報を記述するテキストプロンプトが与えられた場合、トランスフォーマーベースのコンプリートネットワークは、マルチモーダル特徴を効率的に融合させ、プロンプトガイダンスに従って様々な結果を生成することができる。我々は、新しい大規模PartNet-PromptデータセットでP2M2-Netをトレーニングし、2つの挑戦的な形状補完ベンチマークで広範な実験を行う。定量および定性的な結果は、より制御可能な部分認識点雲の完成と生成のためのプロンプトを組み込むことの有効性を示している。コードとデータはhttps://github.com/JLU-ICL/P2M2-Netで公開されている。 Inferring missing regions from severely occluded point clouds is highly challenging. Especially for 3D shapes with rich geometry and structure details, inherent ambiguities of the unknown parts are existing. Existing approaches either learn a one-to-one mapping in a supervised manner or train a generative model to synthesize the missing points for the completion of 3D point cloud shapes. These methods, however, lack the controllability for the completion process and the results are either deterministic or exhibiting uncontrolled diversity. Inspired by the prompt-driven data generation and editing, we propose a novel prompt-guided point cloud completion framework, coined P2M2-Net, to enable more controllable and more diverse shape completion. Given an input partial point cloud and a text prompt describing the part-aware information such as semantics and structure of the missing region, our Transformer-based completion network can efficiently fuse the multimodal features and generate diverse results following the prompt guidance. We train the P2M2-Net on a new large-scale PartNet-Prompt dataset and conduct extensive experiments on two challenging shape completion benchmarks. Quantitative and qualitative results show the efficacy of incorporating prompts for more controllable part-aware point cloud completion and generation. Code and data are available at https://github.com/JLU-ICL/P2M2-Net.	翻訳日:2024-01-02 09:51:03 公開日:2023-12-29
# アクチュエータ劣化シナリオにおける四足ロボットの適応制御戦略 Adaptive Control Strategy for Quadruped Robots in Actuator Degradation Scenarios ( http://arxiv.org/abs/2312.17606v1 ) ライセンス: Link先を確認	Xinyuan Wu, Wentao Dong, Hang Lai, Yong Yu and Ying Wen	(参考訳) 四足歩行ロボットは極端な環境に強い適応性を持つが、欠点を経験することもある。これらの障害が発生したら、ロボットはタスクに戻る前に修理されなければならない。これらの欠点の1つがアクチュエータ劣化であり、デバイス老化や予期せぬ運用イベントなどの要因に起因する。伝統的に、この問題に対処するには複雑なフォールトトレラント設計に大きく依存している。学習に基づくアプローチは、これらの制限を緩和する効果的な方法を提供するが、現実世界の四足ロボットにそのような方法を効果的に配置する研究上のギャップが存在する。本稿では,Actuator Degradation Adaptation Transformer (ADAPT) という,強化学習に根ざした先駆的な教師学習フレームワークについて紹介する。このフレームワークは統合された制御戦略を生み出し、ロボットは内部センサーにのみ依存しながら、突然の関節アクチュエータ障害にもかかわらず、移動を維持およびタスクを実行することができる。 unitree a1プラットフォームにおける経験的評価は、実世界の四足ロボットへの適応の展開可能性と有効性を検証し、このアプローチの堅牢性と実用性を確認する。 Quadruped robots have strong adaptability to extreme environments but may also experience faults. Once these faults occur, robots must be repaired before returning to the task, reducing their practical feasibility. One prevalent concern among these faults is actuator degradation, stemming from factors like device aging or unexpected operational events. Traditionally, addressing this problem has relied heavily on intricate fault-tolerant design, which demands deep domain expertise from developers and lacks generalizability. Learning-based approaches offer effective ways to mitigate these limitations, but a research gap exists in effectively deploying such methods on real-world quadruped robots. This paper introduces a pioneering teacher-student framework rooted in reinforcement learning, named Actuator Degradation Adaptation Transformer (ADAPT), aimed at addressing this research gap. This framework produces a unified control strategy, enabling the robot to sustain its locomotion and perform tasks despite sudden joint actuator faults, relying exclusively on its internal sensors. Empirical evaluations on the Unitree A1 platform validate the deployability and effectiveness of Adapt on real-world quadruped robots, and affirm the robustness and practicality of our approach.	翻訳日:2024-01-02 09:50:37 公開日:2023-12-29
# 物体中心の運動制約の抽象化を用いた統合タスクと運動計画 Unified Task and Motion Planning using Object-centric Abstractions of Motion Constraints ( http://arxiv.org/abs/2312.17605v1 ) ライセンス: Link先を確認	Alejandro Agostini, Justus Piater	(参考訳) タスク・アンド・モーション・プランニング(tamp)では、タスク計画法で使用される抽象記述の曖昧さと過小決定は、タスクを成功させるために必要な物理的制約を特徴付けるのを困難にしている。通常のアプローチは、タスク計画レベルでそのような制約を見落とし、実現不可能な動作、計画修正、そして実現可能な解決策が見つかるまで再計画を行う、高価な準記号幾何学的推論手法を実装することである。本稿では,タスクとモーションプランニングを一つのヒューリスティック検索に統一するTAMP手法を提案する。提案手法は,既成のAIヒューリスティックサーチの計算効率を活用し,物理的に実現可能な計画を実現するための,オブジェクト中心の動作制約の抽象化に基づく。これらの計画は、集中的なサブシンボリックな幾何学的推論を必要とせずに、タスク実行のためのオブジェクトやモーションパラメータに直接変換することができる。 In task and motion planning (TAMP), the ambiguity and underdetermination of abstract descriptions used by task planning methods make it difficult to characterize physical constraints needed to successfully execute a task. The usual approach is to overlook such constraints at task planning level and to implement expensive sub-symbolic geometric reasoning techniques that perform multiple calls on unfeasible actions, plan corrections, and re-planning until a feasible solution is found. We propose an alternative TAMP approach that unifies task and motion planning into a single heuristic search. Our approach is based on an object-centric abstraction of motion constraints that permits leveraging the computational efficiency of off-the-shelf AI heuristic search to yield physically feasible plans. These plans can be directly transformed into object and motion parameters for task execution without the need of intensive sub-symbolic geometric reasoning.	翻訳日:2024-01-02 09:50:17 公開日:2023-12-29
# 量子グレードナノダイアモンドによる生体細胞の超垂直スピン検出 Quantum-grade nanodiamonds for ultrabright spin detection in live cells ( http://arxiv.org/abs/2312.17603v1 ) ライセンス: Link先を確認	Keisuke Oshimi, Hiromu Nakashima, Sara Mandi\'c, Hina Kobayashi, Minori Teramoto, Hirokazu Tsuji, Yoshiki Nishibayashi, Yutaka Shikano, Toshu An, and Masazumi Fujiwara	(参考訳) 光アクセス可能なスピン活性ナノ材料は、生物サンプルを探索するための量子ナノセンサーとして有望である。しかし、これらの材料に対するバイオイメージングレベルの明るさと高品質なスピン特性を達成することは困難であり、量子バイオセンシングへの応用を妨げる。ここでは、スピンレス12C-炭素同位体の濃縮と置換窒素スピン不純物低減によるスピン環境工学により、0.6-1.3ppm窒素空孔(NV)中心を含む超明るい蛍光ナノダイヤモンド(NDs)を実証する。培養細胞に容易に導入されたNDは、かなり狭く光学的に検出された磁気共鳴(ODMR)スペクトルを示し、従来のIb型NDに匹敵するODMR深度を与えるために16倍のマイクロ波励起電力を必要とした。 T1 = 0.68 ms と T2 = 1.6 us (1.6 ms と 2.7 us max) の平均スピン緩和時間は、それぞれタイプIbよりも5倍と10倍長い。本研究で得られたバルク状nvスピン特性と明るい蛍光は,生体用nd系量子センサの感度を著しく向上させた。 Optically accessible spin-active nanomaterials are promising as quantum nanosensors for probing biological samples. However, achieving bioimaging-level brightness and high-quality spin properties for these materials is challenging and hinders their application in quantum biosensing. Here, we demonstrate ultrabright fluorescent nanodiamonds (NDs) containing 0.6-1.3-ppm nitrogen-vacancy (NV) centers by spin-environment engineering via enriching spin-less 12C-carbon isotopes and reducing substitutional nitrogen spin impurities. The NDs, readily introduced into cultured cells, exhibited substantially narrow optically detected magnetic resonance (ODMR) spectra, requiring 16-times less microwave excitation power to give an ODMR depth comparable to that of conventional type-Ib NDs. They show average spin-relaxation times of T1 = 0.68 ms and T2 = 1.6 us (1.6 ms and 2.7 us maximum) that were 5- and 10-fold longer than those of type-Ib, respectively. The bulk-like NV spin properties and bright fluorescence demonstrated in this study significantly improve the sensitivity of ND-based quantum sensors for biological applications.	翻訳日:2024-01-02 09:49:59 公開日:2023-12-29
# タスク指向llmシステムの設計における可能性の専制性:スコーピング調査 The Tyranny of Possibilities in the Design of Task-Oriented LLM Systems: A Scoping Survey ( http://arxiv.org/abs/2312.17601v1 ) ライセンス: Link先を確認	Dhruv Dhamani and Mary Lou Maher	(参考訳) 本調査は,タスク指向LLMシステムの設計空間の現在の理解に焦点を当て,利用可能な設計パラメータの定義と関係について詳述する。本論文は、タスク指向のLLMシステムを定義し、複雑なソフトウェア開発タスクにおける多様なLLMシステム構成(単一LLM、単一LLMエージェント、複数のLLMエージェントシステムを含む)の性能を考察し、その結果を仮説化する思考実験を通して、そのようなシステムの設計空間を探求することから始まる。結果のパターンを議論し,それを3つの予想に定式化する。これらの予想は一部は誤った仮定に基づいているかもしれないが、将来の研究の出発点となる。次に,LLM増補研究の包括・組織化,技術推進,不確実性評価など,いくつかの設計パラメータについて検討した。本稿は,これらの分野の研究評価において,計算とエネルギー効率に重点が置かれていないことを指摘する。本研究は,プロンプト手法をマルチエージェントシステムと見なすことのできるレンズを提供するプロンプト手法のエージェント中心の投影を可能にするために,リニアおよび非線形コンテキストの概念を開発するための基礎を提供する。本稿では、llmプロンシングとllmベースのマルチエージェントシステム間の研究のクロスポーリン化、および既存のプロンシング技術に基づく合成トレーニングデータの生成における、このレンズの意義について述べる。いずれにせよ、スコーピング調査は将来の研究の指針となる7つの予想を提示している。 This scoping survey focuses on our current understanding of the design space for task-oriented LLM systems and elaborates on definitions and relationships among the available design parameters. The paper begins by defining a minimal task-oriented LLM system and exploring the design space of such systems through a thought experiment contemplating the performance of diverse LLM system configurations (involving single LLMs, single LLM-based agents, and multiple LLM-based agent systems) on a complex software development task and hypothesizes the results. We discuss a pattern in our results and formulate them into three conjectures. While these conjectures may be partly based on faulty assumptions, they provide a starting point for future research. The paper then surveys a select few design parameters: covering and organizing research in LLM augmentation, prompting techniques, and uncertainty estimation, and discussing their significance. The paper notes the lack of focus on computational and energy efficiency in evaluating research in these areas. Our survey findings provide a basis for developing the concept of linear and non-linear contexts, which we define and use to enable an agent-centric projection of prompting techniques providing a lens through which prompting techniques can be viewed as multi-agent systems. The paper discusses the implications of this lens, for the cross-pollination of research between LLM prompting and LLM-based multi-agent systems; and also, for the generation of synthetic training data based on existing prompting techniques in research. In all, the scoping survey presents seven conjectures that can help guide future research efforts.	翻訳日:2024-01-02 09:49:35 公開日:2023-12-29
# 製造業における環境サステナビリティへの拡張現実の応用と可能性を探る Exploring the current applications and potential of extended reality for environmental sustainability in manufacturing ( http://arxiv.org/abs/2312.17595v1 ) ライセンス: Link先を確認	Huizhong Cao, Henrik S\"oderlund, M\'elanie Derspeisse and Bj\"orn Johansson	(参考訳) 産業5.0への転換に対応して、デジタルツールの新たな応用とともに、環境持続性を優先する製造システムを求める声が高まっている。拡張現実(VR)、拡張現実(AR)、MR(Mixed Reality)を含む拡張現実(XR)は、産業用5.0のイネーブラーとして認識されている技術の1つである。 XRは、より持続可能な製造の原動力となる可能性があるが、その潜在的な環境効果は、あまり注目されていない。本稿では,環境サステナビリティの原則に関連するXR技術分野における現在の製造応用と研究について考察する。本稿の目的は,(1)現在検討中の文献・研究におけるxr技術の活用事例を特定し,製造における環境持続可能性に取り組むこと,(2)産業や企業に対して,環境持続的製造においてxrを実施するためのユースケース,ツールボックス,方法論,ワークフローを提供すること,の2点である。国立標準技術研究所(nist)によって開発された持続可能性指標の分類に基づき、著者らは実用的xrのユースケースの基準を用いて現在の文献を分析し、マッピングした。この調査は、環境の持続可能性を高める可能性がある製造におけるXR技術の現在の応用とユースケースをマッピングした。研究成果は文献に言及したユースケースとして提示され、将来の研究者や産業における実装のガイダンスやインスピレーションとして、環境サステナビリティのドライバとしてXRを用いている。さらに, 環境サステナビリティのドライバとしてのXRの関心を高めるため, 今後の研究と研究の議論を開いている。 In response to the transformation towards Industry 5.0, there is a growing call for manufacturing systems that prioritize environmental sustainability, alongside the emerging application of digital tools. Extended Reality (XR) - including Virtual Reality (VR), Augmented Reality (AR) and Mixed Reality (MR) - is one of the technologies identified as an enabler for Industry 5.0. XR could potentially also be a driver for more sustainable manufacturing: however, its potential environmental benefits have received limited attention. This paper aims to explore the current manufacturing applications and research within the field of XR technology connected to the environmental sustainability principle. The objectives of this paper are two-fold: (1) Identify the currently explored use cases of XR technology in literature and research, addressing environmental sustainability in manufacturing; (2) Provide guidance and references for industry and companies to use cases, toolboxes, methodologies, and workflows for implementing XR in environmental sustainable manufacturing practices. Based on the categorization of sustainability indicators, developed by the National Institute of Standards and Technology (NIST), the authors analyzed and mapped the current literature, with criteria of pragmatic XR use cases for manufacturing. The exploration resulted in a mapping of the current applications and use cases of XR technology within manufacturing that has the potential to drive environmental sustainability. The results are presented as stated use-cases with reference to the literature, contributing as guidance and inspiration for future researchers or implementations in industry, using XR as a driver for environmental sustainability. Furthermore, the authors open up the discussion for future work and research to increase the attention of XR as a driver for environmental sustainability.	翻訳日:2024-01-02 09:49:08 公開日:2023-12-29
# 効率的なvr製造のためのvrインタラクション:マルチユーザーvrナビゲーションプラットフォームのためのミニマップ VR interaction for efficient virtual manufacturing: mini map for multi-user VR navigation platform ( http://arxiv.org/abs/2312.17593v1 ) ライセンス: Link先を確認	Huizhong Cao, Henrik S\"oderlund, M\'elanie Despeisse, Francisco Garcia Rivera, and Bj\"orn Johansson	(参考訳) 過去10年間で、製造業におけるvrアプリケーションの価値とポテンシャルは、業界4.0以降の増加に伴い、大きな注目を集めている。レイアウト計画、仮想設計レビュー、オペレータトレーニングにおける効果は、これまでの研究で十分に確立されている。しかし、製造におけるVRの多くの機能要件と相互作用パラメータはあいまいに定義されている。探索を待っている分野は空間認識と学習であり、仮想製造システム内のナビゲーションを理解し空間データを処理するのに不可欠である。これは、仮想空間における参加者の空間意識が会議やデザインレビューの効率に大きく影響するマルチユーザーvrアプリケーションにおいて特に重要である。本稿では,仮想ファクトリーレイアウト計画のためのインタラクティブな位置決めマップに着目し,ナビゲーション支援としてのディジタルマップのユーザインタラクション設計について検討する。 VRゲーム産業から頻繁に使われる技術やインタラクティブマップを確立するために文献研究が行われた。複数のインタラクティブマップのデモは、Unityゲームエンジンを使用してVRマルチユーザープラットフォームに実装された包括的なA/Bテストを提供する。インタラクティブマップの5つの異なるプロトタイプが20人の参加者と40の検証済みデータストリームによってテスト、評価、評価された。その結果,インタラクティブマップの最も効率的なインタラクション設計について解析し,考察した。 Over the past decade, the value and potential of VR applications in manufacturing have gained significant attention in accordance with the rise of Industry 4.0 and beyond. Its efficacy in layout planning, virtual design reviews, and operator training has been well-established in previous studies. However, many functional requirements and interaction parameters of VR for manufacturing remain ambiguously defined. One area awaiting exploration is spatial recognition and learning, crucial for understanding navigation within the virtual manufacturing system and processing spatial data. This is particularly vital in multi-user VR applications where participants' spatial awareness in the virtual realm significantly influences the efficiency of meetings and design reviews. This paper investigates the interaction parameters of multi-user VR, focusing on interactive positioning maps for virtual factory layout planning and exploring the user interaction design of digital maps as navigation aid. A literature study was conducted in order to establish frequently used technics and interactive maps from the VR gaming industry. Multiple demonstrators of different interactive maps provide a comprehensive A/B test which were implemented into a VR multi-user platform using the Unity game engine. Five different prototypes of interactive maps were tested, evaluated and graded by the 20 participants and 40 validated data streams collected. The most efficient interaction design of interactive maps is thus analyzed and discussed in the study.	翻訳日:2024-01-02 09:48:38 公開日:2023-12-29
# ロバスト性向上と説明指導によるテキスト分類のための忠実な説明 Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training ( http://arxiv.org/abs/2312.17591v1 ) ライセンス: Link先を確認	Dongfang Li, Baotian Hu, Qingcai Chen, Shan He	(参考訳) 特徴属性法は、信頼できるAIに向けたディープニューラルネットワークに広く適用されてきたモデル予測の説明として重要な入力トークンを強調する。しかし、近年の研究では、これらの手法による説明は忠実で堅牢であるという課題に直面している。本稿では,テキスト分類のためのより忠実な説明(regex)に向けたロバスト性向上と説明指導トレーニングを提案する。まず,入力勾配正規化手法と仮想対角トレーニングによりモデルロバスト性を改善する。第二に、ノイズの多いトークンをマスクし、モデル注意と特徴属性の類似性を最大化し、外部情報をインポートすることなく自己学習の手順と見なすことができる。我々は,5つの帰属手法による6つのデータセットに対する広範な実験を行い,ドメイン外設定の忠実さを評価する。その結果、REGEXは全ての設定における説明の忠実度を向上し、さらに2つのランダム化テストに基づいて一貫したゲインを得ることがわかった。さらに,REGEXが生成したハイライト説明を用いて,選択列予測モデルをトレーニングすることにより,タスク性能をエンドツーエンド手法に匹敵することを示す。 Feature attribution methods highlight the important input tokens as explanations to model predictions, which have been widely applied to deep neural networks towards trustworthy AI. However, recent works show that explanations provided by these methods face challenges of being faithful and robust. In this paper, we propose a method with Robustness improvement and Explanation Guided training towards more faithful EXplanations (REGEX) for text classification. First, we improve model robustness by input gradient regularization technique and virtual adversarial training. Secondly, we use salient ranking to mask noisy tokens and maximize the similarity between model attention and feature attribution, which can be seen as a self-training procedure without importing other external information. We conduct extensive experiments on six datasets with five attribution methods, and also evaluate the faithfulness in the out-of-domain setting. The results show that REGEX improves fidelity metrics of explanations in all settings and further achieves consistent gains based on two randomization tests. Moreover, we show that using highlight explanations produced by REGEX to train select-then-predict models results in comparable task performance to the end-to-end method.	翻訳日:2024-01-02 09:48:18 公開日:2023-12-29
# 対話型進化アルゴリズムを用いたシェーダの手続き生成ツール A Tool for the Procedural Generation of Shaders using Interactive Evolutionary Algorithms ( http://arxiv.org/abs/2312.17587v1 ) ライセンス: Link先を確認	Elio Sasso, Daniele Loiacono, Pier Luca Lanzi	(参考訳) 本稿では,ゲーム開発のための商用ツールUnityエディタと統合されたインタラクティブな進化的アルゴリズムを用いて,シェーダの設計空間を探索するツールを提案する。我々のフレームワークは、最近のシェーダエディタの基盤となるグラフベースの表現とインタラクティブな進化を活用し、デザイナが既存のシェーダからいくつかのビジュアルオプションを探索できるようにする。我々のフレームワークは、現在のシェーダーのグラフ表現を染色体としてエンコードし、シェーダー個体群の進化を誘導する。グラフベースの組換えと突然変異をヒューリスティックに応用し、実現可能なシェーダーを作成する。このフレームワークはUnityエディタの拡張であり、進化計算(およびシェーダープログラミング)の知識がほとんどないデザイナは、ゲームシーンの作業に使用するのと同じビジュアルインターフェースを使用して、基盤となる進化エンジンと対話することができる。 We present a tool for exploring the design space of shaders using an interactive evolutionary algorithm integrated with the Unity editor, a well-known commercial tool for video game development. Our framework leverages the underlying graph-based representation of recent shader editors and interactive evolution to allow designers to explore several visual options starting from an existing shader. Our framework encodes the graph representation of a current shader as a chromosome used to seed the evolution of a shader population. It applies graph-based recombination and mutation with a set of heuristics to create feasible shaders. The framework is an extension of the Unity editor; thus, designers with little knowledge of evolutionary computation (and shader programming) can interact with the underlying evolutionary engine using the same visual interface used for working on game scenes.	翻訳日:2024-01-02 09:47:57 公開日:2023-12-29
# ファジィドライバ生成のためのプロンプトファジィ Prompt Fuzzing for Fuzz Driver Generation ( http://arxiv.org/abs/2312.17677v1 ) ライセンス: Link先を確認	Yunlong Lyu, Yuxuan Xie, Peng Chen and Hao Chen	(参考訳) 高品質なファズドライバを書くのは時間がかかり、ライブラリを深く理解する必要がある。しかし、最先端の自動ファズドライバ生成技術の性能は、多くの課題を残している。消費者コードから学習されたファズドライバは、深い状態に到達できるが、外部入力に制限される。一方、解釈ファジィはほとんどのAPIを探索できるが、膨大な検索空間において多くの試みが必要である。 PromptFuzzは,未知のライブラリコードを探索するためにファズドライバを反復的に生成するファズ処理を行う。ファジィファジィ処理におけるファジィドライバのAPI使用法を検討するために,命令型プログラム生成,誤プログラムのサニタイズ,カバレッジ誘導型プロンプト突然変異,制約付きファジィザ融合など,いくつかの重要な手法を提案する。 PromptFuzzを実装し,OSS-Fuzzと最先端のファズドライバ生成ソリューション(ホッパー)を比較し,実世界の14のライブラリ上での有効性を評価した。実験の結果, PromptFuzz が生成したファズドライバはOSS-Fuzz の 1.61 倍,Hopper の 1.67 倍のブランチカバレッジを実現していることがわかった。さらに、promptenfuzzによって生成されたfuzzドライバは、以前不明だった44件のクラッシュのうち33件の真のバグを検知し、それぞれのコミュニティによって27件のバグが確認された。 Writing high-quality fuzz drivers is time-consuming and requires a deep understanding of the library. However, the performance of the state-of-the-art automatic fuzz driver generation techniques leaves a lot to be desired. Fuzz drivers, which are learned from consumer code, can reach deep states but are restricted to their external inputs. On the other hand, interpretative fuzzing can explore most APIs but requires numerous attempts in a vast search space. We propose PromptFuzz, a coverage-guided fuzzer for prompt fuzzing that iteratively generates fuzz drivers to explore undiscovered library code. To explore API usage in fuzz drivers during prompt fuzzing, we proposed several key techniques: instructive program generation, erroneous program sanitization, coverage-guided prompt mutation, and constrained fuzzer fusion. We implemented PromptFuzz and evaluated its effectiveness on 14 real-world libraries, comparing it against OSS-Fuzz and the state-of-the-art fuzz driver generation solution (i.e., Hopper). The experiment results demonstrate that the fuzz drivers generated by PromptFuzz achieve higher branch coverage that is 1.61 times greater than that of OSS-Fuzz and 1.67 times greater than that of Hopper. In addition, the fuzz drivers generated by PromptFuzz successfully detect 33 true bugs out of a total of 44 crashes, which were previously unknown, and 27 of these bugs have been confirmed by the respective communities.	翻訳日:2024-01-02 09:24:20 公開日:2023-12-29
# Jatmo: タスク特化ファインタニングによるプロンプトインジェクション防御 Jatmo: Prompt Injection Defense by Task-Specific Finetuning ( http://arxiv.org/abs/2312.17673v1 ) ライセンス: Link先を確認	Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, and David Wagner	(参考訳) 大きな言語モデル(LLM)は、命令追従能力によって大きな研究の注目を集めており、ユーザや開発者は様々なタスクにLLMを利用することができる。しかし、LSMはプロンプトインジェクション攻撃に弱い:モデルの命令追従能力をハイジャックする攻撃のクラスで、望ましくない、おそらく悪質な攻撃に対して応答を変更する。本稿では,プロンプトインジェクション攻撃にレジリエントなタスク固有モデルを生成する方法であるjatmoを紹介する。 Jatmo は LLM が命令チューニングを受けたときのみ命令に従うことができるという事実を活用している。教師がチューニングしたモデルを使用してタスク固有のデータセットを生成し、ベースモデルを微調整する(非インストラクションチューニングされたモデル)。 Jatmoはタスクプロンプトとタスクの入力のデータセットのみを必要とし、教師モデルを使用して出力を生成する。既存のデータセットが存在しない状況では、Jatmoは単一の例、場合によってはまったく使用せず、完全な合成データセットを生成することができる。 6つのタスクに対する実験の結果,ジャトモモデルでは標準LLMと同じ品質の出力が得られる一方で,インジェクションの応答性も高いことがわかった。 GPT-3.5-Turboに対する90%以上の成功率に対して、最良の攻撃は、我々のモデルに対する0.5%未満のケースで成功した。 Jatmoはhttps://github.com/wagner-group/prompt-injection-defense.comでリリースしています。 Large Language Models (LLMs) are attracting significant research attention due to their instruction-following abilities, allowing users and developers to leverage LLMs for a variety of tasks. However, LLMs are vulnerable to prompt-injection attacks: a class of attacks that hijack the model's instruction-following abilities, changing responses to prompts to undesired, possibly malicious ones. In this work, we introduce Jatmo, a method for generating task-specific models resilient to prompt-injection attacks. Jatmo leverages the fact that LLMs can only follow instructions once they have undergone instruction tuning. It harnesses a teacher instruction-tuned model to generate a task-specific dataset, which is then used to fine-tune a base model (i.e., a non-instruction-tuned model). Jatmo only needs a task prompt and a dataset of inputs for the task: it uses the teacher model to generate outputs. For situations with no pre-existing datasets, Jatmo can use a single example, or in some cases none at all, to produce a fully synthetic dataset. Our experiments on six tasks show that Jatmo models provide the same quality of outputs on their specific task as standard LLMs, while being resilient to prompt injections. The best attacks succeeded in less than 0.5% of cases against our models, versus over 90% success rate against GPT-3.5-Turbo. We release Jatmo at https://github.com/wagner-group/prompt-injection-defense.	翻訳日:2024-01-02 09:23:53 公開日:2023-12-29
# 非相互作用電子の格子リングにおける測定誘起クロック Measurement-induced Clock in a Lattice Ring of Non-interacting Electrons ( http://arxiv.org/abs/2312.17672v1 ) ライセンス: Link先を確認	David S. Schlegel, Stefan Kehrein	(参考訳) 本研究では, 外部駆動を伴わない非相互作用定常量子系における周期性の出現について検討した。具体的には、弱い局所位置測定を行う非相互作用電子の格子環を考える。本研究では, 定常二時間相関関数の周期構造を解析し, 系の群速度と周期性の関係を明らかにする。本研究は、非平衡定常状態の2時間相関器における周期的挙動を強調し、最小相互作用量子系における周期的現象の理解に寄与する測定誘起クロック機構を示す。 We examine the emergence of periodicity in a non-interacting steady-state quantum system without external drive inspired by quantum time crystals' spontaneous time-translation symmetry breaking. Specifically, we consider a lattice ring of non-interacting electrons undergoing weak local position measurements. Our analysis uncovers time-periodic structures in steady-state two-time correlation functions, with periodicity linked to the system's group velocity. This study demonstrates a measurement-induced clock mechanism, highlighting periodic behaviors in two-time correlators of a non-equilibrium steady state, contributing to understanding time-periodic phenomena in minimally interactive quantum systems.	翻訳日:2024-01-02 09:23:07 公開日:2023-12-29
# topcowチャレンジによる牛のベンチマーク--ctaとmraのためのウィリス円のトポロジー認識解剖学的セグメンテーション Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA ( http://arxiv.org/abs/2312.17670v1 ) ライセンス: Link先を確認	Kaiyuan Yang, Fabio Musio, Yihui Ma, Norman Juchler, Johannes C. Paetzold, Rami Al-Maskari, Luciano H\"oher, Hongwei Bran Li, Ibrahim Ethem Hamamci, Anjany Sekuboyina, Suprosanna Shit, Houjing Huang, Diana Waldmannstetter, Florian Kofler, Fernando Navarro, Martin Menten, Ivan Ezhov, Daniel Rueckert, Iris Vos, Ynte Ruigrok, Birgitta Velthuis, Hugo Kuijf, Julien H\"ammerli, Catherine Wurster, Philippe Bijlenga, Laura Westphal, Jeroen Bisschop, Elisa Colombo, Hakim Baazaoui, Andrew Makmur, James Hallinan, Bene Wiestler, Jan S. Kirschke, Roland Wiest, Emmanuel Montagnon, Laurent Letourneau-Guillon, Adrian Galdran, Francesco Galati, Daniele Falcetta, Maria A. Zuluaga, Chaolong Lin, Haoran Zhao, Zehan Zhang, Sinyoung Ra, Jongyun Hwang, Hyunjin Park, Junqiang Chen, Marek Wodzinski, Henning M\"uller, Pengcheng Shi, Wei Liu, Ting Ma, Cansu Yal\c{c}in, Rachika E. Hamadache, Joaquim Salvi, Xavier Llado, Uma Maria Lal-Trehan Estrada, Valeriia Abramova, Luca Giancardo, Arnau Oliver, Jialu Liu, Haibin Huang, Yue Cui, Zehang Lin, Yusheng Liu, Shunzhi Zhu, Tatsat R. Patel, Vincent M. Tutino, Maysam Orouskhani, Huayu Wang, Mahmud Mossa-Basha, Chengcheng Zhu, Maximilian R. Rokuss, Yannick Kirchhoff, Nico Disch, Julius Holzschuh, Fabian Isensee, Klaus Maier-Hein, Yuki Sato, Sven Hirsch, Susanne Wegener, Bjoern Menze	(参考訳) ウィリス循環(英: Circle of Willis、CoW)は、脳の主要な循環を繋ぐ重要な動脈網である。その血管構造は、重度の神経血管疾患のリスク、重症度、および臨床結果に影響すると考えられている。しかし、高度に可変なCoW解剖を特徴付けることは、まだ手作業であり、時間を要する専門家のタスクである。 CoWは通常、磁気共鳴血管造影(MRA)とCTアンギオグラフィ(CTA)の2つのアンギオグラフィーによって画像化されるが、CoW解剖学、特にCTAのアノテーションを付加したパブリックデータセットは限られている。そこで2023年,注釈付きCoWデータセットの公開とともにTopCoW Challengeを組織し,CoWセグメンテーションタスクを世界中に招待し,4大陸から140人以上の登録参加者を集めた。 TopCoWデータセットは、仮想現実性(VR)技術によって実現された、CoWの13のコンテナコンポーネントに対するボクセルレベルのアノテーションを備えた最初のパブリックデータセットである。また、同じ患者のMRAとCTAをペアにした最初のデータセットでもある。 TopCoWの課題は、トポロジカルメトリクスを重視したマルチクラス解剖学的セグメンテーションタスクとして、CoWの特性問題に取り組むことであった。トップパフォーマンスのチームは、多くの牛の成分を割って約90%のスコアを得ることができたが、低いスコアで動脈や希少な変種を伝えることができた。また、高いサイコロスコアの予測には位相的誤りがあった。追加のトポロジ解析により、特定のCoW成分の検出とCoW変異体のトポロジの整合性の向上が示された。 topcowは、mraとctaの牛解剖学的セグメンテーションタスクを形態学的および位相的にベンチマークする最初の試みであった。 The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neuro-vascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two angiographic imaging modalities, magnetic resonance angiography (MRA) and computed tomography angiography (CTA), but there exist limited public datasets with annotations on CoW anatomy, especially for CTA. Therefore we organized the TopCoW Challenge in 2023 with the release of an annotated CoW dataset and invited submissions worldwide for the CoW segmentation task, which attracted over 140 registered participants from four continents. TopCoW dataset was the first public dataset with voxel-level annotations for CoW's 13 vessel components, made possible by virtual-reality (VR) technology. It was also the first dataset with paired MRA and CTA from the same patients. TopCoW challenge aimed to tackle the CoW characterization problem as a multiclass anatomical segmentation task with an emphasis on topological metrics. The top performing teams managed to segment many CoW components to Dice scores around 90%, but with lower scores for communicating arteries and rare variants. There were also topological mistakes for predictions with high Dice scores. Additional topological analysis revealed further areas for improvement in detecting certain CoW components and matching CoW variant's topology accurately. TopCoW represented a first attempt at benchmarking the CoW anatomical segmentation task for MRA and CTA, both morphologically and topologically.	翻訳日:2024-01-02 09:22:38 公開日:2023-12-29
# AIJack: マシンラーニングのためのセキュリティとプライバシリスクシミュレータ AIJack: Security and Privacy Risk Simulator for Machine Learning ( http://arxiv.org/abs/2312.17667v1 ) ライセンス: Link先を確認	Hideaki Takahashi	(参考訳) 本稿では,機械学習モデルのトレーニングとデプロイメントに関連するセキュリティとプライバシのリスクを評価可能な,オープンソースのライブラリであるaijackを紹介する。ビッグデータとaiへの関心が高まる中、機械学習の研究とビジネスの進歩が加速している。しかし、最近の研究では、トレーニングデータの盗難や悪意のある攻撃者によるモデルの操作など、潜在的な脅威が明らかになっている。したがって、機械学習のセキュリティとプライバシの脆弱性に関する包括的な理解は、機械学習を現実世界の製品に安全に統合するために不可欠である。 AIJackは、統一されたAPIを通じて、さまざまな攻撃および防御メソッドを備えたライブラリを提供することで、このニーズに対処することを目指している。ライブラリはgithubで公開されている(https://github.com/koukyosyumei/aijack)。 This paper introduces AIJack, an open-source library designed to assess security and privacy risks associated with the training and deployment of machine learning models. Amid the growing interest in big data and AI, advancements in machine learning research and business are accelerating. However, recent studies reveal potential threats, such as the theft of training data and the manipulation of models by malicious attackers. Therefore, a comprehensive understanding of machine learning's security and privacy vulnerabilities is crucial for the safe integration of machine learning into real-world products. AIJack aims to address this need by providing a library with various attack and defense methods through a unified API. The library is publicly available on GitHub (https://github.com/Koukyosyumei/AIJack).	翻訳日:2024-01-02 09:22:00 公開日:2023-12-29
# ユーザ戦略と信頼できるアルゴリズム User Strategization and Trustworthy Algorithms ( http://arxiv.org/abs/2312.17666v1 ) ライセンス: Link先を確認	Sarah H. Cen, Andrew Ilyas, Aleksander Madry	(参考訳) 推薦システムや採用決定ツールなど、多くの人間向けアルゴリズムは、ユーザが提供するデータに基づいて訓練されている。これらのアルゴリズムの開発者は、データ生成プロセスが外在的であるという仮定を採用する。つまり、ユーザーが与えられたプロンプトにどのように反応するか(例えば、レコメンデーションや採用提案)は、そのプロンプトに依存し、生成したアルゴリズムに依存しない。例えば、ある人の行動が地対真実分布に従うという仮定は、外生的な仮定である。実際には、アルゴリズムが人間と対話する場合、この仮定はユーザーが戦略的であることから、ほとんど成り立たない。例えば最近の研究文書では、tiktokユーザーはtiktokがフィードのキュレーションに使っていることを知った後にスクロールの振る舞いを変更し、uberのドライバーはuberのアルゴリズムの変更に応じて乗車の受け入れとキャンセルの仕方を変えている。本研究は,ユーザとデータ駆動型プラットフォーム間のインタラクションを,反復的な2人プレイゲームとしてモデル化することで,この戦略的行動の意義を考察する。まず最初に、ユーザストラテジフィケーションが短期的にプラットフォームに役立つことが分かりました。そして、それがプラットフォームのデータを破壊し、最終的に反実的な決定をする能力を損なうことを示します。この現象をユーザの信頼と結びつけて,信頼に値するアルゴリズムを設計することで,正確な推定を行うことができることを示す。最後に、潜在的介入を促す信頼の形式化を提供します。 Many human-facing algorithms -- including those that power recommender systems or hiring decision tools -- are trained on data provided by their users. The developers of these algorithms commonly adopt the assumption that the data generating process is exogenous: that is, how a user reacts to a given prompt (e.g., a recommendation or hiring suggestion) depends on the prompt and not on the algorithm that generated it. For example, the assumption that a person's behavior follows a ground-truth distribution is an exogeneity assumption. In practice, when algorithms interact with humans, this assumption rarely holds because users can be strategic. Recent studies document, for example, TikTok users changing their scrolling behavior after learning that TikTok uses it to curate their feed, and Uber drivers changing how they accept and cancel rides in response to changes in Uber's algorithm. Our work studies the implications of this strategic behavior by modeling the interactions between a user and their data-driven platform as a repeated, two-player game. We first find that user strategization can actually help platforms in the short term. We then show that it corrupts platforms' data and ultimately hurts their ability to make counterfactual decisions. We connect this phenomenon to user trust, and show that designing trustworthy algorithms can go hand in hand with accurate estimation. Finally, we provide a formalization of trustworthiness that inspires potential interventions.	翻訳日:2024-01-02 09:21:40 公開日:2023-12-29
# Shape-IoU: ボックス形状とスケールのバウンディングを考慮した高精度メトリック Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale ( http://arxiv.org/abs/2312.17663v1 ) ライセンス: Link先を確認	Hao Zhang, Shuaijie Zhang	(参考訳) 検出器ローカライゼーションブランチの重要な構成要素として、境界ボックス回帰損失はオブジェクト検出タスクにおいて重要な役割を果たす。既設のバウンディングボックス回帰法は,通常,gtボックスと予測ボックスの幾何学的関係を考慮し,バウンディングボックスの相対位置と形状を用いて損失を算出し,バウンディングボックスの形状やスケールといった固有の特性がバウンディングボックス回帰に与える影響を無視する。本稿では,既存の研究の欠点を補うために,境界箱自体の形状とスケールに着目したバウンディングボックス回帰法を提案する。まず,境界ボックスの回帰特性を分析し,境界ボックス自体の形状とスケール係数が回帰結果に影響を及ぼすことを発見した。以上の結論に基づいて,境界箱自体の形状とスケールに着目して損失を計算し,境界箱の回帰をより正確にする形状IoU法を提案する。最後に,本手法を多数の比較実験により検証し,検出性能を効果的に向上し,既存の手法を上回り,異なる検出タスクで最先端のパフォーマンスを実現することを実証した。 As an important component of the detector localization branch, bounding box regression loss plays a significant role in object detection tasks. The existing bounding box regression methods usually consider the geometric relationship between the GT box and the predicted box, and calculate the loss by using the relative position and shape of the bounding boxes, while ignoring the influence of inherent properties such as the shape and scale of the bounding boxes on bounding box regression. In order to make up for the shortcomings of existing research, this article proposes a bounding box regression method that focuses on the shape and scale of the bounding box itself. Firstly, we analyzed the regression characteristics of the bounding boxes and found that the shape and scale factors of the bounding boxes themselves will have an impact on the regression results. Based on the above conclusions, we propose the Shape IoU method, which can calculate the loss by focusing on the shape and scale of the bounding box itself, thereby making the bounding box regression more accurate. Finally, we validated our method through a large number of comparative experiments, which showed that our method can effectively improve detection performance and outperform existing methods, achieving state-of-the-art performance in different detection tasks.Code is available at https://github.com/malagoutou/Shape-IoU	翻訳日:2024-01-02 09:21:17 公開日:2023-12-29
# Gemini in Reasoning: マルチモーダル大規模言語モデルにおける共通理解の展開 Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models ( http://arxiv.org/abs/2312.17661v1 ) ライセンス: Link先を確認	Yuqing Wang, Yun Zhao	(参考訳) OpenAIのGPT-4V(ision)のようなMLLM(Multimodal Large Language Models)への関心は、学術界と産業界の両方に大きな影響を与えている。これらのモデルは、高度な視覚的理解機能を備えたLarge Language Models (LLM)を強化し、様々なマルチモーダルタスクでアプリケーションを容易にする。最近、Googleはマルチモーダル統合に特化した最先端のMLLMであるGeminiを発表した。その進歩にもかかわらず、予備ベンチマークはgeminiが共通意味推論タスクにおいてgptモデルより遅れていることを示している。しかしながら、この評価は限られたデータセット(すなわちhelaswag)に基づいており、geminiの真のコモンセンス推論ポテンシャルを完全には捉えていない。このギャップに対処するため,本研究では,モダリティ間の共通認識知識の統合を必要とする複雑な推論タスクにおけるgeminiの性能を徹底的に評価する。一般的なタスクからドメイン固有のタスクまで,12のコモンセンス推論データセットを包括的に分析した。これには言語のみに焦点を当てた11のデータセットと、マルチモーダル要素を含むデータセットが含まれている。 4つのLLMと2つのMLLMにわたる実験は、ジェミニの競合するコモンセンス推論能力を示す。さらに,既存のLLMやMLLMが抱えるコモンセンス問題に対処する上での共通課題を明らかにし,これらのモデルのコモンセンス推論能力のさらなる向上の必要性を強調した。 The burgeoning interest in Multimodal Large Language Models (MLLMs), such as OpenAI's GPT-4V(ision), has significantly impacted both academic and industrial realms. These models enhance Large Language Models (LLMs) with advanced visual understanding capabilities, facilitating their application in a variety of multimodal tasks. Recently, Google introduced Gemini, a cutting-edge MLLM designed specifically for multimodal integration. Despite its advancements, preliminary benchmarks indicate that Gemini lags behind GPT models in commonsense reasoning tasks. However, this assessment, based on a limited dataset (i.e., HellaSWAG), does not fully capture Gemini's authentic commonsense reasoning potential. To address this gap, our study undertakes a thorough evaluation of Gemini's performance in complex reasoning tasks that necessitate the integration of commonsense knowledge across modalities. We carry out a comprehensive analysis of 12 commonsense reasoning datasets, ranging from general to domain-specific tasks. This includes 11 datasets focused solely on language, as well as one that incorporates multimodal elements. Our experiments across four LLMs and two MLLMs demonstrate Gemini's competitive commonsense reasoning capabilities. Additionally, we identify common challenges faced by current LLMs and MLLMs in addressing commonsense problems, underscoring the need for further advancements in enhancing the commonsense reasoning abilities of these models.	翻訳日:2024-01-02 09:20:54 公開日:2023-12-29
# 正規表現を用いたリトアニア語の正規化 Normalization of Lithuanian Text Using Regular Expressions ( http://arxiv.org/abs/2312.17660v1 ) ライセンス: Link先を確認	Pijus Kasparaitis	(参考訳) テキスト正規化は、音声合成システムにおいて不可欠な部分である。自然言語のテキストには、数、日付、略語など他の記号類に属する要素がある。これらは非標準語(NSW)と呼ばれ、通常の語に拡張する必要がある。この目的のためには、各NSWのセミオティッククラスを特定する必要がある。リトアニア語に適応したセミオティッククラスの分類が本書で提示されている。 nswsを正規表現に基づいて検出および拡張するためのルールセットが作成される。 3つの全く異なるデータセットで実験を行い、精度を評価した。誤りの原因は説明され、テキスト正規化ルールの開発に推奨される。 Text Normalization is an integral part of any text-to-speech synthesis system. In a natural language text, there are elements such as numbers, dates, abbreviations, etc. that belong to other semiotic classes. They are called non-standard words (NSW) and need to be expanded into ordinary words. For this purpose, it is necessary to identify the semiotic class of each NSW. The taxonomy of semiotic classes adapted to the Lithuanian language is presented in the work. Sets of rules are created for detecting and expanding NSWs based on regular expressions. Experiments with three completely different data sets were performed and the accuracy was assessed. Causes of errors are explained and recommendations are given for the development of text normalization rules.	翻訳日:2024-01-02 09:20:30 公開日:2023-12-29
# 機械学習モデルに基づくutqにおける太陽放射予測 Solar Radiation Prediction in the UTEQ based on Machine Learning Models ( http://arxiv.org/abs/2312.17659v1 ) ライセンス: Link先を確認	Jordy Anchundia Troncoso, \'Angel Torres Quijije, Byron Oviedo and Cristian Zambrano-Vega	(参考訳) 本研究は、ケベド国立工科大学(uteq)中央キャンパスにおいて、太陽放射の予測に用いられる様々な機械学習(ml)モデルの有効性を検討するものである。データはピラノメーターから得られたもので、戦略的にはキャンパスの高所に位置する。この装置は、2020年以来、太陽放射データを継続的に記録し、様々な気象条件と時間変動を含む包括的なデータセットを提供する。相関分析の結果,太陽放射に影響を及ぼす関連する気象変数として,気温と日時が同定された。評価指標である平均二乗誤差(mse)、根平均二乗誤差(rmse)、平均絶対誤差(mae)、決定係数(r^2$)を用いて、線形回帰、k-ネアレスト近傍、決定木、勾配ブースティングなどの異なる機械学習アルゴリズムを比較した。研究では、グラディエント・ブースティング・レグレッショナーが優れた性能を示し、Random Forest Regressorがそれに続いた。これらのモデルは、低いmseと高い$r^2$値で示されるように、太陽放射の非線形パターンを効果的に捉えた。 MLモデルの性能を評価するため、我々はUTEQにおける太陽放射予測のためのWebベースのツールを開発した。その結果,日射予測におけるMLモデルの有効性を実証し,太陽エネルギーの効率的な管理を支援するリアルタイム日射予測における実用的有用性を示した。 This research explores the effectiveness of various Machine Learning (ML) models used to predicting solar radiation at the Central Campus of the State Technical University of Quevedo (UTEQ). The data was obtained from a pyranometer, strategically located in a high area of the campus. This instrument continuously recorded solar irradiance data since 2020, offering a comprehensive dataset encompassing various weather conditions and temporal variations. After a correlation analysis, temperature and the time of day were identified as the relevant meteorological variables that influenced the solar irradiance. Different machine learning algorithms such as Linear Regression, K-Nearest Neighbors, Decision Tree, and Gradient Boosting were compared using the evaluation metrics Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the Coefficient of Determination ($R^2$). The study revealed that Gradient Boosting Regressor exhibited superior performance, closely followed by the Random Forest Regressor. These models effectively captured the non-linear patterns in solar radiation, as evidenced by their low MSE and high $R^2$ values. With the aim of assess the performance of our ML models, we developed a web-based tool for the Solar Radiation Forecasting in the UTEQ available at http://https://solarradiationforecastinguteq.streamlit.app/. The results obtained demonstrate the effectiveness of our ML models in solar radiation prediction and contribute a practical utility in real-time solar radiation forecasting, aiding in efficient solar energy management.	翻訳日:2024-01-02 09:20:23 公開日:2023-12-29
# マトリックス生成状態をもつフェルミオン回路の高速エミュレーション Fast emulation of fermionic circuits with matrix product states ( http://arxiv.org/abs/2312.17657v1 ) ライセンス: Link先を確認	Justin Provazza, Klaas Gunst, Huanchen Zhai, Garnet K.-L. Chan, Toru Shiozaki, Nicholas C. Rubin, Alec F. White	(参考訳) 本稿では,fermionic quantum emulator (fqe)ソフトウェアライブラリのための行列積状態 (mps) 拡張について述べる。本稿では、スピン1/2フェルミオンの多体波動関数を近似するための対称性適応行列積状態の理論について論じ、FQEインタフェース(MPS-FQE)のオープンソース実装について述べる。このソフトウェアは、ほとんどの基本テンソル演算にオープンソースのpyblock3とblock2ライブラリを使用し、fqeのドロップイン代替として、より大きなフェルミオン回路をより効率的だが近似的にエミュレーションすることができる。最後に,より大きな系の近似エミュレーションが期待できる短期的およびフォールトトレラントな量子アルゴリズムについて,量子位相推定のための状態生成戦略のキャラクタリゼーション,異なる変分量子固有ソルバ ans\"atze のテスト,トロッター誤差の数値評価,一般量子力学問題のシミュレーションなど,いくつかの応用例を示す。これらすべての例において、MPS-FQEによる近似エミュレーションにより、フルステートベクターエミュレータで利用できるものよりもはるかに大きいシステムを扱うことができる。 We describe a matrix product state (MPS) extension for the Fermionic Quantum Emulator (FQE) software library. We discuss the theory behind symmetry adapted matrix product states for approximating many-body wavefunctions of spin-1/2 fermions, and we present an open-source, MPS-enabled implementation of the FQE interface (MPS-FQE). The software uses the open-source pyblock3 and block2 libraries for most elementary tensor operations, and it can largely be used as a drop-in replacement for FQE that allows for more efficient, but approximate, emulation of larger fermionic circuits. Finally, we show several applications relevant to both near-term and fault-tolerant quantum algorithms where approximate emulation of larger systems is expected to be useful: characterization of state preparation strategies for quantum phase estimation, the testing of different variational quantum eigensolver Ans\"atze, the numerical evaluation of Trotter errors, and the simulation of general quantum dynamics problems. In all these examples, approximate emulation with MPS-FQE allows us to treat systems that are significantly larger than those accessible with a full statevector emulator	翻訳日:2024-01-02 09:19:53 公開日:2023-12-29
# スケーラブルな自動運転を実現するvisual point cloud forecasting Visual Point Cloud Forecasting enables Scalable Autonomous Driving ( http://arxiv.org/abs/2312.17655v1 ) ライセンス: Link先を確認	Zetong Yang, Li Chen, Yanan Sun, Hongyang Li	(参考訳) 一般ビジョンに関する広範な研究とは対照的に、スケーラブルな視覚自律運転のための事前トレーニングは、ほとんど検討されていない。視覚自律運転アプリケーションは、共同認識、予測、計画のためのセマンティクス、3次元幾何学、時間情報を同時に含む機能を必要とする。これを解決するために、視覚点雲予測と呼ばれる新しい事前学習タスクを導入し、過去の視覚入力から将来の点雲を予測する。このタスクの重要な利点は、意味学、3D構造、時間力学のシナジー学習を捉えることである。したがって、様々な下流タスクにおいて優位性を示す。この問題に対処するために、下流のビジュアルエンコーダを事前学習するための一般的なモデルViDARを提案する。最初にエンコーダによる歴史的埋め込みを抽出する。これらの表現は、将来のポイントクラウド予測のために、新しい潜在レンダリング演算子を介して3次元幾何学空間に変換される。実験では、例えば3D検出における3.1%のNDS、モーション予測における10%の誤差削減、計画における衝突率の15%の削減といった下流タスクが顕著に向上した。 In contrast to extensive studies on general vision, pre-training for scalable visual autonomous driving remains seldom explored. Visual autonomous driving applications require features encompassing semantics, 3D geometry, and temporal information simultaneously for joint perception, prediction, and planning, posing dramatic challenges for pre-training. To resolve this, we bring up a new pre-training task termed as visual point cloud forecasting - predicting future point clouds from historical visual input. The key merit of this task captures the synergic learning of semantics, 3D structures, and temporal dynamics. Hence it shows superiority in various downstream tasks. To cope with this new problem, we present ViDAR, a general model to pre-train downstream visual encoders. It first extracts historical embeddings by the encoder. These representations are then transformed to 3D geometric space via a novel Latent Rendering operator for future point cloud prediction. Experiments show significant gain in downstream tasks, e.g., 3.1% NDS on 3D detection, ~10% error reduction on motion forecasting, and ~15% less collision rate on planning.	翻訳日:2024-01-02 09:19:30 公開日:2023-12-29
# Effecitve クロスモーダル蒸留による視覚接地のためのブリジングモダリティギャップ Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal Distillation ( http://arxiv.org/abs/2312.17648v1 ) ライセンス: Link先を確認	Jiaxi Wang, Wenhui Hu, Xueyang Liu, Beihu Wu, Yuting Qiu, YingYing Cai	(参考訳) ビジュアルグラウンドティングは、画像の特定の領域の視覚情報を対応する自然言語表現と整合させることを目的としている。現在の視覚接地法は、事前訓練された視覚と言語バックボーンを別々に活用し、視覚の特徴と言語的特徴を得る。これら2つの機能はデリケートに設計されたネットワークを介して融合されるが、機能の多様性によってマルチモーダル推論には適用できない。この問題は、現在の視覚的接地法で使用される単一モード事前学習バックボーン間のドメインギャップから生じており、従来のエンドツーエンドのトレーニング手法では克服できない。そこで本研究では,マルチモーダル事前学習モデルを蒸留し,視覚的接地作業の指導を行うEmpowering Pre-trained Model for Visual Grounding (EpmVG)フレームワークを提案する。 EpmVGは、トレーニング済みモデルにおける画像とテキストの一貫性情報を効果的に導入し、バックボーンネットワークに存在するドメインギャップを低減し、視覚的グラウンド処理におけるモデルの性能を向上させる、新しいクロスモーダル蒸留機構に基づいている。従来の5つのデータセットに対して大規模な実験を行い,本手法が最先端手法よりも優れた性能を発揮することを示す。 Visual grounding aims to align visual information of specific regions of images with corresponding natural language expressions. Current visual grounding methods leverage pre-trained visual and language backbones separately to obtain visual features and linguistic features. Although these two types of features are then fused via delicately designed networks, the heterogeneity of the features makes them inapplicable for multi-modal reasoning. This problem arises from the domain gap between the single-modal pre-training backbone used in current visual grounding methods, which can hardly be overcome by the traditional end-to-end training method. To alleviate this, our work proposes an Empowering pre-trained model for Visual Grounding (EpmVG) framework, which distills a multimodal pre-trained model to guide the visual grounding task. EpmVG is based on a novel cross-modal distillation mechanism, which can effectively introduce the consistency information of images and texts in the pre-trained model, to reduce the domain gap existing in the backbone networks, thereby improving the performance of the model in the visual grounding task. Extensive experiments are carried out on five conventionally used datasets, and results demonstrate that our method achieves better performance than state-of-the-art methods.	翻訳日:2024-01-02 09:19:12 公開日:2023-12-29
# 異文化的観点からのマルチモーダル知覚・認知の法則の研究 -海外の中国庭園を例として- Research on the Laws of Multimodal Perception and Cognition from a Cross-cultural Perspective -- Taking Overseas Chinese Gardens as an Example ( http://arxiv.org/abs/2312.17642v1 ) ライセンス: Link先を確認	Ran Chen, Xueqi Yao, Jing Zhao, Shuhan Xu, Sirui Zhang, Yijun Mao	(参考訳) 本研究では,マルチモーダルデータ解析における知覚的相互作用と認知的相互作用の複雑な関係を,海外の中国庭園における空間的体験設計を中心に検討することを目的とする。ソーシャルメディア上での評価内容や画像は個人の関心や感情反応を反映し,感情情報とイメージベースの認知情報の両方を含む認知研究のための豊富なデータベースを提供する。深層学習技術を活用し,ソーシャルメディアからのテキストデータと視覚データを分析し,海外の中国庭園の文脈における人々の認識と感情認知の関係を明らかにする。さらに,本研究では,マルチエージェントシステム(mas)をaiエージェントとして導入する。各エージェントは、チャットシーンシミュレーションとweb検索を組み合わせることで、美的認知の法則を探求する。この研究は、知覚を感情スコアに翻訳する従来のアプローチを超えて、テキストを直接分析し、意見データを深く掘り下げる研究手法の拡張を可能にする。本研究は,文化コミュニケーションと美的理解の分野における重要な貢献である,多様な文化的文脈における美的体験とその建築・景観デザインへの影響を理解するための新しい視点を提供する。 This study aims to explore the complex relationship between perceptual and cognitive interactions in multimodal data analysis,with a specific emphasis on spatial experience design in overseas Chinese gardens. It is found that evaluation content and images on social media can reflect individuals' concerns and sentiment responses, providing a rich data base for cognitive research that contains both sentimental and image-based cognitive information. Leveraging deep learning techniques, we analyze textual and visual data from social media, thereby unveiling the relationship between people's perceptions and sentiment cognition within the context of overseas Chinese gardens. In addition, our study introduces a multi-agent system (MAS)alongside AI agents. Each agent explores the laws of aesthetic cognition through chat scene simulation combined with web search. This study goes beyond the traditional approach of translating perceptions into sentiment scores, allowing for an extension of the research methodology in terms of directly analyzing texts and digging deeper into opinion data. This study provides new perspectives for understanding aesthetic experience and its impact on architecture and landscape design across diverse cultural contexts, which is an essential contribution to the field of cultural communication and aesthetic understanding.	翻訳日:2024-01-02 09:18:49 公開日:2023-12-29
# mod2t:モデルデータ駆動運動静物追跡法 MoD2T:Model-Data-Driven Motion-Static Object Tracking Method ( http://arxiv.org/abs/2312.17641v1 ) ライセンス: Link先を確認	Yang Feng, Liao Pan, Wu Di, Liu Bo, Zhang Xingle	(参考訳) マルチオブジェクト追跡(MOT)の領域は、ビデオ分析の領域において最重要事項である。しかし、この領域における伝統的な方法論と深層学習に基づくアプローチは、固有の限界を示す。データによってのみ駆動される深層学習法は、対象の運動状態を正確に識別するのは難しいが、包括的数学的モデルに依存する従来の手法は、最適化された追跡精度に苦しむ可能性がある。これらの課題に対処するために、モデルデータ駆動のモーションスタティックオブジェクトトラッキング(mod2t)を導入する。本稿では,従来の数学モデルとディープラーニングに基づくmotフレームワークをうまく融合させ,確立された方法論や高度なディープラーニング技術にのみ依存する制約を効果的に緩和する新しいアーキテクチャを提案する。 MoD2Tの数学的モデリングとディープラーニングの融合により、物体の動き決定の精度が向上し、追跡精度が向上する。我々の実証実験は、UAVの空中監視や街路レベルの追跡など、様々なシナリオでMoD2Tの有効性をしっかりと裏付けている。物体運動状態の判別におけるMoD2Tの習熟度を評価するため,MVF1測定基準を導入する。この新しい性能指標は動作状態の分類の精度を計測するために設計され、mod2tの性能の包括的な評価を提供する。微妙な実験はMVF1の定式化の背後にある理論的根拠を裏付ける。 MoD2Tの性能を総合的に評価するために、さまざまなデータセットを慎重に注釈付けし、厳密なテストを行う。達成されたmvf1スコアは、動作状態の分類の精度を計測するが、キッティデータセットの0.774、mot17の0.521、uavdtの0.827という最小または軽度なカメラの動作で特に注目される。 The domain of Multi-Object Tracking (MOT) is of paramount significance within the realm of video analysis. However, both traditional methodologies and deep learning-based approaches within this domain exhibit inherent limitations. Deep learning methods driven exclusively by data exhibit challenges in accurately discerning the motion states of objects, while traditional methods relying on comprehensive mathematical models may suffer from suboptimal tracking precision. To address these challenges, we introduce the Model-Data-Driven Motion-Static Object Tracking Method (MoD2T). We propose a novel architecture that adeptly amalgamates traditional mathematical modeling with deep learning-based MOT frameworks, thereby effectively mitigating the limitations associated with sole reliance on established methodologies or advanced deep learning techniques. MoD2T's fusion of mathematical modeling and deep learning augments the precision of object motion determination, consequently enhancing tracking accuracy. Our empirical experiments robustly substantiate MoD2T's efficacy across a diverse array of scenarios, including UAV aerial surveillance and street-level tracking. To assess MoD2T's proficiency in discerning object motion states, we introduce MVF1 metric. This novel performance metric is designed to measure the accuracy of motion state classification, providing a comprehensive evaluation of MoD2T's performance. Meticulous experiments substantiate the rationale behind MVF1's formulation. To provide a comprehensive assessment of MoD2T's performance, we meticulously annotate diverse datasets and subject MoD2T to rigorous testing. The achieved MVF1 scores, which measure the accuracy of motion state classification, are particularly noteworthy in scenarios marked by minimal or mild camera motion, with values of 0.774 on the KITTI dataset, 0.521 on MOT17, and 0.827 on UAVDT.	翻訳日:2024-01-02 09:18:29 公開日:2023-12-29
# 双対ボソニックラダーのトラバースによる高次元量子コンピューティングの高速化 Empowering high-dimensional quantum computing by traversing the dual bosonic ladder ( http://arxiv.org/abs/2312.17741v1 ) ライセンス: Link先を確認	Long B. Nguyen, Noah Goss, Karthik Siva, Yosep Kim, Ed Younis, Bingcheng Qing, Akel Hashim, David I. Santiago, Irfan Siddiqi	(参考訳) 高次元量子情報処理は、ハードウェアの限界を超越し、量子技術のフロンティアを前進させる有望な方法として登場した。いわゆる量子ビットの未解決ポテンシャルを損なうには、確立された量子ビット法を超えた量子プロトコルの開発が必要である。本稿では,ラマン支援の2光子相互作用を用いた多次元固体システムに対するロバストでハードウェア効率のよい拡張可能な手法を提案する。有効性を示すために,我々は,マルチキュービット演算の集合を構築し,原子スクレッデッド状態やSchr\\odinger cat状態を含む高度に絡み合った多次元状態を実現し,qudit配列に沿ってプログラム可能な絡み合い分布を実装した。我々の研究は、強く駆動されたマルチキュージット系の量子電磁力学を照らし、高次元量子アプリケーションの開発のための実験的基礎を提供する。 High-dimensional quantum information processing has emerged as a promising avenue to transcend hardware limitations and advance the frontiers of quantum technologies. Harnessing the untapped potential of the so-called qudits necessitates the development of quantum protocols beyond the established qubit methodologies. Here, we present a robust, hardware-efficient, and extensible approach for operating multidimensional solid-state systems using Raman-assisted two-photon interactions. To demonstrate its efficacy, we construct a set of multi-qubit operations, realize highly entangled multidimensional states including atomic squeezed states and Schr\"odinger cat states, and implement programmable entanglement distribution along a qudit array. Our work illuminates the quantum electrodynamics of strongly driven multi-qudit systems and provides the experimental foundation for the future development of high-dimensional quantum applications.	翻訳日:2024-01-02 08:54:04 公開日:2023-12-29
# 量子正則化による2次元質量レスQCDの位相 Phases of 2d massless QCD with qubit regularization ( http://arxiv.org/abs/2312.17734v1 ) ライセンス: Link先を確認	Hanqing Liu, Tanmoy Bhattacharya, Shailesh Chandrasekharan and Rajan Gupta	(参考訳) 我々は,2d SU(N)ゲージ理論の連続体物理学を1つの無質量ディラックフェルミオンに結合して再現する可能性を検討する。連続体理論は、紫外線(UV)におけるN自由フェルミオンと赤外線(IR)におけるコセットウェス・ズミノ・ウィッテン(WZW)モデルによって記述される。本研究では,有限次元リンクヒルベルト空間と一般化ハバードカップリングを持つkogut-susskindハミルトニアンを用いて,これらの特徴の再現性について検討する。強結合展開を用いて, スピン鎖で表される二量体相と他の相のギャップが現れることを示す。さらに、N=2の場合、テンソルネットワーク法を用いて、2次相転移が存在することを示す。遷移における臨界理論はsu(2)_1 wzwモデルとして理解でき、このモデルにおける位相図を定量的に決定できる。モデルの閉じ込め特性を利用することで、自由フェルミオンの紫外線物理学がいかに出現するかを議論するが、我々のモデルにさらなる修正を加える必要がある。 We investigate the possibility of reproducing the continuum physics of 2d SU(N) gauge theory coupled to a single flavor of massless Dirac fermions using qubit regularization. The continuum theory is described by N free fermions in the ultraviolet (UV) and a coset Wess-Zumino-Witten (WZW) model in the infrared (IR). In this work, we explore how well these features can be reproduced using the Kogut-Susskind Hamiltonian with a finite-dimensional link Hilbert space and a generalized Hubbard coupling. Using strong coupling expansions, we show that our model exhibits a gapped dimer phase and another phase described by a spin-chain. Furthermore, for N=2, using tensor network methods, we show that there is a second-order phase transition between these two phases. The critical theory at the transition can be understood as an SU(2)_1 WZW model, using which we determine the phase diagram of our model quantitatively. Using the confinement properties of the model we argue how the UV physics of free fermions could also emerge, but may require further modifications to our model.	翻訳日:2024-01-02 08:53:46 公開日:2023-12-29
# 時間内の光子液化 Photon liquefaction in time ( http://arxiv.org/abs/2312.17732v1 ) ライセンス: Link先を確認	Eduardo Zubizarreta Casalengua and Elena del Valle and Fabrice P. Laussy	(参考訳) 液体中の空間相関と同じ特性を持つフォトンストリームに局所的時間相関をインプリントするメカニズムを提供する。この写真では、単光子放射体は(時空)ガスに対応し、非相関光は理想気体である。我々は、良い単一光子源は、そのような時間的液体の特徴、すなわち(線形依存とは対照的に)短い時間相関の高原と、光子時間順序付けの直接的現示である後の振動を示すものであると主張する。我々は「液体光」の広いファミリーの2階コヒーレンス関数に対する一般の閉形式解析式を得るが、完全に結晶化されることはない。 We provide a mechanism to imprint local temporal correlations in photon streams which have the same character as spatial correlations in liquids. Usual single-photon emitters correspond, in this picture, to a (temporal) gas while uncorrelated light is the ideal gas. We argue that good single-photon sources are those that exhibit such temporal liquid features, i.e., with a plateau for their short-time correlations (as opposed to a linear dependence) and oscillations at later times, which is a direct manifestation of photon time-ordering. We obtain general, closed-form analytical expressions for the second-order coherence function of a broad family of "liquid light" which can be arbitrarily correlated, though never completely crystallized.	翻訳日:2024-01-02 08:53:25 公開日:2023-12-29
# 大規模javaシステムにおけるiast(interactive application security testing)とrasp(runtime application self-protection)ツールの有効性と効率の比較 Comparing Effectiveness and Efficiency of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) Tools in a Large Java-based System ( http://arxiv.org/abs/2312.17726v1 ) ライセンス: Link先を確認	Aishwarya Seth, Saikath Bhattacharya, Sarah Elder, Nusrat Zahan, Laurie Williams	(参考訳) セキュリティリソースは乏しく、実践者はサイバーセキュリティ業界で利用可能な技術やツールを効果的かつ効率的に利用するためのガイダンスが必要である。新たな2つのツールタイプであるInteractive Application Security Testing (IAST) とRuntime Application Self-Protection (RASP) は、Dynamic Application Security Testing (DAST) や Static Application Security Testing (SAST) といった確立したツールに対して、十分に評価されていない。本研究の目的は、さまざまな脆弱性検出・防止技術やツールと比較して、その有効性と効率の分析を通じて、対話型アプリケーションセキュリティテスト(IAST)と実行時アプリケーションセルフプロテクション(RASP)ツールの使用について、実践者がより深い選択をするのを支援することである。オープンソースJavaベースのオンラインアプリケーションであるOpenMRSにIASTとRASPを適用します。 iastとraspの効率性と有効性について,先行研究におけるopenmrsに適用した手法と比較した。検出された脆弱性の数とタイプによって効率と有効性を測定する。本研究は,IASTが他の技術と比較して比較的優れており,効率と有効性の両方において第2位であることを示す。 IASTは8つのTop-10OWASPセキュリティリスクをSMPTで9つ、EMPT、DAST、SASTで7つ検出した。 IASTはSMPTよりも多くの脆弱性を発見した。 IAST (2.14 VpH) の効率はEMPT (2.22 VpH) に次いで第2位である。これらの結果から,ブラックボックスセキュリティテストを行う際のIASTの有用性が示唆された。 OpenMRSのような大規模でエンタープライズ規模のWebアプリケーションのコンテキストでは、RASPは脆弱性検出を置き換えるものではなく、IASTは他のテクニックを補完する強力なツールである。 Security resources are scarce, and practitioners need guidance in the effective and efficient usage of techniques and tools available in the cybersecurity industry. Two emerging tool types, Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP), have not been thoroughly evaluated against well-established counterparts such as Dynamic Application Security Testing (DAST) and Static Application Security Testing (SAST). The goal of this research is to aid practitioners in making informed choices about the use of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) tools through an analysis of their effectiveness and efficiency in comparison with different vulnerability detection and prevention techniques and tools. We apply IAST and RASP on OpenMRS, an open-source Java-based online application. We compare the efficiency and effectiveness of IAST and RASP with techniques applied on OpenMRS in prior work. We measure efficiency and effectiveness in terms of the number and type of vulnerabilities detected and prevented per hour. Our study shows IAST performed relatively well compared to other techniques, performing second-best in both efficiency and effectiveness. IAST detected eight Top-10 OWASP security risks compared to nine by SMPT and seven for EMPT, DAST, and SAST. IAST found more vulnerabilities than SMPT. The efficiency of IAST (2.14 VpH) is second to only EMPT (2.22 VpH). These findings imply that our study benefited from using IAST when conducting black-box security testing. In the context of a large, enterprise-scale web application such as OpenMRS, RASP does not replace vulnerability detection, while IAST is a powerful tool that complements other techniques.	翻訳日:2024-01-02 08:53:11 公開日:2023-12-29
# 2単位行列の固有量子族 Genuinely quantum families of 2-unitary matrices ( http://arxiv.org/abs/2312.17719v1 ) ライセンス: Link先を確認	Rafa{\l} Bistro\'n, Jakub Czartowski and Karol \.Zyczkowski	(参考訳) 量子コンピューティングが発展するにつれて、複数の文脈で制御可能な方法で量子ゲートの絡み合いと切り離しを実装するという問題が再燃する。量子畳み込みニューラルネットワーク(quantum convolutional neural networks)は、エンタングル状態においてエンコードされた情報を失うことなく、qudit数の体系的な減少を基本概念としている。本研究では、畳み込みネットワークのための畳み込みとプールベーシックな構造ブロックの量子アナログに着目し、置換テンソルのコヒーレンスとしてパラメトリズ可能な ``quantum convolution''チャネルを構築し、特徴付ける。この方法で構築された操作は、一般に高い(異なる)エンタングリングパワーを提供する。特に,本手法を用いて構築した畳み込みチャネルに必要な条件を極大絡み合い力を持つために同定する。これに基づいて、2部行列の次元$d^2$ for $d = 7$ および $d = 9$ の新しい連続クラスを2ドルおよび4ドル自由非局所パラメータで確立し、階数 4$ または 4$-partite の完全テンソルに対応する。新たに確立されたファミリーは、量子畳み込みニューラルネットワークにおけるトレーニング可能な畳み込み/プーリング層のプロトタイプとして機能する。 As quantum computing develops, the problem of implementing entangling and disentangling quantum gates in a controllable manner reemerges in multiple contexts. One of the newest applications of such disentangling channels are quantum convolutional neural networks, where the core idea lies in the systematic decrease of qudit numbers without loss of information encoded in entangled states. In this work, we focus on quantum analogues of convolution and pooling - basic building block for convolutional networks - and construct and characterize parametrizable ``quantum convolution'' channels as coherifications of permutation tensors. Operations constructed in this manner generically provide high (dis)entangling power. In particular, we identify conditions necessary for the convolution channels constructed using our method to possess maximal entangling power. Based on this, we establish new, continuous classes of bipartite 2-unitary matrices of dimension $d^2$ for $d = 7$ and $d = 9$, with $2$ and $4$ free nonlocal parameters, corresponding to perfect tensors of rank $4$ or $4$-partite absolutely maximally entangled states. The newly established families may serve as the prototype for trainable convolution/pooling layers in quantum convolutional neural networks.	翻訳日:2024-01-02 08:52:41 公開日:2023-12-29
# テキスト生成のための基本勾配型マルコフ連鎖モンテカルロ Principled Gradient-based Markov Chain Monte Carlo for Text Generation ( http://arxiv.org/abs/2312.17710v1 ) ライセンス: Link先を確認	Li Du, Afra Amini, Lucas Torroba Hennigen, Xinyan Velocity Yu, Jason Eisner, Holden Lee, Ryan Cotterell	(参考訳) 近年の論文は、高速収束を約束するMCMCアルゴリズムのパラダイムである勾配に基づくサンプリングアルゴリズムを適用することで、エネルギーベースのテキスト生成の可能性を示している。しかし、本論文で示すように、テキスト生成に対するこのアプローチの以前の試みはすべて、対象言語モデルのディストリビューションから正しくサンプルできなかった。この制限に対処するため,本論文では,テキストの分布を限定分布とする忠実なテキストサンプルを設計する問題を考察する。本稿では,対象エネルギーに基づくテキスト分布から試料を正しく抽出するための忠実な勾配に基づくサンプリングアルゴリズムを提案し,その理論的性質について検討する。各種テキスト生成の実験を通じて, 忠実なサンプリング者は, 制御対象に順応しながら, より流動的なテキストを生成できることを実証した。 Recent papers have demonstrated the possibility of energy-based text generation by adapting gradient-based sampling algorithms, a paradigm of MCMC algorithms that promises fast convergence. However, as we show in this paper, previous attempts on this approach to text generation all fail to sample correctly from the target language model distributions. To address this limitation, we consider the problem of designing text samplers that are faithful, meaning that they have the target text distribution as its limiting distribution. We propose several faithful gradient-based sampling algorithms to sample from the target energy-based text distribution correctly, and study their theoretical properties. Through experiments on various forms of text generation, we demonstrate that faithful samplers are able to generate more fluent text while adhering to the control objectives better.	翻訳日:2024-01-02 08:52:18 公開日:2023-12-29
# 線形干渉計におけるボソン-フェルミオン相補性 Boson-fermion complementarity in a linear interferometer ( http://arxiv.org/abs/2312.17709v1 ) ライセンス: Link先を確認	Michael G. Jabbour and Nicolas J. Cerf	(参考訳) ボゾンとフェルミオンの統計は、特にボソン束対フェルミオンアンチバンキングといったアンチノミクスの行動を引き起こすことが知られている。ここでは,ボソニックとフェルミイオンの干渉を任意の線形干渉計で結合する基本関係を確立する。ボゾンとフェルミオンの遷移確率は、それらの値を制限する同じ方程式で一緒に現れ、従って、相互作用の詳細とは独立なボソン-フェルミオン相補性を表現する。例えば、任意の干渉計内の2つの粒子に対して、ボソニックとフェルミオンの平均は古典的粒子に従う確率と一致しなければならない。ちなみに、この基本的な関係は、任意の複素行列の永久行列と決定行列の平方モジュラーを接続する今までにない数学的同一性ももたらしている。 Bosonic and fermionic statistics are well known to give rise to antinomic behaviors, most notably boson bunching vs. fermion antibunching. Here, we establish a fundamental relation that combines bosonic and fermionic multiparticle interferences in an arbitrary linear interferometer. The bosonic and fermionic transition probabilities appear together in a same equation which constrains their values, hence expressing a boson-fermion complementarity that is independent of the details of the interaction. For two particles in any interferometer, for example, it implies that the average of the bosonic and fermionic probabilities must coincide with the probability obeyed by classical particles. Incidentally, this fundamental relation also provides a heretofore unknown mathematical identity connecting the squared moduli of the permanent and determinant of arbitrary complex matrices.	翻訳日:2024-01-02 08:52:04 公開日:2023-12-29
# 中央銀行デジタル通貨(CBDC)における信頼構築とプライバシー問題軽減の6つの方法 The six ways to build trust and reduce privacy concern in a Central Bank Digital Currency (CBDC) ( http://arxiv.org/abs/2312.17708v1 ) ライセンス: Link先を確認	Alex Zarifis and Xusen Cheng	(参考訳) 中央銀行デジタル通貨(CBDC)は少数の国で実施されているが、さらに多くの国で調査されている。 CBDCは中央銀行が発行・支援するデジタル通貨である。消費者信頼は、支払いシステムと技術であるこの通貨の採用を奨励または阻止することができる。本研究は、CBDCにおける消費者信頼の理解を図り、すべての利害関係者に対して開発と採用の段階がより効果的で満足するようにすることを目的とする。 Central Bank Digital Currencies (CBDCs) have been implemented by only a handful of countries, but they are being explored by many more. CBDCs are digital currencies issued and backed by a central bank. Consumer trust can encourage or discourage the adoption of this currency, which is also a payment system and a technology. This research attempts to understand consumer trust in CBDCs so that the development and adoption stages are more effective and satisfying for all the stakeholders.	翻訳日:2024-01-02 08:51:48 公開日:2023-12-29
# TuPy-E:新しいデータセットと包括的なモデル分析によるブラジルのソーシャルメディアにおけるヘイトスピーチの検出 TuPy-E: detecting hate speech in Brazilian Portuguese social media with a novel dataset and comprehensive analysis of models ( http://arxiv.org/abs/2312.17704v1 ) ライセンス: Link先を確認	Felipe Oliveira, Victoria Reis, Nelson Ebecken	(参考訳) ソーシャルメディアは人間の対話に不可欠なものとなり、コミュニケーションと表現のためのプラットフォームを提供している。しかし、これらのプラットフォームでのヘイトスピーチの台頭は個人やコミュニティに重大なリスクをもたらす。ヘイトスピーチの検出と対処は、その豊富な語彙、複雑な文法、地域によって異なるため、ポルトガル語のような言語では特に困難である。そこで我々は,ヘイトスピーチ検出のためのポルトガル最大の注釈付きコーパスTuPy-Eを紹介する。 TuPy-Eはオープンソースアプローチを活用し、研究コミュニティ内のコラボレーションを促進する。 BERTモデルのような高度な技術を用いて詳細な分析を行い、学術的理解と実践的応用に寄与する。 Social media has become integral to human interaction, providing a platform for communication and expression. However, the rise of hate speech on these platforms poses significant risks to individuals and communities. Detecting and addressing hate speech is particularly challenging in languages like Portuguese due to its rich vocabulary, complex grammar, and regional variations. To address this, we introduce TuPy-E, the largest annotated Portuguese corpus for hate speech detection. TuPy-E leverages an open-source approach, fostering collaboration within the research community. We conduct a detailed analysis using advanced techniques like BERT models, contributing to both academic understanding and practical applications	翻訳日:2024-01-02 08:51:38 公開日:2023-12-29
# コンベアモードスピンコヒーレント電子シャットリングによる谷分割のマッピング Mapping of valley-splitting by conveyor-mode spin-coherent electron shuttling ( http://arxiv.org/abs/2312.17694v1 ) ライセンス: Link先を確認	Mats Volmer, Tom Struck, Arnau Sala, Bingjie Chen, Max Oberl\"ander, Tobias Offermann, Ran Xue, Lino Visser, Jhih-Sian Tu, Stefan Trellenkamp, {\L}ukasz Cywi\'nski, Hendrik Bluhm, Lars R. Schreiber	(参考訳) Si/SiGeヘテロ構造では、低い励起谷状態は電子スピン量子ビットの操作性とスケーラビリティを著しく制限する。谷分割の局所的変動を特徴づけ,理解するためには,空間分解能とエネルギー分解能の高い高速探査法が欠如している。コンベアモードのスピンコヒーレント電子シャットリングにより与えられる空間制御を応用し, エンタングル電子スピンペアをプローブとして磁場依存性の反交差と励起谷状態を検出することにより, 局所谷分割の2次元マッピング法を提案する。この方法は、サブ真空eVエネルギー精度とナノメートル横方向分解能を有する。 210nm×18nmの広い領域にまたがる谷の分割のヒストグラムは、確立されているが時間のかかる磁気スペクトロメトリー法によって得られた統計とよく一致している。特異なヘテロ構造については、谷分割のほぼガウス分布と量子ドットサイズに類似した相関長を求める。このマッピング手法は、スケーラブルな量子コンピューティングのためのsi/sigeヘテロ構造を工学する上で有用なツールとなるかもしれない。 In Si/SiGe heterostructures, the low-lying excited valley state seriously limit operability and scalability of electron spin qubits. For characterizing and understanding the local variations in valley splitting, fast probing methods with high spatial and energy resolution are lacking. Leveraging the spatial control granted by conveyor-mode spin-coherent electron shuttling, we introduce a method for two-dimensional mapping of the local valley splitting by detecting magnetic field dependent anticrossings of ground and excited valley states using entangled electron spin-pairs as a probe. The method has sub-{\mu}eV energy accuracy and a nanometer lateral resolution. The histogram of valley splittings spanning a large area of 210 nm by 18 nm matches well with statistics obtained by the established but time-consuming magnetospectroscopy method. For the specific heterostructure, we find a nearly Gaussian distribution of valley splittings and a correlation length similar to the quantum dot size. Our mapping method may become a valuable tool for engineering Si/SiGe heterostructures for scalable quantum computing.	翻訳日:2024-01-02 08:51:27 公開日:2023-12-29
# マルチスケール・ビジョン・トランスフォーマーとバイパート・マッチング Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization ( http://arxiv.org/abs/2312.17686v1 ) ライセンス: Link先を確認	Ioanna Ntinou, Enrique Sanchez, Georgios Tzimiropoulos	(参考訳) 行動の局所化は、しばしば別々に対処される検出と認識のタスクを組み合わせる難しい問題である。 State-of-the-artメソッドは、高解像度で事前計算された既成の既成境界ボックス検出に依存し、分類タスクのみに焦点を当てたトランスフォーマーモデルを提案する。このような2段階のソリューションは、リアルタイムデプロイメントでは禁じられている。一方、シングルステージの手法は、ネットワークの一部(一般的にはバックボーン)を作業負荷の大部分を共有に分割することで、両方のタスクをターゲットとすることで、パフォーマンスを向上する。これらの手法は、学習可能なクエリでDETRヘッドを追加することで構築され、クロスアテンションとセルフアテンションの後、対応するMLPに送信して、人のバウンディングボックスとアクションを検出する。しかし、detrのようなアーキテクチャはトレーニングが難しく、大きな複雑さを引き起こす可能性がある。本稿では,視覚変換器の出力トークンに対して,直線的二部整合損失が適用可能であることを観察する。これにより、余分なエンコーダ-デコーダヘッドと学習可能なクエリを必要とせずに両方のタスクを実行できるバックボーン+MPPアーキテクチャが実現される。両タスクを両パートマッチングでトレーニングした単一のMViT-Sアーキテクチャが,RoIで事前計算したバウンディングボックス上でトレーニングした場合,同一のMViT-Sを超えることを示す。我々のMViTv2-Sモデルはトークンプーリングとトレーニングパイプラインを慎重に設計し、AVA2.2上で+3mAPを達成する。 w.r.t.2ステージ。コードとモデルはペーパーリビジョン後にリリースされる。 Action Localization is a challenging problem that combines detection and recognition tasks, which are often addressed separately. State-of-the-art methods rely on off-the-shelf bounding box detections pre-computed at high resolution and propose transformer models that focus on the classification task alone. Such two-stage solutions are prohibitive for real-time deployment. On the other hand, single-stage methods target both tasks by devoting part of the network (generally the backbone) to sharing the majority of the workload, compromising performance for speed. These methods build on adding a DETR head with learnable queries that, after cross- and self-attention can be sent to corresponding MLPs for detecting a person's bounding box and action. However, DETR-like architectures are challenging to train and can incur in big complexity. In this paper, we observe that a straight bipartite matching loss can be applied to the output tokens of a vision transformer. This results in a backbone + MLP architecture that can do both tasks without the need of an extra encoder-decoder head and learnable queries. We show that a single MViT-S architecture trained with bipartite matching to perform both tasks surpasses the same MViT-S when trained with RoI align on pre-computed bounding boxes. With a careful design of token pooling and the proposed training pipeline, our MViTv2-S model achieves +3 mAP on AVA2.2. w.r.t. the two-stage counterpart. Code and models will be released after paper revision.	翻訳日:2024-01-02 08:51:10 公開日:2023-12-29
# 機械学習を用いたIOTシステムのマルウェア検出 Malware Detection in IOT Systems Using Machine Learning Techniques ( http://arxiv.org/abs/2312.17683v1 ) ライセンス: Link先を確認	Ali Mehrban, Pegah Ahadian	(参考訳) IoT環境でのマルウェア検出は堅牢な方法論を必要とする。そこで本研究では,IoTマルウェア識別のためのCNN-LSTMハイブリッドモデルを導入し,その性能評価を行った。 k-foldクロスバリデーションを利用して、提案手法は95.5%の精度を達成し、既存の手法を上回った。 CNNアルゴリズムは優れた学習モデル構築を可能にし、LSTM分類器は高い分類精度を示した。一般的な技術との比較分析は、提案されたモデルの有効性を示し、IoTセキュリティを強化する可能性を強調した。この研究は、代替手段としてSVMの将来の探索を提唱し、分散検出戦略の必要性を強調し、より強力なIOTセキュリティのための予測分析の重要性を強調している。この研究は、IoTエコシステムにおけるよりレジリエントなセキュリティ対策を開発するためのプラットフォームとして機能する。 Malware detection in IoT environments necessitates robust methodologies. This study introduces a CNN-LSTM hybrid model for IoT malware identification and evaluates its performance against established methods. Leveraging K-fold cross-validation, the proposed approach achieved 95.5% accuracy, surpassing existing methods. The CNN algorithm enabled superior learning model construction, and the LSTM classifier exhibited heightened accuracy in classification. Comparative analysis against prevalent techniques demonstrated the efficacy of the proposed model, highlighting its potential for enhancing IoT security. The study advocates for future exploration of SVMs as alternatives, emphasizes the need for distributed detection strategies, and underscores the importance of predictive analyses for a more powerful IOT security. This research serves as a platform for developing more resilient security measures in IoT ecosystems.	翻訳日:2024-01-02 08:50:35 公開日:2023-12-29
# FlowVid: 一貫性のあるビデオ-ビデオ合成のための不完全な光フローのモデリング FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis ( http://arxiv.org/abs/2312.17681v1 ) ライセンス: Link先を確認	Feng Liang, Bichen Wu, Jialiang Wang, Licheng Yu, Kunpeng Li, Yinan Zhao, Ishan Misra, Jia-Bin Huang, Peizhao Zhang, Peter Vajda, Diana Marculescu	(参考訳) 拡散モデルは画像から画像への合成を変換し、現在ではビデオに浸透している。しかし、ビデオフレーム間の時間的一貫性を維持するという課題により、V2V合成の進歩が妨げられている。本稿では,空間条件と時間的光フロー手がかりを併用した一貫したV2V合成フレームワークを提案する。光流に厳密に従属する従来の手法とは対照的に,本手法は流れ推定の不完全さを処理しながらその利点を生かしている。第1フレームからの反りによる光流れを符号化し、拡散モデルにおける補足参照として機能する。これにより,第1のフレームを任意の一般的なi2iモデルで編集し,編集を連続するフレームに伝達することにより,映像合成のためのモデルを実現する。柔軟性: FlowVidは既存のI2Iモデルとシームレスに動作し、スタイリゼーションやオブジェクトスワップ、ローカル編集など、さまざまな変更を容易にします。 2) 効率性: 30 FPS と 512 x512 の解像度を持つ 4 秒のビデオは、それぞれ CoDeF, Rerender, TokenFlow よりも3.1x, 7.2x, 10.5x の 1.5 分で生成される。 (3)高品質:私たちのFlowVidは45.7%の時間を好んでおり、CoDeF (3.5%)、Rerender (10.2%)、TokenFlow (40.4%)を上回っている。 Diffusion models have transformed the image-to-image (I2I) synthesis and are now permeating into videos. However, the advancement of video-to-video (V2V) synthesis has been hampered by the challenge of maintaining temporal consistency across video frames. This paper proposes a consistent V2V synthesis framework by jointly leveraging spatial conditions and temporal optical flow clues within the source video. Contrary to prior methods that strictly adhere to optical flow, our approach harnesses its benefits while handling the imperfection in flow estimation. We encode the optical flow via warping from the first frame and serve it as a supplementary reference in the diffusion model. This enables our model for video synthesis by editing the first frame with any prevalent I2I models and then propagating edits to successive frames. Our V2V model, FlowVid, demonstrates remarkable properties: (1) Flexibility: FlowVid works seamlessly with existing I2I models, facilitating various modifications, including stylization, object swaps, and local edits. (2) Efficiency: Generation of a 4-second video with 30 FPS and 512x512 resolution takes only 1.5 minutes, which is 3.1x, 7.2x, and 10.5x faster than CoDeF, Rerender, and TokenFlow, respectively. (3) High-quality: In user studies, our FlowVid is preferred 45.7% of the time, outperforming CoDeF (3.5%), Rerender (10.2%), and TokenFlow (40.4%).	翻訳日:2024-01-02 08:50:23 公開日:2023-12-29
# 遅延拡散モデルを用いた教師付きグラフ外乱検出のためのデータ拡張 Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models ( http://arxiv.org/abs/2312.17679v1 ) ライセンス: Link先を確認	Kay Liu, Hengrui Zhang, Ziqing Hu, Fangxin Wang, Philip S. Yu	(参考訳) グラフ外乱検出は、グラフニューラルネットワークの領域における研究と応用の顕著な課題である。グラフの多数派からの偏差を示す外れ値ノードを識別する。教師付きグラフ異常検出アルゴリズムに直面する根本的な課題の1つはクラス不均衡の問題である。従来の方法では、損失関数の推定のインスタンスを再重み付けし、高重みを外れ値に、低重みを外れ値に割り当てることで不均衡を軽減する。それでも、これらの戦略は、それぞれ過度に適合する傾向にある。近年,生成モデル,特に拡散モデルが高忠実度画像合成における効果を実証している。その異常な世代品質にもかかわらず、教師付きグラフ異常検出のためのデータ拡張の可能性はほとんど未調査のままである。このギャップを埋めるため,遅延拡散モデルを用いた教師付きグラフアウトリア検出において,クラス不均衡を緩和する新しいデータ拡張であるGODMを導入する。提案手法は,(1) Variantioanl Encoderは,グラフデータ内に存在する異種情報を統一潜在空間にマッピングする。 2)グラフ生成器は,潜伏空間の実際の外れ値と統計的に類似したグラフデータを合成し,(3)潜伏拡散モデルにより反復分解により実際の有機データの潜伏空間分布を学習する。複数のデータセットに対して行われた大規模な実験は、GODMの有効性と効率を裏付けるものである。ケーススタディは、我々の合成データの生成品質をさらに実証した。アクセシビリティと再現性を向上するため、GODMをプラグイン・アンド・プレイパッケージにカプセル化し、Python Package Index (PyPI)でリリースする。 Graph outlier detection is a prominent task of research and application in the realm of graph neural networks. It identifies the outlier nodes that exhibit deviation from the majority in the graph. One of the fundamental challenges confronting supervised graph outlier detection algorithms is the prevalent issue of class imbalance, where the scarcity of outlier instances compared to normal instances often results in suboptimal performance. Conventional methods mitigate the imbalance by reweighting instances in the estimation of the loss function, assigning higher weights to outliers and lower weights to inliers. Nonetheless, these strategies are prone to overfitting and underfitting, respectively. Recently, generative models, especially diffusion models, have demonstrated their efficacy in synthesizing high-fidelity images. Despite their extraordinary generation quality, their potential in data augmentation for supervised graph outlier detection remains largely underexplored. To bridge this gap, we introduce GODM, a novel data augmentation for mitigating class imbalance in supervised Graph Outlier detection with latent Diffusion Models. Specifically, our proposed method consists of three key components: (1) Variantioanl Encoder maps the heterogeneous information inherent within the graph data into a unified latent space. (2) Graph Generator synthesizes graph data that are statistically similar to real outliers from latent space, and (3) Latent Diffusion Model learns the latent space distribution of real organic data by iterative denoising. Extensive experiments conducted on multiple datasets substantiate the effectiveness and efficiency of GODM. The case study further demonstrated the generation quality of our synthetic data. To foster accessibility and reproducibility, we encapsulate GODM into a plug-and-play package and release it at the Python Package Index (PyPI).	翻訳日:2024-01-02 08:49:57 公開日:2023-12-29

Title

Authors

Abstract

論文公表日・翻訳日

# 収束保証付きブロックチェーンに基づくフェデレーション学習におけるロバストソフトマックスアグリゲーション

Robust softmax aggregation on blockchain based federated learning with convergence guarantee ( http://arxiv.org/abs/2311.07027v2 )

ライセンス: Link先を確認

Huiyu Wu, Diego Klabjan,

(参考訳) ブロックチェーンベースのフェデレーション学習は、参加者がローカルデータセットを共有することなくモデルトレーニングを可能にする分散学習スキームである。本稿では,ブロックチェーンに基づくフェデレーション学習フレームワークについて述べる。まず、既存のブロックチェーンネットワーク上でテスト済みの証明・オブ・テイクコンセンサス機構を利用して、バリデータとマイナを選択し、参加者の更新を集約し、ブロックを計算する、ブロックチェーンベースの新しいフェデレーション学習アーキテクチャを提案する。第2に、アグリゲーションプロセスの堅牢性を確保するために、特定のブロックチェーンアーキテクチャに依存した、近似された人口損失値に基づく、新しいソフトマックスアグリゲーション手法を設計する。さらに,我々のソフトマックス集約手法は,非制限仮定の凸設定において,大域最小値に収束することを示す。包括的実験により、我々のフレームワークは、様々な設定において、既存のロバストな集約アルゴリズムよりも大きなマージンで優れていることが示された。

Blockchain based federated learning is a distributed learning scheme that allows model training without participants sharing their local data sets, where the blockchain components eliminate the need for a trusted central server compared to traditional Federated Learning algorithms. In this paper we propose a softmax aggregation blockchain based federated learning framework. First, we propose a new blockchain based federated learning architecture that utilizes the well-tested proof-of-stake consensus mechanism on an existing blockchain network to select validators and miners to aggregate the participants' updates and compute the blocks. Second, to ensure the robustness of the aggregation process, we design a novel softmax aggregation method based on approximated population loss values that relies on our specific blockchain architecture. Additionally, we show our softmax aggregation technique converges to the global minimum in the convex setting with non-restricting assumptions. Our comprehensive experiments show that our framework outperforms existing robust aggregation algorithms in various settings by large margins.

翻訳日:2024-03-18 23:32:03 公開日:2023-12-29

# ドローンファームウェアの動的解析の問題点とその解決法

Difficulties in Dynamic Analysis of Drone Firmware and Its Solutions ( http://arxiv.org/abs/2312.16818v2 )

ライセンス: Link先を確認

Yejun Kim, Kwangsoo Cho, Seungjoo Kim,

(参考訳) モノのインターネット(IoT)技術の進歩により、その応用は公共、工業、民間、軍事など様々な分野にまたがる。特に、ドローン部門は商業目的と軍事目的の両方において大きな注目を集めている。その結果、ドローンの脆弱性分析に焦点を当てた研究が急増した。しかし、IoTデバイスに対する脅威を軽減するセキュリティ研究のほとんどは、主にネットワーク、ファームウェア、モバイルアプリケーションに焦点を当てている。これらのうち、ファームウェアのセキュリティを解析するためにファジリングを使用するには、ファームウェアのエミュレーションが必要である。しかし、ドローンファームウェアに関しては、エミュレーションや自動ファジィングツールが欠けている。これは、入力インターフェースの制限、ファームウェアの暗号化、署名といった問題によることが多い。既存のエミュレータやIoTデバイスの自動アナライザがドローンに適用できると仮定するのは興味深いかもしれないが、実際的な応用が証明されている。本稿では,ドローンファームウェアの動的解析の課題について論じ,潜在的な解決策を提案する。さらに,最大市場シェアのDJIドローンに適用することで,提案手法の有効性を実証する。

With the advancement of Internet of Things (IoT) technology, its applications span various sectors such as public, industrial, private and military. In particular, the drone sector has gained significant attention for both commercial and military purposes. As a result, there has been a surge in research focused on vulnerability analysis of drones. However, most security research to mitigate threats to IoT devices has focused primarily on networks, firmware and mobile applications. Of these, the use of fuzzing to analyse the security of firmware requires emulation of the firmware. However, when it comes to drone firmware, the industry lacks emulation and automated fuzzing tools. This is largely due to challenges such as limited input interfaces, firmware encryption and signatures. While it may be tempting to assume that existing emulators and automated analysers for IoT devices can be applied to drones, practical applications have proven otherwise. In this paper, we discuss the challenges of dynamically analysing drone firmware and propose potential solutions. In addition, we demonstrate the effectiveness of our methodology by applying it to DJI drones, which have the largest market share.

翻訳日:2024-03-18 11:18:35 公開日:2023-12-29

# Web アセンブリを用いた簡単なクライアント側による個人情報の暗号化

Simple client-side encryption of personal information with Web Assembly ( http://arxiv.org/abs/2312.17689v1 )

ライセンス: Link先を確認

Marco Falda, Angela Grassi,

(参考訳) HTTPSプロトコルは、いくつかの攻撃に対して高いレベルの堅牢性を強制しているが、必要な証明書をイントラネットにセットアップするのは簡単ではない。 Web Assemblyを使ってクライアント側のデータを暗号化する簡単な方法が提案されている。データはクリアテキストとしてサーバに転送されることはない。サーバ内のフィールドの検索は、暗号文と平文との安定した接頭辞対応を保証する符号化方式によって可能となる。本手法はセマンティック・メディカル・データベースのために開発され、不感な情報を明確な形で保持しながら、追加のパスワードを使って個人データにアクセスすることができる。 Web Assemblyは、操作の暗号化/復号化の迅速かつ効率的な実行を保証するために選ばれている。コードはhttps://github.com/mfalda/client-encdec.comで公開されている。

The HTTPS protocol has enforced a higher level of robustness to several attacks; however, it is not easy to set up the required certificates on intranets, nor is it effective in the case the server confidentiality is not reliable, as in the case of cloud services, or it could be compromised. A simple method is proposed to encrypt the data on the client side, using Web Assembly. It never transfers data to the server as clear text. Searching fields in the server is made possible by an encoding scheme that ensures a stable prefix correspondence between ciphertext and plaintext. The method has been developed for a semantic medical database, and allows accessing personal data using an additional password while maintaining non-sensitive information in clear form. Web Assembly has been chosen to guarantee the fast and efficient execution of encrypting/decrypting operations and because of its characteristic of producing modules that are very robust against reverse engineering. The code is available at https://github.com/mfalda/client-encdec.

翻訳日:2024-03-18 11:08:48 公開日:2023-12-29

# スケールされたアフィン$\varphi^4_4$量子ユークリッド共変相対論におけるグリーン関数の連続極限

Continuum limit of the Green function in scaled affine $\varphi^4_4$ quantum Euclidean covariant relativistic field theory ( http://arxiv.org/abs/2402.10903v1 )

ライセンス: Link先を確認

Riccardo Fantoni,

(参考訳) 我々は、経路積分モンテカルロ計算機実験を通じて、$\varphi_4^4$スケールユークリッド共変相対論的スカラー場理論のアフィン量子化が、1点函数と2点函数のよく定義された連続極限を持つ有効な量子場理論であることを証明した。アフィン量子化は、スケールした振る舞いを伴う状況を利用して、量子論の完全な満足な量子化を導き、予期せぬ$\hbar^2/\varphi^2$へと導く。

We prove through path integral Monte Carlo computer experiments that the affine quantization of the $\varphi_4^4$ scaled Euclidean covariant relativistic scalar field theory is a valid quantum field theory with a well defined continuum limit of the one- and two-point-function. Affine quantization leads to a completely satisfactory quantization of field theories using situations that involve scaled behavior leading to an unexpected, $\hbar^2/\varphi^2$ which arises only in the quantum aspects.

翻訳日:2024-03-18 07:28:31 公開日:2023-12-29

# Web 3.0のための技術導入: 総合的な調査

Enabling Technologies for Web 3.0: A Comprehensive Survey ( http://arxiv.org/abs/2401.10901v1 )

ライセンス: Link先を確認

Md Arif Hassan, Mohammad Behdad Jamshidi, Bui Duc Manh, Nam H. Chu, Chi-Hieu Nguyen, Nguyen Quang Hieu, Cong T. Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Nguyen Van Huynh, Mohammad Abu Alsheikh and Eryk Dutkiewicz

(参考訳) Web 3.0はインターネット進化の次の段階であり、自律性、効率性、品質、セキュリティ、プライバシを高めることを目的としている。この進化は、最新の技術開発を利用してコンテンツアクセスを民主化する可能性がある。本稿では,ブロックチェーン,セマンティックWeb,3DインタラクティブWeb,メタバース,仮想現実/拡張現実,モノのインターネット,Web 3.0の形成におけるその役割など,Web 3.0のコンテキストにおけるテクノロジの実現に関する詳細な調査を行う。 web 3.0のコンセプト、基本的なアーキテクチャ、潜在的なアプリケーション、業界の採用など、包括的背景を提供することで開始します。次に、Web 3.0開発に重要なIoT、5G、ブロックチェーン技術の最近のブレークスルーについて検討する。次に、ai、semantic web、および3d interactive webを含む他の実現可能な技術について論じる。これらの技術を利用することで、分散ID、プラットフォームの相互運用性、データの透明性、レイテンシの低減、システムのスケーラビリティ向上など、Web 3.0の実現における重要な課題に効果的に対処できます。最後に、Web 3.0の実装に関連する重要な課題を強調し、潜在的なソリューションを強調し、この分野における今後の研究方向性に関する洞察を提供する。

Web 3.0 represents the next stage of Internet evolution, aiming to empower users with increased autonomy, efficiency, quality, security, and privacy. This evolution can potentially democratize content access by utilizing the latest developments in enabling technologies. In this paper, we conduct an in-depth survey of enabling technologies in the context of Web 3.0, such as blockchain, semantic web, 3D interactive web, Metaverse, Virtual reality/Augmented reality, Internet of Things technology, and their roles in shaping Web 3.0. We commence by providing a comprehensive background of Web 3.0, including its concept, basic architecture, potential applications, and industry adoption. Subsequently, we examine recent breakthroughs in IoT, 5G, and blockchain technologies that are pivotal to Web 3.0 development. Following that, other enabling technologies, including AI, semantic web, and 3D interactive web, are discussed. Utilizing these technologies can effectively address the critical challenges in realizing Web 3.0, such as ensuring decentralized identity, platform interoperability, data transparency, reducing latency, and enhancing the system's scalability. Finally, we highlight significant challenges associated with Web 3.0 implementation, emphasizing potential solutions and providing insights into future research directions in this field.

翻訳日:2024-01-28 16:07:58 公開日:2023-12-29

# Holonic Learning: 柔軟なエージェントベースの分散機械学習フレームワーク

Holonic Learning: A Flexible Agent-based Distributed Machine Learning Framework ( http://arxiv.org/abs/2401.10839v1 )

ライセンス: Link先を確認

Ahmad Esmaeili, Zahra Ghorrati, Eric T. Matson

(参考訳) 過去10年間で、データと計算リソースのユビキタス化が進み、マシンラーニングパラダイムのより分散的なアプローチへの顕著な移行を促している。このような移行は、スケーラビリティとリソース分散の課題に対処するだけでなく、プライバシーとセキュリティの懸念にも対処しようとしている。本稿では,ディープラーニングモデルを学習するための協調的かつプライバシを重視した学習フレームワークであるHoloonic Learning(HoL)を紹介する。ホロニックの概念を活用することで、HoLフレームワークは学習プロセスにおける構造化された自己相似階層を確立し、ホロロン内のコミットメントとコミュニケーションパターンとともに、各ホロンの個々のモデル集約アプローチを通じて、よりニュアンスな協調制御を可能にする。 HoLは一般的な形で、幅広い設計と柔軟性を提供する。本論文は, 実験解析と有効性を示すため, 全ホロンのモデルアグリゲーションに重み付け平均化を用いたHoLの特殊変種であるHoloAvgを実装した。提案手法の収束性は,標準MNIStデータセットのIIDおよび非IID設定における実験により検証される。さらに, 各種設計およびデータ分散シナリオ下でのHoLの性能挙動について検討した。この結果から,特に非IIDデータ分布の文脈において,HoLの競争性能向上の成果が確認された。

Ever-increasing ubiquity of data and computational resources in the last decade have propelled a notable transition in the machine learning paradigm towards more distributed approaches. Such a transition seeks to not only tackle the scalability and resource distribution challenges but also to address pressing privacy and security concerns. To contribute to the ongoing discourse, this paper introduces Holonic Learning (HoL), a collaborative and privacy-focused learning framework designed for training deep learning models. By leveraging holonic concepts, the HoL framework establishes a structured self-similar hierarchy in the learning process, enabling more nuanced control over collaborations through the individual model aggregation approach of each holon, along with their intra-holon commitment and communication patterns. HoL, in its general form, provides extensive design and flexibility potentials. For empirical analysis and to demonstrate its effectiveness, this paper implements HoloAvg, a special variant of HoL that employs weighted averaging for model aggregation across all holons. The convergence of the proposed method is validated through experiments on both IID and Non-IID settings of the standard MNISt dataset. Furthermore, the performance behaviors of HoL are investigated under various holarchical designs and data distribution scenarios. The presented results affirm HoL's prowess in delivering competitive performance particularly, in the context of the Non-IID data distribution.

翻訳日:2024-01-28 16:06:33 公開日:2023-12-29

# ReliCD:信頼性を意識した信頼性認知診断フレームワーク

ReliCD: A Reliable Cognitive Diagnosis Framework with Confidence Awareness ( http://arxiv.org/abs/2401.10749v1 )

ライセンス: Link先を確認

Yunfei Zhang, Chuan Qin, Dazhong Shen, Haiping Ma, Le Zhang, Xingyi Zhang, Hengshu Zhu

(参考訳) 過去数十年間、認知診断モデリングは、学生の学習状況と知識習得レベルを定量化できる計算教育コミュニティにおいて注目を集めてきた。実際、ニューラルネットワークの最近の進歩は、学生や運動の深い表現を学ぶことによって、従来の認知診断モデルの性能を大きく向上させた。それにもかかわらず、既存のアプローチは、学生の熟練度を予測する際の自信過剰の問題に苦しむことが多く、これは主に、現実的な学生と生徒との対話データにおける不可避なノイズとスパーシティによって引き起こされ、診断フィードバックの教育的応用を著しく妨げている。そこで本研究では, 診断フィードバックの信頼性を定量化し, 異なる認知的診断機能に対して柔軟である, 信頼性の高い認知診断(relicd)フレームワークを提案する。具体的には,まず,様々な知識概念の状態の不確実性を明確に推定し,診断フィードバックの信頼性を定量化するベイズ法を提案する。特に、潜在的な差異を考慮し、事前学習モデルを用いて、異なる能力概念の潜伏変数に対する個々の事前分布をモデル化することを提案する。さらに,信頼度ランキングの論理的仮説を導入する。この線に沿って、学生のパフォーマンス予測過程をモデル化し、信頼度パラメータを最適化する新たな校正損失を設計する。最後に、4つの実世界のデータセットに関する広範な実験により、ReliCDフレームワークの有効性が明らかになりました。

During the past few decades, cognitive diagnostics modeling has attracted increasing attention in computational education communities, which is capable of quantifying the learning status and knowledge mastery levels of students. Indeed, the recent advances in neural networks have greatly enhanced the performance of traditional cognitive diagnosis models through learning the deep representations of students and exercises. Nevertheless, existing approaches often suffer from the issue of overconfidence in predicting students' mastery levels, which is primarily caused by the unavoidable noise and sparsity in realistic student-exercise interaction data, severely hindering the educational application of diagnostic feedback. To address this, in this paper, we propose a novel Reliable Cognitive Diagnosis(ReliCD) framework, which can quantify the confidence of the diagnosis feedback and is flexible for different cognitive diagnostic functions. Specifically, we first propose a Bayesian method to explicitly estimate the state uncertainty of different knowledge concepts for students, which enables the confidence quantification of diagnostic feedback. In particular, to account for potential differences, we suggest modeling individual prior distributions for the latent variables of different ability concepts using a pre-trained model. Additionally, we introduce a logical hypothesis for ranking confidence levels. Along this line, we design a novel calibration loss to optimize the confidence parameters by modeling the process of student performance prediction. Finally, extensive experiments on four real-world datasets clearly demonstrate the effectiveness of our ReliCD framework.

翻訳日:2024-01-28 16:06:10 公開日:2023-12-29

# テンソル畳み込みニューラルネットワークを用いた製造におけるブースティング欠陥検出

Boosting Defect Detection in Manufacturing using Tensor Convolutional Neural Networks ( http://arxiv.org/abs/2401.01373v1 )

ライセンス: Link先を確認

Pablo Martin-Ramiro, Unai Sainz de la Maza, Roman Orus, Samuel Mugel

(参考訳) 欠陥検出は製造業における品質管理の段階において最も重要かつ困難な課題の1つである。本稿では,テンソル畳み込みニューラルネットワーク(t-cnn)を導入し,ロバート・ボッシュ製造工場で製造される超音波センサの構成要素の1つである実欠陥検出への応用について検討する。我々の量子インスパイアされたT-CNNは、精度を犠牲にすることなく、等価なCNNモデルのトレーニング速度と性能を大幅に向上するために、縮小モデルパラメータ空間で動作する。より具体的には、t-cnnが従来のcnnと同じ性能を品質指標で測定し、最大15分の1のパラメータと4%から19%の速さで達成できることを実証する。以上の結果から,T-CNNは従来の人間の視覚検査の結果を大きく上回り,製造における実際の応用に価値をもたらすことが示された。

Defect detection is one of the most important yet challenging tasks in the quality control stage in the manufacturing sector. In this work, we introduce a Tensor Convolutional Neural Network (T-CNN) and examine its performance on a real defect detection application in one of the components of the ultrasonic sensors produced at Robert Bosch's manufacturing plants. Our quantum-inspired T-CNN operates on a reduced model parameter space to substantially improve the training speed and performance of an equivalent CNN model without sacrificing accuracy. More specifically, we demonstrate how T-CNNs are able to reach the same performance as classical CNNs as measured by quality metrics, with up to fifteen times fewer parameters and 4% to 19% faster training times. Our results demonstrate that the T-CNN greatly outperforms the results of traditional human visual inspection, providing value in a current real application in manufacturing.

翻訳日:2024-01-15 09:54:35 公開日:2023-12-29

# 大規模言語モデルを用いたオンライン投稿の脅威検出の有効性

Efficacy of Utilizing Large Language Models to Detect Public Threat Posted Online ( http://arxiv.org/abs/2401.02974v1 )

ライセンス: Link先を確認

Taeksoo Kwon (Algorix Convergence Research Office), Connor Kim (Centennial High School)

(参考訳) 本稿では,大規模言語モデル(LLM)を用いてオンライン投稿された公的な脅威を検出する方法を提案する。暴力に対するレトリックや先進的な警告の拡散に対する懸念が高まっている中、自動コンテンツ分析技術は早期の識別とモデレーションに役立つ可能性がある。カスタムデータ収集ツールは、500の非脅威例と20の脅威からなる、韓国の人気のあるオンラインコミュニティからの投稿タイトルを集めるために開発された。様々なLSM(GPT-3.5、GPT-4、PaLM)は個々のポストを「脅威」または「安全」に分類するよう促された。統計的分析では、全てのモデルが強い精度を示し、脅威と非脅威の識別の両方に対して適合性テストの2乗精度を渡した。 GPT-4は総じて97.9%の非脅威と100%の脅威精度で性能が向上した。 PaLM APIの価格設定はコスト効率が高かった。以上の結果から,LLMは大規模コンテンツモデレーションを効果的に強化し,新たなオンラインリスクを軽減できる可能性が示唆された。しかし、バイアス、透明性、倫理的監視は、現実の実施前に重要な考慮事項である。

This paper examines the efficacy of utilizing large language models (LLMs) to detect public threats posted online. Amid rising concerns over the spread of threatening rhetoric and advance notices of violence, automated content analysis techniques may aid in early identification and moderation. Custom data collection tools were developed to amass post titles from a popular Korean online community, comprising 500 non-threat examples and 20 threats. Various LLMs (GPT-3.5, GPT-4, PaLM) were prompted to classify individual posts as either "threat" or "safe." Statistical analysis found all models demonstrated strong accuracy, passing chi-square goodness of fit tests for both threat and non-threat identification. GPT-4 performed best overall with 97.9% non-threat and 100% threat accuracy. Affordability analysis also showed PaLM API pricing as highly cost-efficient. The findings indicate LLMs can effectively augment human content moderation at scale to help mitigate emerging online risks. However, biases, transparency, and ethical oversight remain vital considerations before real-world implementation.

翻訳日:2024-01-15 09:31:35 公開日:2023-12-29

# ANALYTiC:機械学習における決定境界と次元化の理解

ANALYTiC: Understanding Decision Boundaries and Dimensionality Reduction in Machine Learning ( http://arxiv.org/abs/2401.05418v1 )

ライセンス: Link先を確認

Salman Haidri

(参考訳) コンパクトでハンドヘルドなデバイスが登場したことで、追跡された動きデータのプールができ、それを使ってトレンドやパターンを推測できるようになりました。動物、人間、車両等の様々な軌跡データの洪水により、ANALYTiCのアイデアは、ラベル付きデータの集合から学習することで、軌跡から意味的アノテーションを推論するアクティブラーニングによって生まれた。本研究は,データ中のパターンやクラスタを強調表示し,現在あるアクティブラーニングと組み合わせて,次元削減と決定境界の適用について検討する。これらの特徴を3つの異なる軌道データセットでテストし,ラベル付きデータの活用と解釈性の向上を目標とした。実験により,これらの組み合わせ手法がトラジェクティブラベリングの効率と精度を向上させる可能性を実証した。この研究は、運動データ分析の文脈における機械学習と視覚的手法のより広範な統合に向けた足掛かりとなる。

The advent of compact, handheld devices has given us a pool of tracked movement data that could be used to infer trends and patterns that can be made to use. With this flooding of various trajectory data of animals, humans, vehicles, etc., the idea of ANALYTiC originated, using active learning to infer semantic annotations from the trajectories by learning from sets of labeled data. This study explores the application of dimensionality reduction and decision boundaries in combination with the already present active learning, highlighting patterns and clusters in data. We test these features with three different trajectory datasets with objective of exploiting the the already labeled data and enhance their interpretability. Our experimental analysis exemplifies the potential of these combined methodologies in improving the efficiency and accuracy of trajectory labeling. This study serves as a stepping-stone towards the broader integration of machine learning and visual methods in context of movement data analysis.

翻訳日:2024-01-15 08:20:54 公開日:2023-12-29

# 慣性センサ信号強調のためのウェーブレット動的選択ネットワーク

Wavelet Dynamic Selection Network for Inertial Sensor Signal Enhancement ( http://arxiv.org/abs/2401.05416v1 )

ライセンス: Link先を確認

Yifeng Wang, Yi Zhao

(参考訳) 姿勢や動きを感知するコンポーネントとして、慣性センサーは様々な携帯機器で広く使われている。しかし、慣性センサーの過酷なエラーは、特に軌道回復と意味認識の機能を阻害する。主流信号処理法として、ウェーブレット基底関数が豊富で多様なため、ウェーブレットは信号の数学的顕微鏡として評価される。しかし、慣性センサの複雑なノイズタイプと応用シナリオにより、ウェーブレットベースパープレキシングが選択される。本研究では,可変慣性信号に対して適切なウェーブレット基底をインテリジェントに選択するウェーブレット動的選択ネットワーク(wdsnet)を提案する。さらに、既存のディープラーニングアーキテクチャは、入力データから特徴を抽出する上で優れているが、カテゴリ認識能力の向上に不可欠である対象カテゴリの特徴を学習することを無視し、ウェーブレットベースの選択を改善する。そこで本研究では,トレーニング可能なパラメータを増やすことなく,カテゴリの特徴を抽出し,表現できるカテゴリ表現機構を提案する。さらにcrmは、共通の完全連結ネットワークをカテゴリ表現に変換し、遠方かつ自明な1つのホットな分類ラベルよりも特徴抽出器を注意深く監視する。本稿では,ネットワーク上で解釈可能性を設定し,特徴抽出器の特徴監督機構を監督するプロセスと呼び,その効果を実験的・理論的に実証する。拡張された慣性信号は、軌道再構成などの元の信号に関して実行不能なタスクを実行できる。定量的およびビジュアルな結果は、WDSNetが既存の手法より優れていることを示している。注目すべきは、WDSNetは弱教師付き手法として、比較された全教師付き手法の最先端性能を達成することである。

As attitude and motion sensing components, inertial sensors are widely used in various portable devices. But the severe errors of inertial sensors restrain their function, especially the trajectory recovery and semantic recognition. As a mainstream signal processing method, wavelet is hailed as the mathematical microscope of signal due to the plentiful and diverse wavelet basis functions. However, complicated noise types and application scenarios of inertial sensors make selecting wavelet basis perplexing. To this end, we propose a wavelet dynamic selection network (WDSNet), which intelligently selects the appropriate wavelet basis for variable inertial signals. In addition, existing deep learning architectures excel at extracting features from input data but neglect to learn the characteristics of target categories, which is essential to enhance the category awareness capability, thereby improving the selection of wavelet basis. Therefore, we propose a category representation mechanism (CRM), which enables the network to extract and represent category features without increasing trainable parameters. Furthermore, CRM transforms the common fully connected network into category representations, which provide closer supervision to the feature extractor than the far and trivial one-hot classification labels. We call this process of imposing interpretability on a network and using it to supervise the feature extractor the feature supervision mechanism, and its effectiveness is demonstrated experimentally and theoretically in this paper. The enhanced inertial signal can perform impracticable tasks with regard to the original signal, such as trajectory reconstruction. Both quantitative and visual results show that WDSNet outperforms the existing methods. Remarkably, WDSNet, as a weakly-supervised method, achieves the state-of-the-art performance of all the compared fully-supervised methods.

翻訳日:2024-01-15 08:20:39 公開日:2023-12-29

# グローバル外交実践における生成AIの役割:戦略的枠組み

The Role of Generative AI in Global Diplomatic Practices: A Strategic Framework ( http://arxiv.org/abs/2401.05415v1 )

ライセンス: Link先を確認

Muneera Bano, Zahid Chaudhri, Didar Zowghi

(参考訳) 21世紀に人工知能(AI)が外交の領域を変革するにつれ、この研究はこれらの進歩の双対性を評価することの必要性に対処し、それらがもたらす課題と彼らが与える機会の両方を解き放つ。 OpenAIによるChatGPTのローンチから1年近くが経ち、さまざまな作業領域にその機能を持たせた。これらの能力を外交に応用する範囲はまだ完全には解明されていない。我々の研究目的は、デジタル・AI外交に関する現在の言論を体系的に検討し、現代の外交実践におけるジェネレーティブ・AIの役割のための包括的枠組みの開発を知らせることである。 230の学術論文の体系的な分析を通じて、我々は機会と課題のスペクトルを特定し、ジェネレーティブAIの統合のための多面的概念を捉え、外交における将来の研究と革新のコースを設定する戦略的枠組みに到達した。

As Artificial Intelligence (AI) transforms the domain of diplomacy in the 21st century, this research addresses the pressing need to evaluate the dualistic nature of these advancements, unpacking both the challenges they pose and the opportunities they offer. It has been almost a year since the launch of ChatGPT by OpenAI that revolutionised various work domains with its capabilities. The scope of application of these capabilities to diplomacy is yet to be fully explored or understood. Our research objective is to systematically examine the current discourse on Digital and AI Diplomacy, thus informing the development of a comprehensive framework for the role of Generative AI in modern diplomatic practices. Through the systematic analysis of 230 scholarly articles, we identified a spectrum of opportunities and challenges, culminating in a strategic framework that captures the multifaceted concepts for integration of Generative AI, setting a course for future research and innovation in diplomacy.

翻訳日:2024-01-15 08:20:15 公開日:2023-12-29

# 固有状態遷移におけるスケール不変臨界ダイナミクス

Scale-invariant critical dynamics at eigenstate transitions ( http://arxiv.org/abs/2309.16005v2 )

ライセンス: Link先を確認

Miroslav Hopjan, Lev Vidmar

(参考訳) スケール不変ダイナミクスの概念は、スペクトル形式因子(sff)のランプの出現によって示されるように、量子カオス系において後期によく確立されている。先行論文[phys. rev. lett. 131, 060404 (2023)]の結果に基づいて,臨界点における生存確率とsffのスケール不変ダイナミクス,すなわち量子カオスから局在への固有状態遷移の特徴を考察する。量子カオス状態とは対照的に、臨界における量子力学は、後期のスケール不変性を示すだけでなく、中間時力学と呼ばれるより短い時間で現れることを示す。結果は二次モデルと相互作用モデルの両方に適用できる。具体的には,3次元から5次元のアンダーソンモデル,前者のパワールールランダムバンド行列,後者の量子太陽モデルと超メトリックモデル,およびローゼンツヴァイク・ポーターモデルについて検討した。

The notion of scale-invariant dynamics is well established at late times in quantum chaotic systems, as illustrated by the emergence of a ramp in the spectral form factor (SFF). Building on the results of the preceding Letter [Phys. Rev. Lett. 131, 060404 (2023)], we explore features of scale-invariant dynamics of survival probability and SFF at criticality, i.e., at eigenstate transitions from quantum chaos to localization. We show that, in contrast to the quantum chaotic regime, the quantum dynamics at criticality do not only exhibit scale invariance at late times, but also at much shorter times that we refer to as mid-time dynamics. Our results apply to both quadratic and interacting models. Specifically, we study Anderson models in dimensions three to five and power-law random banded matrices for the former, and the quantum sun model and the ultrametric model for the latter, as well as the Rosenzweig-Porter model.

翻訳日:2024-01-03 19:52:20 公開日:2023-12-29

# ハイブリッドモデリングデザインパターン

Hybrid Modeling Design Patterns ( http://arxiv.org/abs/2401.00033v1 )

ライセンス: Link先を確認

Maja Rudolph, Stefan Kurz, Barbara Rakitsch

(参考訳) デザインパターンは、繰り返し発生するモデリング課題にソリューションを伝達する体系的な方法を提供する。本稿では、第一原理に基づくモデリングとデータ駆動モデリング技術を組み合わせたハイブリッドモデリングの設計パターンを紹介する。どちらのアプローチも相補的な利点がある一方で、それらをハイブリッドモデルに組み合わせる方法は多々あり、適切な解決策は目前にある問題に依存します。本稿では、データ駆動コンポーネントとドメイン知識をハイブリッドアプローチに組み合わせるための青写真として機能する4つの基本パターンを提案する。さらに,基本パターンとより複雑なハイブリッドモデルの組み合わせを規定する2つの構成パターンも提示する。各デザインパターンは、気候モデリング、工学、物理学といった応用分野の典型的なユースケースによって示される。

Design patterns provide a systematic way to convey solutions to recurring modeling challenges. This paper introduces design patterns for hybrid modeling, an approach that combines modeling based on first principles with data-driven modeling techniques. While both approaches have complementary advantages there are often multiple ways to combine them into a hybrid model, and the appropriate solution will depend on the problem at hand. In this paper, we provide four base patterns that can serve as blueprints for combining data-driven components with domain knowledge into a hybrid approach. In addition, we also present two composition patterns that govern the combination of the base patterns into more complex hybrid models. Each design pattern is illustrated by typical use cases from application areas such as climate modeling, engineering, and physics.

翻訳日:2024-01-03 19:18:58 公開日:2023-12-29

# 意思決定基盤モデルのための自己指導型事前学習: 定式化, パイプライン, 課題

Self-supervised Pretraining for Decision Foundation Model: Formulation, Pipeline and Challenges ( http://arxiv.org/abs/2401.00031v1 )

ライセンス: Link先を確認

Xiaoqian Liu, Jianbin Jiao, Junge Zhang

(参考訳) 意思決定(Decision-making)は、選択と最適なポリシーを見つけるために知覚、記憶、推論を必要とする動的なプロセスである。意思決定の伝統的なアプローチはサンプルの効率と一般化に苦しむ一方で、大規模な自己教師付き事前学習は言語やビジョンにおける微調整や少数ショット学習による迅速な適応を可能にしている。そこで我々は,大規模な自己指導型事前学習から得られる知識を下流の意思決定問題に統合する。本稿では,事前学習と下流推定のためのデータ収集,事前学習目標,適応戦略に関する最近の研究について述べる。最後に,総合的かつ柔軟な自己指導型事前学習の助けを借りて,意思決定基盤モデル開発における重要な課題と今後の方向性を明らかにする。

Decision-making is a dynamic process requiring perception, memory, and reasoning to make choices and find optimal policies. Traditional approaches to decision-making suffer from sample efficiency and generalization, while large-scale self-supervised pretraining has enabled fast adaptation with fine-tuning or few-shot learning in language and vision. We thus argue to integrate knowledge acquired from generic large-scale self-supervised pretraining into downstream decision-making problems. We propose Pretrain-Then-Adapt pipeline and survey recent work on data collection, pretraining objectives and adaptation strategies for decision-making pretraining and downstream inference. Finally, we identify critical challenges and future directions for developing decision foundation model with the help of generic and flexible self-supervised pretraining.

翻訳日:2024-01-03 19:18:47 公開日:2023-12-29

# OCRのスケーリング法則に関する実証的研究

An Empirical Study of Scaling Law for OCR ( http://arxiv.org/abs/2401.00028v1 )

ライセンス: Link先を確認

Miao Rang, Zhenni Bi, Chuanjian Liu, Yunhe Wang, Kai Han

(参考訳) モデルサイズ、データボリューム、計算、モデル性能の法則は自然言語処理(nlp)の分野で広く研究されてきた。しかし、光学文字認識(OCR)におけるスケーリング法則はまだ研究されていない。そこで本研究では,テキスト認識分野におけるモデルの性能とスケール,データボリューム,計算の相関関係を総合的に検討し,他の要因が一定である場合に,性能とモデルサイズ間のスムーズなパワー則と,データボリュームのトレーニングを行う。さらに,600万実サンプルと1800万合成サンプルからなる,rebu-synと呼ばれる大規模データセットを構築した。スケーリング法則と新しいデータセットに基づいて、シーンテキスト認識モデルをトレーニングし、トップ1の平均精度97.42%の6つの一般的なテストベンチマーク上で、最先端の新たなテストを実現しました。

The laws of model size, data volume, computation and model performance have been extensively studied in the field of Natural Language Processing (NLP). However, the scaling laws in Optical Character Recognition (OCR) have not yet been investigated. To address this, we conducted comprehensive studies that involved examining the correlation between performance and the scale of models, data volume and computation in the field of text recognition.Conclusively, the study demonstrates smooth power laws between performance and model size, as well as training data volume, when other influencing factors are held constant. Additionally, we have constructed a large-scale dataset called REBU-Syn, which comprises 6 million real samples and 18 million synthetic samples. Based on our scaling law and new dataset, we have successfully trained a scene text recognition model, achieving a new state-ofthe-art on 6 common test benchmarks with a top-1 average accuracy of 97.42%.

翻訳日:2024-01-03 19:18:17 公開日:2023-12-29

# 学習可能な離散ウェーブレット変換を用いたブラインド動作劣化のための高能率マルチスケールネットワーク

Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring ( http://arxiv.org/abs/2401.00027v1 )

ライセンス: Link先を確認

Xin Gao, Tianheng Qiu, Xinyu Zhang, Hanlin Bai, Kang Liu, Xuan Huang, Hu Wei, Guoying Zhang, Huaping Liu

(参考訳) しかし、ディープラーニングの文脈では、既存のマルチスケールアルゴリズムでは、低スケールのrgb画像とディープセマンティクスの融合に複雑なモジュールを使用するだけでなく、手作業で十分な信頼性を持たない低解像度のイメージ対を生成する必要がある。本研究では,simo(single-input and multiple-outputs)に基づくマルチスケールネットワークを提案する。これにより、粗大なスキームに基づくアルゴリズムの複雑さを単純化する。マルチスケールアーキテクチャを用いて得られた詳細情報に影響を及ぼす復元欠陥を軽減するため,実世界のぼやけた軌跡の特徴を学習可能なウェーブレット変換モジュールと組み合わせて,ぼやけた画像から鋭い画像へのステップバイステップ遷移の方向連続性と周波数特性に着目した。そこで本研究では,実世界の分散データセットにおいて,主観的および客観的品質と計算効率の両面で最先端の性能を示す学習可能な離散ウェーブレット変換(mlwnet)を用いたマルチスケールネットワークを提案する。

Coarse-to-fine schemes are widely used in traditional single-image motion deblur; however, in the context of deep learning, existing multi-scale algorithms not only require the use of complex modules for feature fusion of low-scale RGB images and deep semantics, but also manually generate low-resolution pairs of images that do not have sufficient confidence. In this work, we propose a multi-scale network based on single-input and multiple-outputs(SIMO) for motion deblurring. This simplifies the complexity of algorithms based on a coarse-to-fine scheme. To alleviate restoration defects impacting detail information brought about by using a multi-scale architecture, we combine the characteristics of real-world blurring trajectories with a learnable wavelet transform module to focus on the directional continuity and frequency features of the step-by-step transitions between blurred images to sharp images. In conclusion, we propose a multi-scale network with a learnable discrete wavelet transform (MLWNet), which exhibits state-of-the-art performance on multiple real-world deblurred datasets, in terms of both subjective and objective quality as well as computational efficiency.

翻訳日:2024-01-03 19:18:01 公開日:2023-12-29

# マルチパーティ量子相互情報:代替定義」への回答

Reply to "Comment on `Multiparty quantum mutual information: An alternative definition'" ( http://arxiv.org/abs/2401.00026v1 )

ライセンス: Link先を確認

Asutosh Kumar

(参考訳) 我々はLeeらの主張を再確認する。 [先述の A 108, 066401 (2023)] は、以前の研究(A. Kumar, Phys. Rev. A 96, 012332 (2017))で提案された量子相対エントロピーの観点から、多部系における量子双対全相関の式は正しくない。量子相対エントロピーの観点から、量子双対全相関の代替式(s)を提供する。しかし、量子双対全相関の計算では、フォン・ノイマンのエントロピーの観点からその表現を使うべきであると仮定する。

We reaffirm the claim of Lee et al. [preceding Comment, Phys. Rev. A 108, 066401 (2023)] that the expression of quantum dual total correlation of a multipartite system in terms of quantum relative entropy as proposed in previous work [A. Kumar, Phys. Rev. A 96, 012332 (2017)] is not correct. We provide alternate expression(s) of quantum dual total correlation in terms of quantum relative entropy. We, however, prescribe that in computing quantum dual total correlation one should use its expression in terms of von Neumann entropy.

翻訳日:2024-01-03 19:17:39 公開日:2023-12-29

# モデルミス種別を用いた適応線形二次制御の漸近回帰解析

Nonasymptotic Regret Analysis of Adaptive Linear Quadratic Control with Model Misspecification ( http://arxiv.org/abs/2401.00073v1 )

ライセンス: Link先を確認

Bruce D. Lee, Anders Rantzer, Nikolai Matni

(参考訳) 多様なデータセット上で大きなモデルを事前学習し、特定のアプリケーション用に微調整するという戦略は、コンピュータビジョン、自然言語処理、ロボット制御に素晴らしい結果をもたらした。この戦略は適応制御において大きな可能性を秘めており、限られたデータで変化する条件に迅速に適応する必要がある。適応制御のための事前学習の利点を具体的に理解するために,学習者がダイナミクスのための基底行列の集合の事前知識を有する場合の適応線形二次制御問題について検討する。この根拠は、基盤となるデータ生成プロセスのダイナミクスを完全に表現できないという意味では不明確である。先行する知識を用いて,システムと$t$インタラクションを行った後,期待する後悔の上限を証明できるアルゴリズムを提案する。 t$ が小さいレジームでは、上限は、学習者が利用可能な事前知識に応じて、$\texttt{poly}(\log t)$ または $\sqrt{t}$ の項スケールによって支配される。 t$ が大きければ、後悔は$\delta t$ で成長する言葉によって支配され、$\delta$ は誤特定のレベルを定量化する。この線形項は、不特定の基底を用いて基礎となる力学を完全に推定できないため、基底行列がオンラインでも適用されない限り避けられない。しかし、基底行列の重みを推定する誤りによって生じる部分線型項が無視できるようになった後、大きな t$ に対してのみ支配的である。我々は解析を検証するシミュレーションを提供する。また,本シミュレーションでは,関連システムの集合からのオフラインデータを事前学習段階の一部として使用して,適応制御器で使用される不特定なダイナミクスベースを推定する。

The strategy of pre-training a large model on a diverse dataset, then fine-tuning for a particular application has yielded impressive results in computer vision, natural language processing, and robotic control. This strategy has vast potential in adaptive control, where it is necessary to rapidly adapt to changing conditions with limited data. Toward concretely understanding the benefit of pre-training for adaptive control, we study the adaptive linear quadratic control problem in the setting where the learner has prior knowledge of a collection of basis matrices for the dynamics. This basis is misspecified in the sense that it cannot perfectly represent the dynamics of the underlying data generating process. We propose an algorithm that uses this prior knowledge, and prove upper bounds on the expected regret after $T$ interactions with the system. In the regime where $T$ is small, the upper bounds are dominated by a term scales with either $\texttt{poly}(\log T)$ or $\sqrt{T}$, depending on the prior knowledge available to the learner. When $T$ is large, the regret is dominated by a term that grows with $\delta T$, where $\delta$ quantifies the level of misspecification. This linear term arises due to the inability to perfectly estimate the underlying dynamics using the misspecified basis, and is therefore unavoidable unless the basis matrices are also adapted online. However, it only dominates for large $T$, after the sublinear terms arising due to the error in estimating the weights for the basis matrices become negligible. We provide simulations that validate our analysis. Our simulations also show that offline data from a collection of related systems can be used as part of a pre-training stage to estimate a misspecified dynamics basis, which is in turn used by our adaptive controller.

翻訳日:2024-01-03 19:06:24 公開日:2023-12-29

# 磁気材料の機械学習モデル

Machine-learned models for magnetic materials ( http://arxiv.org/abs/2401.00072v1 )

ライセンス: Link先を確認

Pawe{\l} Leszczy\'nski, Kamil Kutorasi\'nski, Marcin Szewczyk, and Jaros{\l}aw Paw{\l}owski

(参考訳) 本稿では,ディープニューラルネットワークを用いた材料モデリングのための汎用フレームワークを提案する。多次元特性(測定を模倣する)で表される材料は、教師なしの方法で神経オートエンコーダモデルを訓練するために使用される。エンコーダは、デコーダ部分で使用される理論モデルの材料パラメータを予測しようとしている。デコーダは予測パラメータを用いて入力特性を再構成する。ニューラルモデルは、様々な物質的振る舞いをカバーできる合成的に生成された特性の集合を捉えるように訓練され、単一の測定のためにモデルパラメータを最適化するのではなく、基礎となる物理学を一般化できるモデルへと導かれる。モデルの設定後、周波数領域と電流領域の磁性物質を同時にモデル化する複雑な問題において、その有用性を証明する。

We present a general framework for modeling materials using deep neural networks. Material represented by multidimensional characteristics (that mimic measurements) is used to train the neural autoencoder model in an unsupervised manner. The encoder is trying to predict the material parameters of a theoretical model, which is then used in a decoder part. The decoder, using the predicted parameters, reconstructs the input characteristics. The neural model is trained to capture a synthetically generated set of characteristics that can cover a broad range of material behaviors, leading to a model that can generalize on the underlying physics rather than just optimize the model parameters for a single measurement. After setting up the model we prove its usefulness in the complex problem of modeling magnetic materials in the frequency and current (out-of-linear range) domains simultaneously.

翻訳日:2024-01-03 19:05:56 公開日:2023-12-29

# 可変相互作用によるボソニックダイナミクスの有限性決定

Deciding finiteness of bosonic dynamics with tunable interactions ( http://arxiv.org/abs/2401.00069v1 )

ライセンス: Link先を確認

David Edward Bruschi, Andr\'e Xuereb and Robert Zeier

(参考訳) この研究では、ボソニック量子力学の分解に動機付けられ、対応するリー代数(無限次元かもしれない)を研究する。このような因子分解を特徴付けるために、これらのリー代数の条件を有限次元とする。各自由ハミルトン項がそれ自体が生成リー代数の元である場合を考える。提案手法では,スキュー・エルミートボソニック作用素を適切な部分空間に体系的に分割し,リー代数自体の次元を測るために用いられるスキュー・エルミート作用素の特定の列を構成する新しいツールを開発する。この結果の意義は、特定のハミルトニアンの独立制御生成子のみを制約する条件に依存するため、生成されたリー代数の有限性を検証する効果的なアルゴリズムを提供する。さらに、この結果は、生成および消滅作用素の多項式をワイル代数(weyl algebra)と呼ぶ数学的仕事と密接に結びついている。私たちの研究は、量子制御と量子技術に関連するボソニックダイナミクスの分解をよりよく理解するための道を開くものです。

In this work we are motivated by factorization of bosonic quantum dynamics and we study the corresponding Lie algebras, which can potentially be infinite dimensional. To characterize such factorization, we identify conditions for these Lie algebras to be finite dimensional. We consider cases where each free Hamiltonian term is itself an element of the generated Lie algebra. In our approach, we develop new tools to systematically divide skew-hermitian bosonic operators into appropriate subspaces, and construct specific sequences of skew-hermitian operators that are used to gauge the dimensionality of the Lie algebras themselves. The significance of our result relies on conditions that constrain only the independently controlled generators in a particular Hamiltonian, thereby providing an effective algorithm for verifying the finiteness of the generated Lie algebra. In addition, our results are tightly connected to mathematical work where the polynomials of creation and annihilation operators are known as the Weyl algebra. Our work paves the way for better understanding factorization of bosonic dynamics relevant to quantum control and quantum technology.

翻訳日:2024-01-03 19:05:42 公開日:2023-12-29

# 任意領域における粒子形状モデリング

Particle-Based Shape Modeling for Arbitrary Regions-of-Interest ( http://arxiv.org/abs/2401.00067v1 )

ライセンス: Link先を確認

Hong Xu, Alan Morris, Shireen Y. Elhabian

(参考訳) 統計的形状モデリング (SSM) は解剖学的構造の形態変化を定量的に解析する手法である。これらの分析は、特定の形態学的特徴に焦点を当てるために、対象の解剖学的領域の建築モデルを必要とすることが多い。任意の領域の形状モデリングを可能にするために,広く使用されているssmフレームワークである \particle-based shape modeling (psm) の拡張を提案する。興味のある領域を定義する既存の方法は計算コストが高く、トポロジカルな制限がある。これらの欠点に対処するために、メッシュフィールドを使用して自由形式の制約を定義し、任意の領域の関心を形状面に分割することができる。さらに,モデル最適化に二次ペナルティ法を付加することにより,切削面と自由形式の制約の組み合わせを計算効率良く実行できるようにする。本手法の有効性を,難解な合成データセットと2つの医学データセットに示す。

Statistical Shape Modeling (SSM) is a quantitative method for analyzing morphological variations in anatomical structures. These analyses often necessitate building models on targeted anatomical regions of interest to focus on specific morphological features. We propose an extension to \particle-based shape modeling (PSM), a widely used SSM framework, to allow shape modeling to arbitrary regions of interest. Existing methods to define regions of interest are computationally expensive and have topological limitations. To address these shortcomings, we use mesh fields to define free-form constraints, which allow for delimiting arbitrary regions of interest on shape surfaces. Furthermore, we add a quadratic penalty method to the model optimization to enable computationally efficient enforcement of any combination of cutting-plane and free-form constraints. We demonstrate the effectiveness of this method on a challenging synthetic dataset and two medical datasets.

翻訳日:2024-01-03 19:05:23 公開日:2023-12-29

# 新規金属合金の3d印刷のための加速プロセス開発

Accelerating Process Development for 3D Printing of New Metal Alloys ( http://arxiv.org/abs/2401.00065v1 )

ライセンス: Link先を確認

David Guirguis, Conrad Tucker, Jack Beuth

(参考訳) 3Dプリントされた金属の品質の不確実性や変動に対処することで、この技術が広く使われるようになる。新しい合金のプロセスマッピングは、許容できる印刷品質を一貫して生み出す最適なプロセスパラメータを決定するために不可欠である。プロセスマッピングは通常、従来の方法で行われ、実験の設計や印刷部品のex situ characterizationに使用される。一方,In situ手法は観測可能な特徴が限られており,精度を高めるためには複雑な高コスト設定が必要であるため,制限されている。ビデオビジョントランスと高速イメージングを用いたレーザ-金属相互作用における溶融金属力学の時間的特徴を取り入れることで,これらの制約を緩和する。我々の手法は既存の商用機械で利用でき、効率的な欠陥と変数の定量化のためのその場プロセスマップを提供することができる。本手法の汎用性は, 組成や内在性熱流動特性の異なる合金に対して, クロスデータセット評価を行うことによって実証される。

Addressing the uncertainty and variability in the quality of 3D printed metals can further the wide spread use of this technology. Process mapping for new alloys is crucial for determining optimal process parameters that consistently produce acceptable printing quality. Process mapping is typically performed by conventional methods and is used for the design of experiments and ex situ characterization of printed parts. On the other hand, in situ approaches are limited because their observable features are limited and they require complex high-cost setups to obtain temperature measurements to boost accuracy. Our method relaxes these limitations by incorporating the temporal features of molten metal dynamics during laser-metal interactions using video vision transformers and high-speed imaging. Our approach can be used in existing commercial machines and can provide in situ process maps for efficient defect and variability quantification. The generalizability of the approach is demonstrated by performing cross-dataset evaluations on alloys with different compositions and intrinsic thermofluid properties.

翻訳日:2024-01-03 19:05:10 公開日:2023-12-29

# 排他性グラフによるハイブリッド因果構造の特徴付け

Characterizing Hybrid Causal Structures with the Exclusivity Graph Approach ( http://arxiv.org/abs/2401.00063v1 )

ライセンス: Link先を確認

Giovanni Rodari, Davide Poderini, Emanuele Polino, Alessia Suprano, Fabio Sciarrino, Rafael Chaves

(参考訳) 一般因果構造によって制約された相関集合の幾何解析は基礎的・量子的技術研究において最も重要なものである。この課題に対処することは一般的に困難であり、異なるシナリオのための多様な理論的手法の開発を促す。近年, 因果構造の異なる部分における異なる因果仮定を組み合わせた新たなハイブリッドシナリオが出現している。本研究では,古典的,量子的,非シグナリングな分布を,古典的因果制約や弱い非シグナリングが因果構造の異なるノードに使用されるハイブリッドシナリオにおいて探索するために,グラフ理論手法を拡張した。そのような因果関係を無向グラフにマッピングすることで、対応する分布の集合を特徴付け、それらの関係を分析することができる。特に本手法では,古典的,量子的,無信号的動作を同時に区別できるベル的不等式を最小化し,対応する境界を効率的に推定する方法を示す。この手法は量子ネットワークの研究や量子情報処理への応用のための強力なツールである。

Analyzing the geometry of correlation sets constrained by general causal structures is of paramount importance for foundational and quantum technology research. Addressing this task is generally challenging, prompting the development of diverse theoretical techniques for distinct scenarios. Recently, novel hybrid scenarios combining different causal assumptions within different parts of the causal structure have emerged. In this work, we extend a graph theoretical technique to explore classical, quantum, and no-signaling distributions in hybrid scenarios, where classical causal constraints and weaker no-signaling ones are used for different nodes of the causal structure. By mapping such causal relationships into an undirected graph we are able to characterize the associated sets of compatible distributions and analyze their relationships. In particular we show how with our method we can construct minimal Bell-like inequalities capable of simultaneously distinguishing classical, quantum, and no-signaling behaviors, and efficiently estimate the corresponding bounds. The demonstrated method will represent a powerful tool to study quantum networks and for applications in quantum information tasks.

翻訳日:2024-01-03 19:04:55 公開日:2023-12-29

# 組織効果のための意味コンピューティング--組織理論から意味論的モデリングまで

Semantic Computing for Organizational Effectiveness: From Organization Theory to Practice through Semantics-Based Modelling ( http://arxiv.org/abs/2401.00062v1 )

ライセンス: Link先を確認

Mena Rizk, Daniela Rosu, Mark Fox

(参考訳) 組織の重要な機能は、その目的を達成するために必要な統合のレベル(調整と協力)を育むことである。協力するための調整とモチベーションの必要性は、組織のメンバとその作業との間の無数の依存関係から生まれます。したがって、協調と協力の問題に対する解決策を推論するには、基礎となる依存関係を含む堅牢な表現が必要である。このような表現が正式な組織モデルから欠落していることが分かっています。確立された組織研究と、北米最大の自治体との広範囲にわたるフィールドワークに基づいて、(1) 結果、報酬、疫学依存といった概念を運用するオントロジーを導入し、その統合リスクとの関連性、(2) 複雑な政府インフラプロジェクトにおける統合を分析・支援するためのこのオントロジーの現実的応用について述べる。オントロジーはZ3とOWLの両方で実装・検証されている。モデルの主な特徴は、推論可能な依存関係、説明可能な協調と協力のリスク、リスクを軽減するために組織内の依存関係構造をどのように変更できるかに関するアクション可能な洞察などです。インセンティブのミスアライメントやフリーライディング,サブゴール最適化といった現実的な課題を依存性構造の観点から概念化する上で,セマンティクスに基づくアプローチは,協調と協力をモデル化し,強化するための新しい手法である。意思決定支援システムに統合されたこのモデルは、組織設計と有効性に影響を及ぼす助けとなるかもしれない。より広範に、我々のアプローチは、既存の組織理論から有形で現実的な価値を導き出す意味論の変革の可能性を強調している。

A critical function of an organization is to foster the level of integration (coordination and cooperation) necessary to achieve its objectives. The need to coordinate and motivation to cooperate emerges from the myriad dependencies between an organization's members and their work. Therefore, to reason about solutions to coordination and cooperation problems requires a robust representation that includes the underlying dependencies. We find that such a representation remains missing from formal organizational models, and we leverage semantics to bridge this gap. Drawing on well-established organizational research and our extensive fieldwork with one of North America's largest municipalities, (1) we introduce an ontology, formalized in first-order logic, that operationalizes concepts like outcome, reward, and epistemic dependence, and their links to potential integration risks; and (2) present real-world applications of this ontology to analyze and support integration in complex government infrastructure projects. Our ontology is implemented and validated in both Z3 and OWL. Key features of our model include inferable dependencies, explainable coordination and cooperation risks, and actionable insights on how dependency structures within an organization can be altered to mitigate the risks. Conceptualizing real-world challenges like incentive misalignment, free-riding, and subgoal optimization in terms of dependency structures, our semantics-based approach represents a novel method for modelling and enhancing coordination and cooperation. Integrated within a decision-support system, our model may serve as an impactful aid for organizational design and effectiveness. More broadly, our approach underscores the transformative potential of semantics in deriving tangible, real-world value from existing organization theory.

翻訳日:2024-01-03 19:04:37 公開日:2023-12-29

# 界面を透過する波動と粒子:可逆性とコヒーレンス

Transmission of waves and particles through the interface: reversibility and coherence ( http://arxiv.org/abs/2401.00059v1 )

ライセンス: Link先を確認

A.P. Meilakhs

(参考訳) 本稿では, 量子粒子(フォノン, 電子, 光子)の界面透過性について検討し, 様々な物理シナリオにおける普遍パターンを同定する。古典波動方程式から始め、それらを量子化し、運動方程式を導出する。これらは界面における粒子の分布関数のマッチング条件である。熱輸送のような不可逆過程を正確に記述するための重要な特徴である、導出方程式の時間的不可逆性に留意する。我々は, 波動方程式の時間対称性が乱れ, 入射波の非コヒーレンスを仮定して, 導出の分岐を同定する。その結果,非コヒーレント伝送が時間的不可逆性を示すことがわかった。我々はこの仮説を検証する実験を提案する。

We examine the transmission of quantum particles (phonons, electrons, and photons) across interfaces, identifying universal patterns in diverse physical scenarios. Starting with classical wave equations, we quantize them and derive kinetic equations. Those are matching conditions for the distribution functions of particles at the interface. We note the time irreversibility of the derived kinetic equations -- an essential feature for accurately describing irreversible processes like heat transport. We identify the juncture in our derivation where the time symmetry of wave equations is disrupted, it is the assumption of the non-coherence of incident waves. Consequently, we infer that non-coherent transmission through the interface exhibits time irreversibility. We propose an experiment to validate this hypothesis.

翻訳日:2024-01-03 19:04:06 公開日:2023-12-29

# 共有経済の言語を探る:ドイツ語と英語のAirbnbに対する信頼の構築とプライバシー上の懸念を減らす

Exploring the language of the sharing economy: Building trust and reducing privacy concern on Airbnb in German and English ( http://arxiv.org/abs/2401.00058v1 )

ライセンス: Link先を確認

Alex Zarifis, Richard Ingham and Julia Kroenung

(参考訳) イングランドで英語、ドイツでドイツ語でプロパティを提供する人のプロフィールにあるテキストは、信頼が構築されているかどうかを調査するために比較され、プライバシーに関する懸念も同様に減少する。信頼構築の方法は,(1)形式性,(2)距離と近接性,(3)動機づけとユーモア,(4)主張的かつ受動的であること,(5)プラットフォーム言語スタイルと用語に適合すること,(6)境界を設定すること,の6つである。プライバシーの懸念は通常、プラットフォームに残されているため、直接的に軽減されることはない。その結果,言語の影響は限定的であり,プラットフォーム規範や習慣が最大の影響を受けていることが示唆された。

The text in the profile of those offering their properties in England in English and in Germany in German, are compared to explore whether trust is built, and privacy concerns are reduced in the same way. Six methods of building trust are used by the landlords: (1) the level of formality, (2) distance and proximity, (3) emotiveness and humor, (4) being assertive and passive aggressive, (5) conformity to the platform language style and terminology and (6) setting boundaries. Privacy concerns are not usually reduced directly as this is left to the platform. The findings indicate that language has a limited influence and the platform norms and habits are the biggest influence.

翻訳日:2024-01-03 19:03:55 公開日:2023-12-29

# コントラスト世界モデルの一般化特性

Generalization properties of contrastive world models ( http://arxiv.org/abs/2401.00057v1 )

ライセンス: Link先を確認

Kandan Ramakrishnan, R. James Cotton, Xaq Pitkow, Andreas S. Tolias

(参考訳) オブジェクト中心の世界モデルに関する最近の研究は、完全に教師なしまたは自己管理的な方法で、オブジェクトの観点で表現を分解することを目的としている。このような世界モデルは一般化問題に対処する重要な要素であると仮定されている。しかし、自己スーパービジョンでは性能が向上しているが、OODの一般化は体系的にも明示的にもテストされていない。本稿では、対照的世界モデルの一般化特性について広範な研究を行う。我々は、新しいオブジェクト属性への外挿、新しい結合や新しい属性の導入など、様々なOOD一般化シナリオの下で、モデルを体系的にテストする。実験の結果, 対照的な世界モデルでは, 異なるOODテストの下では一般化できず, サンプルがOODの程度によって性能が低下することがわかった。遷移の更新と畳み込みの特徴マップを視覚化すると、オブジェクトの属性の変化(以前は目に見えない色、形、色と形の組み合わせなど)が、オブジェクトの表現の分解を分解するのを観察する。我々の研究は、一般化のためのオブジェクト指向表現の重要性を強調し、現在のモデルは人間レベルの一般化に必要な表現を学ぶ能力に制限されている。

Recent work on object-centric world models aim to factorize representations in terms of objects in a completely unsupervised or self-supervised manner. Such world models are hypothesized to be a key component to address the generalization problem. While self-supervision has shown improved performance however, OOD generalization has not been systematically and explicitly tested. In this paper, we conduct an extensive study on the generalization properties of contrastive world model. We systematically test the model under a number of different OOD generalization scenarios such as extrapolation to new object attributes, introducing new conjunctions or new attributes. Our experiments show that the contrastive world model fails to generalize under the different OOD tests and the drop in performance depends on the extent to which the samples are OOD. When visualizing the transition updates and convolutional feature maps, we observe that any changes in object attributes (such as previously unseen colors, shapes, or conjunctions of color and shape) breaks down the factorization of object representations. Overall, our work highlights the importance of object-centric representations for generalization and current models are limited in their capacity to learn such representations required for human-level generalization.

翻訳日:2024-01-03 19:03:42 公開日:2023-12-29

# 集団行動によるオンラインアルゴリズムリコース

Online Algorithmic Recourse by Collective Action ( http://arxiv.org/abs/2401.00055v1 )

ライセンス: Link先を確認

Elliot Creager and Richard Zemel

(参考訳) アルゴリズムに関する研究は通常、固定された意思決定システムと対話する際に、個人が好ましくない自動決定を合理的に変更する方法を考える。本稿では,データ主体とのインタラクションに応じてシステムパラメータが動的に更新されるオンライン環境に着目した。一般的な個人レベルのリコースを超えて、オンライン設定では、パラメータ更新ルールを活用することで、グループがシステム決定を形作る新しい方法が開かれる。我々は,ユーザが協調して特徴の摂動を計算し,悪質な自動決定の緩和における集団的行動の重要性を強調することで,リコースが改善されることを示す。

Research on algorithmic recourse typically considers how an individual can reasonably change an unfavorable automated decision when interacting with a fixed decision-making system. This paper focuses instead on the online setting, where system parameters are updated dynamically according to interactions with data subjects. Beyond the typical individual-level recourse, the online setting opens up new ways for groups to shape system decisions by leveraging the parameter update rule. We show empirically that recourse can be improved when users coordinate by jointly computing their feature perturbations, underscoring the importance of collective action in mitigating adverse automated decisions.

翻訳日:2024-01-03 19:03:22 公開日:2023-12-29

# ChatEd:ChatGPTを活用した高等教育用チャットボット

ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education ( http://arxiv.org/abs/2401.00052v1 )

ライセンス: Link先を確認

Kevin Wang, Jason Ramos, Ramon Lawrence

(参考訳) 自然言語処理(NLP)の急速な進化に伴い、ChatGPTのような大規模言語モデル(LLM)は、様々な分野を変革できる強力なツールとして登場した。その膨大な知識ベースと動的相互作用能力は、パーソナライズされたアシスタントとして運営することで教育を改善する重要な可能性を示している。しかし,LLMを教育現場に展開する際には,誤った,偏見のある,あるいは不快な回答が生まれる可能性も大きな課題である。この研究は、ChatGPTの強みと従来の情報検索ベースのチャットボットフレームワークを組み合わせて、高等教育における学生支援を強化する革新的なアーキテクチャを導入する。私たちの経験的評価は、このアプローチの高い期待を裏付けています。

With the rapid evolution of Natural Language Processing (NLP), Large Language Models (LLMs) like ChatGPT have emerged as powerful tools capable of transforming various sectors. Their vast knowledge base and dynamic interaction capabilities represent significant potential in improving education by operating as a personalized assistant. However, the possibility of generating incorrect, biased, or unhelpful answers are a key challenge to resolve when deploying LLMs in an education context. This work introduces an innovative architecture that combines the strengths of ChatGPT with a traditional information retrieval based chatbot framework to offer enhanced student support in higher education. Our empirical evaluations underscore the high promise of this approach.

翻訳日:2024-01-03 19:03:11 公開日:2023-12-29

# 期待分割関数と連続最適化によるMessengerと非コーディングRNAの設計

Messenger and Non-Coding RNA Design via Expected Partition Function and Continuous Optimization ( http://arxiv.org/abs/2401.00037v1 )

ライセンス: Link先を確認

Ning Dai, Wei Yu Tang, Tianshuo Zhou, David H. Mathews, Liang Huang

(参考訳) メッセンジャーRNAと非コーディングRNAを設計するタスクは離散最適化の問題であり、これらの問題のいくつかのバージョンはNPハードである。一般的な局所探索法に代わるものとして,これらの問題を連続最適化として定式化し,「期待分割関数」という新しい概念に基づく最適化のための汎用フレームワークを開発する。基本的な考え方は、可能な全ての候補列にまたがる分布から始め、目的関数を系列から分布へと拡張することである。次に,勾配勾配に基づく最適化法を用いて拡張対象関数を改良し,分布は徐々に1つのホットシーケンス(すなわち1つのシーケンス)へと縮小する。この枠組みにおける2つの重要なケーススタディとして、分配関数(すなわちアンサンブル自由エネルギー)を最適化するmRNA設計問題と、条件付き(すなわちボルツマン)確率を最適化する非コーディングRNA設計問題を考える。いずれの場合も,本手法は有望な予備結果を示す。コードはhttps://github.com/kunyaa/rna_design_codebaseで利用可能です。

The tasks of designing messenger RNAs and non-coding RNAs are discrete optimization problems, and several versions of these problems are NP-hard. As an alternative to commonly used local search methods, we formulate these problems as continuous optimization and develop a general framework for this optimization based on a new concept of "expected partition function". The basic idea is to start with a distribution over all possible candidate sequences, and extend the objective function from a sequence to a distribution. We then use gradient descent-based optimization methods to improve the extended objective function, and the distribution will gradually shrink towards a one-hot sequence (i.e., a single sequence). We consider two important case studies within this framework, the mRNA design problem optimizing for partition function (i.e., ensemble free energy) and the non-coding RNA design problem optimizing for conditional (i.e., Boltzmann) probability. In both cases, our approach demonstrate promising preliminary results. We make our code available at https://github.com/KuNyaa/RNA_Design_codebase.

翻訳日:2024-01-03 19:03:00 公開日:2023-12-29

# 離散分布ネットワーク

Discrete Distribution Networks ( http://arxiv.org/abs/2401.00036v1 )

ライセンス: Link先を確認

Lei Yang

(参考訳) 本稿では,階層的離散分布を用いたデータ分布を近似する新しい生成モデルである離散分布ネットワーク(ddn)を提案する。ネットワーク内の機能は本質的に分布情報を含むため,単一出力からネットワークを解放し,複数のサンプルを同時に生成することは極めて有効である。したがって、ddnは複数の離散的なサンプルポイントを生成して、連続的な分布を含む目標分布に適合する。 DDNは、ターゲットデータのより詳細な情報をキャプチャするために、第1層で生成された粗い結果から、GTに最も近い出力を選択する。この選択された出力は、第2層の条件としてネットワークにフィードバックされ、GTに類似した新しい出力を生成する。 DDN層の数が増加するにつれて、出力の表現空間は指数関数的に拡大し、生成したサンプルはGTに近づきつつある。この離散分布の階層的な出力パターンはDDNに2つの興味深い性質を与える:高度に圧縮された表現とより一般的なゼロショット条件生成である。 CIFAR-10 および FFHQ における実験により,DDN の有効性とこれらの興味深い特性を実証した。

We introduce a novel generative model, the Discrete Distribution Networks (DDN), that approximates data distribution using hierarchical discrete distributions. We posit that since the features within a network inherently contain distributional information, liberating the network from a single output to concurrently generate multiple samples proves to be highly effective. Therefore, DDN fits the target distribution, including continuous ones, by generating multiple discrete sample points. To capture finer details of the target data, DDN selects the output that is closest to the Ground Truth (GT) from the coarse results generated in the first layer. This selected output is then fed back into the network as a condition for the second layer, thereby generating new outputs more similar to the GT. As the number of DDN layers increases, the representational space of the outputs expands exponentially, and the generated samples become increasingly similar to the GT. This hierarchical output pattern of discrete distributions endows DDN with two intriguing properties: highly compressed representation and more general zero-shot conditional generation. We demonstrate the efficacy of DDN and these intriguing properties through experiments on CIFAR-10 and FFHQ.

翻訳日:2024-01-03 19:02:42 公開日:2023-12-29

# 複雑力学系のモデルにおける構造誤差の学習

Learning About Structural Errors in Models of Complex Dynamical Systems ( http://arxiv.org/abs/2401.00035v1 )

ライセンス: Link先を確認

Jin-Long Wu, Matthew E. Levine, Tapio Schneider, Andrew Stuart

(参考訳) 複雑な力学系は、いくつかの自由度(例えば、小さなスケール)が計算的に解決できない、あるいは完全に理解されていないため、モデル化が難しいことが知られている。例えば、雲の動力学と液滴形成の小さなスケールは気候の制御に不可欠であるが、地球規模の気候モデルでは解決できない。未解決の自由度の影響に対する半経験的閉包モデルはしばしば存在し、重要なドメイン固有の知識をエンコードする。このようなクロージャモデルの構築と構造的エラーの学習による修正は、ドメイン知識とデータを融合する効果的な方法になり得る。ここでは,構造的エラーについて学ぶための一般的なアプローチ,原則,アルゴリズムについて述べる。このアプローチの鍵となるのは、例えば未解決スケールのクロージャモデルにおいて、複雑なシステムのモデル内に構造的エラーモデルを含めることです。構造誤差は通常非線形に観測可能なデータにマップされる。しかしながら、モデル出力とデータ間のミスマッチは、ラベル付き入力ペアの欠如と構造誤差モデルの出力不足により、構造誤差について間接的にのみ通知される。さらに、モデルの微分は存在せず、容易に利用することができる。構造的エラーモデルについて,デリバティブフリーなカルマン逆アルゴリズムと変種を用いて間接データから学習する方法,スパーシティ制約が「害を及ぼさない」原理をいかに強制するか,構造的エラーのモデル化方法について論じる。また,非局所的および確率的誤差モデルを用いるメリットについても考察する。さらに,データ同化技術が非エルゴディックシステムにおける構造的誤りの学習を支援することを示す。概念とアルゴリズムは、Lorenz-96システムとヒトグルコース-インスリンモデルに基づく2つの数値例で示される。

Complex dynamical systems are notoriously difficult to model because some degrees of freedom (e.g., small scales) may be computationally unresolvable or are incompletely understood, yet they are dynamically important. For example, the small scales of cloud dynamics and droplet formation are crucial for controlling climate, yet are unresolvable in global climate models. Semi-empirical closure models for the effects of unresolved degrees of freedom often exist and encode important domain-specific knowledge. Building on such closure models and correcting them through learning the structural errors can be an effective way of fusing data with domain knowledge. Here we describe a general approach, principles, and algorithms for learning about structural errors. Key to our approach is to include structural error models inside the models of complex systems, for example, in closure models for unresolved scales. The structural errors then map, usually nonlinearly, to observable data. As a result, however, mismatches between model output and data are only indirectly informative about structural errors, due to a lack of labeled pairs of inputs and outputs of structural error models. Additionally, derivatives of the model may not exist or be readily available. We discuss how structural error models can be learned from indirect data with derivative-free Kalman inversion algorithms and variants, how sparsity constraints enforce a "do no harm" principle, and various ways of modeling structural errors. We also discuss the merits of using non-local and/or stochastic error models. In addition, we demonstrate how data assimilation techniques can assist the learning about structural errors in non-ergodic systems. The concepts and algorithms are illustrated in two numerical examples based on the Lorenz-96 system and a human glucose-insulin model.

翻訳日:2024-01-03 19:02:23 公開日:2023-12-29

# バンド内遷移による強超高速消磁

Strong ultrafast demagnetization due to the intraband transitions ( http://arxiv.org/abs/2401.00099v1 )

ライセンス: Link先を確認

Mitsuko Murakami and G. P. Zhang

(参考訳) フェムト秒レーザーパルスによる強磁性遷移金属の脱磁は固体物理学における根本的な問題であり、スピントロニクスデバイスの開発にはその理解が不可欠である。速度ゲージにおける時間依存磁気モーメントのab initio計算は、実験で観測された大量の消磁を再現することには成功していない。本研究では,結晶運動量空間内の対流微分を通じてバンド内遷移を速度ゲージ内に組み込む手法を提案する。時間依存性の量子Liouville方程式に基づく遷移元素バルク結晶(bccFe,hcpCo,fccNi)に対する実験結果から,バンド内項の挿入後の非磁性化量の劇的な増大が得られた。また,各強磁性材料へのバンド内遷移の効果は,バンド構造とスピン特性の違いにより大きく異なることがわかった。我々の発見は超高速磁化の理解に大きく影響している。

Demagnetization in ferromagnetic transition metals driven by a femtosecond laser pulse is a fundamental problem in solid state physics, and its understanding is essential to the development of spintronics devices. Ab initio calculation of time-dependent magnetic moment in the velocity gauge so far has not been successful in reproducing the large amount of demagnetization observed in experiments. In this work, we propose a method to incorporate intraband transitions within the velocity gauge through a convective derivative in the crystal momentum space. Our results for transition-element bulk crystals (bcc Fe, hcp Co and fcc Ni) based on the time-dependent quantum Liouville equation show a dramatic enhancement in the amount of demagnetization after the inclusion of an intraband term, in agreement with experiments. We also find that the effect of intraband transitions to each ferromagnetic material is distinctly different because of their band structure and spin property differences. Our finding has a far-reaching impact on understanding of ultrafast demagnetization.

翻訳日:2024-01-03 18:52:53 公開日:2023-12-29

# ブラジルのシナリオにおける自動評価

Automatic Essay Scoring in a Brazilian Scenario ( http://arxiv.org/abs/2401.00095v1 )

ライセンス: Link先を確認

Felipe Akio Matsuoka

(参考訳) 本稿では,ブラジルのExame Nacional do Ensino M\'edio(ENEM)のポルトガル語エッセイに合わせた,AES(Automatic Essay Scoring)アルゴリズムを提案する。提案手法は,高度な深層学習技術を活用して,学生エッセイの大量評価における効率性とスケーラビリティを目標とした,人間の評価基準に忠実に整合する。この研究はブラジルの教育アセスメントにおける手動採点の物流的および財政的な制約に応えるだけでなく、スコアリングの公平性と一貫性を高めることを約束しており、大規模な学術分野におけるaesの適用において大きな一歩を踏み出した。

This paper presents a novel Automatic Essay Scoring (AES) algorithm tailored for the Portuguese-language essays of Brazil's Exame Nacional do Ensino M\'edio (ENEM), addressing the challenges in traditional human grading systems. Our approach leverages advanced deep learning techniques to align closely with human grading criteria, targeting efficiency and scalability in evaluating large volumes of student essays. This research not only responds to the logistical and financial constraints of manual grading in Brazilian educational assessments but also promises to enhance fairness and consistency in scoring, marking a significant step forward in the application of AES in large-scale academic settings.

翻訳日:2024-01-03 18:52:35 公開日:2023-12-29

# 言語に基づく物体検出訓練のための強化否定値の生成

Generating Enhanced Negatives for Training Language-Based Object Detectors ( http://arxiv.org/abs/2401.00094v1 )

ライセンス: Link先を確認

Shiyu Zhao, Long Zhao, Vijay Kumar B.G, Yumin Suh, Dimitris N. Metaxas, Manmohan Chandraker, Samuel Schulter

(参考訳) 言語ベースのopen-vocabulary object detectionの最近の進歩は、フリーフォームのテキストアノテーションで大規模データを活用するより良い方法を見つけることに起因する。このようなモデルを識別的目的関数で訓練することは成功したが、良い正と負のサンプルを必要とする。しかし、自由形式の性質と対象記述の開語彙は、負の空間を極端に大きくする。事前の作業はランダムに負をサンプリングするか、ルールベースのテクニックを使って構築する。対照的に、我々は、現代の生成モデルに組み込まれた膨大な知識を活用して、元のデータにより関連性のあるネガティブを自動構築することを提案する。具体的には,大きな言語モデルを用いて負のテキスト記述を生成し,テキストから画像への拡散モデルを用いて対応する負のイメージを生成する。実験分析により,生成した負データとの関連性が確認され,言語ベースの検出器での利用により,2つの複雑なベンチマークの性能が向上した。

The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations. Training such models with a discriminative objective function has proven successful, but requires good positive and negative samples. However, the free-form nature and the open vocabulary of object descriptions make the space of negatives extremely large. Prior works randomly sample negatives or use rule-based techniques to build them. In contrast, we propose to leverage the vast knowledge built into modern generative models to automatically build negatives that are more relevant to the original data. Specifically, we use large-language-models to generate negative text descriptions, and text-to-image diffusion models to also generate corresponding negative images. Our experimental analysis confirms the relevance of the generated negative data, and its use in language-based detectors improves performance on two complex benchmarks.

翻訳日:2024-01-03 18:52:19 公開日:2023-12-29

# 配車システムにおけるフェアネスエンハンシング車両のリバランス

Fairness-Enhancing Vehicle Rebalancing in the Ride-hailing System ( http://arxiv.org/abs/2401.00093v1 )

ライセンス: Link先を確認

Xiaotong Guo, Hanyong Xu, Dingyi Zhuang, Yunhan Zheng, Jinhua Zhao

(参考訳) 配車産業の急速な成長は、世界中の都市交通に革命をもたらした。利益があるにも拘わらず、未整備のコミュニティは安価な配車サービスへのアクセシビリティが限られているため、株式の懸念が生じる。この文脈で大きな問題は、アイドル車両が需要が予想される地域に移動される、車両のリバランス問題である。需要予測と再バランス戦略の公平なアプローチがなければ、これらのプラクティスは既存の不平等をさらに深めることができる。配車サービスの世界では、アルゴリズムの公正性、ドライバーへの公正性、ライダーへの公正性の3つの主な面が認識されている。本稿では,新しい車両リバランス法を用いて,アルゴリズムとライダーの公平性の向上に着目する。本稿では,需要予測のための社会認識型時空間グラフ畳み込みネットワーク(sa-stgcn)と,それに続く車両リバランスのための公平性統合マッチング統合車両リバランスモデル(mivr)を組み合わせたアプローチを提案する。本手法は, 予測の不一致を低減し, 多様な地域におけるサービス提供の適正化を図る。本システムの有効性を実世界の配車データに基づくシミュレーションを用いて評価する。提案手法は配車需要の予測における正確性と公平性を両立させ,その後の運転においてより公平な車両再バランスを実現することを示唆する。具体的には,本研究で開発したアルゴリズムにより,標準偏差と平均顧客待ち時間をそれぞれ6.48%,0.49%削減した。この成果は配車プラットフォームにとって有益な成果であり、運用効率と公平さのバランスを保っている。

The rapid growth of the ride-hailing industry has revolutionized urban transportation worldwide. Despite its benefits, equity concerns arise as underserved communities face limited accessibility to affordable ride-hailing services. A key issue in this context is the vehicle rebalancing problem, where idle vehicles are moved to areas with anticipated demand. Without equitable approaches in demand forecasting and rebalancing strategies, these practices can further deepen existing inequities. In the realm of ride-hailing, three main facets of fairness are recognized: algorithmic fairness, fairness to drivers, and fairness to riders. This paper focuses on enhancing both algorithmic and rider fairness through a novel vehicle rebalancing method. We introduce an approach that combines a Socio-Aware Spatial-Temporal Graph Convolutional Network (SA-STGCN) for refined demand prediction and a fairness-integrated Matching-Integrated Vehicle Rebalancing (MIVR) model for subsequent vehicle rebalancing. Our methodology is designed to reduce prediction discrepancies and ensure equitable service provision across diverse regions. The effectiveness of our system is evaluated using simulations based on real-world ride-hailing data. The results suggest that our proposed method enhances both accuracy and fairness in forecasting ride-hailing demand, ultimately resulting in more equitable vehicle rebalancing in subsequent operations. Specifically, the algorithm developed in this study effectively reduces the standard deviation and average customer wait times by 6.48% and 0.49%, respectively. This achievement signifies a beneficial outcome for ride-hailing platforms, striking a balance between operational efficiency and fairness.

翻訳日:2024-01-03 18:52:03 公開日:2023-12-29

# アクティブラーニングフレームワークにおける政策管理コストの定量化

Quantifying Policy Administration Cost in an Active Learning Framework ( http://arxiv.org/abs/2401.00086v1 )

ライセンス: Link先を確認

Si Zhang and Philip W. L. Fong

(参考訳) 本稿では,政策管理のための計算モデルを提案する。組織が進化するにつれて、新しいユーザとリソースはアクセス制御モデルの仲介の下に徐々に置かれる。こうした新たなエンティティを追加する度に、ポリシー管理者は、新しい現実を反映してアクセス制御ポリシーをどう修正するかを熟考しなければならない。適切に設計されたアクセス制御モデルは、組織の規模が大きくなると管理コストが禁じられないように、このような変化を予測しなければならない。残念ながら、過去のアクセス制御の研究は、政策管理のコストを定量化する公式な方法を提供していない。本研究では,現在進行中の政策管理を活発な学習フレームワークでモデル化することを提案する。管理コストはクエリの複雑さの観点から定量化できる。保護領域の進化に応用することで,このアプローチの有用性を実証する。また、さまざまな政策管理戦略をフレームワークでモデル化しました。これにより、ポリシーが進化するときにヒューリスティックな推論を使用することにより、ドメインベースのポリシーがアクセス制御行列よりもコスト面で有利であることを正式に示せるようになりました。我々の知る限り、これは、政策検討のコストを調査し、ヒューリスティックな政策管理のコスト優位性を示すために、アクティブな学習フレームワークを使用する最初の試みである。

This paper proposes a computational model for policy administration. As an organization evolves, new users and resources are gradually placed under the mediation of the access control model. Each time such new entities are added, the policy administrator must deliberate on how the access control policy shall be revised to reflect the new reality. A well-designed access control model must anticipate such changes so that the administration cost does not become prohibitive when the organization scales up. Unfortunately, past Access Control research does not offer a formal way to quantify the cost of policy administration. In this work, we propose to model ongoing policy administration in an active learning framework. Administration cost can be quantified in terms of query complexity. We demonstrate the utility of this approach by applying it to the evolution of protection domains. We also modelled different policy administration strategies in our framework. This allowed us to formally demonstrate that domain-based policies have a cost advantage over access control matrices because of the use of heuristic reasoning when the policy evolves. To the best of our knowledge, this is the first work to employ an active learning framework to study the cost of policy deliberation and demonstrate the cost advantage of heuristic policy administration.

翻訳日:2024-01-03 18:51:34 公開日:2023-12-29

# 物質波の交叉ウィグナー関数によるグーイ位相と量子干渉

Gouy phase and quantum interference with cross-Wigner functions for matter-waves ( http://arxiv.org/abs/2401.00083v1 )

ライセンス: Link先を確認

Lucas S. Marinho, Pedro R. Dieguez, Carlos H. S. Vieira and Irismar G. da Paz

(参考訳) グーイ位相は、古典電磁波から物質波、量子光学まで、様々な波動現象を正確に記述するのに必須である。本研究では,相互ウィグナー変換に基づく位相空間法を用いて,相関ガウス波パケットによって特徴付けられる物質波の進化における空間的および時間的干渉を分析する。第一に,初期関数の交叉と自由進化,第二に二重スリット配置による進化を考える。グローバルなグーイ位相を得る波動関数と異なり、クロスウィグナーは進化時間が異なるため、グーイ位相差を取得する。その結果, 時間的相同性相は時間的干渉の正確な説明に重要であることが示唆された。さらに,物質波を用いた二重スリット実験において,空間強度干渉項からクロスウィグナーを再構成するウィグナー関数に基づく手法を提案する。

The Gouy phase is essential for accurately describing various wave phenomena, ranging from classical electromagnetic waves to matter waves and quantum optics. In this work, we employ phase-space methods based on the cross-Wigner transformation to analyze spatial and temporal interference in the evolution of matter waves characterized initially by a correlated Gaussian wave packet. First, we consider the cross-Wigner of the initial function with its free evolution, and second for the evolution through a double-slit arrangement. Different from the wave function which acquires a global Gouy phase, we find that the cross-Wigner acquires a Gouy phase difference due to different evolution times. The results suggest that temporal like-Gouy phases are important for an accurate description of temporal interference. Furthermore, we propose a technique based on the Wigner function to reconstruct the cross-Wigner from the spatial intensity interference term in a double-slit experiment with matter waves.

翻訳日:2024-01-03 18:51:17 公開日:2023-12-29

# 金融における合成データ応用

Synthetic Data Applications in Finance ( http://arxiv.org/abs/2401.00081v1 )

ライセンス: Link先を確認

Vamsi K. Potluru, Daniel Borrajo, Andrea Coletta, Niccol\`o Dalmasso, Yousef El-Laham, Elizabeth Fons, Mohsen Ghassemi, Sriram Gopalakrishnan, Vikesh Gosai, Eleonora Krea\v{c}i\'c, Ganapathy Mani, Saheed Obitayo, Deepak Paramanand, Natraj Raman, Mikhail Solonin, Srijan Sood, Svitlana Vyetrenko, Haibei Zhu, Manuela Veloso, Tucker Balch

(参考訳) 合成データは、金融、ヘルスケア、バーチャルリアリティーなど、さまざまな商用環境で大きな進歩を遂げている。本稿では、金融セクターにおける合成データのプロトタイプ的応用について概観する。これらは、表表、時系列、イベントシリーズ、および市場および小売金融アプリケーションの両方から生じる非構造化を含む、さまざまなデータモダリティをカバーする。金融は高度に規制された産業であるため、合成データはプライバシー、公正性、説明可能性に関連する問題を扱うための潜在的アプローチである。これらのアプリケーションにおける我々のアプローチの品質と有効性を評価するために、様々な指標が利用されます。金融分野の文脈において,合成データのオープンな方向性で結論づける。

Synthetic data has made tremendous strides in various commercial settings including finance, healthcare, and virtual reality. We present a broad overview of prototypical applications of synthetic data in the financial sector and in particular provide richer details for a few select ones. These cover a wide variety of data modalities including tabular, time-series, event-series, and unstructured arising from both markets and retail financial applications. Since finance is a highly regulated industry, synthetic data is a potential approach for dealing with issues related to privacy, fairness, and explainability. Various metrics are utilized in evaluating the quality and effectiveness of our approaches in these applications. We conclude with open directions in synthetic data in the context of the financial domain.

翻訳日:2024-01-03 18:51:03 公開日:2023-12-29

# スポーツシナリオにおける大規模再識別分析 : 臨界点獲得のベテラル

A Large-Scale Re-identification Analysis in Sporting Scenarios: the Betrayal of Reaching a Critical Point ( http://arxiv.org/abs/2401.00080v1 )

ライセンス: Link先を確認

David Freire-Obreg\'on, Javier Lorenzo-Navarro, Oliverio J. Santana, Daniel Hern\'andez-Sosa, Modesto Castrill\'on-Santana

(参考訳) 遠距離ランニング競技の参加者を再特定することは、広範囲にわたる距離と絶えず変化する地形のために悩まされることがある。これらの課題を克服するために、ランナーの顔、バイブの数字、衣服を分析するコンピュータビジョン技術が開発されている。しかし,本研究では,様々な事前訓練されたヒト行動認識モデルと損失関数を活用することで,走者再識別(re-ID)のための歩行に基づく新しいアプローチを提案する。提案手法は,超遠距離競技におけるランナーの再識別に有望な結果をもたらすことを示す。さらに, 選手が持久限界に近づいているとき, 異なる人体運動の意義と, リid精度への影響について検討した。本研究は,ランナーの歩行の認識が,激しい疲労の瞬間として定義される競技の臨界点(cp)と,その位置から数km離れたフィニッシュラインが見えてくる地点によってどのように影響を受けるかを検討したものである。このCPがアスリートのリIDの精度をいかに向上させるかを検討することを目的とする。実験の結果,運動選手がアプローチするにつれて,歩行認識が著しく向上する(最大9%のマップ増加)ことが判明した。これは、遠距離競技や長距離監視タスクなど、現実世界のシナリオで歩行認識を利用する可能性を強調している。

Re-identifying participants in ultra-distance running competitions can be daunting due to the extensive distances and constantly changing terrain. To overcome these challenges, computer vision techniques have been developed to analyze runners' faces, numbers on their bibs, and clothing. However, our study presents a novel gait-based approach for runners' re-identification (re-ID) by leveraging various pre-trained human action recognition (HAR) models and loss functions. Our results show that this approach provides promising results for re-identifying runners in ultra-distance competitions. Furthermore, we investigate the significance of distinct human body movements when athletes are approaching their endurance limits and their potential impact on re-ID accuracy. Our study examines how the recognition of a runner's gait is affected by a competition's critical point (CP), defined as a moment of severe fatigue and the point where the finish line comes into view, just a few kilometers away from this location. We aim to determine how this CP can improve the accuracy of athlete re-ID. Our experimental results demonstrate that gait recognition can be significantly enhanced (up to a 9% increase in mAP) as athletes approach this point. This highlights the potential of utilizing gait recognition in real-world scenarios, such as ultra-distance competitions or long-duration surveillance tasks.

翻訳日:2024-01-03 18:50:51 公開日:2023-12-29

# 神経科学研究における操作の成熟度モデル

A Maturity Model for Operations in Neuroscience Research ( http://arxiv.org/abs/2401.00077v1 )

ライセンス: Link先を確認

Erik C. Johnson, Thinh T. Nguyen, Benjamin K. Dichter, Frank Zappulla, Montgomery Kosma, Kabilar Gunalan, Yaroslav O. Halchenko, Shay Q. Neufeld, Michael Schirner, Petra Ritter, Maryann E. Martone, Brock Wester, Franco Pestilli, Dimitri Yatsenko

(参考訳) 科学者は活動と目標を拡大するために新しいアプローチを採用しています。神経技術、人工知能、自動化、コラボレーションツールの進歩は、新たな発見を約束する。しかし、他の分野や産業と比較して、神経科学研究所はコラボレーション、再現性、自動化をサポートする主要な技術を採用するのが遅かった。様々な研究チームに対して,自動化された研究ワークフローを実現するためのロードマップを策定する。神経科学研究における5段階能力成熟モデルの構築を提案する。高いレベルの運用成熟を達成するには、新たなテクノロジ対応の方法論が必要です。成熟度モデルは、多分野の神経科学チームにおけるオペレーションの評価とアップグレードのためのガイドラインを提供する。

Scientists are adopting new approaches to scale up their activities and goals. Progress in neurotechnologies, artificial intelligence, automation, and tools for collaboration promises new bursts of discoveries. However, compared to other disciplines and the industry, neuroscience laboratories have been slow to adopt key technologies to support collaboration, reproducibility, and automation. Drawing on progress in other fields, we define a roadmap for implementing automated research workflows for diverse research teams. We propose establishing a five-level capability maturity model for operations in neuroscience research. Achieving higher levels of operational maturity requires new technology-enabled methodologies, which we describe as ``SciOps''. The maturity model provides guidelines for evaluating and upgrading operations in multidisciplinary neuroscience teams.

翻訳日:2024-01-03 18:50:23 公開日:2023-12-29

# サイバーセキュリティにおける説明可能な機械学習のためのテンソルネットワーク

Tensor Networks for Explainable Machine Learning in Cybersecurity ( http://arxiv.org/abs/2401.00867v1 )

ライセンス: Link先を確認

Borja Aizpurua, Roman Orus

(参考訳) 本稿では,テンソルネットワークが機械学習アルゴリズムの解法開発にどのように役立つかを示す。具体的には,行列積状態(mps)に基づく教師なしクラスタリングアルゴリズムを開発し,敵生成脅威インテリジェンスの実際のユースケースに適用する。我々の調査は、MPSがオートエンコーダやGANといった従来のディープラーニングモデルと性能面で競合し、よりリッチなモデル解釈能力を提供することを示した。我々のアプローチは、機能的確率、フォン・ノイマンのエントロピー、および相互情報の抽出を自然に促進し、異常の分類のための説得力のある物語を提供し、前例のないレベルの透明性と解釈可能性を促進する。

In this paper we show how tensor networks help in developing explainability of machine learning algorithms. Specifically, we develop an unsupervised clustering algorithm based on Matrix Product States (MPS) and apply it in the context of a real use-case of adversary-generated threat intelligence. Our investigation proves that MPS rival traditional deep learning models such as autoencoders and GANs in terms of performance, while providing much richer model interpretability. Our approach naturally facilitates the extraction of feature-wise probabilities, Von Neumann Entropy, and mutual information, offering a compelling narrative for classification of anomalies and fostering an unprecedented level of transparency and interpretability, something fundamental to understand the rationale behind artificial intelligence decisions.

翻訳日:2024-01-03 15:36:38 公開日:2023-12-29

# リニアオーバーパラメータ化によるプレナードネットワークのブースティング

Boosting Pruned Networks with Linear Over-parameterization ( http://arxiv.org/abs/2204.11444v3 )

ライセンス: Link先を確認

Yu Qian, Jian Cao, Xiaoshuang Li, Jie Zhang, Hufei Li, Jue Chen

(参考訳) 構造化プルーニングは、高速な推論のためのチャネル(フィルタ)を減らし、実行時にフットプリントを低くすることでニューラルネットワークを圧縮する。プルーニング後の精度を回復するため、細調整は通常、プルーニングネットワークに適用される。しかし、刈り取られたネットワークに残されているパラメータが少なすぎると、精度を回復するための微調整が困難になる。この課題に対処するため,我々は,まず,刈り込みネットワーク内のコンパクト層を線形に過度にパラメータ化して,微調整パラメータの数を拡大し,さらに微調整後に元の層に再パラメータ化する手法を提案する。具体的には、現在の出力特徴写像を変更しない連続的な畳み込み/直線層を複数有する畳み込み/直線層を等価に拡張する。さらに, 類似性保存知識蒸留を利用して, 過パラメータ化ブロックが対応する高密度層の即時データ-データ類似性を学習し, 特徴学習能力を維持する。提案手法は,CIFAR-10とImageNetで総合的に評価され,バニラ微調整戦略,特に大きな刈り取り率に優れていた。

Structured pruning compresses neural networks by reducing channels (filters) for fast inference and low footprint at run-time. To restore accuracy after pruning, fine-tuning is usually applied to pruned networks. However, too few remaining parameters in pruned networks inevitably bring a great challenge to fine-tuning to restore accuracy. To address this challenge, we propose a novel method that first linearly over-parameterizes the compact layers in pruned networks to enlarge the number of fine-tuning parameters and then re-parameterizes them to the original layers after fine-tuning. Specifically, we equivalently expand the convolution/linear layer with several consecutive convolution/linear layers that do not alter the current output feature maps. Furthermore, we utilize similarity-preserving knowledge distillation that encourages the over-parameterized block to learn the immediate data-to-data similarities of the corresponding dense layer to maintain its feature learning ability. The proposed method is comprehensively evaluated on CIFAR-10 and ImageNet which significantly outperforms the vanilla fine-tuning strategy, especially for large pruning ratio.

翻訳日:2024-01-03 03:35:10 公開日:2023-12-29

# 知識発見のための事前情報を用いた次元削減

Dimension Reduction with Prior Information for Knowledge Discovery ( http://arxiv.org/abs/2111.13646v4 )

ライセンス: Link先を確認

Anh Tuan Bui

(参考訳) 本稿では,高次元データを低次元空間にマッピングする問題を,他の既知の特徴の存在下で解決する。この問題は、ほとんどのアプリケーションによく制御可能/測定可能な機能があるため、科学や工学においてユビキタスである。この問題を解決するため,本稿では,条件付き多次元スケーリング (conditional multidimensional scaling, mds) と呼ばれる幅広い手法を提案する。また,条件付きMDSの目的関数を最適化するアルゴリズムを開発した。このアルゴリズムの収束は穏やかな仮定の下で証明される。条件付きMDSは、親族関係用語、表情、織物、カーブランド認識、シリンダー加工の例で説明される。これらの例は, 従来の次元減少に対する条件付きMDSの利点を示し, 次元削減空間の推定品質を改善し, 可視化と知識発見タスクを簡素化した。この作業用のコンピュータコードは、オープンソースcml Rパッケージで利用可能である。

This paper addresses the problem of mapping high-dimensional data to a low-dimensional space, in the presence of other known features. This problem is ubiquitous in science and engineering as there are often controllable/measurable features in most applications. To solve this problem, this paper proposes a broad class of methods, which is referred to as conditional multidimensional scaling (MDS). An algorithm for optimizing the objective function of conditional MDS is also developed. The convergence of this algorithm is proven under mild assumptions. Conditional MDS is illustrated with kinship terms, facial expressions, textile fabrics, car-brand perception, and cylinder machining examples. These examples demonstrate the advantages of conditional MDS over conventional dimension reduction in improving the estimation quality of the reduced-dimension space and simplifying visualization and knowledge discovery tasks. Computer codes for this work are available in the open-source cml R package.

翻訳日:2024-01-03 03:32:21 公開日:2023-12-29

# 線形計画のための確率的原始双対法の近似最適線形収束

Nearly Optimal Linear Convergence of Stochastic Primal-Dual Methods for Linear Programming ( http://arxiv.org/abs/2111.05530v3 )

ライセンス: Link先を確認

Haihao Lu, Jinwen Yang

(参考訳) 近年,線形プログラミング(LP)における一階法への関心が高まっている。本稿では,lpのような鋭いプライマル・デュアル問題を解くために分散低減と再スタートを用いた確率的アルゴリズムを提案する。提案手法は,鋭いインスタンスを高い確率で解くための線形収束率を示す。さらに,非制約双線形問題に対する効率的な座標ベースの確率オラクルを提案し,これは反復コストが$\mathcal O(1)$であり,既存の決定論的および確率論的アルゴリズムの複雑さを改善する。最後に、得られた線形収束率は、幅広い確率的原始双対法に対してほぼ最適($\log$ 項まで)であることが示される。

There is a recent interest on first-order methods for linear programming (LP). In this paper,we propose a stochastic algorithm using variance reduction and restarts for solving sharp primal-dual problems such as LP. We show that the proposed stochastic method exhibits a linear convergence rate for solving sharp instances with a high probability. In addition, we propose an efficient coordinate-based stochastic oracle for unconstrained bilinear problems, which has $\mathcal O(1)$ per iteration cost and improves the complexity of the existing deterministic and stochastic algorithms. Finally, we show that the obtained linear convergence rate is nearly optimal (upto $\log$ terms) for a wide class of stochastic primal dual methods.

翻訳日:2024-01-03 03:32:05 公開日:2023-12-29

# 課金するか、売るか? LSTM, CNN, オートエンコーダによるEVパックの有効寿命推定

To Charge or to Sell? EV Pack Useful Life Estimation via LSTMs, CNNs, and Autoencoders ( http://arxiv.org/abs/2110.03585v2 )

ライセンス: Link先を確認

Michael Bosello, Carlo Falcomer, Claudio Rossi, Giovanni Pau

(参考訳) 電気自動車(EV)は、より良い性能と快適性を提供することを約束しながら急速に普及している。彼らの成功にもかかわらず、そのコストは依然として課題である。リチウムイオン電池は最も高価なev部品の1つであり、様々な用途でエネルギー貯蔵の標準となっている。バッテリーパックの有効寿命(RUL)を正確に見積もると、再利用が促進され、EVのコストが削減され、持続可能性が改善される。電池パックの残留市場値を定量化するために、正しいRUL推定を用いることができる。そして、顧客は、その価値がまだあるとき、すなわち、ターゲットアプリケーションの寿命が終わる前に、バッテリーを売ることを決定できるので、安全性と信頼性を損なうことなく、第2のドメインで再利用することができる。本稿では,Liイオン電池のRUL(LSTMとオートエンコーダ対CNNとオートエンコーダ)を推定するための2つのディープラーニング手法を提案し,比較する。オートエンコーダは有用な特徴を抽出するために使用され、その後のネットワークはRULを推定するために使用される。これまでの文献で提案されているものと比較して,本手法が実際にデプロイされたアプリケーションに適用可能であることを保証するための対策を講じている。このような対策としては,(1)非可測変数を入力として使用するのを避けること,(2)広い変数と異なる条件の適切なデータセットを使用すること,(3)サイクル数ではなく残時間を予測すること,などがあげられる。その結果,提案手法は,ばらつきの多い多数の電池からなるデータセットを一般化できることがわかった。

Electric vehicles (EVs) are spreading fast as they promise to provide better performance and comfort, but above all, to help face climate change. Despite their success, their cost is still a challenge. Lithium-ion batteries are one of the most expensive EV components, and have become the standard for energy storage in various applications. Precisely estimating the remaining useful life (RUL) of battery packs can encourage their reuse and thus help to reduce the cost of EVs and improve sustainability. A correct RUL estimation can be used to quantify the residual market value of the battery pack. The customer can then decide to sell the battery when it still has a value, i.e., before it exceeds the end of life of the target application, so it can still be reused in a second domain without compromising safety and reliability. This paper proposes and compares two deep learning approaches to estimate the RUL of Li-ion batteries: LSTM and autoencoders vs. CNN and autoencoders. The autoencoders are used to extract useful features, while the subsequent network is then used to estimate the RUL. Compared to what has been proposed so far in the literature, we employ measures to ensure the method's applicability in the actual deployed application. Such measures include (1) avoiding using non-measurable variables as input, (2) employing appropriate datasets with wide variability and different conditions, and (3) predicting the remaining ampere-hours instead of the number of cycles. The results show that the proposed methods can generalize on datasets consisting of numerous batteries with high variance.

翻訳日:2024-01-03 03:31:29 公開日:2023-12-29

# ベントニックAUV調査計画のための特徴空間探査

Feature Space Exploration For Planning Initial Benthic AUV Surveys ( http://arxiv.org/abs/2105.11598v2 )

ライセンス: Link先を確認

Jackson Shields, Oscar Pizarro, Stefan B. Williams

(参考訳) 特別目的自律水中車両(AUV)は、海底の光学画像を収集するベントニック(海底)調査に使用される。カメラのフットプリントが小さく、調査対象地域が広いため、これらのAUVは数万平方メートルを超える領域の完全なカバレッジ画像を収集できない。そのため,AUV経路のサンプル採取は少ないが,効果的に行う必要がある。広帯域の音響浴量測定データは広い範囲で容易に利用でき、しばしば海底覆いに先立って有用である。そのため、AUVデータ収集のガイドには、事前の浴量測定が使用できる。本研究は,多種多様な水浴場から試料を採取するために,水浴計の特徴空間表現を効率的に探索する初期auvサーベイの計画手法を提案する。これにより、AUVは独自の生息地を含む可能性があり、調査地域全体を代表する地域を訪問できる。本稿では,機能空間探索の報奨,フリーフォームパスの計画,サーベイテンプレートの配置を最適化するための情報収集プランナーを提案する。これらの手法のAUV調査計画への適合性は,特徴空間のカバレッジと,初期潜水時のベント性生息地の全クラスへの訪問能力に基づいて評価される。 RRT(Rapidly-Expanding Random Trees)とMCTS(Monte-Carlo Tree Search)に基づくインフォームティブプランナーが最も有効であることがわかった。これは、初期潜水の有用性を高めるため、AUV調査にとって貴重なツールである。また、音響浴量測定と視覚由来の海底分類の関係を学習するための総合的なトレーニングセットも提供する。

Special-purpose Autonomous Underwater Vehicles (AUVs) are utilised for benthic (seafloor) surveys, where the vehicle collects optical imagery of the seafloor. Due to the small-sensor footprint of the cameras and the vast areas to be surveyed, these AUVs can not feasibly collect full coverage imagery of areas larger than a few tens of thousands of square meters. Therefore it is necessary for AUV paths to sample the surveys areas sparsely, yet effectively. Broad-scale acoustic bathymetric data is readily available over large areas, and is often a useful prior of seafloor cover. As such, prior bathymetry can be used to guide AUV data collection. This research proposes methods for planning initial AUV surveys that efficiently explore a feature space representation of the bathymetry, in order to sample from a diverse set of bathymetric terrain. This will enable the AUV to visit areas that likely contain unique habitats and are representative of the entire survey site. We propose several information gathering planners that utilise a feature space exploration reward, to plan freeform paths or to optimise the placement of a survey template. The suitability of these methods to plan AUV surveys is evaluated based on the coverage of the feature space and also the ability to visit all classes of benthic habitat on the initial dive. Informative planners based on Rapidly-expanding Random Trees (RRT) and Monte-Carlo Tree Search (MCTS) were found to be the most effective. This is a valuable tool for AUV surveys as it increases the utility of initial dives. It also delivers a comprehensive training set to learn a relationship between acoustic bathymetry and visually-derived seafloor classifications.

翻訳日:2024-01-03 03:30:59 公開日:2023-12-29

# ガウス雑音を持つ行列:特異部分空間摂動の最適推定

Matrices with Gaussian noise: optimal estimates for singular subspace perturbation ( http://arxiv.org/abs/1803.00679v3 )

ライセンス: Link先を確認

Sean O'Rourke and Van Vu and Ke Wang

(参考訳) Davis-Kahan-Wedin $\sin \Theta$定理は、行列の特異部分空間が小さな摂動を受けるとどのように変化するかを記述する。この古典的な結果は最悪のシナリオでは鋭い。本稿では,摂動がガウス確率行列である場合,davis-kahan-wedin $\sin \theta$ theorem の確率的バージョンを証明する。ある種の構造的仮定の下では、古典的なデービス=カーン=ヴェーディン$\sin \Theta$定理を著しく改善する最適境界を得る。私たちの重要なツールの1つは、特異値に対して束縛された新しい摂動です。

The Davis-Kahan-Wedin $\sin \Theta$ theorem describes how the singular subspaces of a matrix change when subjected to a small perturbation. This classic result is sharp in the worst case scenario. In this paper, we prove a stochastic version of the Davis-Kahan-Wedin $\sin \Theta$ theorem when the perturbation is a Gaussian random matrix. Under certain structural assumptions, we obtain an optimal bound that significantly improves upon the classic Davis-Kahan-Wedin $\sin \Theta$ theorem. One of our key tools is a new perturbation bound for the singular values, which may be of independent interest.

翻訳日:2024-01-03 03:30:18 公開日:2023-12-29

# FlowX: メッセージフローによる説明可能なグラフニューラルネットワークを目指して

FlowX: Towards Explainable Graph Neural Networks via Message Flows ( http://arxiv.org/abs/2206.12987v3 )

ライセンス: Link先を確認

Shurui Gui, Hao Yuan, Jie Wang, Qicheng Lao, Kang Li, Shuiwang Ji

(参考訳) グラフニューラルネットワーク(GNN)の動作メカニズム解明へのステップとして,その説明可能性について検討する。現在のほとんどの手法はグラフノード、エッジ、機能の説明に重点を置いているが、GNNの本質的な機能メカニズムとして、メッセージフローは説明可能性を実現する上でより自然なものである、と我々は主張する。そこで本研究では,重要なメッセージフローを識別してGNNを説明する新しい手法であるFlowXを提案する。フローの重要性を定量化するために,協調ゲーム理論からシェープリー値の哲学に従うことを提案する。連立の余分な貢献を計算することの複雑さに対処するために,シェープ値近似を更なるトレーニングの初期評価として計算するフローサンプリングスキームを提案する。次に,多様な説明対象に対してフロースコアを学習するための情報制御学習アルゴリズムを提案する。合成および実世界の両方のデータセットに関する実験的研究により、提案したFlowXとその変種がGNNの説明可能性の向上に繋がることを示した。コードはhttps://github.com/divelab/digで入手できる。

We investigate the explainability of graph neural networks (GNNs) as a step toward elucidating their working mechanisms. While most current methods focus on explaining graph nodes, edges, or features, we argue that, as the inherent functional mechanism of GNNs, message flows are more natural for performing explainability. To this end, we propose a novel method here, known as FlowX, to explain GNNs by identifying important message flows. To quantify the importance of flows, we propose to follow the philosophy of Shapley values from cooperative game theory. To tackle the complexity of computing all coalitions' marginal contributions, we propose a flow sampling scheme to compute Shapley value approximations as initial assessments of further training. We then propose an information-controlled learning algorithm to train flow scores toward diverse explanation targets: necessary or sufficient explanations. Experimental studies on both synthetic and real-world datasets demonstrate that our proposed FlowX and its variants lead to improved explainability of GNNs. The code is available at https://github.com/divelab/DIG.

翻訳日:2024-01-03 03:19:35 公開日:2023-12-29

# 言語間の生涯学習

Cross-lingual Lifelong Learning ( http://arxiv.org/abs/2205.11152v2 )

ライセンス: Link先を確認

Meryem M'hamdi, Xiang Ren, and Jonathan May

(参考訳) 多言語学習の長年の目標は、多言語データ分布の変化に耐えられる普遍的な言語横断モデルを開発することである。このような多言語モデルを、見当たらないターゲット言語に適応させる作業は、数多く行われてきた。しかし、この方向のほとんどの研究は、ソースからターゲット言語への標準のワンホップ転送学習パイプラインに焦点を当てているが、現実的なシナリオでは、新しい言語を逐次的に組み込むことができる。本稿では,言語間連続学習(ccl)の評価パラダイムを提案する。そこでは,異なる言語からの新たなデータに継続的に適応するためのアプローチのカテゴリを分析する。マルチリンガルなシーケンシャルな学習を特に難しいものにするための洞察を提供する。このような課題を克服するために,言語間連続学習アルゴリズムの代表的なセットをベンチマークし,注意深く収集されたデータストリームのベースラインと比較して,その知識の保存,蓄積,一般化能力を分析する。この分析の意味は、従来の転帰学習を超えて、異なる言語間連続学習のデシダラタを測り、バランスをとる方法のレシピを含む。

The longstanding goal of multi-lingual learning has been to develop a universal cross-lingual model that can withstand the changes in multi-lingual data distributions. There has been a large amount of work to adapt such multi-lingual models to unseen target languages. However, the majority of work in this direction focuses on the standard one-hop transfer learning pipeline from source to target languages, whereas in realistic scenarios, new languages can be incorporated at any time in a sequential manner. In this paper, we present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm, where we analyze different categories of approaches used to continually adapt to emerging data from different languages. We provide insights into what makes multilingual sequential learning particularly challenging. To surmount such challenges, we benchmark a representative set of cross-lingual continual learning algorithms and analyze their knowledge preservation, accumulation, and generalization capabilities compared to baselines on carefully curated datastreams. The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata, which go beyond conventional transfer learning.

翻訳日:2024-01-03 03:18:55 公開日:2023-12-29

# 漢文書を現代朝鮮語・英語に翻訳する

Translating Hanja Historical Documents to Contemporary Korean and English ( http://arxiv.org/abs/2205.10019v5 )

ライセンス: Link先を確認

Juhee Son, Jiho Jin, Haneul Yoo, JinYeong Bak, Kyunghyun Cho, Alice Oh

(参考訳) 朝鮮王朝の年代記(ajd)には朝鮮の近代国家に先立つ500年の王国である朝鮮の王の日々の記録が含まれている。アナル文字は、1968年から1993年まで朝鮮語に翻訳された古来の朝鮮語書記法「般若」で書かれていた。しかし、この翻訳は書き直しに過ぎず、多くの古語的な韓国語も含んでいたため、2012年に新しい専門的な翻訳作業が始まった。それ以来、わずか1人の王の記録は10年で完成している。並行して、専門家翻訳家は英語の翻訳にも取り組んでおり、そのペースは遅く、これまでのところ英語の王の記録は1つだけだった。そこで本稿では,ハンジャの歴史文書をより理解しやすい韓国語と英語に翻訳するニューラルマシン翻訳モデルh2keを提案する。 H2KEは多言語ニューラルマシン翻訳の上に構築され、時代遅れの朝鮮語翻訳の全データセットと、より最近になって翻訳された現代韓国語と英語の小さなデータセットから漢漢で書かれた歴史的文書の翻訳を学ぶ。提案手法を,漢書古文書の復元と翻訳を同時に学習する最近のモデルと,新たに翻訳されたコーパスのみに基づいて学習したトランスフォーマーベースモデルとを比較した。実験の結果,現代韓国語と英語の両翻訳のBLEUスコアにおいて,本手法が基調を著しく上回ることがわかった。我々はさらに、専門家と非専門家の韓国語話者による原語翻訳よりも翻訳が好ましいことを示す広範な人的評価を行っている。

The Annals of Joseon Dynasty (AJD) contain the daily records of the Kings of Joseon, the 500-year kingdom preceding the modern nation of Korea. The Annals were originally written in an archaic Korean writing system, `Hanja', and were translated into Korean from 1968 to 1993. The resulting translation was however too literal and contained many archaic Korean words; thus, a new expert translation effort began in 2012. Since then, the records of only one king have been completed in a decade. In parallel, expert translators are working on English translation, also at a slow pace and produced only one king's records in English so far. Thus, we propose H2KE, a neural machine translation model, that translates historical documents in Hanja to more easily understandable Korean and to English. Built on top of multilingual neural machine translation, H2KE learns to translate a historical document written in Hanja, from both a full dataset of outdated Korean translation and a small dataset of more recently translated contemporary Korean and English. We compare our method against two baselines: a recent model that simultaneously learns to restore and translate Hanja historical document and a Transformer based model trained only on newly translated corpora. The experiments reveal that our method significantly outperforms the baselines in terms of BLEU scores for both contemporary Korean and English translations. We further conduct extensive human evaluation which shows that our translation is preferred over the original expert translations by both experts and non-expert Korean speakers.

翻訳日:2024-01-03 03:18:35 公開日:2023-12-29

# 協調推論誘導言語モデルによる数学単語問題の解法

Solving Math Word Problems via Cooperative Reasoning induced Language Models ( http://arxiv.org/abs/2210.16257v5 )

ライセンス: Link先を確認

Xinyu Zhu, Junjie Wang, Lin Zhang, Yuxiang Zhang, Ruyi Gan, Jiaxing Zhang, Yujiu Yang

(参考訳) 大規模事前学習言語モデル(PLM)は、特に数学語問題(MWP)のような高レベルの知性を必要とする問題に新たな機会をもたらす。しかしながら、既存のPLMをMWPに直接適用することは、生成プロセスが十分な監督を欠いているため、人間としての高速な適応性を欠いているため失敗する可能性がある。人間の推論には、即時反応系(システム1)と微妙な推論系(システム2)から構成される二重推論の枠組みがあることに気付く。これにより、協調推論(Cooperative Reasoning, CoRe)と呼ばれる、MWPを解くための協調推論によるPLMを開発することとなり、システム1をジェネレータとして、システム2をバリデーションとして、人間のような推論アーキテクチャを実現する。提案手法では, ジェネレータは推論経路の生成に責任を持ち, 検証器を用いて評価を監督し, ジェネレータに対する信頼性の高いフィードバックを得る。我々はCoReフレームワークをいくつかの数学的推論データセット上で評価し、最先端の手法よりも優れた改善を実現した。私たちのコードはhttps://github.com/TianHongZXY/CoReで利用可能です。

Large-scale pre-trained language models (PLMs) bring new opportunities to challenging problems, especially those that need high-level intelligence, such as the math word problem (MWPs). However, directly applying existing PLMs to MWPs can fail as the generation process lacks sufficient supervision and thus lacks fast adaptivity as humans. We notice that human reasoning has a dual reasoning framework that consists of an immediate reaction system (system 1) and a delicate reasoning system (system 2), where the entire reasoning is determined by their interaction. This inspires us to develop a cooperative reasoning-induced PLM for solving MWPs, called Cooperative Reasoning (CoRe), resulting in a human-like reasoning architecture with system 1 as the generator and system 2 as the verifier. In our approach, the generator is responsible for generating reasoning paths, and the verifiers are used to supervise the evaluation in order to obtain reliable feedback for the generator. We evaluate our CoRe framework on several mathematical reasoning datasets and achieve decent improvement over state-of-the-art methods, up to 9.6% increase over best baselines. Our codes are available at https://github.com/TianHongZXY/CoRe

翻訳日:2024-01-03 03:11:03 公開日:2023-12-29

# 未学習ニューラルネットワークによる残留バックプロジェクション

Residual Back Projection With Untrained Neural Networks ( http://arxiv.org/abs/2210.14416v2 )

ライセンス: Link先を確認

Ziyu Shu and Alireza Entezari

(参考訳) 背景と目的: 画像処理タスクにおけるニューラルネットワークの成功は、CT(Computerd tomography)における画像再構成問題への彼らの応用を動機付けている。この分野では進歩が進んでいるが、安定性の欠如と精度の理論的保証、および特定の画像領域に対する高品質なトレーニングデータの不足は、多くのCTアプリケーションに課題をもたらす。本稿では,ニューラルネットワークの階層構造を利用したCTにおける反復的再構成(IR)の枠組みを,トレーニングを必要とせずに提案する。本フレームワークでは,この構造情報をDIP(Deep Image Prior)として組み込んで,リザーブ・バック・プロジェクション(RBP)接続を用いてイテレーションの基礎となる。方法: 目標関数を最小化し, 高精度な再構成を実現するために, 未訓練のu-netと新しい残差バックプロジェクションを併用して提案する。各イテレーションにおいて、トレーニングされていないU-netの重みを最適化し、現在のイテレーションにおけるU-netの出力を使用して、上記RBP接続を介して次のイテレーションにおけるU-netの入力を更新する。結果: 実験の結果, RBP-DIPフレームワークは, 従来のIR法と類似のネットワーク構造を持つ事前学習モデル, 未学習モデルに改善をもたらすことがわかった。これらの改善は、少数ビュー、限定アングル、低線量撮像構成において特に重要である。結論: パラレルビームX線撮影とファンビームX線撮影を併用すると, 複数の条件下での大きな改善が見られた。さらに、提案フレームワークは、トレーニングデータを必要としないため、異なる条件(ノイズレベル、幾何学、画像オブジェクトなど)に適応するために、オンデマンドで調整することができる。

Background and Objective: The success of neural networks in a number of image processing tasks has motivated their application in image reconstruction problems in computed tomography (CT). While progress has been made in this area, the lack of stability and theoretical guarantees for accuracy, together with the scarcity of high-quality training data for specific imaging domains pose challenges for many CT applications. In this paper, we present a framework for iterative reconstruction (IR) in CT that leverages the hierarchical structure of neural networks, without the need for training. Our framework incorporates this structural information as a deep image prior (DIP), and uses a novel residual back projection (RBP) connection that forms the basis for our iterations. Methods: We propose using an untrained U-net in conjunction with a novel residual back projection to minimize an objective function and achieve high-accuracy reconstruction. In each iteration, the weights of the untrained U-net are optimized, and the output of the U-net in the current iteration is used to update the input of the U-net in the next iteration through the aforementioned RBP connection. Results: Experimental results demonstrate that the RBP-DIP framework offers improvements over other state-of-the-art conventional IR methods, as well as pre-trained and untrained models with similar network structures under multiple conditions. These improvements are particularly significant in the few-view, limited-angle, and low-dose imaging configurations. Conclusions: Applying to both parallel and fan beam X-ray imaging, our framework shows significant improvement under multiple conditions. Furthermore, the proposed framework requires no training data and can be adjusted on-demand to adapt to different conditions (e.g. noise level, geometry, and imaged object).

翻訳日:2024-01-03 03:10:06 公開日:2023-12-29

# 特定ミスデータ機構を用いたガウス混合モデルのベイズ則推定法の解析

Analysis of Estimating the Bayes Rule for Gaussian Mixture Models with a Specified Missing-Data Mechanism ( http://arxiv.org/abs/2210.13785v2 )

ライセンス: Link先を確認

Ziyang Lyu

(参考訳) 半教師付き学習(SSL)アプローチは、幅広い工学と科学の分野でうまく適用されている。本稿では、Ahfock と McLachlan (2020) が導入した、未分類観測のための欠落機構を持つ生成モデルフレームワークについて検討する。一部分類されたサンプルでは、欠落データ機構を用いたベイズ規則を用いた分類器は、2クラス正規ホモシダモデルにおいて完全教師付き分類器を上回ることができ、特に中程度から低い重複率と欠落クラスラベルの割合で、あるいは重なりが大きいが欠落ラベルが少ない。また、重複領域や欠落したクラスラベルの比率に関わらず、欠落データ機構のない分類器を上回ります。シミュレーションにより不均等な共分散を持つ2成分および3成分の正規混合モデルの探索を行い, 以上の知見を裏付ける。最後に,ニューロン間および皮膚病変データセットに欠測データ機構を付加した分類器について述べる。

Semi-supervised learning (SSL) approaches have been successfully applied in a wide range of engineering and scientific fields. This paper investigates the generative model framework with a missingness mechanism for unclassified observations, as introduced by Ahfock and McLachlan(2020). We show that in a partially classified sample, a classifier using Bayes rule of allocation with a missing-data mechanism can surpass a fully supervised classifier in a two-class normal homoscedastic model, especially with moderate to low overlap and proportion of missing class labels, or with large overlap but few missing labels. It also outperforms a classifier with no missing-data mechanism regardless of the overlap region or the proportion of missing class labels. Our exploration of two- and three-component normal mixture models with unequal covariances through simulations further corroborates our findings. Finally, we illustrate the use of the proposed classifier with a missing-data mechanism on interneuronal and skin lesion datasets.

翻訳日:2024-01-03 03:09:02 公開日:2023-12-29

# フェルミオン量子シミュレーションのための誤り訂正符号

Error-correcting codes for fermionic quantum simulation ( http://arxiv.org/abs/2210.08411v5 )

ライセンス: Link先を確認

Yu-An Chen, Alexey V. Gorshkov, and Yijia Xu

(参考訳) パウリ安定化符号の文脈における$\mathbb{Z}_2$格子ゲージ理論の枠組みを利用して、2次元正方格子上の量子ビット系によるフェルミオンをシミュレートする手法を提案する。ローラン多項式環上のパウリ加群のシンプレクティック自己同型について検討する。これにより、エンコードされた論理フェルミオンと物理キュービットの間のレートを固定しながら、安定化符号の符号距離を体系的に増加させることができる。フェミオンシミュレーションに適した安定化符号群を同定し、$d=2,3,4,5,6,7$の符号距離を達成し、任意の$\lfloor \frac{d-1}{2} \rfloor$-qubitエラーの補正を可能にする。従来のコード連結手法とは対照的に、この手法は(フェルミオン)符号率を低下させることなくコード距離を増大させることができる。特に、コード距離が$d=3,4,5$のコードに対して、すべての安定化子と論理演算子を明示的に示す。すべてのPauliエラーに対するシンドロームを提供し、コード距離を数値的に計算するシンドロームマッチングアルゴリズムを考案する。

Utilizing the framework of $\mathbb{Z}_2$ lattice gauge theories in the context of Pauli stabilizer codes, we present methodologies for simulating fermions via qubit systems on a two-dimensional square lattice. We investigate the symplectic automorphisms of the Pauli module over the Laurent polynomial ring. This enables us to systematically increase the code distances of stabilizer codes while fixing the rate between encoded logical fermions and physical qubits. We identify a family of stabilizer codes suitable for fermion simulation, achieving code distances of $d=2,3,4,5,6,7$, allowing correction of any $\lfloor \frac{d-1}{2} \rfloor$-qubit error. In contrast to the traditional code concatenation approach, our method can increase the code distances without decreasing the (fermionic) code rate. In particular, we explicitly show all stabilizers and logical operators for codes with code distances of $d=3,4,5$. We provide syndromes for all Pauli errors and invent a syndrome-matching algorithm to compute code distances numerically.

翻訳日:2024-01-03 03:08:10 公開日:2023-12-29

# CoreDeep: 幅確率によるき裂検出アルゴリズムの改善

CoreDeep: Improving Crack Detection Algorithms Using Width Stochasticity ( http://arxiv.org/abs/2209.04648v2 )

ライセンス: Link先を確認

Ram Krishna Pandey, Akshit Achara

(参考訳) 画像中のクラックの自動検出やセグメント化は、メンテナンスや運用のコスト削減に役立つ。背景から亀裂を分離する明確な境界が存在しないため,難易度分析のための亀裂の検出,測定,定量化は困難である。開発されたアルゴリズムは、データに関連する固有の課題を扱う必要がある。知覚的に注目すべき課題は、色、強度、深さ、ぼやけ、動きの青、方向、欠陥に対する異なる関心領域(ROI)、スケール、照明、複雑で困難な背景などである。これらのバリエーションは(crack interクラス)とイメージ(crack in-class変数)にまたがる。全体として、大きな背景(インター)と前景(イントラクラス)のばらつきがある。本研究では,これらの変化が背景シナリオの難易度に与える影響を低減しようと試みている。我々は,これらの変動の影響を低減するために,確率幅(SW)アプローチを提案している。提案手法は検出性を向上し,偽陽性と陰性を大幅に低減する。我々は,平均IoU,偽陽性,陰性,主観的品質の観点からアルゴリズムの性能を客観的に測定した。

Automatically detecting or segmenting cracks in images can help in reducing the cost of maintenance or operations. Detecting, measuring and quantifying cracks for distress analysis in challenging background scenarios is a difficult task as there is no clear boundary that separates cracks from the background. Developed algorithms should handle the inherent challenges associated with data. Some of the perceptually noted challenges are color, intensity, depth, blur, motion-blur, orientation, different region of interest (ROI) for the defect, scale, illumination, complex and challenging background, etc. These variations occur across (crack inter class) and within images (crack intra-class variabilities). Overall, there is significant background (inter) and foreground (intra-class) variability. In this work, we have attempted to reduce the effect of these variations in challenging background scenarios. We have proposed a stochastic width (SW) approach to reduce the effect of these variations. Our proposed approach improves detectability and significantly reduces false positives and negatives. We have measured the performance of our algorithm objectively in terms of mean IoU, false positives and negatives and subjectively in terms of perceptual quality.

翻訳日:2024-01-03 03:06:07 公開日:2023-12-29

# 予測誤差保証による分散オフラインポリシー評価

Distributional Offline Policy Evaluation with Predictive Error Guarantees ( http://arxiv.org/abs/2302.09456v3 )

ライセンス: Link先を確認

Runzhe Wu, Masatoshi Uehara, Wen Sun

(参考訳) 本研究では,ポリシから生成されていないオフラインデータセット,すなわち分散オフラインポリシ評価(OPE)を用いて,ポリシの戻り値の分布を推定する問題について検討する。本稿では,mle (maximum likelihood estimation) のシーケンスを実行し,mle を通じて訓練できる限り,任意の状態確率的生成モデルを統合する柔軟性を有する適応度推定 (adapted likelihood estimation, fle) というアルゴリズムを提案する。 FLEは、報酬が多次元ベクトルとなるような有限水平と無限水平の割引設定の両方に使うことができる。我々の理論的結果は、有限水平と無限水平の割引設定の両方において、FLEは総変分距離とワッサーシュタイン距離で基底真理に近い分布を学習できることを示している。我々の理論的結果は、オフラインデータがテストポリシーのトレースをカバーし、教師付き学習MLEが成功するという条件下で成り立つ。実験では,2つの生成モデル,ガウス混合モデルと拡散モデルを用いてFLEの性能を示す。多次元報酬設定では、拡散モデルを持つFLEは、テストポリシの戻りの複雑な分布を推定することができる。

We study the problem of estimating the distribution of the return of a policy using an offline dataset that is not generated from the policy, i.e., distributional offline policy evaluation (OPE). We propose an algorithm called Fitted Likelihood Estimation (FLE), which conducts a sequence of Maximum Likelihood Estimation (MLE) and has the flexibility of integrating any state-of-the-art probabilistic generative models as long as it can be trained via MLE. FLE can be used for both finite-horizon and infinite-horizon discounted settings where rewards can be multi-dimensional vectors. Our theoretical results show that for both finite-horizon and infinite-horizon discounted settings, FLE can learn distributions that are close to the ground truth under total variation distance and Wasserstein distance, respectively. Our theoretical results hold under the conditions that the offline data covers the test policy's traces and that the supervised learning MLE procedures succeed. Experimentally, we demonstrate the performance of FLE with two generative models, Gaussian mixture models and diffusion models. For the multi-dimensional reward setting, FLE with diffusion models is capable of estimating the complicated distribution of the return of a test policy.

翻訳日:2024-01-03 02:57:08 公開日:2023-12-29

# 対訳オンライン協調フィルタリング

Adversarial Online Collaborative Filtering ( http://arxiv.org/abs/2302.05765v2 )

ライセンス: Link先を確認

Stephen Pasteris, Fabio Vitale, Mark Herbster, Claudio Gentile, Andre' Panisson

(参考訳) 本研究では,オンライン・コラボレーション・フィルタリングの課題について検討し,ユーザがオンライン・スタイルでコンテンツを配信する必要があること,ユーザが同じコンテンツアイテムを1回以上推薦できないこと,等について考察する。まず,ユーザの嗜好行列上の双クラスタ化仮定の下で動作するアルゴリズムを設計・解析し,このアルゴリズムがユーザ列やアイテムの宇宙,選好行列の双クラスタ化パラメータに関する事前知識に従わないよう,完全適応的でありながら最適な後悔保証を示すことを示す。そこで,本アルゴリズムの汎用行列を用いたより頑健なバージョンを提案する。また,このアルゴリズムはパラメータフリーであり,プライオリティ行列が2重クラスター構造から逸脱する量と一致することを後悔していることを示す。我々の知る限り、これらはオンライン共同フィルタリングにおける最初の成果であり、このレベルの一般化と適応性は、反復的制約下で維持される。最後に,理論の検証と標準ベースラインとの実証的な比較を目的とした実世界のデータセットに関する簡単な実験により,理論的知見を補完する。この比較は、これらのベースラインに対するアプローチの競争上の優位性を示している。

We investigate the problem of online collaborative filtering under no-repetition constraints, whereby users need to be served content in an online fashion and a given user cannot be recommended the same content item more than once. We start by designing and analyzing an algorithm that works under biclustering assumptions on the user-item preference matrix, and show that this algorithm exhibits an optimal regret guarantee, while being fully adaptive, in that it is oblivious to any prior knowledge about the sequence of users, the universe of items, as well as the biclustering parameters of the preference matrix. We then propose a more robust version of this algorithm which operates with general matrices. Also this algorithm is parameter free, and we prove regret guarantees that scale with the amount by which the preference matrix deviates from a biclustered structure. To our knowledge, these are the first results on online collaborative filtering that hold at this level of generality and adaptivity under no-repetition constraints. Finally, we complement our theoretical findings with simple experiments on real-world datasets aimed at both validating the theory and empirically comparing to standard baselines. This comparison shows the competitive advantage of our approach over these baselines.

翻訳日:2024-01-03 02:56:47 公開日:2023-12-29

# テキストから画像へのプロンプトの最適化

Optimizing Prompts for Text-to-Image Generation ( http://arxiv.org/abs/2212.09611v2 )

ライセンス: Link先を確認

Yaru Hao, Zewen Chi, Li Dong, Furu Wei

(参考訳) よく設計されたプロンプトは、テキストから画像へのモデルをガイドし、素晴らしい画像を生成する。しかしながら、パフォーマンスプロンプトはモデル固有であり、ユーザ入力と不一致であることが多い。本稿では,従来のユーザ入力をモデル優先のプロンプトに自動的に適応する一般的なフレームワークである,プロンプト適応を提案する。具体的には、手作業によるプロンプトの小さなコレクション上で、事前訓練された言語モデルを用いて教師付き微調整を行う。その後、強化学習を使用して、より良いプロンプトを探索します。我々は,本来のユーザ意図を維持しつつ,より美的なイメージを生成するためのポリシーを奨励する報酬関数を定義する。安定拡散実験の結果,本手法は自動測定値と人選好評価値の両方で手動のプロンプト工学よりも優れていた。さらに、強化学習は、特にドメイン外のプロンプトのパフォーマンスをさらに向上させる。事前トレーニングされたチェックポイントはhttps://aka.ms/promptist.comで入手できる。デモはhttps://aka.ms/promptist-demoで見ることができる。

Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts. Specifically, we first perform supervised fine-tuning with a pretrained language model on a small collection of manually engineered prompts. Then we use reinforcement learning to explore better prompts. We define a reward function that encourages the policy to generate more aesthetically pleasing images while preserving the original user intentions. Experimental results on Stable Diffusion show that our method outperforms manual prompt engineering in terms of both automatic metrics and human preference ratings. Moreover, reinforcement learning further boosts performance, especially on out-of-domain prompts. The pretrained checkpoints are available at https://aka.ms/promptist. The demo can be found at https://aka.ms/promptist-demo.

翻訳日:2024-01-03 02:54:47 公開日:2023-12-29

# 高エネルギー物理におけるFAIRAIモデル

FAIR AI Models in High Energy Physics ( http://arxiv.org/abs/2212.05081v3 )

ライセンス: Link先を確認

Javier Duarte and Haoyang Li and Avik Roy and Ruike Zhu and E. A. Huerta and Daniel Diaz and Philip Harris and Raghav Kansal and Daniel S. Katz and Ishaan H. Kavoori and Volodymyr V. Kindratenko and Farouk Mokhtar and Mark S. Neubauer and Sang Eon Park and Melissa Quinnan and Roger Rusack and Zhizhen Zhao

(参考訳) findable, access, interoperaable, and reusable (fair) データ原則は、科学的発見を促進するためにデータの共有方法を調査し、評価し、改善するためのフレームワークを提供する。これらの原則を研究ソフトウェアや他のデジタル製品に一般化することは、活発な研究分野である。機械学習(ML)モデル -- 明示的にプログラムされることなくデータに基づいてトレーニングされたアルゴリズム -- や、より一般的には人工知能(AI)モデル — は、AIが実験的な高エネルギー物理学(HEP)のような科学領域を変革しているため、この目標にとって重要なものだ。本稿では、HEPにおけるAIモデルに対するFAIR原則の実践的定義を提案し、これらの原則の適用のためのテンプレートを記述する。グラフニューラルネットワークを用いてヒッグス粒子が2つのボトムクォークに崩壊するのを識別する、HEPに適用したAIモデルの例を用いて、テンプレートの使用を実証する。本稿では,このFAIR AIモデルの堅牢性,ハードウェアアーキテクチャとソフトウェアフレームワーク間のポータビリティ,解釈可能性について報告する。

The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly programmed -- and more generally, artificial intelligence (AI) models, are an important target for this because of the ever-increasing pace with which AI is transforming scientific domains, such as experimental high energy physics (HEP). In this paper, we propose a practical definition of FAIR principles for AI models in HEP and describe a template for the application of these principles. We demonstrate the template's use with an example AI model applied to HEP, in which a graph neural network is used to identify Higgs bosons decaying to two bottom quarks. We report on the robustness of this FAIR AI model, its portability across hardware architectures and software frameworks, and its interpretability.

翻訳日:2024-01-03 02:54:19 公開日:2023-12-29

# 統計的推論としての説明可能性

Explainability as statistical inference ( http://arxiv.org/abs/2212.03131v3 )

ライセンス: Link先を確認

Hugo Henri Joseph Senetaire, Damien Garreau, Jes Frellsen, Pierre-Alexandre Mattei

(参考訳) 近年、様々なモデル説明アプローチが提案されており、いずれも非常に異なる理論とヒューリスティックによって導かれている。本稿では,統計的推論問題として新しい経路と解釈可能性を提案する。本稿では,解釈可能な予測を生成するために設計された一般の深部確率モデルを提案する。モデルパラメータは最大確率で学習でき、この方法は任意の予測器ネットワークアーキテクチャと任意の種類の予測問題に適用することができる。本手法は,ニューラルネットワークをセレクタとして使用し,推論時の解釈を高速に行う無形解釈モデルの一例である。いくつかの一般的な解釈可能性法は、一般モデルに対する正規化極大確率の特別な場合であることが示されている。そこで本稿では,特徴重要度マップの評価を可能にする,真理選択に基づく新しいデータセットを提案する。これらのデータセットを用いて、複数の命令を用いることでより合理的な解釈が得られることを示す。

A wide variety of model explanation approaches have been proposed in recent years, all guided by very different rationales and heuristics. In this paper, we take a new route and cast interpretability as a statistical inference problem. We propose a general deep probabilistic model designed to produce interpretable predictions. The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture and any type of prediction problem. Our method is a case of amortized interpretability models, where a neural network is used as a selector to allow for fast interpretation at inference time. Several popular interpretability methods are shown to be particular cases of regularised maximum likelihood for our general model. We propose new datasets with ground truth selection which allow for the evaluation of the features importance map. Using these datasets, we show experimentally that using multiple imputation provides more reasonable interpretations.

翻訳日:2024-01-03 02:54:00 公開日:2023-12-29

# 連続量子場理論のための変分ニューラルネットワークアンサッツ

Variational Neural-Network Ansatz for Continuum Quantum Field Theory ( http://arxiv.org/abs/2212.00782v4 )

ライセンス: Link先を確認

John M. Martyn, Khadijeh Najafi, Di Luo

(参考訳) ファインマンにさかのぼる物理学者は、量子場理論に変分原理を適用することの難しさを嘆いている。非相対論的場の量子論では、状態のフォック空間表現を構成する無限に多くの$n$粒子波動関数をパラメータ化し、最適化することが課題である。ここでは,連続体における非相対論的量子場理論への変分原理の適用を可能にする深層学習アンサッツであるニューラルネットワーク量子場状態を導入することにより,この問題にアプローチする。我々のansatzは、ディープセットニューラルネットワークアーキテクチャを使用して、量子場状態を含むn$-particle波関数のすべてを同時にパラメータ化します。我々は、ansatzを用いて、不均一系や長距離相互作用を持つ系を含む様々な場理論の基底状態の近似を行い、量子場理論を探索する強力な新しいツールを示す。

Physicists dating back to Feynman have lamented the difficulties of applying the variational principle to quantum field theories. In non-relativistic quantum field theories, the challenge is to parameterize and optimize over the infinitely many $n$-particle wave functions comprising the state's Fock space representation. Here we approach this problem by introducing neural-network quantum field states, a deep learning ansatz that enables application of the variational principle to non-relativistic quantum field theories in the continuum. Our ansatz uses the Deep Sets neural network architecture to simultaneously parameterize all of the $n$-particle wave functions comprising a quantum field state. We employ our ansatz to approximate ground states of various field theories, including an inhomogeneous system and a system with long-range interactions, thus demonstrating a powerful new tool for probing quantum field theories.

翻訳日:2024-01-03 02:53:47 公開日:2023-12-29

# 医用samアダプタ : 医用画像分割のためのsegment anythingモデルの適用

Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation ( http://arxiv.org/abs/2304.12620v7 )

ライセンス: Link先を確認

Junde Wu and Wei Ji and Yuanpei Liu and Huazhu Fu and Min Xu and Yanwu Xu and Yueming Jin

(参考訳) Segment Anything Model (SAM) は、画像セグメンテーションの分野で、様々なセグメンテーションタスクやプロンプトベースのインタフェースにおいて印象的な機能によって最近人気を集めている。しかし、最近の研究や個人実験により、SAMは医学的な特定の知識が欠如しているため、医療画像のセグメンテーションにおいて不十分であることが示されている。これにより、医療画像におけるSAMのセグメンテーション能力の強化に関する疑問が提起される。本稿では,samモデルを微調整する代わりに,領域固有の医学知識を,軽量かつ効果的な適応手法を用いてセグメンテーションモデルに組み込む医療用samアダプタ(med-sa)を提案する。 Med-SAでは,3次元医用画像に2次元SAMを適応させる空間深度変換(SD-Trans)と,即時適応を実現するハイパープロンプト適応(HyP-Adpt)を提案する。各種画像モダリティを対象とした17の医用画像分割作業に関する総合評価実験を行った。 Med-SAは、パラメータのわずか2%を更新しながら、いくつかの最先端(SOTA)医療画像セグメンテーション法より優れている。私たちのコードはhttps://github.com/KidsWithTokens/Medical-SAM-Adapterでリリースされています。

The Segment Anything Model (SAM) has recently gained popularity in the field of image segmentation due to its impressive capabilities in various segmentation tasks and its prompt-based interface. However, recent studies and individual experiments have shown that SAM underperforms in medical image segmentation, since the lack of the medical specific knowledge. This raises the question of how to enhance SAM's segmentation capability for medical images. In this paper, instead of fine-tuning the SAM model, we propose the Medical SAM Adapter (Med-SA), which incorporates domain-specific medical knowledge into the segmentation model using a light yet effective adaptation technique. In Med-SA, we propose Space-Depth Transpose (SD-Trans) to adapt 2D SAM to 3D medical images and Hyper-Prompting Adapter (HyP-Adpt) to achieve prompt-conditioned adaptation. We conduct comprehensive evaluation experiments on 17 medical image segmentation tasks across various image modalities. Med-SA outperforms several state-of-the-art (SOTA) medical image segmentation methods, while updating only 2\% of the parameters. Our code is released at https://github.com/KidsWithTokens/Medical-SAM-Adapter.

翻訳日:2024-01-03 02:46:50 公開日:2023-12-29

# Knapsack最適化によるニューラルネットワークのセキュア化

Securing Neural Networks with Knapsack Optimization ( http://arxiv.org/abs/2304.10442v2 )

ライセンス: Link先を確認

Yakir Gorski, Amir Jevnisek, Shai Avidan

(参考訳) ニューラルネットワークを持つmlaasサービスプロバイダ(sps)は、ニューラルネットワークの重み付けを秘密にしたいと考えている。一方で、ユーザはデータを明かすことなく、spsのニューラルネットワークを推論に利用したいと考えている。マルチパーティ計算(MPC)は、これを実現するためのソリューションを提供する。 MPCの計算は、当事者がデータを行き来するときに通信を伴う。非線形操作は通常、通信帯域幅の大部分を必要とする主なボトルネックである。本稿では,多くのコンピュータビジョンタスクのバックボーンとして機能するResNetsに着目し,その非線形成分,具体的にはReLUの数を削減することを目的とする。我々の重要な洞察は、空間的近接画素は相関したReLU応答を示すことである。この知見に基づいて、ピクセル当たりのReLU演算をパッチ毎のReLU演算に置き換える。このアプローチを "Block-ReLU" と呼ぶ。ニューラルネットワークの異なるレイヤは異なる特徴階層に対応するため、ニューラルネットワークのさまざまなレイヤに対してパッチサイズの柔軟性を実現することは理にかなっている。そこで我々は,Knapsack問題に対する新たな問題の削減を通じて,最適なパッチサイズを選択するアルゴリズムを考案した。 ResNet50のバックボーンを用いたイメージネットの分類、ResNet18のバックボーンを用いたCIFAR100の分類、MobileNetV2のバックボーンを用いたADE20Kのセマンティックセグメンテーション、ResNet50のバックボーンを用いたPascal VOC 2012のセマンティックセグメンテーションの4つの問題に対するアプローチを示す。私たちのアプローチは、少数の競合相手に比べて競争力のあるパフォーマンスを実現します。ソースコードはhttps://github.com/yg320/secure_inferenceで公開しています。

MLaaS Service Providers (SPs) holding a Neural Network would like to keep the Neural Network weights secret. On the other hand, users wish to utilize the SPs' Neural Network for inference without revealing their data. Multi-Party Computation (MPC) offers a solution to achieve this. Computations in MPC involve communication, as the parties send data back and forth. Non-linear operations are usually the main bottleneck requiring the bulk of communication bandwidth. In this paper, we focus on ResNets, which serve as the backbone for many Computer Vision tasks, and we aim to reduce their non-linear components, specifically, the number of ReLUs. Our key insight is that spatially close pixels exhibit correlated ReLU responses. Building on this insight, we replace the per-pixel ReLU operation with a ReLU operation per patch. We term this approach 'Block-ReLU'. Since different layers in a Neural Network correspond to different feature hierarchies, it makes sense to allow patch-size flexibility for the various layers of the Neural Network. We devise an algorithm to choose the optimal set of patch sizes through a novel reduction of the problem to the Knapsack Problem. We demonstrate our approach in the semi-honest secure 3-party setting for four problems: Classifying ImageNet using ResNet50 backbone, classifying CIFAR100 using ResNet18 backbone, Semantic Segmentation of ADE20K using MobileNetV2 backbone, and Semantic Segmentation of Pascal VOC 2012 using ResNet50 backbone. Our approach achieves competitive performance compared to a handful of competitors. Our source code is publicly available: https://github.com/yg320/secure_inference.

翻訳日:2024-01-03 02:46:27 公開日:2023-12-29

# 地震量子化

Earthquake Quantization ( http://arxiv.org/abs/2303.06158v4 )

ライセンス: Link先を確認

Benjamin Koch and Enrique Mu\~noz

(参考訳) アインシュタインの144歳の誕生日の記念として、経路積分の経路がランダムではなく、ランダムな背景の測地方程式の解となるような新しい量子化処方則を提案する。この視点の変化は、非相対論的量子力学の通常の定式化と数学的に等価であることを示す。結論として、物質に結合した量子重力や量子同値原理のような概念的問題について述べる。

In this homage to Einstein's 144th birthday we propose a novel quantization prescription, where the paths of a path-integral are not random, but rather solutions of a geodesic equation in a random background. We show that this change of perspective can be made mathematically equivalent to the usual formulations of non-relativistic quantum mechanics. To conclude, we comment on conceptual issues, such as quantum gravity coupled to matter and the quantum equivalence principle.

翻訳日:2024-01-03 02:41:46 公開日:2023-12-29

# 画像2SSM:放射基底関数を持つ画像からの統計的形状モデルの再構成

Image2SSM: Reimagining Statistical Shape Models from Images with Radial Basis Functions ( http://arxiv.org/abs/2305.11946v2 )

ライセンス: Link先を確認

Hong Xu and Shireen Y. Elhabian

(参考訳) 統計的形状モデリング(SSM)は解剖学的形態変化を解析するための重要なツールである。典型的なSSMパイプラインでは、セグメント化と剛性登録を経た3次元解剖画像は、統計的解析が可能な低次元形状特徴を用いて表現される。コンパクトな形状表現を構築するための様々な方法が提案されているが、それらは手間とコストのかかるステップを伴う。本研究では,画像から形状のラジアル・ベイシス関数(rbf)に基づく表現を学習するために,画像セグメンテーションペアを利用した新しい深層学習手法であるimage2ssmを提案する。このrpfベースの形状表現は、複雑なジオメトリにデータ駆動方式で適応できる基礎面の連続的かつコンパクトな表現を推定するために、ネットワークに豊富な自己教師付き信号を提供する。 image2ssmは、最小限のパラメータチューニングとユーザ支援を必要とせず、解剖学的形状のアンサンブルの統計的ランドマークに基づく形状モデルを構築して、興味のある生物学的構造の集団を特徴付けることができる。トレーニングが完了すると、Image2SSMは、新しい未分割画像から低次元の形状表現を推測するために使用でき、特に大きなコホートを扱う場合、SSMのスケーラブルなアプローチへの道を開くことができる。合成および実データを用いた実験は,SSMの最先端対応方式と比較して提案手法の有効性を示した。

Statistical shape modeling (SSM) is an essential tool for analyzing variations in anatomical morphology. In a typical SSM pipeline, 3D anatomical images, gone through segmentation and rigid registration, are represented using lower-dimensional shape features, on which statistical analysis can be performed. Various methods for constructing compact shape representations have been proposed, but they involve laborious and costly steps. We propose Image2SSM, a novel deep-learning-based approach for SSM that leverages image-segmentation pairs to learn a radial-basis-function (RBF)-based representation of shapes directly from images. This RBF-based shape representation offers a rich self-supervised signal for the network to estimate a continuous, yet compact representation of the underlying surface that can adapt to complex geometries in a data-driven manner. Image2SSM can characterize populations of biological structures of interest by constructing statistical landmark-based shape models of ensembles of anatomical shapes while requiring minimal parameter tuning and no user assistance. Once trained, Image2SSM can be used to infer low-dimensional shape representations from new unsegmented images, paving the way toward scalable approaches for SSM, especially when dealing with large cohorts. Experiments on synthetic and real datasets show the efficacy of the proposed method compared to the state-of-art correspondence-based method for SSM.

翻訳日:2024-01-03 02:31:56 公開日:2023-12-29

# quditsを用いたso(5)多重フェルミオン系の量子シミュレーション

Quantum Simulations of SO(5) Many-Fermion Systems using Qudits ( http://arxiv.org/abs/2305.11941v2 )

ライセンス: Link先を確認

Marc Illa, Caroline E. P. Robin and Martin J. Savage

(参考訳) 量子多体系の構造とダイナミクスは、基礎となる相互作用の間の微妙な相互作用の結果であり、複雑な絡み合い構造をもたらす。この明らかな複雑さにもかかわらず、対称性は出現し、関連する自由度を決定するために長く使われてきた。本研究では,量子コンピュータが相互作用するフェルミオン系をシミュレートする上で,量子コンピュータの潜在的有用性について検討する。フェルミオンのアガシモデルは、基礎となる$so(5)$代数に基づいており、それらが記述するシステムは、5つの基底状態を持つモードのペアに分割することができ、自然に$d=5$ qudits (qu5its) の配列に埋め込まれる。最大12qu5itに埋め込まれたフェルミオン系の時間進化の古典的なノイズレスシミュレーションは、Googleのcirqソフトウェアを用いて実行される。 qu5it回路の資源要求を解析し、量子ビットシステムへの2つの異なるマッピング、物理認識のjordan-wignerマッピングと状態から状態へのマッピングと比較した。特に、必要な量子リソースの削減と、シミュレーションを物理空間から外すための予測エラーの低減に、キューディットを使うことの利点を見出した。それまで認識されていなかった符号問題は、高エネルギー励起を経時的に進行するトロタライズエラーから特定されている。これは高エネルギーおよび核物理学における量子シミュレーション、特に断片化と高度非弾性マルチチャネル過程に意味を持つ。

The structure and dynamics of quantum many-body systems are the result of a delicate interplay between underlying interactions, which leads to intricate entanglement structures. Despite this apparent complexity, symmetries emerge and have long been used to determine the relevant degrees of freedom and simplify classical descriptions of these systems. In this work, we explore the potential utility of quantum computers with arrays of qudits in simulating interacting fermionic systems, when the qudits can naturally map these relevant degrees of freedom. The Agassi model of fermions is based on an underlying $so(5)$ algebra, and the systems it describes can be partitioned into pairs of modes with five basis states, which naturally embed in arrays of $d=5$ qudits (qu5its). Classical noiseless simulations of the time evolution of systems of fermions embedded in up to twelve qu5its are performed using Google's cirq software. The resource requirements of the qu5it circuits are analyzed and compared with two different mappings to qubit systems, a physics-aware Jordan-Wigner mapping and a state-to-state mapping. We find advantages in using qudits, specifically in lowering the required quantum resources and reducing anticipated errors that take the simulation out of the physical space. A previously unrecognized sign problem has been identified from Trotterization errors in time evolving high-energy excitations. This has implications for quantum simulations in high-energy and nuclear physics, specifically of fragmentation and highly inelastic, multi-channel processes.

翻訳日:2024-01-03 02:31:33 公開日:2023-12-29

# 三フレーバーニュートリノ振動における量子性

Quantifying quantumness in three-flavor neutrino oscillations ( http://arxiv.org/abs/2305.06095v3 )

ライセンス: Link先を確認

Victor Bittencourt, Massimo Blasone, Silvio De Siena, Cristina Matrella

(参考訳) 3相振動ニュートリノ系で符号化された量子相関を平面波と波束の2つのアプローチを用いて特徴付ける。完全相補性関係を用いて、最近のニュートリノ実験から選択された関連するパラメータの観点から予測可能性、局所コヒーレンス、非局所相関のトレードオフを研究する。 CCRはバイパーティイト相関に関する貢献をよく説明しているが、純粋な状態の場合において真の三部会の貢献を含むようにこれらの関係を促進する試みは、完全には意味のない結果をもたらす。しかし,本研究では,ccrとは独立に,純粋なインスタンスと混合ケースの両方に対して,真の三成分寄与の分析を行う。

We characterize quantum correlations encoded in a three-flavor oscillating neutrino system by using both plane-wave and wave-packet approach. By means of the Complete Complementarity Relations we study the trade off of predictability, local coherence and non local correlations in terms of the relevant parameters, chosen from recent neutrino experiments. Although the CCR describe very well the contributions associated to bipartite correlations, an attempt of promoting these relations to include the genuine tripartite contributions in the pure state case leads to a not completely meaningful result. However, we provide an analysis of the genuine tripartite contributions both for the pure instance and for the mixed case, independently of CCR.

翻訳日:2024-01-03 02:30:18 公開日:2023-12-29

# WikiSQE: Wikipediaにおける文質評価のための大規模データセット

WikiSQE: A Large-Scale Dataset for Sentence Quality Estimation in Wikipedia ( http://arxiv.org/abs/2305.05928v2 )

ライセンス: Link先を確認

Kenichiro Ando, Satoshi Sekine, Mamoru Komachi

(参考訳) wikipediaは誰でも編集できるので、様々な質の高い文章が含まれている。そのためウィキペディアには質の悪い編集がいくつか含まれており、しばしば他の編集者によってマークアップされる。編集者のレビューはwikipediaの信頼性を高めるが、すべての編集されたテキストをチェックするのは難しい。このプロセスを支援することは非常に重要であるが、研究のための大規模で包括的なデータセットは現存していない。本稿では,ウィキペディアにおける文質推定のための大規模データセットであるWikiSQEを提案する。各文は、英語ウィキペディアの改訂履歴全体から抽出され、対象の品質ラベルを慎重に調査し、選択した。 WikiSQEには約3.4Mの文と153の品質ラベルがある。競合する機械学習モデルを用いた自動分類実験では、引用や構文・意味論、命題に問題がある文はより検出が難しいことが判明した。また,人的アノテーションによって開発したモデルの方が,クラウドソーシング作業者よりも優れていた。 WikiSQEはNLPの他のタスクにとって貴重なリソースであると期待されている。

Wikipedia can be edited by anyone and thus contains various quality sentences. Therefore, Wikipedia includes some poor-quality edits, which are often marked up by other editors. While editors' reviews enhance the credibility of Wikipedia, it is hard to check all edited text. Assisting in this process is very important, but a large and comprehensive dataset for studying it does not currently exist. Here, we propose WikiSQE, the first large-scale dataset for sentence quality estimation in Wikipedia. Each sentence is extracted from the entire revision history of English Wikipedia, and the target quality labels were carefully investigated and selected. WikiSQE has about 3.4 M sentences with 153 quality labels. In the experiment with automatic classification using competitive machine learning models, sentences that had problems with citation, syntax/semantics, or propositions were found to be more difficult to detect. In addition, by performing human annotation, we found that the model we developed performed better than the crowdsourced workers. WikiSQE is expected to be a valuable resource for other tasks in NLP.

翻訳日:2024-01-03 02:30:06 公開日:2023-12-29

# 局所的特徴量に基づく視覚定位のための制約付き近距離近傍

Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization ( http://arxiv.org/abs/2306.09012v3 )

ライセンス: Link先を確認

Dror Aiger, Andr\'e Araujo, Simon Lynen

(参考訳) 大規模なビジュアルローカライズシステムは、画像収集から構築された3dポイントクラウドに引き続き依存する。これらのモデルの3dポイントは局所的な画像特徴を用いて表現されるが、クエリ画像のローカル特徴とポイントクラウドとの直接マッチングは、最寄りの検索問題の規模のため困難である。視覚的ローカライゼーションに対する最近の多くのアプローチでは、まずグローバルな(画像ごとの)埋め込みを用いてデータベースイメージの小さなサブセットを検索し、クエリの局所的特徴をそれらに対してマッチングするハイブリッド手法が提案されている。各クエリイメージに2つの特徴型を計算しなければならないという大きな欠点があるにも関わらず、グローバルな埋め込みは、視覚的ローカライゼーションにおいてそのイメージ検索に不可欠である、という一般的な信念になったようだ。本稿では, この仮定から一歩引いて, 局所特徴のみを用いて, k-アネレスト近傍の連立解法であるConstrained Approximate Nearest Neighbors (CANN)を提案する。我々はまず,複数のメトリクスをまたいだk-nearest-neighbor検索の理論的基礎を導出し,CANNが視覚的ローカライゼーションをどのように改善するかを示す。公開ローカライズベンチマークを用いた実験により,本手法が最先端のグローバル特徴量ベース検索と局所特徴集約方式のアプローチを両立することを示した。さらに、これらのデータセットの特徴集約スキームよりも、インデックスとクエリ時間の両方で桁違いに高速である。コード: \url{https://github.com/google-research/google-research/tree/master/cann}

Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localization have thus proposed a hybrid method, where first a global (per image) embedding is used to retrieve a small subset of database images, and local features of the query are matched only against those. It seems to have become common belief that global embeddings are critical for said image-retrieval in visual localization, despite the significant downside of having to compute two feature types for each query image. In this paper, we take a step back from this assumption and propose Constrained Approximate Nearest Neighbors (CANN), a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features. We first derive the theoretical foundation for k-nearest-neighbor retrieval across multiple metrics and then showcase how CANN improves visual localization. Our experiments on public localization benchmarks demonstrate that our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes. Moreover, it is an order of magnitude faster in both index and query time than feature aggregation schemes for these datasets. Code: \url{https://github.com/google-research/google-research/tree/master/cann}

翻訳日:2024-01-03 02:23:46 公開日:2023-12-29

# 契約によるコンテキストロボットミッションの正しい構成設計

Correct-by-Construction Design of Contextual Robotic Missions Using Contracts ( http://arxiv.org/abs/2306.08144v3 )

ライセンス: Link先を確認

Piergiuseppe Mallozzi, Pierluigi Nuzzo, Nir Piterman, Gerardo Schneider, Patrizio Pelliccione

(参考訳) ロボットミッションを効果的に指定し実装することは、ロボットシステムのソフトウェア工学にいくつかの課題をもたらす。これらの課題は、現実の運用環境において、さまざまなアプリケーションシナリオや状況(コンテキストとしても知られる)を考慮して、ロボットのハイレベルなタスクを形式化し実行する必要があることに起因する。複数のコンテキストを明示的に記述した正確なミッション仕様を書くのは面倒でエラーを起こしやすい。さらに、コンテキストの数が増え、したがって仕様の複雑さが増すにつれて、コンストラクションの正しい実装(例えば、合成法を使って)を生成することができる。これらの問題に対処するための実行可能なアプローチは、ミッション仕様をより小さく管理可能なサブミッションに分解し、それぞれのサブミッションを特定のコンテキストに合わせて調整することである。しかしながら、この構成的アプローチは、ミッション全体の正しさを保証するために、独自の課題を導入する。本稿では,前提-保証契約を用いたコンテキストロボットミッションの特定と実装のための新しい構成フレームワークを提案する。ミッション仕様は階層的でモジュラーな方法で構成されており、各サブミッションを独立したロボットコントローラとして合成することができる。本稿では,事前定義された条件下での精度を確保しつつ,サブミッションコントローラ間の動的切り替えの問題に対処する。

Effectively specifying and implementing robotic missions poses a set of challenges to software engineering for robotic systems. These challenges stem from the need to formalize and execute a robot's high-level tasks while considering various application scenarios and conditions, also known as contexts, in real-world operational environments. Writing correct mission specifications that explicitly account for multiple contexts can be tedious and error-prone. Furthermore, as the number of contexts, and consequently the complexity of the specification, increases, generating a correct-by-construction implementation (e.g., by using synthesis methods) can become intractable. A viable approach to address these issues is to decompose the mission specification into smaller, manageable sub-missions, with each sub-mission tailored to a specific context. Nevertheless, this compositional approach introduces its own set of challenges in ensuring the overall mission's correctness. In this paper, we propose a novel compositional framework for specifying and implementing contextual robotic missions using assume-guarantee contracts. The mission specification is structured in a hierarchical and modular fashion, allowing for each sub-mission to be synthesized as an independent robot controller. We address the problem of dynamically switching between sub-mission controllers while ensuring correctness under predefined conditions.

翻訳日:2024-01-03 02:22:31 公開日:2023-12-29

# villandiffusion:拡散モデルのための統一バックドア攻撃フレームワーク

VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models ( http://arxiv.org/abs/2306.06874v5 )

ライセンス: Link先を確認

Sheng-Yen Chou, Pin-Yu Chen, Tsung-Yi Ho

(参考訳) 拡散モデル(dms)は、反復的ノイズ付加と雑音除去から可逆的破壊過程を学ぶ最先端の生成モデルである。これらは、テキストから画像への条件生成など、多くの生成AIアプリケーションのバックボーンである。しかし、最近の研究では、基本的な無条件DM(DDPMやDDIMなど)は、モデル入力における悪意ある埋め込みパターンによって引き起こされる出力操作攻撃であるバックドアインジェクションに弱いことが示されている。本稿では,dmsのバックドア解析の現在の範囲を拡大するための統一バックドアアタックフレームワーク(villandiffusion)を提案する。本フレームワークは, 主流の非条件および条件付きDM(デノジングベースおよびスコアベース)と, 総合評価のための各種トレーニングフリーサンプリングを対象とする。実験により,dm構成のバックドア解析を容易にするとともに,dmsに対するキャプションに基づくバックドア攻撃に対する新たな洞察を提供する。私たちのコードはgithubで入手できる。 \url{https://github.com/ibm/villandiffusion}

Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising. They are the backbone of many generative AI applications, such as text-to-image conditional generation. However, recent studies have shown that basic unconditional DMs (e.g., DDPM and DDIM) are vulnerable to backdoor injection, a type of output manipulation attack triggered by a maliciously embedded pattern at model input. This paper presents a unified backdoor attack framework (VillanDiffusion) to expand the current scope of backdoor analysis for DMs. Our framework covers mainstream unconditional and conditional DMs (denoising-based and score-based) and various training-free samplers for holistic evaluations. Experiments show that our unified framework facilitates the backdoor analysis of different DM configurations and provides new insights into caption-based backdoor attacks on DMs. Our code is available on GitHub: \url{https://github.com/IBM/villandiffusion}

翻訳日:2024-01-03 02:21:47 公開日:2023-12-29

# 高輝度LHCにおけるデータ圧縮のための地球モーバー距離の微分

Differentiable Earth Mover's Distance for Data Compression at the High-Luminosity LHC ( http://arxiv.org/abs/2306.04712v3 )

ライセンス: Link先を確認

Rohan Shenoy and Javier Duarte and Christian Herwig and James Hirschauer and Daniel Noonan and Maurizio Pierini and Nhan Tran and Cristina Mantilla Suarez

(参考訳) 地球移動器距離(Earth mover's distance、EMD)は画像認識と分類に有用な指標であるが、通常の実装は微分可能ではなく、勾配降下による他のアルゴリズムを訓練するための損失関数として使うには遅すぎる。本稿では,畳み込みニューラルネットワーク(CNN)を用いて,EMDの微分可能かつ高速な近似を学習し,計算集約型EMD実装の代替として使用できることを示す。この微分可能な近似を、cernの高輝度lhcにおけるデータ圧縮のためのautoencoder-inspired neural network(encoder nn)のトレーニングに適用する。このエンコーダNNの目標は、粒子検出器内のエネルギー蓄積の分布に関する情報を保存しながらデータを圧縮することである。 EMD CNNを用いて訓練したエンコーダNNの性能が平均二乗誤差に基づく損失関数付きトレーニングよりも優れていることを示す。

The Earth mover's distance (EMD) is a useful metric for image recognition and classification, but its usual implementations are not differentiable or too slow to be used as a loss function for training other algorithms via gradient descent. In this paper, we train a convolutional neural network (CNN) to learn a differentiable, fast approximation of the EMD and demonstrate that it can be used as a substitute for computing-intensive EMD implementations. We apply this differentiable approximation in the training of an autoencoder-inspired neural network (encoder NN) for data compression at the high-luminosity LHC at CERN. The goal of this encoder NN is to compress the data while preserving the information related to the distribution of energy deposits in particle detectors. We demonstrate that the performance of our encoder NN trained using the differentiable EMD CNN surpasses that of training with loss functions based on mean squared error.

翻訳日:2024-01-03 02:20:20 公開日:2023-12-29

# cert: 濃度推定のレンズを通してデータベースシステムの性能問題を見つける

CERT: Finding Performance Issues in Database Systems Through the Lens of Cardinality Estimation ( http://arxiv.org/abs/2306.00355v2 )

ライセンス: Link先を確認

Jinsheng Ba, Manuel Rigger

(参考訳) データベース管理システム(DBMS)は、クエリ計画を作成して所定のクエリを処理し、その後実行し、クエリの結果を計算する。効率的なクエリ計画の導出は困難であり、学術と産業の両方がクエリ最適化の研究に何十年も費やしている。しかし、DBMSはパフォーマンス上の問題になりがちで、DBMSは予期しないほど非効率なクエリプランを生成し、クエリの実行が遅くなる可能性がある。このような問題を見つけることは長年の問題であり、期待される実行時間に関する根拠となる真理情報は存在しないため、本質的に困難である。本研究では,濃度推定のレンズを通して性能問題を見つける新しい手法である濃度推定制限テスト(cert)を提案する。データベース上のクエリが与えられた場合、CERTはより制限的なクエリ(例えば、LEFT JOINをINNER JOINに置き換えるなど)を導出する。 CERTテストでは,クエリ最適化の最も重要な部分であることが示され,このような問題の発見と修正が最高のパフォーマンス向上をもたらすことを期待している。さらに、他の種類のクエリ最適化問題は、CERTでも見られる予期せぬ推定基準によって明らかにできることがわかった。 CERTはソースコードへのアクセスを必要としないブラックボックス技術であり、DBMSはEXPLAINステートメントを通じてクエリプランを公開する。 certはクエリの実行を回避し、コストがかかり、パフォーマンスの変動が発生しやすい。 CERTを広く使われている3つの成熟DBMS、MySQL、TiDB、CockroachDBで評価した。 CERTは13のユニークな問題を発見し、そのうち2つは修正され、9つは開発者によって確認された。私たちはDBMS開発者がDBMBSのパフォーマンスを改善するのに役立つパフォーマンスバグを見つけるための新しいアングルを期待しています。

Database Management Systems (DBMSs) process a given query by creating a query plan, which is subsequently executed, to compute the query's result. Deriving an efficient query plan is challenging, and both academia and industry have invested decades into researching query optimization. Despite this, DBMSs are prone to performance issues, where a DBMS produces an unexpectedly inefficient query plan that might lead to the slow execution of a query. Finding such issues is a longstanding problem and inherently difficult, because no ground truth information on an expected execution time exists. In this work, we propose Cardinality Estimation Restriction Testing (CERT), a novel technique that finds performance issues through the lens of cardinality estimation. Given a query on a database, CERT derives a more restrictive query (e.g., by replacing a LEFT JOIN with an INNER JOIN), whose estimated number of rows should not exceed the estimated number of rows for the original query. CERT tests cardinality estimation specifically, because they were shown to be the most important part for query optimization; thus, we expect that finding and fixing such issues might result in the highest performance gains. In addition, we found that other kinds of query optimization issues can be exposed by unexpected estimated cardinalities, which can also be found by CERT. CERT is a black-box technique that does not require access to the source code; DBMSs expose query plans via the EXPLAIN statement. CERT eschews executing queries, which is costly and prone to performance fluctuations. We evaluated CERT on three widely used and mature DBMSs, MySQL, TiDB, and CockroachDB. CERT found 13 unique issues, of which 2 issues were fixed and 9 confirmed by the developers. We expect that this new angle on finding performance bugs will help DBMS developers in improving DMBSs' performance.

翻訳日:2024-01-03 02:20:05 公開日:2023-12-29

# botartist: twitterのサスペンションに基づくtwitterボット検出機械学習モデル

BotArtist: Twitter bot detection Machine Learning model based on Twitter suspension ( http://arxiv.org/abs/2306.00037v3 )

ライセンス: Link先を確認

Alexander Shevtsov, Despoina Antonakaki, Ioannis Lamprou, Polyvios Pratikakis, Sotiris Ioannidis

(参考訳) Twitterは最も人気のあるソーシャルネットワークの1つで、コミュニケーションとオンライン会話のための手段を提供しているが、残念ながらボットや偽アカウントのターゲットであり、偽情報の操作と拡散につながっている。この目的に向けて、我々は、最近のロシア・ウクライナ戦争に関する900万人のユーザーから生まれた、Twitter上での難解で多言語的なソーシャル談話データセットを収集し、ボットアカウントとそれらに関わる会話を検出する。 Twitter APIの停止アカウントコレクションには,約343Kのボットアカウントと8Mの一般ユーザが含まれています。さらに、Botometer-V3が提供するデータセットには、1,777のVarol、483のドイツアカウント、1,321の米国アカウントがあります。公開データセットの他に、2022年のエネルギー危機と2022年の陰謀に関する一般的な議論に関する2つの独立したデータセットも収集しています。どちらのデータセットも、twitterのサスペンションメカニズムに従ってラベル付けされた。我々は最先端のXGBoostモデルを用いたボット検出のための新しいMLモデルを構築した。 Twitterのサスペンションメカニズムの真実に則って、このモデルを大量のラベル付きツイートと組み合わせています。これは、Twitter APIとは独立しているため、コレクションから異なる期間でデータセットのラベル付けを可能にする、限定的なプロファイル機能を必要とする。ボットメーターと比較すると,本手法は2つの実例のシナリオデータセットよりも平均11%高いroc-aucスコアが得られる。

Twitter as one of the most popular social networks, offers a means for communication and online discourse, which unfortunately has been the target of bots and fake accounts, leading to the manipulation and spreading of false information. Towards this end, we gather a challenging, multilingual dataset of social discourse on Twitter, originating from 9M users regarding the recent Russo-Ukrainian war, in order to detect the bot accounts and the conversation involving them. We collect the ground truth for our dataset through the Twitter API suspended accounts collection, containing approximately 343K of bot accounts and 8M of normal users. Additionally, we use a dataset provided by Botometer-V3 with 1,777 Varol, 483 German accounts, and 1,321 US accounts. Besides the publicly available datasets, we also manage to collect 2 independent datasets around popular discussion topics of the 2022 energy crisis and the 2022 conspiracy discussions. Both of the datasets were labeled according to the Twitter suspension mechanism. We build a novel ML model for bot detection using the state-of-the-art XGBoost model. We combine the model with a high volume of labeled tweets according to the Twitter suspension mechanism ground truth. This requires a limited set of profile features allowing labeling of the dataset in different time periods from the collection, as it is independent of the Twitter API. In comparison with Botometer our methodology achieves an average 11% higher ROC-AUC score over two real-case scenario datasets.

翻訳日:2024-01-03 02:19:08 公開日:2023-12-29

# 合成画像のオープンセットアーキテクチャ属性に対するシームズによる検証システム

A Siamese-based Verification System for Open-set Architecture Attribution of Synthetic Images ( http://arxiv.org/abs/2307.09822v2 )

ライセンス: Link先を確認

Lydia Abady, Jun Wang, Benedetta Tondi, Mauro Barni

(参考訳) 合成画像属性のための様々な手法が開発されているが、そのほとんどはトレーニングセットに含まれるモデルやアーキテクチャによって生成された画像のみを属性とすることができ、未知のアーキテクチャでは動作せず、現実のシナリオにおける適用性を妨げている。本稿では,合成画像から生成したアーキテクチャへのオープンセット帰属問題に対処するために,シャムネットワークを利用する検証フレームワークを提案する。私たちは2つの異なる設定を考えます。最初の設定では、2つの画像が同じ生成アーキテクチャで作成されたか否かを判定する。第2設定では、システムは、クレームアーキテクチャによって生成された1つまたは複数の参照画像を利用して、合成画像を生成するために使用されるアーキテクチャに関するクレームを検証する。提案システムの主な強みは、クローズドシナリオとオープンセットシナリオの両方で動作可能であり、入力画像(クエリ画像と参照画像の両方)が、トレーニング中に考慮されたアーキテクチャに属することができることである。 gan,拡散モデル,トランスフォーマなどの様々な生成アーキテクチャを包含する実験評価では,合成顔画像生成に着目し,クローズド設定とオープンセット設定の両方において優れた性能と強力な一般化性能を確認した。

Despite the wide variety of methods developed for synthetic image attribution, most of them can only attribute images generated by models or architectures included in the training set and do not work with unknown architectures, hindering their applicability in real-world scenarios. In this paper, we propose a verification framework that relies on a Siamese Network to address the problem of open-set attribution of synthetic images to the architecture that generated them. We consider two different settings. In the first setting, the system determines whether two images have been produced by the same generative architecture or not. In the second setting, the system verifies a claim about the architecture used to generate a synthetic image, utilizing one or multiple reference images generated by the claimed architecture. The main strength of the proposed system is its ability to operate in both closed and open-set scenarios so that the input images, either the query and reference images, can belong to the architectures considered during training or not. Experimental evaluations encompassing various generative architectures such as GANs, diffusion models, and transformers, focusing on synthetic face image generation, confirm the excellent performance of our method in both closed and open-set settings, as well as its strong generalization capabilities.

翻訳日:2024-01-03 02:11:13 公開日:2023-12-29

# Vision-Language Modelsは良いゲストになれるか? 時間と位置推論のためのVLMの探索

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning ( http://arxiv.org/abs/2307.06166v2 )

ライセンス: Link先を確認

Gengyuan Zhang, Yurui Zhang, Kerui Zhang, Volker Tresp

(参考訳) 視覚言語モデル(vlms)は、常識的な知識を人間として推論できると期待されている。一つの例は、人間が知識に基づいて画像がどこでいつ撮影されるのかを判断できるということである。これは、視覚的な手がかりに基づいて、大規模な画像テキストリソースで事前訓練された視覚言語モデルが、推論時間と位置において人間の能力を上回ることができるかどうかを疑問視させる。そこで本研究では、VLMが時間や位置関連の特徴を認識できるかどうかを識別・生成するVLMに応用した2段階の認識空間探索タスクを提案する。この調査を容易にするために,リッチな社会文化的な手がかりで画像を合成する画像データセットWikiTiLoを紹介した。広範にわたる実験的研究において、VLMは視覚エンコーダの関連性を効果的に維持できるが、完全な推論ができないことが判明した。将来の研究を促進するために、データセットとコードをリリースします。

Vision-Language Models (VLMs) are expected to be capable of reasoning with commonsense knowledge as human beings. One example is that humans can reason where and when an image is taken based on their knowledge. This makes us wonder if, based on visual cues, Vision-Language Models that are pre-trained with large-scale image-text resources can achieve and even outperform human's capability in reasoning times and location. To address this question, we propose a two-stage \recognition\space and \reasoning\space probing task, applied to discriminative and generative VLMs to uncover whether VLMs can recognize times and location-relevant features and further reason about it. To facilitate the investigation, we introduce WikiTiLo, a well-curated image dataset compromising images with rich socio-cultural cues. In the extensive experimental studies, we find that although VLMs can effectively retain relevant features in visual encoders, they still fail to make perfect reasoning. We will release our dataset and codes to facilitate future studies.

翻訳日:2024-01-03 02:10:01 公開日:2023-12-29

# 大規模言語モデルの評価に関する調査

A Survey on Evaluation of Large Language Models ( http://arxiv.org/abs/2307.03109v9 )

ライセンス: Link先を確認

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, Xing Xie

(参考訳) 大規模言語モデル(LLM)は、様々なアプリケーションにおける前例のない性能のため、学術と産業の両方で人気が高まっている。 LLMは研究と日常利用の両方において重要な役割を担い続けており、その評価はタスクレベルだけでなく社会レベルでもますます重要になり、潜在的なリスクの理解を深めている。過去数年間、様々な観点からLSMを調べるための重要な努力が続けられてきた。本稿では, これらのLCMの評価手法を総合的に検討し, 評価方法, 評価方法, 評価方法の3つの重要な側面に着目した。まず,一般的な自然言語処理タスク,推論,医療利用,倫理,教育,自然科学,社会科学,エージェント応用など,評価タスクの観点から概観する。第2に,LLMの性能評価において重要な要素である評価手法とベンチマークに飛び乗ることで,'where' と 'how' の質問に答える。次に、異なるタスクにおけるLCMの成功事例と失敗事例を要約する。最後に、llms評価の先にあるいくつかの将来の課題に光を当てた。我々の目的は、LLMの評価の領域における研究者に貴重な洞察を提供することであり、それによってより熟練したLLMの開発を支援することである。我々のキーポイントは、LCMの開発を支援するために、評価を必須の規律として扱うべきであるということです。関連したオープンソース資料は、https://github.com/mlgroupjlu/llm-eval-surveyで一貫して保守しています。

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, educations, natural and social sciences, agent applications, and other areas. Secondly, we answer the `where' and `how' questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey.

翻訳日:2024-01-03 02:09:41 公開日:2023-12-29

# 核多体系における多体絡み合いと情報再構成

Multi-Body Entanglement and Information Rearrangement in Nuclear Many-Body Systems ( http://arxiv.org/abs/2306.16535v2 )

ライセンス: Link先を確認

S. Momme Hengstenberg, Caroline E. P. Robin, Martin J. Savage

(参考訳) 核多体系の有効モデル空間(EMS)計算について検討し,多粒子エンタングルメントの収束について検討した。一般化リプキン・メシュコフ・グリク(lmg)モデルは、核の絡み合い駆動記述の将来の発展の動機付けと洞察を提供するために用いられる。効果的なアプローチはヒルベルト空間の切り離しと、関連する基本自由度を構成するクォービット(スピン)の変分回転に基づいている。回転と切り離しの非可換性により、モデル空間の大部分でエネルギー収束が指数関数的に改善される。本分析では, 相関と絡み合いの測定を行い, その収束度をカットオフの増加とともに定量化する。マルチボディの絡み合いを推定するために, 1 および 2 スピンの絡み合いエントロピー,相互情報,および $n$-tangles に焦点を当てた。実効的な記述は回転したスピンのエントロピーや相互情報を強く抑制し、低いカットオフで正確な結果を広範囲に回収することができる。一方、素ハミルトニアンのネーブ・トランケーションは、これらの測度を人工的に過小評価する。本モデルにおけるn$-tangles は、n$-particle の絡み合いの基底独立測度を提供する。 EMSの記述ではこれらを捉えるのが難しいが、最小のハミルトニアンのトランケーションに比べて収束の改善は著しく劇的である。低エネルギーems法は多体系における低次オブザーバブルの予測能力を提供し、lmgモデルにおける量子相関や多体絡み合いの類似性を示し、核多体系や高エネルギー物理学や核物理学に関連する実効場理論の研究を動機付けるものであると結論づける。

We examine how effective-model-space (EMS) calculations of nuclear many-body systems rearrange and converge multi-particle entanglement. The generalized Lipkin-Meshkov-Glick (LMG) model is used to motivate and provide insight for future developments of entanglement-driven descriptions of nuclei. The effective approach is based on a truncation of the Hilbert space together with a variational rotation of the qubits (spins), which constitute the relevant elementary degrees of freedom. The non-commutivity of the rotation and truncation allows for an exponential improvement of the energy convergence throughout much of the model space. Our analysis examines measures of correlations and entanglement, and quantifies their convergence with increasing cut-off. We focus on one- and two-spin entanglement entropies, mutual information, and $n$-tangles for $n=2,4$ to estimate multi-body entanglement. The effective description strongly suppresses entropies and mutual information of the rotated spins, while being able to recover the exact results to a large extent with low cut-offs. Naive truncations of the bare Hamiltonian, on the other hand, artificially underestimate these measures. The $n$-tangles in the present model provide a basis-independent measures of $n$-particle entanglement. While these are more difficult to capture with the EMS description, the improvement in convergence, compared to truncations of the bare Hamiltonian, is significantly more dramatic. We conclude that the low-energy EMS techniques, that successfully provide predictive capabilities for low-lying observables in many-body systems, exhibit analogous efficacy for quantum correlations and multi-body entanglement in the LMG model, motivating future studies in nuclear many-body systems and effective field theories relevant to high-energy physics and nuclear physics.

翻訳日:2024-01-03 02:08:51 公開日:2023-12-29

# NeuroCLIP: CLIP と SNNによるニューロモルフィックデータ理解

NeuroCLIP: Neuromorphic Data Understanding by CLIP and SNN ( http://arxiv.org/abs/2306.12073v2 )

ライセンス: Link先を確認

Yufei Guo and Yuanpei Chen and Zhe Ma

(参考訳) 近年,ニューロモルフィック視覚センサが注目されている。 However, the neuromorphic data consists of asynchronous event spikes, which makes it difficult to construct a big benchmark to train a power general neural network model, thus limiting the neuromorphic data understanding for ``unseen" objects by deep learning. While for the frame image, since the training data can be obtained easily, the zero-shot and few-shot learning for ``unseen" task via the large Contrastive Vision-Language Pre-training (CLIP) model, which is pre-trained by large-scale image-text pairs in 2D, have shown inspirational performance. 問題に対処するために,CLIPをニューロモルフィックなデータ認識に移行できるのではないかと考えた。この目的のために、論文ではこのアイデアをneuroclipで実現している。 NeuroCLIPは2D CLIPとニューロモルフィックデータ理解のための2つの特別に設計されたモジュールで構成されている。まず、イベントスパイクを単純な識別戦略でシーケンシャルなフレームイメージに変換するイベントフレームモジュール。第2に、CLIPのビジュアルエンコーダから来る逐次的機能に対して、スパイキングニューラルネットワーク(SNN)をベースとしたシンプルな微調整アダプタである、タイムステップ間アダプタにより、数ショットのパフォーマンスが向上する。 N-MNIST、CIFAR10-DVS、ES-ImageNetなどのニューロモルフィックデータセットに関する様々な実験は、NeuroCLIPの有効性を実証している。私たちのコードはhttps://github.com/yfguo91/neuroclip.gitでオープンソースです。

Recently, the neuromorphic vision sensor has received more and more interest. However, the neuromorphic data consists of asynchronous event spikes, which makes it difficult to construct a big benchmark to train a power general neural network model, thus limiting the neuromorphic data understanding for ``unseen" objects by deep learning. While for the frame image, since the training data can be obtained easily, the zero-shot and few-shot learning for ``unseen" task via the large Contrastive Vision-Language Pre-training (CLIP) model, which is pre-trained by large-scale image-text pairs in 2D, have shown inspirational performance. We wonder whether the CLIP could be transferred to neuromorphic data recognition to handle the ``unseen" problem. To this end, we materialize this idea with NeuroCLIP in the paper. The NeuroCLIP consists of 2D CLIP and two specially designed modules for neuromorphic data understanding. First, an event-frame module that could convert the event spikes to the sequential frame image with a simple discrimination strategy. Second, an inter-timestep adapter, which is a simple fine-tuned adapter based on a spiking neural network (SNN) for the sequential features coming from the visual encoder of CLIP to improve the few-shot performance. Various experiments on neuromorphic datasets including N-MNIST, CIFAR10-DVS, and ES-ImageNet demonstrate the effectiveness of NeuroCLIP. Our code is open-sourced at https://github.com/yfguo91/NeuroCLIP.git.

翻訳日:2024-01-03 02:06:17 公開日:2023-12-29

# テキスト認識のための自己蒸留正規化コネクショニスト時間的分類損失:単純かつ効果的なアプローチ

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach ( http://arxiv.org/abs/2308.08806v4 )

ライセンス: Link先を確認

Ziyin Zhang, Ning Lu, Minghui Liao, Yongshuai Huang, Cheng Li, Min Wang and Wei Peng

(参考訳) テキスト認識手法は急速に発展しつつある。強力なモジュール、言語モデル、un-および半教師なしの学習スキームなど、いくつかの高度なテクニックは、公開ベンチマークのパフォーマンスを継続的に押し上げる。しかし、損失関数の観点から、テキスト認識モデルをいかに最適化するかという問題は概ね見過ごされている。 CTCに基づく手法は、性能と推論速度のバランスが良く、精度の低下に苦慮しているため、実際に広く用いられている。 CTC損失は、個々の文字を学習することを無視しながら、シーケンスターゲット全体の最適化を強調するためである。本稿では,CTCモデルを用いた自己蒸留方式を提案する。フレームワイズ正規化項をctc損失に取り入れ、個々の監督を強調し、潜在アライメントの最大化後アライメントを活用し、ctcベースのモデル間の蒸留で生じる不整合問題を解決する。正規化ctc損失を蒸留接続主義時間的分類 (dctc) 損失と呼ぶ。 DCTCの損失はモジュールフリーで、余分なパラメータや推論遅延、追加のトレーニングデータやフェーズを必要としない。公開ベンチマークの大規模な実験は、DCTCがこれらの欠点を全くなく、テキスト認識モデルの精度を最大2.6%向上させることができることを示した。

Text recognition methods are gaining rapid development. Some advanced techniques, e.g., powerful modules, language models, and un- and semi-supervised learning schemes, consecutively push the performance on public benchmarks forward. However, the problem of how to better optimize a text recognition model from the perspective of loss functions is largely overlooked. CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with accuracy degradation. This is because CTC loss emphasizes the optimization of the entire sequence target while neglecting to learn individual characters. We propose a self-distillation scheme for CTC-based model to address this issue. It incorporates a framewise regularization term in CTC loss to emphasize individual supervision, and leverages the maximizing-a-posteriori of latent alignment to solve the inconsistency problem that arises in distillation between CTC-based models. We refer to the regularized CTC loss as Distillation Connectionist Temporal Classification (DCTC) loss. DCTC loss is module-free, requiring no extra parameters, longer inference lag, or additional training data or phases. Extensive experiments on public benchmarks demonstrate that DCTC can boost text recognition model accuracy by up to 2.6%, without any of these drawbacks.

翻訳日:2024-01-03 01:59:26 公開日:2023-12-29

# 量子計測理論における正準占有状態(マクロ)のエントロピー

Entropy of the Canonical Occupancy (Macro) State in the Quantum Measurement Theory ( http://arxiv.org/abs/2308.04472v6 )

ライセンス: Link先を確認

Arnaldo Spalvieri

(参考訳) 本稿では,任意の数の非相互作用ボソンからなる平衡状態における占有数とエントロピーの確率分布を解析した。確率分布は、環境統合と関心システム(経験的アプローチ)のボソニック固有状態から環境をトレースすることと、環境統合と関心システム(ベイズ的アプローチ)の混合状態から環境をトレースすることの両方によって導かれる。熱力学的極限では、この2つは一致し、多項分布に等しい。さらに, 熱力学的エントロピーの古典的解析において生じる矛盾を解消し, ボゾン系の物理的エントロピーと占有数のシャノンエントロピーを同定することを提案する。最後に、多項分布のエントロピーと多変量超幾何分布のエントロピーとの情報理論的不等式を利用して、ベイジアン主義と経験主義を共通の「情報力学」の枠組みに統合する。

The paper analyzes the probability distribution of the occupancy numbers and the entropy of a system at the equilibrium composed by an arbitrary number of non-interacting bosons. The probability distribution is derived both by tracing out the environment from a bosonic eigenstate of the union of environment and system of interest (the empirical approach) and by tracing out the environment from the mixed state of the union of environment and system of interest (the Bayesian approach). In the thermodynamic limit, the two coincide and are equal to the multinomial distribution. Furthermore, the paper proposes to identify the physical entropy of the bosonic system with the Shannon entropy of the occupancy numbers, fixing certain contradictions that arise in the classical analysis of thermodynamic entropy. Finally, by leveraging an information-theoretic inequality between the entropy of the multinomial distribution and the entropy of the multivariate hypergeometric distribution, Bayesianism and empiricism are integrated into a common ''infomechanical'' framework.

翻訳日:2024-01-03 01:57:10 公開日:2023-12-29

# 基本となるパターンを明らかにする:データセットの類似性、パフォーマンス、一般化

Revealing the Underlying Patterns: Investigating Dataset Similarity, Performance, and Generalization ( http://arxiv.org/abs/2308.03580v3 )

ライセンス: Link先を確認

Akshit Achara, Ram Krishna Pandey

(参考訳) 教師付きディープラーニングモデルは、特定のタスクで許容可能なパフォーマンスを達成するために、大量のラベル付きデータを必要とする。しかし、見当たらないデータでテストすると、そのモデルはうまく機能しないかもしれない。したがって、モデルは一般化を改善するために追加および様々なラベル付きデータで訓練される必要がある。本研究の目的は,モデルとその性能,一般化を理解することである。モデル動作に関する洞察を得るために、画像イメージ、データセット、画像データセット距離を確立する。提案する距離メトリクスとモデル性能を組み合わせることで,候補アーキテクチャのプールから適切なモデル/アーキテクチャを選択することができる。これらのモデルの一般化は,少数の未確認画像(例えば,1,3,7)をトレーニングセットに追加するだけで改善できることを示した。提案手法は、動的環境における未知のデータに対するモデル性能の推定を行い、トレーニングとアノテーションのコストを削減する。

Supervised deep learning models require significant amount of labeled data to achieve an acceptable performance on a specific task. However, when tested on unseen data, the models may not perform well. Therefore, the models need to be trained with additional and varying labeled data to improve the generalization. In this work, our goal is to understand the models, their performance and generalization. We establish image-image, dataset-dataset, and image-dataset distances to gain insights into the model's behavior. Our proposed distance metric when combined with model performance can help in selecting an appropriate model/architecture from a pool of candidate architectures. We have shown that the generalization of these models can be improved by only adding a small number of unseen images (say 1, 3 or 7) into the training set. Our proposed approach reduces training and annotation costs while providing an estimate of model performance on unseen data in dynamic environments.

翻訳日:2024-01-03 01:56:15 公開日:2023-12-29

# スレート政策の迅速な最適化 - plackett-luceを超越

Fast Slate Policy Optimization: Going Beyond Plackett-Luce ( http://arxiv.org/abs/2308.01566v2 )

ライセンス: Link先を確認

Otmane Sakhi, David Rohde, Nicolas Chopin

(参考訳) 大規模機械学習システムのますます重要になっているビルディングブロックは、スレートを返すことに基づいている。この技術には、検索、情報検索、推薦システムが含まれる。アクションスペースが大きい場合には、決定システムは特定の構造に制限され、オンラインクエリを迅速に完了する。本稿では,任意の報酬関数を与えられた大規模意思決定システムの最適化について述べる。我々は,この学習問題を政策最適化フレームワークにキャストし,決定関数の新たな緩和から生まれた新しい種類の政策を提案する。これにより、巨大なアクション空間にスケールする単純で効率的な学習アルゴリズムが実現される。提案手法を一般に採用されているPlanet-Luceポリシークラスと比較し,数百万のアクション空間サイズの問題に対するアプローチの有効性を示す。

An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions.

翻訳日:2024-01-03 01:55:28 公開日:2023-12-29

# 生成AIのセキュリティリスクの特定と修正

Identifying and Mitigating the Security Risks of Generative AI ( http://arxiv.org/abs/2308.14840v4 )

ライセンス: Link先を確認

Clark Barrett, Brad Boyd, Elie Burzstein, Nicholas Carlini, Brad Chen, Jihye Choi, Amrita Roy Chowdhury, Mihai Christodorescu, Anupam Datta, Soheil Feizi, Kathleen Fisher, Tatsunori Hashimoto, Dan Hendrycks, Somesh Jha, Daniel Kang, Florian Kerschbaum, Eric Mitchell, John Mitchell, Zulfikar Ramzan, Khawaja Shams, Dawn Song, Ankur Taly, Diyi Yang

(参考訳) あらゆる主要な技術発明が両用ジレンマを再浮上させ、新しい技術は善と害に使える可能性がある。大規模言語モデル(LLM)や拡散モデルのようなジェネレーティブAI(GenAI)技術は、顕著な能力(例えば、テキスト内学習、コード補完、テキストから画像への生成と編集)を示している。しかし、GenAIは攻撃者も同様に新しい攻撃を発生させ、既存の攻撃の速度と効果を高めるために使うことができる。本稿は、Google(スタンフォード大学とウィスコンシン大学マディソン校が共同で開催した、GenAIによる二重使用ジレンマに関するワークショップの成果を報告する。本論文は包括的ではなく,ワークショップで得られた興味深い知見のいくつかを合成する試みである。この話題について,コミュニティの短期的,長期的目標について論じる。この論文は、この重要なトピックに関する議論の出発点と、研究コミュニティが取り組むべき興味深い問題の両方を提供することを期待している。

Every major technical invention resurfaces the dual-use dilemma -- the new technology has the potential to be used for good as well as for harm. Generative AI (GenAI) techniques, such as large language models (LLMs) and diffusion models, have shown remarkable capabilities (e.g., in-context learning, code-completion, and text-to-image generation and editing). However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks. This paper reports the findings of a workshop held at Google (co-organized by Stanford University and the University of Wisconsin-Madison) on the dual-use dilemma posed by GenAI. This paper is not meant to be comprehensive, but is rather an attempt to synthesize some of the interesting findings from the workshop. We discuss short-term and long-term goals for the community on this topic. We hope this paper provides both a launching point for a discussion on this important topic as well as interesting problems that the research community can work to address.

翻訳日:2024-01-03 01:44:21 公開日:2023-12-29

# CNN分類性能向上のための多段階特徴デコレーション制約

Multi-stage feature decorrelation constraints for improving CNN classification performance ( http://arxiv.org/abs/2308.12880v2 )

ライセンス: Link先を確認

Qiuyu Zhu and Hao Wang and Xuewen Zu and Chengfei Liu

(参考訳) パターン分類に使用される畳み込みニューラルネットワーク(CNN)では、トレーニング損失関数は通常、ネットワークパラメータの正規化制約を除いて、ネットワークの最終出力に適用される。しかし,ネットワーク層数の増加に伴い,ネットワークフロント層に対する損失関数の影響は徐々に減少し,ネットワークパラメータは局所的に最適化される傾向にある。同時に、訓練されたネットワークは特徴のすべての段階で重要な情報冗長性を有しており、全ての段階における特徴マッピングの有効性を低減し、最適方向におけるネットワークのその後のパラメータの変化には影響しないことがわかった。したがって、前段特徴を抑える損失関数を設計し、前段特徴の情報冗長性を排除し、ネットワークのより最適化されたソリューションを得ることができ、さらにネットワークの分類精度を向上させることができる。本稿は,CNNにおいて,有効特徴を洗練し,全ての段階における特徴の相関性を制限することで情報冗長性を解消する多段階的特徴相関損失(MFD Loss)を提案する。 cnnには多数の層があり、実験的比較と分析を通じて、mfd損失はcnnの複数の前面層に作用し、各層と各チャネルの出力特性を制約し、ネットワークトレーニング中に分類損失機能と共同で監督訓練を行う。単一のSoftmax Loss教師付き学習と比較して、いくつかの典型的なCNNでよく使われるデータセットの実験は、Softmax Loss+MFD Lossの分類性能が著しく優れていることを証明している。一方、MFDロスと他の典型的な損失関数の組み合わせ前後の比較実験は、そのよい普遍性を検証する。

For the convolutional neural network (CNN) used for pattern classification, the training loss function is usually applied to the final output of the network, except for some regularization constraints on the network parameters. However, with the increasing of the number of network layers, the influence of the loss function on the network front layers gradually decreases, and the network parameters tend to fall into local optimization. At the same time, it is found that the trained network has significant information redundancy at all stages of features, which reduces the effectiveness of feature mapping at all stages and is not conducive to the change of the subsequent parameters of the network in the direction of optimality. Therefore, it is possible to obtain a more optimized solution of the network and further improve the classification accuracy of the network by designing a loss function for restraining the front stage features and eliminating the information redundancy of the front stage features .For CNN, this article proposes a multi-stage feature decorrelation loss (MFD Loss), which refines effective features and eliminates information redundancy by constraining the correlation of features at all stages. Considering that there are many layers in CNN, through experimental comparison and analysis, MFD Loss acts on multiple front layers of CNN, constrains the output features of each layer and each channel, and performs supervision training jointly with classification loss function during network training. Compared with the single Softmax Loss supervised learning, the experiments on several commonly used datasets on several typical CNNs prove that the classification performance of Softmax Loss+MFD Loss is significantly better. Meanwhile, the comparison experiments before and after the combination of MFD Loss and some other typical loss functions verify its good universality.

翻訳日:2024-01-03 01:43:37 公開日:2023-12-29

# 生涯多エージェント経路探索のための交通流最適化

Traffic Flow Optimisation for Lifelong Multi-Agent Path Finding ( http://arxiv.org/abs/2308.11234v4 )

ライセンス: Link先を確認

Zhe Chen, Daniel Harabor, Jiaoyang Li, Peter J. Stuckey

(参考訳) Multi-Agent Path Finding (MAPF)は、ロボット工学の基本的問題であり、エージェントのチームが衝突のない経路を計算し、全員が共有マップを横切るように要求する。この話題には多くの研究があるが、エージェントの数が増えるにつれて、現在のアルゴリズムはすべて苦労している。主な理由は、既存のアプローチが通常、渋滞を引き起こす自由フロー最適経路を計画しているからである。この問題に取り組むため,我々は,エージェントが混雑回避経路をたどって目的地へ誘導する新しい手法を提案する。各エージェントがひとつの宛先を持つワンショットMAPFと,エージェントが常に新しいタスクを割り当てる生涯MAPFの2つの大規模設定でこのアイデアを評価する。 MAPFの場合、我々のアプローチはソリューションの品質を大幅に改善する。 Lifelong MAPF ではスループットに大きな改善が報告されている。

Multi-Agent Path Finding (MAPF) is a fundamental problem in robotics that asks us to compute collision-free paths for a team of agents, all moving across a shared map. Although many works appear on this topic, all current algorithms struggle as the number of agents grows. The principal reason is that existing approaches typically plan free-flow optimal paths, which creates congestion. To tackle this issue we propose a new approach for MAPF where agents are guided to their destination by following congestion-avoiding paths. We evaluate the idea in two large-scale settings: one-shot MAPF, where each agent has a single destination, and lifelong MAPF, where agents are continuously assigned new tasks. For one-shot MAPF we show that our approach substantially improves solution quality. For Lifelong MAPF we report large improvements in overall throughput.

翻訳日:2024-01-03 01:42:47 公開日:2023-12-29

# 現代の非参照画像とビデオ品質メトリクスの堅牢性と敵攻撃の比較

Comparing the robustness of modern no-reference image- and video-quality metrics to adversarial attacks ( http://arxiv.org/abs/2310.06958v3 )

ライセンス: Link先を確認

Anastasia Antsiferova, Khaled Abud, Aleksandr Gushchin, Ekaterina Shumitskaya, Sergey Lavrushkin, Dmitriy Vatolin

(参考訳) 現在、ニューラルネットワークベースの画像およびビデオ品質指標は、従来の方法よりも優れたパフォーマンスを示している。しかし、視覚的品質を改善することなくメトリクスのスコアを上げる敵攻撃にもより脆弱になった。既存の品質指標のベンチマークは、主観的品質と計算時間との相関の観点からパフォーマンスを比較する。しかし、画像品質指標の敵対的ロバスト性も研究に値する分野である。本稿では,異なる敵攻撃に対する現代のメトリクスの堅牢性を分析する。コンピュータビジョンタスクからの敵意攻撃を適用し,15の非参照画像/ビデオ品質指標に対する攻撃の効率性を比較した。いくつかのメトリクスは、脆弱なメトリクスよりも安全なベンチマークでの使用を可能にする敵攻撃に対する高い抵抗を示した。このベンチマークは、攻撃に対してメトリクスをより堅牢にしたい研究者や、必要に応じてそのようなメトリクスを見つけたい研究者のために、新しいメトリクスの提出を受け入れる。 pip install robustness-benchmarkを使ってベンチマークを試してみよう。

Nowadays neural-network-based image- and video-quality metrics show better performance compared to traditional methods. However, they also became more vulnerable to adversarial attacks that increase metrics' scores without improving visual quality. The existing benchmarks of quality metrics compare their performance in terms of correlation with subjective quality and calculation time. However, the adversarial robustness of image-quality metrics is also an area worth researching. In this paper, we analyse modern metrics' robustness to different adversarial attacks. We adopted adversarial attacks from computer vision tasks and compared attacks' efficiency against 15 no-reference image/video-quality metrics. Some metrics showed high resistance to adversarial attacks which makes their usage in benchmarks safer than vulnerable metrics. The benchmark accepts new metrics submissions for researchers who want to make their metrics more robust to attacks or to find such metrics for their needs. Try our benchmark using pip install robustness-benchmark.

翻訳日:2024-01-03 01:36:53 公開日:2023-12-29

# 変分逆推論を用いたオフライン模倣学習

Offline Imitation Learning with Variational Counterfactual Reasoning ( http://arxiv.org/abs/2310.04706v4 )

ライセンス: Link先を確認

Bowei He, Zexu Sun, Jinxin Liu, Shuai Zhang, Xu Chen, Chen Ma

(参考訳) オフライン模倣学習(il)では、エージェントは、追加のオンライン環境の相互作用なしに最適な専門家の行動方針を学ぶことを目指している。しかし、ロボット操作のような現実世界の多くのシナリオでは、オフラインデータセットは報酬なしで最適な振る舞いから収集される。専門家データが少ないため、エージェントは通常、単に足跡を覚えず、環境の変化に弱いため、新しい環境に一般化する能力が欠如している。本稿では,高品質な専門家データを自動的に生成し,エージェントの一般化能力を向上させるために,デファクト推論を行うことにより,サンダーライン{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA)を提案する。特に、特定可能な変分オートエンコーダを利用して、専門家データ拡張のための \textit{counterfactual} サンプルを生成する。生成した専門家データの影響と一般化の改善を理論的に分析する。さらに,本手法が分散性能のための \textsc{deepmind control suite} ベンチマークと分散一般化のための \textsc{causalworld} ベンチマークの両方において,様々なベースラインを上回ることを示すために,広範な実験を行った。我々のコードは \url{https://github.com/ZexuSun/OILCA-NeurIPS23} で入手できる。

In offline imitation learning (IL), an agent aims to learn an optimal expert behavior policy without additional online environment interactions. However, in many real-world scenarios, such as robotics manipulation, the offline dataset is collected from suboptimal behaviors without rewards. Due to the scarce expert data, the agents usually suffer from simply memorizing poor trajectories and are vulnerable to variations in the environments, lacking the capability of generalizing to new environments. To automatically generate high-quality expert data and improve the generalization ability of the agent, we propose a framework named \underline{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA) by doing counterfactual inference. In particular, we leverage identifiable variational autoencoder to generate \textit{counterfactual} samples for expert data augmentation. We theoretically analyze the influence of the generated expert data and the improvement of generalization. Moreover, we conduct extensive experiments to demonstrate that our approach significantly outperforms various baselines on both \textsc{DeepMind Control Suite} benchmark for in-distribution performance and \textsc{CausalWorld} benchmark for out-of-distribution generalization. Our code is available at \url{https://github.com/ZexuSun/OILCA-NeurIPS23}.

翻訳日:2024-01-03 01:35:45 公開日:2023-12-29

# 敵対的特徴脱感化によるロバスト性強化隆起モデル

Robustness-enhanced Uplift Modeling with Adversarial Feature Desensitization ( http://arxiv.org/abs/2310.04693v3 )

ライセンス: Link先を確認

Zexu Sun, Bowei He, Ming Ma, Jiakai Tang, Yuchen Wang, Chen Ma, Dugang Liu

(参考訳) uplift modelingは、オンラインマーケティングにおいて非常に有望な結果を示している。しかし、既存の作品の多くは、いくつかの実用的応用においてロバスト性に挑戦しがちである。本稿では,この現象の考えられる説明を最初に提示する。我々は,様々な実世界のデータセットを用いたオンラインマーケティングにおいて,いくつかの重要な特徴の摂動が上昇モデルの性能に重大な影響を与え,また逆の傾向を引き起こすような,特徴の感度問題が存在することを検証した。上記の問題を解決するために, 対角的特徴脱感化(RUAD)を用いた新しい頑健性強化リフトモデリングフレームワークを提案する。具体的には,入力特徴量からキー部分集合を識別するジョイント・マルチラベル・モデリングを備えた機能選択モジュールと,この選択された特徴のサブセットに対するモデルのロバスト性を高めるために,逆トレーニングとソフト補間操作を用いた敵機能デセンシタイズモジュールを含む,2つのカスタマイズモジュールにより,アップリフトモデルの特徴感度をより効果的に緩和する。最後に、オンラインマーケティングにおけるRUADの有効性を検証するために、パブリックデータセットと実際の製品データセットに関する広範な実験を行う。さらに、機能感度に対するruadの堅牢性や、さまざまなアップリフトモデルとの互換性も示しています。

Uplift modeling has shown very promising results in online marketing. However, most existing works are prone to the robustness challenge in some practical applications. In this paper, we first present a possible explanation for the above phenomenon. We verify that there is a feature sensitivity problem in online marketing using different real-world datasets, where the perturbation of some key features will seriously affect the performance of the uplift model and even cause the opposite trend. To solve the above problem, we propose a novel robustness-enhanced uplift modeling framework with adversarial feature desensitization (RUAD). Specifically, our RUAD can more effectively alleviate the feature sensitivity of the uplift model through two customized modules, including a feature selection module with joint multi-label modeling to identify a key subset from the input features and an adversarial feature desensitization module using adversarial training and soft interpolation operations to enhance the robustness of the model against this selected subset of features. Finally, we conduct extensive experiments on a public dataset and a real product dataset to verify the effectiveness of our RUAD in online marketing. In addition, we also demonstrate the robustness of our RUAD to the feature sensitivity, as well as the compatibility with different uplift models.

翻訳日:2024-01-03 01:35:12 公開日:2023-12-29

# PixArt-$\alpha$:フォトリアリスティックテキスト・画像合成のための拡散変換器の高速訓練

PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis ( http://arxiv.org/abs/2310.00426v3 )

ライセンス: Link先を確認

Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, Zhenguo Li

(参考訳) 最も先進的なテキスト・ツー・イメージ(T2I)モデルでは、膨大なトレーニングコスト(GPU時間など)が必要であり、AIGCコミュニティの根本的な革新を著しく妨げつつ、CO2排出量を増大させる。本稿では,最新の画像生成装置 (imagen, sdxl, midjourney など) と画像生成品質が競合するトランスフォーマチックなt2i拡散モデルpixart-$\alpha$について紹介する。さらに、図1と2に示すように、トレーニングコストの低い1024pxまでの高解像度画像合成をサポートする。 To achieve this goal, three core designs are proposed: (1) Training strategy decomposition: We devise three distinct training steps that separately optimize pixel dependency, text-image alignment, and image aesthetic quality; (2) Efficient T2I Transformer: We incorporate cross-attention modules into Diffusion Transformer (DiT) to inject text conditions and streamline the computation-intensive class-condition branch; (3) High-informative data: We emphasize the significance of concept density in text-image pairs and leverage a large Vision-Language model to auto-label dense pseudo-captions to assist text-image alignment learning. その結果、PIXART-$\alpha$のトレーニング速度は既存の大規模T2Iモデルを大きく上回り、例えば、PIXART-$\alpha$は安定拡散v1.5のトレーニング時間(675対6,250 A100 GPU日)の10.8%しか必要とせず、300,000ドル近く節約でき(26,000対320,000ドル)、90%のCO2排出量を削減できる。さらに、より大きなSOTAモデルであるRAPHAELと比較して、トレーニングコストは1%に過ぎません。大規模な実験により、PIXART-$\alpha$は画質、芸術性、セマンティックコントロールに優れていた。 PIXART-$\alpha$はAIGCコミュニティとスタートアップに新たな洞察を与えて、高品質で低コストな生成モデルをスクラッチから構築することを願っている。

The most advanced text-to-image (T2I) models require significant training costs (e.g., millions of GPU hours), seriously hindering the fundamental innovation for the AIGC community while increasing CO2 emissions. This paper introduces PIXART-$\alpha$, a Transformer-based T2I diffusion model whose image generation quality is competitive with state-of-the-art image generators (e.g., Imagen, SDXL, and even Midjourney), reaching near-commercial application standards. Additionally, it supports high-resolution image synthesis up to 1024px resolution with low training cost, as shown in Figure 1 and 2. To achieve this goal, three core designs are proposed: (1) Training strategy decomposition: We devise three distinct training steps that separately optimize pixel dependency, text-image alignment, and image aesthetic quality; (2) Efficient T2I Transformer: We incorporate cross-attention modules into Diffusion Transformer (DiT) to inject text conditions and streamline the computation-intensive class-condition branch; (3) High-informative data: We emphasize the significance of concept density in text-image pairs and leverage a large Vision-Language model to auto-label dense pseudo-captions to assist text-image alignment learning. As a result, PIXART-$\alpha$'s training speed markedly surpasses existing large-scale T2I models, e.g., PIXART-$\alpha$ only takes 10.8% of Stable Diffusion v1.5's training time (675 vs. 6,250 A100 GPU days), saving nearly \$300,000 (\$26,000 vs. \$320,000) and reducing 90% CO2 emissions. Moreover, compared with a larger SOTA model, RAPHAEL, our training cost is merely 1%. Extensive experiments demonstrate that PIXART-$\alpha$ excels in image quality, artistry, and semantic control. We hope PIXART-$\alpha$ will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.

翻訳日:2024-01-03 01:33:54 公開日:2023-12-29

# 非可換位相空間におけるディラック方程式のエレンフェストの理論

Ehrenfest's Theorem for the Dirac Equation in Noncommutative Phase-Space ( http://arxiv.org/abs/2309.16360v2 )

ライセンス: Link先を確認

Ilyas Haouam

(参考訳) 本稿では,ディラック粒子の位置と運動運動運動量演算子の時間微分を電磁場と非可換な設定で計算する非可換位相空間におけるディラック方程式からehrenfestの定理を考察する。これにより、位相空間の非可換性がエレンフェストの定理に及ぼす影響を調べることができる。線形boppシフトとmoyal-weyl積の両方で、非可換性が挿入される。

In this article, we investigate Ehrenfest's theorem from the Dirac equation in a noncommutative phase-space where we calculate the time derivative of the position and the kinetic momentum operators for Dirac particles in interaction with electromagnetic field and within a noncommutative setting. This allows examining the effect of the phase-space noncommutativity on Ehrenfest's theorem. Knowing that with both the linear Bopp-Shift and Moyal-Weyl product, the noncommutativity is inserted.

翻訳日:2024-01-03 01:33:15 公開日:2023-12-29

# 測度輸送による密度推定:生物科学への応用への展望

Density Estimation via Measure Transport: Outlook for Applications in the Biological Sciences ( http://arxiv.org/abs/2309.15366v2 )

ライセンス: Link先を確認

Vanessa Lopez-Marrero, Patrick R. Johnstone, Gilchan Park, Xihaier Luo

(参考訳) 測定輸送手法の利点の1つは、広範囲の確率測度に従って分散されたデータの処理と分析のための統一的なフレームワークを可能にすることである。本研究は, 生体科学研究を支援するためのワークフローの一環として, 三角輸送地図を用いた輸送技術の測定の可能性を評価することを目的とした計算研究の結果を提示する。放射能生物学のような領域で一般的なデータシナリオは特に興味深い。データが少ない場合、疎いトランスポートマップは有利である。特に、利用可能なデータサンプルの集合の一連のランダムに選択されたサブセットに基づいて訓練された一連の(少ない)適応輸送マップから集められた統計は、データに隠された情報を明らかにする。その結果, 放射線生物応用において, 本手法は, 放射線照射下での遺伝子関係とそのダイナミクスに関する仮説を生成するためのツールを提供する。

One among several advantages of measure transport methods is that they allow for a unified framework for processing and analysis of data distributed according to a wide class of probability measures. Within this context, we present results from computational studies aimed at assessing the potential of measure transport techniques, specifically, the use of triangular transport maps, as part of a workflow intended to support research in the biological sciences. Scarce data scenarios, which are common in domains such as radiation biology, are of particular interest. We find that when data is scarce, sparse transport maps are advantageous. In particular, statistics gathered from computing series of (sparse) adaptive transport maps, trained on a series of randomly chosen subsets of the set of available data samples, leads to uncovering information hidden in the data. As a result, in the radiation biology application considered here, this approach provides a tool for generating hypotheses about gene relationships and their dynamics under radiation exposure.

翻訳日:2024-01-03 01:32:39 公開日:2023-12-29

# 心理指標を用いた汎用AIの評価

Evaluating General-Purpose AI with Psychometrics ( http://arxiv.org/abs/2310.16379v2 )

ライセンス: Link先を確認

Xiting Wang, Liming Jiang, Jose Hernandez-Orallo, David Stillwell, Luning Sun, Fang Luo, Xing Xie

(参考訳) 大規模言語モデルのような汎用AIシステムの包括的かつ正確な評価は、リスクを効果的に軽減し、その能力のより深い理解を可能にする。現在の評価手法は、主に特定のタスクのベンチマークに基づいており、現在の技術では、予期せぬタスクのパフォーマンスを予測し、特定のタスク項目やユーザ入力におけるパフォーマンスを説明する科学的基盤が欠けているため、これらの多用途aiシステムを適切に評価することができない。さらに、特定のタスクの既存のベンチマークでは、信頼性と妥当性に関する懸念が高まっている。これらの課題に対処するため,タスク指向評価から構成指向評価への移行を提案する。心理学的測定の科学である心理計測学は、複数のタスクでパフォーマンスを損なう潜在構造を特定し測定するための厳密な方法論を提供する。そのメリットを議論し,潜在的な落とし穴に対して警告するとともに,それを実践するための枠組みを提案する。最後に、心理測定と汎用AIシステムの評価を統合する将来の機会について検討する。

Comprehensive and accurate evaluation of general-purpose AI systems such as large language models allows for effective mitigation of their risks and deepened understanding of their capabilities. Current evaluation methodology, mostly based on benchmarks of specific tasks, falls short of adequately assessing these versatile AI systems, as present techniques lack a scientific foundation for predicting their performance on unforeseen tasks and explaining their varying performance on specific task items or user inputs. Moreover, existing benchmarks of specific tasks raise growing concerns about their reliability and validity. To tackle these challenges, we suggest transitioning from task-oriented evaluation to construct-oriented evaluation. Psychometrics, the science of psychological measurement, provides a rigorous methodology for identifying and measuring the latent constructs that underlie performance across multiple tasks. We discuss its merits, warn against potential pitfalls, and propose a framework to put it into practice. Finally, we explore future opportunities of integrating psychometrics with the evaluation of general-purpose AI systems.

翻訳日:2024-01-03 01:25:19 公開日:2023-12-29

# notechat: 臨床ノートに基づく総合的な医師・患者会話のデータセット

NoteChat: A Dataset of Synthetic Doctor-Patient Conversations Conditioned on Clinical Notes ( http://arxiv.org/abs/2310.15959v2 )

ライセンス: Link先を確認

Junda Wang, Zonghai Yao, Zhichao Yang, Huixue Zhou, Rumeng Li, Xun Wang, Yucheng Xu, Hong Yu

(参考訳) 本稿では,大規模言語モデル(llms)を活用した新しい協調型マルチエージェントフレームワークであるnotechatを紹介する。 NoteChatは、構造化されたロールプレイと戦略的プロンプトを通じて、ロール固有のLLMのアンサンブルが、割り当てられたロールをより効果的に実行できるという原則を具体化している。これらのロールプレイング LLM 間の相乗効果は結合的で効率的な対話生成をもたらす。 MTS-ダイアログ(MTS-dialogue, MTS-dialogue)の評価では、NoteChatによる強化された患者-生理的ダイアログで訓練されたモデルは、臨床ノートを生成するための他の最先端モデルよりも優れている。我々の総合的な自動評価と人的評価は、NoteChatがChatGPTやGPT-4のような最先端のモデルを大幅に上回り、ドメインの専門家が臨床ノートに基づいて優れた合成患者と物理学の対話を生成することを実証している。 NoteChatは、医師が燃え尽きる主な原因である、患者に直接関与し、臨床ドキュメントを支援する可能性がある。

We introduce NoteChat, a novel cooperative multi-agent framework leveraging Large Language Models (LLMs) to generate patient-physician dialogues. NoteChat embodies the principle that an ensemble of role-specific LLMs, through structured role-play and strategic prompting, can perform their assigned roles more effectively. The synergy among these role-playing LLMs results in a cohesive and efficient dialogue generation. Evaluation on MTS-dialogue, a benchmark dataset for patient-physician dialogues-note pairs, shows that models trained with the augmented synthetic patient-physician dialogues by NoteChat outperforms other state-of-the-art models for generating clinical notes. Our comprehensive automatic and human evaluation demonstrates that NoteChat substantially surpasses state-of-the-art models like ChatGPT and GPT-4 up to 22.78% by domain experts in generating superior synthetic patient-physician dialogues based on clinical notes. NoteChat has the potential to engage patients directly and help clinical documentation, a leading cause of physician burnout.

翻訳日:2024-01-03 01:24:22 公開日:2023-12-29

# 誘電体膜による反動注入によるダイヤモンド中の色中心の創製

Creation of color centers in diamond by recoil implantation through dielectric films ( http://arxiv.org/abs/2310.12484v2 )

ライセンス: Link先を確認

Yuyang Han, Christian Pederson, Bethany E. Matthews, Nicholas S. Yama, Maxwell F. Parsons, Kai-Mei C. Fu

(参考訳) 量子技術のためのダイヤモンドの地表に近い色中心の必要性は、結晶格子に特定の外部不純物のドーピングを制御する動機となる。近年の実験では、イオン注入による表面前駆体からの運動量移動によってこれが達成できることが示されている。ここでは、この技術を拡張し、窒素空孔(NV)とシリコン空孔(SiV)をダイヤモンドに形成するための誘電体前駆体を組み込む。具体的には, ダイヤモンド表面の窒化ケイ素や二酸化ケイ素の薄い層へのガリウム集電ビーム露光により, 外部不純物と炭素空孔の両方が導入された。これらの欠陥はその後、アニール後に好ましい光学特性を持つ表面近傍のNVとSiV中心を引き起こす。

The need of near-surface color centers in diamond for quantum technologies motivates the controlled doping of specific extrinsic impurities into the crystal lattice. Recent experiments have shown that this can be achieved by momentum transfer from a surface precursor via ion implantation, an approach known as ``recoil implantation.'' Here, we extend this technique to incorporate dielectric precursors for creating nitrogen-vacancy (NV) and silicon-vacancy (SiV) centers in diamond. Specifically, we demonstrate that gallium focused-ion-beam exposure to a thin layer of silicon nitride or silicon dioxide on the diamond surface results in the introduction of both extrinsic impurities and carbon vacancies. These defects subsequently give rise to near-surface NV and SiV centers with desirable optical properties after annealing.

翻訳日:2024-01-03 01:23:44 公開日:2023-12-29

# Score-based Generative Priors を用いた確率的イメージング

Provable Probabilistic Imaging using Score-Based Generative Priors ( http://arxiv.org/abs/2310.10835v2 )

ライセンス: Link先を確認

Yu Sun, Zihui Wu, Yifan Chen, Berthy T. Feng, Katherine L. Bouman

(参考訳) 不確かさを定量化しながら高品質な画像を推定することは、不適切な逆問題を解くための画像再構成アルゴリズムにおいて2つの望ましい特徴である。本稿では,一般的な逆問題に対する解の空間を特徴付けるための原則的枠組みとして,プラグアンドプレイ型モンテカルロ(PMC)を提案する。 PMCは、高画質の画像再構成のために、表現力のあるスコアベースの生成先を組み込むことができる。特に,従来のpnp(plug-and-play priors)のサンプリングアナログと見なすことのできる2つのpmcアルゴリズムと,(red)アルゴリズムによる正規化を導入する。また,pmcアルゴリズムの収束を特徴付ける理論的解析も確立した。我々の分析は,非log-concave確率や不完全なスコアネットワークが存在する場合でも,両アルゴリズムの漸近的定常性を保証する。線形前方モデルと非線形前方モデルの両方を用いた複数の代表逆問題に対する PMC アルゴリズムの性能を示す。実験の結果, PMCは再建品質を著しく向上し, 高忠実度不確実性定量化を可能にした。

Estimating high-quality images while also quantifying their uncertainty are two desired features in an image reconstruction algorithm for solving ill-posed inverse problems. In this paper, we propose plug-and-play Monte Carlo (PMC) as a principled framework for characterizing the space of possible solutions to a general inverse problem. PMC is able to incorporate expressive score-based generative priors for high-quality image reconstruction while also performing uncertainty quantification via posterior sampling. In particular, we introduce two PMC algorithms which can be viewed as the sampling analogues of the traditional plug-and-play priors (PnP) and regularization by denoising (RED) algorithms. We also establish a theoretical analysis for characterizing the convergence of the PMC algorithms. Our analysis provides non-asymptotic stationarity guarantees for both algorithms, even in the presence of non-log-concave likelihoods and imperfect score networks. We demonstrate the performance of the PMC algorithms on multiple representative inverse problems with both linear and nonlinear forward models. Experimental results show that PMC significantly improves reconstruction quality and enables high-fidelity uncertainty quantification.

翻訳日:2024-01-03 01:22:23 公開日:2023-12-29

# 合成能力の多重化:合成課題における拡散モデルの探索

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task ( http://arxiv.org/abs/2310.09336v3 )

ライセンス: Link先を確認

Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka

(参考訳) 現代の生成モデルは、非常に現実的なデータを生成する前例のない能力を示している。しかし、実世界の本質的な構成性を考えると、これらのモデルの実用的利用には、トレーニングデータセットにない出力を生成するための新しい概念セットを構成する能力を示す必要がある。先行研究は、最近の拡散モデルが興味深い組成一般化能力を示すが、予測不能に失敗することを示した。そこで本研究では, 条件付き拡散モデルにおける合成拡散モデルの構成一般化の理解, 学習データの属性の相違, サンプルアウトオブディストリビューション生成能力の測定について検討した。結果はこう示しています i) 概念からサンプルを生成し,それらを構成する能力が出現する順序は,基礎となるデータ生成プロセスの構造によって支配される。二構成的課題における演出は、構成的課題の性能に依拠し、部分的には生成モデルにみられる創発的な現象を説明するため、突然の「緊急」を示す。 (iii) 分布サンプルを生成するためのトレーニングデータの頻度が低い概念を構成するには、分布サンプルを生成するよりもかなり多くの最適化ステップが必要となる。本研究は、データ中心の観点から、生成モデルにおける能力と構成性を理解するための基礎を築いた。

Modern generative models exhibit unprecedented capabilities to generate extremely realistic data. However, given the inherent compositionality of the real world, reliable use of these models in practical applications requires that they exhibit the capability to compose a novel set of concepts to generate outputs not seen in the training data set. Prior work demonstrates that recent diffusion models do exhibit intriguing compositional generalization abilities, but also fail unpredictably. Motivated by this, we perform a controlled study for understanding compositional generalization in conditional diffusion models in a synthetic setting, varying different attributes of the training data and measuring the model's ability to generate samples out-of-distribution. Our results show: (i) the order in which the ability to generate samples from a concept and compose them emerges is governed by the structure of the underlying data-generating process; (ii) performance on compositional tasks exhibits a sudden "emergence" due to multiplicative reliance on the performance of constituent tasks, partially explaining emergent phenomena seen in generative models; and (iii) composing concepts with lower frequency in the training data to generate out-of-distribution samples requires considerably more optimization steps compared to generating in-distribution samples. Overall, our study lays a foundation for understanding capabilities and compositionality in generative models from a data-centric perspective.

翻訳日:2024-01-03 01:20:36 公開日:2023-12-29

# 確率的保証と実践による連続pomdp計画における複雑観測モデルの簡略化

Simplifying Complex Observation Models in Continuous POMDP Planning with Probabilistic Guarantees and Practice ( http://arxiv.org/abs/2311.07745v3 )

ライセンス: Link先を確認

Idan Lev-Yehudi, Moran Barenboim, Vadim Indelman

(参考訳) カメラ画像のような高次元かつ連続的な観察で部分的に観測可能なマルコフ決定プロセス(POMDP)を解くことは、多くの実生活ロボットや計画問題に必要である。近年の研究では、観測モデルとして機械学習確率モデルが提案されているが、オンライン展開には計算コストが大きすぎる。我々は,ソリューションの品質に関する正式な保証を維持しつつ,簡易な観測モデルを計画に使用することがどのような意味を持つのかという問題に対処する。我々の主な貢献は、単純化モデルの統計総変動距離に基づく新しい確率的境界である。提案手法は,PMDP値w.r.t.オリジナルモデルと経験的計画値と簡易モデルとのバウンドを示し,近年の粒子信頼性MDP濃度バウンドの結果を一般化した。私たちの計算はオフラインとオンラインの2つに分けることができ、計画中にコストのかかるモデルに全くアクセスすることなく正式な保証を得ることができます。最後に,既存の連続オンラインpomdpソルバのルーチンにバウンドをどのように統合するかをシミュレーションで示す。

Solving partially observable Markov decision processes (POMDPs) with high dimensional and continuous observations, such as camera images, is required for many real life robotics and planning problems. Recent researches suggested machine learned probabilistic models as observation models, but their use is currently too computationally expensive for online deployment. We deal with the question of what would be the implication of using simplified observation models for planning, while retaining formal guarantees on the quality of the solution. Our main contribution is a novel probabilistic bound based on a statistical total variation distance of the simplified model. We show that it bounds the theoretical POMDP value w.r.t. original model, from the empirical planned value with the simplified model, by generalizing recent results of particle-belief MDP concentration bounds. Our calculations can be separated into offline and online parts, and we arrive at formal guarantees without having to access the costly model at all during planning, which is also a novel result. Finally, we demonstrate in simulation how to integrate the bound into the routine of an existing continuous online POMDP solver.

翻訳日:2024-01-03 01:13:40 公開日:2023-12-29

# glamm: 大きなマルチモーダルモデルを持つピクセル

GLaMM: Pixel Grounding Large Multimodal Model ( http://arxiv.org/abs/2311.03356v2 )

ライセンス: Link先を確認

Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan

(参考訳) 大規模マルチモーダルモデル(LMM)は、大規模言語モデルを視覚領域に拡張する。初期のLMMは、全体像とテキストプロンプトを使用して、根拠のないテキスト応答を生成する。近年,領域レベルのLMMは視覚的に接地された応答を生成するために用いられている。しかし、それらは一度に1つのオブジェクトカテゴリのみを参照すること、ユーザが領域を指定すること、あるいは高密度のピクセル単位のオブジェクトグラウンドを提供することができないことに限定されている。本研究では,対応するオブジェクト分割マスクとシームレスに連動する自然言語応答を生成可能な最初のモデルであるGrounding LMM(GLaMM)を提案する。 GLaMMは会話に現れるオブジェクトを接地するだけでなく、テキストとオプションの視覚的プロンプト(関心領域)の両方を入力として受け入れるほど柔軟である。これによりユーザは、テキストドメインとビジュアルドメインの両方において、さまざまなレベルの粒度でモデルと対話できるようになる。視覚的接地会話生成(gcg)の新たな設定のための標準ベンチマークが欠如していることから,我々は,接地会話を用いた包括的評価プロトコルを導入する。提案したGCGタスクは,大規模に自然界に密着した概念を必要とする。そこで本研究では,セグメンテーションマスク付きで利用可能な合計810万の領域を基盤とした7.5万のユニークな概念を含む自動アノテーションパイプラインを用いて,GranD(GranD)を提案する。 GCGに加えて、GLaMMは、表現のセグメンテーション、画像と地域レベルのキャプション、視覚言語による会話など、いくつかの下流タスクでも効果的に実行する。

Large Multimodal Models (LMMs) extend Large Language Models to the vision domain. Initial LMMs used holistic images and text prompts to generate ungrounded textual responses. Recently, region-level LMMs have been used to generate visually grounded responses. However, they are limited to only referring to a single object category at a time, require users to specify the regions, or cannot offer dense pixel-wise object grounding. In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks. GLaMM not only grounds objects appearing in the conversations but is flexible enough to accept both textual and optional visual prompts (region of interest) as input. This empowers users to interact with the model at various levels of granularity, both in textual and visual domains. Due to the lack of standard benchmarks for the novel setting of visually Grounded Conversation Generation (GCG), we introduce a comprehensive evaluation protocol with our curated grounded conversations. Our proposed GCG task requires densely grounded concepts in natural scenes at a large-scale. To this end, we propose a densely annotated Grounding-anything Dataset (GranD) using our proposed automated annotation pipeline that encompasses 7.5M unique concepts grounded in a total of 810M regions available with segmentation masks. Besides GCG, GLaMM also performs effectively on several downstream tasks, e.g., referring expression segmentation, image and region-level captioning and vision-language conversations.

翻訳日:2024-01-03 01:12:07 公開日:2023-12-29

# LLM4Drive: 自動運転のための大規模言語モデルの調査

LLM4Drive: A Survey of Large Language Models for Autonomous Driving ( http://arxiv.org/abs/2311.01043v3 )

ライセンス: Link先を確認

Zhenjie Yang, Xiaosong Jia, Hongyang Li, Junchi Yan

(参考訳) 交通と都市移動に革命をもたらす触媒である自動運転技術は、ルールベースのシステムからデータ駆動戦略に移行する傾向にある。従来のモジュールベースのシステムは、カスケードモジュール間の累積誤差と柔軟性のない事前設定規則によって制約される。対照的に、エンドツーエンドの自動運転システムは、完全にデータ駆動のトレーニングプロセスによってエラーの蓄積を避ける可能性があるが、その“ブラックボックス”の性質によって透明性が欠如し、決定の検証とトレーサビリティが複雑になることが多い。近年,大規模言語モデル (LLM) は,文脈理解,論理的推論,回答生成などの能力を示した。自然の思考は、これらの能力を自律運転に活用することである。 LLMとファンデーションビジョンモデルを組み合わせることで、現在の自律運転システムが欠落しているオープンワールド理解、推論、少数ショット学習への扉を開くことができる。本稿では、自律運転のための大規模言語モデル(llm4ad)に関する研究ラインを体系的にレビューする。本研究は,技術進歩の現状を評価し,この分野の主要な課題と今後の方向性を明確に概説する。学術と産業の研究者の利便性のために、この分野の最新の進歩と、指定されたリンクを通じて関連するオープンソースリソースをリアルタイムで更新する。

Autonomous driving technology, a catalyst for revolutionizing transportation and urban mobility, has the tend to transition from rule-based systems to data-driven strategies. Traditional module-based systems are constrained by cumulative errors among cascaded modules and inflexible pre-set rules. In contrast, end-to-end autonomous driving systems have the potential to avoid error accumulation due to their fully data-driven training process, although they often lack transparency due to their "black box" nature, complicating the validation and traceability of decisions. Recently, large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers. A natural thought is to utilize these abilities to empower autonomous driving. By combining LLM with foundation vision models, it could open the door to open-world understanding, reasoning, and few-shot learning, which current autonomous driving systems are lacking. In this paper, we systematically review a research line about \textit{Large Language Models for Autonomous Driving (LLM4AD)}. This study evaluates the current state of technological advancements, distinctly outlining the principal challenges and prospective directions for the field. For the convenience of researchers in academia and industry, we provide real-time updates on the latest advances in the field as well as relevant open-source resources via the designated link: https://github.com/Thinklab-SJTU/Awesome-LLM4AD.

翻訳日:2024-01-03 01:11:20 公開日:2023-12-29

# 頭部・視線空間・時間的相互作用コンテキストのキャプチャによるエンドツーエンド映像視線推定

End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context ( http://arxiv.org/abs/2310.18131v3 )

ライセンス: Link先を確認

Yiran Guan, Zhuoguang Chen, Wenzheng Zeng, Zhiguo Cao, and Yang Xiao

(参考訳) 本稿では,頭部,顔,眼の空間的相互作用コンテキストを,まだ意識されていないエンドツーエンドの学習方法で把握し,映像の視線推定を容易にする新しい手法MCGaze(Multi-Clue Gaze)を提案する。 mcgazeの主な利点は、頭、顔、目の手がかりの局在化のタスクを、最適な性能を求めるための協調最適化とともに、一段階の視点推定のために共同で解決できることである。この間、空間的-時間的文脈交換は頭、顔、目の手がかりの間で起こる。したがって、様々なクエリから特徴を融合して得られる最終視線は、頭や顔からのグローバルな手がかりと、パフォーマンスを生かした目からのローカルな手がかりを同時に認識することができる。一方、ワンステップ走行方式は高い走行効率を確保する。 gaze360データセットの挑戦的な実験は、提案の優越性を検証する。ソースコードはhttps://github.com/zgchen33/MCGazeで公開される。

In this letter, we propose a new method, Multi-Clue Gaze (MCGaze), to facilitate video gaze estimation via capturing spatial-temporal interaction context among head, face, and eye in an end-to-end learning way, which has not been well concerned yet. The main advantage of MCGaze is that the tasks of clue localization of head, face, and eye can be solved jointly for gaze estimation in a one-step way, with joint optimization to seek optimal performance. During this, spatial-temporal context exchange happens among the clues on the head, face, and eye. Accordingly, the final gazes obtained by fusing features from various queries can be aware of global clues from heads and faces, and local clues from eyes simultaneously, which essentially leverages performance. Meanwhile, the one-step running way also ensures high running efficiency. Experiments on the challenging Gaze360 dataset verify the superiority of our proposition. The source code will be released at https://github.com/zgchen33/MCGaze.

翻訳日:2024-01-03 01:10:15 公開日:2023-12-29

# プリプロセッシング時間を改善するサブリニア時間スペクトルクラスタリングオラクル

A Sublinear-Time Spectral Clustering Oracle with Improved Preprocessing Time ( http://arxiv.org/abs/2310.17878v2 )

ライセンス: Link先を確認

Ranran Shen, Pan Peng

(参考訳) 本稿では,クラスタ性が強いグラフに対して,サブ線形時間スペクトルクラスタリングオラクルを設計する問題に対処する。これらのグラフは、それぞれ大きな内部伝導(少なくとも$\varphi$)と小さな外部伝導(ほとんどの$\varepsilon$)によって特徴づけられる、潜伏クラスター$k$を含む。我々の目的は、グラフを前処理してクラスタリングメンバシップクエリを有効にすることであり、前処理とクエリ応答の両方をサブライン時間で実行し、その結果のパーティションは、地上のクラスタリングに近い$k$-partitionと整合性を持つべきである。以前のオラクルは、内部コンダクタンスと外部コンダクタンスの間の$\textrm{poly}(k)\log n$ギャップか($k/\varepsilon$)前処理時間に依存していた。我々のアルゴリズムは、少し高い分類率のコストで、これらの仮定を緩和する。また、クラスタリングオラクルはいくつかのランダムなエッジ削除に対して堅牢であることを示す。理論境界を検証するために,合成ネットワーク実験を行った。

We address the problem of designing a sublinear-time spectral clustering oracle for graphs that exhibit strong clusterability. Such graphs contain $k$ latent clusters, each characterized by a large inner conductance (at least $\varphi$) and a small outer conductance (at most $\varepsilon$). Our aim is to preprocess the graph to enable clustering membership queries, with the key requirement that both preprocessing and query answering should be performed in sublinear time, and the resulting partition should be consistent with a $k$-partition that is close to the ground-truth clustering. Previous oracles have relied on either a $\textrm{poly}(k)\log n$ gap between inner and outer conductances or exponential (in $k/\varepsilon$) preprocessing time. Our algorithm relaxes these assumptions, albeit at the cost of a slightly higher misclassification ratio. We also show that our clustering oracle is robust against a few random edge deletions. To validate our theoretical bounds, we conducted experiments on synthetic networks.

翻訳日:2024-01-03 01:09:54 公開日:2023-12-29

# ビデオグラウンド化のための拡散モデルによる反復的リファインメントの探索

Exploring Iterative Refinement with Diffusion Models for Video Grounding ( http://arxiv.org/abs/2310.17189v2 )

ライセンス: Link先を確認

Xiao Liang, Tao Shi, Yaoyuan Liang, Te Tao, Shao-Lun Huang

(参考訳) ビデオグラウンディングは、与えられた文クエリに対応する未トリミングビデオ内のターゲットモーメントをローカライズすることを目的としている。既存の手法は通常、事前に定義された一連の提案から最良の予測を選択したり、標的を単発的に直接回帰させたりすることで、体系的な予測改善プロセスが欠如する。本稿では,DiffusionVGを提案する。DiffusionVGは条件生成タスクとしてビデオグラウンドを定式化し,ガウス雑音入力からターゲットスパンを生成し,逆拡散過程において相互に洗練する拡散モデルである。訓練中、ディフュージョンvgは目標スパンに一定の前方拡散過程で徐々にノイズを加え、逆拡散過程において目標スパンを回復することを学習する。推論において、DiffusionVGは、映像文表現に条件付き学習された逆拡散プロセスによりガウス雑音入力からターゲットスパンを生成することができる。 DiffusionVGは、メインストリームのCharades-STA、ActivityNet Captions、TACoSベンチマークの既存の優れたモデルと比較して、優れたパフォーマンスを示している。

Video grounding aims to localize the target moment in an untrimmed video corresponding to a given sentence query. Existing methods typically select the best prediction from a set of predefined proposals or directly regress the target span in a single-shot manner, resulting in the absence of a systematical prediction refinement process. In this paper, we propose DiffusionVG, a novel framework with diffusion models that formulates video grounding as a conditional generation task, where the target span is generated from Gaussian noise inputs and interatively refined in the reverse diffusion process. During training, DiffusionVG progressively adds noise to the target span with a fixed forward diffusion process and learns to recover the target span in the reverse diffusion process. In inference, DiffusionVG can generate the target span from Gaussian noise inputs by the learned reverse diffusion process conditioned on the video-sentence representations. Without bells and whistles, our DiffusionVG demonstrates superior performance compared to existing well-crafted models on mainstream Charades-STA, ActivityNet Captions and TACoS benchmarks.

翻訳日:2024-01-03 01:09:07 公開日:2023-12-29

# ステップサイズチューニングとプログレッシブシャープニングの相互作用について

On the Interplay Between Stepsize Tuning and Progressive Sharpening ( http://arxiv.org/abs/2312.00209v3 )

ライセンス: Link先を確認

Vincent Roulet, Atish Agarwala, Fabian Pedregosa

(参考訳) 近年の実証研究は、最適化器が安定の端で作動する臨界値を中心に安定するまで、シャープネス(ヘッセンの最大の固有値)が最適化を通して増加する深層学習モデルの興味深い性質を明らかにしている(Cohen et al, 2022)。本研究は, ステップサイズチューナーを用いて, ステップサイズ・チューナーを用いて, ステップサイズを局所的な量(例えば, 暗黙的に, シャープネス自体)に適応させる手法を実証的に検討する。決定論的設定における古典的アーミージョ線探索の驚くほど低い性能は、その目標の鋭さを常に増やそうとする傾向からよく説明できる。一方,polyakステップ化は一般に安定性の辺で,あるいは少し先でも動作し,アルミージョよりも優れており,決定論的設定では対応するステップが一定である。ステップサイズチューナーのアンロックには,ステップサイズとシャープネスのジョイントダイナミクスの理解が必要であることを示唆する分析で結論付けた。

Recent empirical work has revealed an intriguing property of deep learning models by which the sharpness (largest eigenvalue of the Hessian) increases throughout optimization until it stabilizes around a critical value at which the optimizer operates at the edge of stability, given a fixed stepsize (Cohen et al, 2022). We investigate empirically how the sharpness evolves when using stepsize-tuners, the Armijo linesearch and Polyak stepsizes, that adapt the stepsize along the iterations to local quantities such as, implicitly, the sharpness itself. We find that the surprisingly poor performance of a classical Armijo linesearch in the deterministic setting may be well explained by its tendency to ever-increase the sharpness of the objective. On the other hand, we observe that Polyak stepsizes operate generally at the edge of stability or even slightly beyond, outperforming its Armijo and constant stepsizes counterparts in the deterministic setting. We conclude with an analysis that suggests unlocking stepsize tuners requires an understanding of the joint dynamics of the step size and the sharpness.

翻訳日:2024-01-03 01:02:10 公開日:2023-12-29

# MicroCinema:テキスト・ビデオ・ジェネレーションのための分断型アプローチ

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation ( http://arxiv.org/abs/2311.18829v2 )

ライセンス: Link先を確認

Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo

(参考訳) 高品質でコヒーレントなテキスト対ビデオ生成のための,単純かつ効果的なフレームワークであるmicrocinemaを提案する。テキストプロンプトとビデオを直接結びつける既存のアプローチとは異なり、microcinemaでは、テキストからビデオへの分割と分割という2段階のプロセスを導入している。この戦略には2つの大きな利点がある。 a) 安定拡散、ミッドジャーニー、ダルルといった最近のテキスト対画像モデルの進歩を最大限に活用し、フォトリアリスティックで高精細な画像を生成することができる。 b) 生成された画像を活用することで,運動力学の効率的な学習を優先して,細部への焦点を小さくすることができる。この戦略を効果的に実施するために,2つのコア設計を導入する。まず,画像の外観の保存性を高めた外観注入ネットワークを提案する。第2に,事前学習した2次元拡散モデルの能力を維持するための新しいメカニズムである外観雑音優先法を導入する。これらのデザイン要素により、マイクロシネマは、提供されたテキストプロンプトによって、正確な動きで高品質なビデオを生成することができる。大規模な実験は提案フレームワークの優位性を実証している。具体的には、microCinemaはUCF-101では342.86、MSR-VTTでは377.40のSOTAゼロショットFVDを達成する。ビデオサンプルはhttps://wangyanhui666.github.io/microcinema.github.io/を参照。

We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation. Unlike existing approaches that align text prompts with video directly, MicroCinema introduces a Divide-and-Conquer strategy which divides the text-to-video into a two-stage process: text-to-image generation and image\&text-to-video generation. This strategy offers two significant advantages. a) It allows us to take full advantage of the recent advances in text-to-image models, such as Stable Diffusion, Midjourney, and DALLE, to generate photorealistic and highly detailed images. b) Leveraging the generated image, the model can allocate less focus to fine-grained appearance details, prioritizing the efficient learning of motion dynamics. To implement this strategy effectively, we introduce two core designs. First, we propose the Appearance Injection Network, enhancing the preservation of the appearance of the given image. Second, we introduce the Appearance Noise Prior, a novel mechanism aimed at maintaining the capabilities of pre-trained 2D diffusion models. These design elements empower MicroCinema to generate high-quality videos with precise motion, guided by the provided text prompts. Extensive experiments demonstrate the superiority of the proposed framework. Concretely, MicroCinema achieves SOTA zero-shot FVD of 342.86 on UCF-101 and 377.40 on MSR-VTT. See https://wangyanhui666.github.io/MicroCinema.github.io/ for video samples.

翻訳日:2024-01-03 01:01:50 公開日:2023-12-29

# 教師なしセマンティックセグメンテーションのための軽量クラスタリングフレームワーク

A Lightweight Clustering Framework for Unsupervised Semantic Segmentation ( http://arxiv.org/abs/2311.18628v2 )

ライセンス: Link先を確認

Yau Shing Jonathan Cheung, Xi Chen, Lihe Yang, Hengshuang Zhao

(参考訳) 教師なしセマンティクスセグメンテーションは、注釈付きデータを使わずに画像の各ピクセルを対応するクラスに分類することを目的としている。ラベル付きデータセットの取得は高価であるため、広く研究されている分野である。この分野でのこれまでの研究は、モデルの正確性が徐々に向上することを示したが、ほとんどのニューラルネットワークトレーニングは必要だった。これによりセグメンテーションは、特に大規模なデータセットを扱う場合、等しく高価になった。本論文では,教師なしセマンティクスセグメンテーションのための軽量クラスタリングフレームワークを提案する。自己監督型視覚変換器の注意特徴は,前景と背景の差異が強いことが判明した。したがって、前景と背景画像のパッチを効果的に分離するためにクラスタリングを利用することができる。当社のフレームワークでは、まず、データセットレベル、カテゴリレベル、イメージレベルの複数レベルのクラスタリングを行い、一貫性を維持します。そして、抽出されたバイナリパッチレベルの擬似マスクをアップサンプリングし、洗練し、最終的にラベル付けする。さらに、自己監督型ビジョントランスフォーマーの特徴を包括的に分析し、DINOとDINOv2の詳細な比較を行い、我々の主張を正当化する。我々のフレームワークは、教師なしセマンティックセグメンテーションにおいて大きな可能性を証明し、PASCAL VOCおよびMS COCOデータセットの最先端結果を達成する。

Unsupervised semantic segmentation aims to categorize each pixel in an image into a corresponding class without the use of annotated data. It is a widely researched area as obtaining labeled datasets is expensive. While previous works in the field have demonstrated a gradual improvement in model accuracy, most required neural network training. This made segmentation equally expensive, especially when dealing with large-scale datasets. We thus propose a lightweight clustering framework for unsupervised semantic segmentation. We discovered that attention features of the self-supervised Vision Transformer exhibit strong foreground-background differentiability. Therefore, clustering can be employed to effectively separate foreground and background image patches. In our framework, we first perform multilevel clustering across the Dataset-level, Category-level, and Image-level, and maintain consistency throughout. Then, the binary patch-level pseudo-masks extracted are upsampled, refined and finally labeled. Furthermore, we provide a comprehensive analysis of the self-supervised Vision Transformer features and a detailed comparison between DINO and DINOv2 to justify our claims. Our framework demonstrates great promise in unsupervised semantic segmentation and achieves state-of-the-art results on PASCAL VOC and MS COCO datasets.

翻訳日:2024-01-03 01:01:27 公開日:2023-12-29

# DAP:視覚・言語ナビゲーションのためのドメイン認識型プロンプト学習

DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation ( http://arxiv.org/abs/2311.17812v4 )

ライセンス: Link先を確認

Ting Liu, Yue Hu, Wansen Wu, Youkai Wang, Kai Xu, Quanjun Yin

(参考訳) 未知の環境をナビゲートするための言語指示に従うことは、自律型実施エージェントにとって困難なタスクである。強力な表現能力により、事前訓練された視覚・言語モデルはVLNで広く使われている。しかし、そのほとんどはWebcrawledの汎用データセットでトレーニングされており、VLNタスクで使用する場合、かなりのドメインギャップが生じる。そこで本研究では,新しいモデルに依存しないdap(domain-aware prompt learning)フレームワークを提案する。 VLNタスクにおいて、事前訓練されたモデルに特定のオブジェクトレベルとシーンレベルのクロスモーダルアライメントを持たせるために、DAPは低コストのプロンプトチューニングパラダイムを適用し、ドメイン内の画像セマンティクスを抽出するためのソフトな視覚的プロンプトを学習する。具体的には、CLIPモデルの助けを借りて、まずドメイン内の画像とテキストのペアを生成する。次に,事前学習モデルにおいて,視覚エンコーダの入力空間にソフトビジュアルプロンプトを導入する。 DAPは、訓練済みモデルの視覚エンコーダにドメイン内の視覚知識を効率的に注入する。 R2RとREVERIEの両方の実験結果は、既存の最先端手法と比較してDAPの優位性を示している。

Following language instructions to navigate in unseen environments is a challenging task for autonomous embodied agents. With strong representation capabilities, pretrained vision-and-language models are widely used in VLN. However, most of them are trained on web-crawled general-purpose datasets, which incurs a considerable domain gap when used for VLN tasks. To address the problem, we propose a novel and model-agnostic domain-aware prompt learning (DAP) framework. For equipping the pretrained models with specific object-level and scene-level cross-modal alignment in VLN tasks, DAP applies a low-cost prompt tuning paradigm to learn soft visual prompts for extracting in-domain image semantics. Specifically, we first generate a set of in-domain image-text pairs with the help of the CLIP model. Then we introduce soft visual prompts in the input space of the visual encoder in a pretrained model. DAP injects in-domain visual knowledge into the visual encoder of the pretrained model in an efficient way. Experimental results on both R2R and REVERIE show the superiority of DAP compared to existing state-of-the-art methods.

翻訳日:2024-01-03 01:01:10 公開日:2023-12-29

# 時系列におけるイベント検出:ユニバーサルディープラーニングアプローチ

Event Detection in Time Series: Universal Deep Learning Approach ( http://arxiv.org/abs/2311.15654v2 )

ライセンス: Link先を確認

Menouar Azib, Benjamin Renard, Philippe Garnier, Vincent G\'enot, Nicolas Andr\'e

(参考訳) 時系列におけるイベント検出は、不均衡なデータセット、まれなイベント、時間間隔定義イベントの頻度のため、困難なタスクである。従来の教師付きディープラーニング手法は主にバイナリ分類を採用しており、各ステップにはイベントの有無を示すバイナリラベルが割り当てられている。しかし、これらの手法はこれらの特定のシナリオを効果的に扱うのに苦労する。これらの制約に対処するために,分類に基づく手法よりもいくつかの利点を提供する,教師付き回帰に基づくディープラーニング手法を提案する。パラメータが限られているこのアプローチは、まれなイベントや不均衡なデータセットを含む、統一されたフレームワーク内のさまざまな種類のイベントを効果的に処理できる。我々は,その普遍性と精度を理論的に正当化し,様々な領域,特に稀な事象や不均衡なデータセットにおいて,その優れた性能を示す。

Event detection in time series is a challenging task due to the prevalence of imbalanced datasets, rare events, and time interval-defined events. Traditional supervised deep learning methods primarily employ binary classification, where each time step is assigned a binary label indicating the presence or absence of an event. However, these methods struggle to handle these specific scenarios effectively. To address these limitations, we propose a novel supervised regression-based deep learning approach that offers several advantages over classification-based methods. Our approach, with a limited number of parameters, can effectively handle various types of events within a unified framework, including rare events and imbalanced datasets. We provide theoretical justifications for its universality and precision and demonstrate its superior performance across diverse domains, particularly for rare events and imbalanced datasets.

翻訳日:2024-01-03 01:00:15 公開日:2023-12-29

# オンラインコミュニティからの完全な視覚的質問応答データセット

Fully Authentic Visual Question Answering Dataset from Online Communities ( http://arxiv.org/abs/2311.15562v2 )

ライセンス: Link先を確認

Chongyan Chen, Mengchen Liu, Noel Codella, Yunsheng Li, Lu Yuan, Danna Gurari

(参考訳) VQA(Visual Question Answering)は、画像に関する質問に答える機能である。 VQAデータセットは、すべてのコンテンツが真正のユースケースから生まれたものである。オンラインの質問応答コミュニティフォーラムから引用して、VQAonlineと呼ぶ。次に、データセットと、他の8つのvqaデータセットとの関係を特徴付ける。データセットの回答はもっと長い(平均173語)ので、標準的なVQA評価指標と互換性がないため、テキスト評価を長くするための6つの一般的な指標のうちどれが人間の判断に最も適しているかを次に分析する。次に、最も適したメトリクスを使用して、VQAonline上で6つの最先端のビジョンと言語基盤モデルを評価し、最も苦労している場所を明らかにします。データセットはhttps://vqaonline.github.io/で公開されている。

Visual Question Answering (VQA) entails answering questions about images. We introduce the first VQA dataset in which all contents originate from an authentic use case. Sourced from online question answering community forums, we call it VQAonline. We then characterize our dataset and how it relates to eight other VQA datasets. Observing that answers in our dataset tend to be much longer (e.g., with a mean of 173 words) and thus incompatible with standard VQA evaluation metrics, we next analyze which of the six popular metrics for longer text evaluation align best with human judgments. We then use the best-suited metrics to evaluate six state-of-the-art vision and language foundation models on VQAonline and reveal where they struggle most. The dataset can be found publicly at https://vqaonline.github.io/.

翻訳日:2024-01-03 01:00:00 公開日:2023-12-29

# 背景モジュロ観測における量子力学

Quantum Mechanics on a background modulo observation ( http://arxiv.org/abs/2311.12493v2 )

ライセンス: Link先を確認

Jose A. Pereira Frugone

(参考訳) 背景の時空を観測領域や測定領域によってモジュラー化された空間に変換するとき、量子力学の何が残るのか? この新しいモジュライ空間は、量子位相比較(観測、測定)が示唆される時空領域を同定することによって構成される。これを観測モジュール空間(OM-space)と呼ぶ。さらに、qm文では、プランク定数 (h) を$\zeta_0 4 \pi^2$(ここで$\zeta_0$ はプランク長さ)に置き換え、$p_0$ (プランクモーメント) を$4 \pi^2$に置き換える。これは量子力学を、観測モジュラ量子力学(OM-QM)と呼ばれる非常にリッチな双対数理論にマッピングする。我々は、ディラック方程式、量子波動関数、自由粒子質量に対する OM-双対を求める。エネルギーの OM-QM 対はリーマンゼータ函数の零点の単純函数であることが判明した。また、om-qmは電子スピン、電子電荷、電界および微細構造定数に対応する。また、ハイゼムベルクの不確かさ関係とアインシュタインの一般相対性場方程式のOM-QM対応式は、一意なOM-QM方程式の一定の極限として現れる。また、重力定数と宇宙定数のOM-QM対応も得られる。我々は、OM-QM側のホログラフィーのアナログを見つけ、スピンを高次元曲率として解釈する。 OM-QM対応の解釈は、測定や観測に依存しないQM情報の一部を与えるものとして提案される。この対応の潜在的な将来の応用について論じる。

In this work we will answer the following question: What remains of Quantum Mechanics when we transform the background space-time into a space modularized by observation or measurement regions ? This new moduli space is constructed by identifying regions of space-time where quantum phase comparison (observation, measurement) is implied. We call it Observation Modular space (OM-space). In addition we replace in QM statements the Plank constant (h) by the quantity $\zeta_0 4 \pi^2$ (where $\zeta_0$ is the Plank Length) or otherwise, replacing $P_0$ (the Planck Momentum) by $4 \pi^2$. This maps Quantum Mechanics into a very rich dual Number Theory which we call Observation Modular Quantum Mechanics (OM-QM). We find the OM-dual to the Dirac Equation, the quantum Wave Function and a free particle's mass. The OM-QM counterparts of the Energy turns out to be a simple function of the zeroes of the Riemann zeta function. We also find the OM-QM correspondents to the electron spin, the electron charge, the Electric Field and the Fine Structure Constant. We also find the OM-QM correspondents of the Heisemberg uncertainty relation and Einstein's General Relativity Field equation emerging as certain limits of a unique OM-QM equation. We also get the OM-QM correspondents of the Gravitational Constant and the Cosmological Constant. We find the analog of holography in the OM-QM side and we get an interpretation of spin as a high dimensional curvature. An interpretation of the OM-QM correspondence is proposed as giving the part of QM information which is not measurement or observation dependent. Some potential future applications of this correspondence are discussed.

翻訳日:2024-01-03 00:58:31 公開日:2023-12-29

# CARAT:マルチモーダルマルチラベル感情認識のためのコントラスト特徴再構成と集約

CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-modal Multi-label Emotion Recognition ( http://arxiv.org/abs/2312.10201v2 )

ライセンス: Link先を確認

Cheng Peng, Ke Chen, Lidan Shou, Gang Chen

(参考訳) マルチモーダルマルチラベル感情認識(MMER)は、複数のモーダルから関連する感情を特定することを目的としている。 mmerの課題は、異種データから複数のラベルの識別的特徴を効果的に捉える方法である。最近の研究は主に、マルチモーダル情報を全てのラベルの統一表現に統合するための様々な融合戦略の探求に費やされている。しかし、このような学習スキームは、各モダリティの特異性を見逃すだけでなく、異なるラベルに対する個々の識別的特徴を捉えることに失敗する。さらに、ラベルやモダリティの依存関係を効果的にモデル化することはできない。これらの課題に対処するために,MMERタスクのためのContrAstive Feature Restruction and AggregaTion(CARAT)を提案する。具体的には,モーダル分離とラベル特有の特徴を対比的に学習することにより,細粒度モダリティとラベル間の依存性をよりよくモデル化するための再構成ベースの融合機構を考案する。モータリティの相補性をさらに活用するために,ラベル間の共起コラボレーションを充実させるシャッフルベースのアグリゲーション戦略を導入する。 CMU-MOSEIとM3EDの2つのベンチマークデータセットの実験は、最先端手法に対するCARATの有効性を示した。コードはhttps://github.com/chengzju/CARAT.comで入手できる。

Multi-modal multi-label emotion recognition (MMER) aims to identify relevant emotions from multiple modalities. The challenge of MMER is how to effectively capture discriminative features for multiple labels from heterogeneous data. Recent studies are mainly devoted to exploring various fusion strategies to integrate multi-modal information into a unified representation for all labels. However, such a learning scheme not only overlooks the specificity of each modality but also fails to capture individual discriminative features for different labels. Moreover, dependencies of labels and modalities cannot be effectively modeled. To address these issues, this paper presents ContrAstive feature Reconstruction and AggregaTion (CARAT) for the MMER task. Specifically, we devise a reconstruction-based fusion mechanism to better model fine-grained modality-to-label dependencies by contrastively learning modal-separated and label-specific features. To further exploit the modality complementarity, we introduce a shuffle-based aggregation strategy to enrich co-occurrence collaboration among labels. Experiments on two benchmark datasets CMU-MOSEI and M3ED demonstrate the effectiveness of CARAT over state-of-the-art methods. Code is available at https://github.com/chengzju/CARAT.

翻訳日:2024-01-03 00:52:36 公開日:2023-12-29

# 地球は平らである:―説得的会話を通してLLMの誤報に対する信念を調査する

The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation ( http://arxiv.org/abs/2312.09085v3 )

ライセンス: Link先を確認

Rongwu Xu, Brian S. Lin, Shujian Yang, Tianqi Zhang, Weiyan Shi, Tianwei Zhang, Zhixuan Fang, Wei Xu, Han Qiu

(参考訳) 大きな言語モデル(LLM)は膨大な量の知識をカプセル化するが、それでも外部の誤情報に弱いままである。既存の研究は主に、この感受性の挙動を単一ターンで研究している。しかし、信念は多面的な会話、特に説得力のある会話の間に変化する可能性がある。そこで本研究では,llmの説得的会話に対する感受性,特に正解できる事実的質問について考察する。我々はまず、体系的に生成された説得的誤報と組み合わせた事実質問を含むFact to Misinformデータセット(Fact to Misinform)をキュレートする。そこで我々は,llmsの信念変化を説得力のある対話で追跡するテストフレームワークを開発した。広範にわたる実験により,LLMの事実知識に対する正しい信念は,様々な説得戦略によって容易に操作できることがわかった。

Large Language Models (LLMs) encapsulate vast amounts of knowledge but still remain vulnerable to external misinformation. Existing research mainly studied this susceptibility behavior in a single-turn setting. However, belief can change during a multi-turn conversation, especially a persuasive one. Therefore, in this study, we delve into LLMs' susceptibility to persuasive conversations, particularly on factual questions that they can answer correctly. We first curate the Farm (i.e., Fact to Misinform) dataset, which contains factual questions paired with systematically generated persuasive misinformation. Then, we develop a testing framework to track LLMs' belief changes in a persuasive dialogue. Through extensive experiments, we find that LLMs' correct beliefs on factual knowledge can be easily manipulated by various persuasive strategies.

翻訳日:2024-01-03 00:51:04 公開日:2023-12-29

# 産業サイバー物理システムにおける予後と健康管理の基礎モデルに関する研究

Survey on Foundation Models for Prognostics and Health Management in Industrial Cyber-Physical Systems ( http://arxiv.org/abs/2312.06261v2 )

ライセンス: Link先を確認

Ruonan Liu, Quanhu Zhang, Te Han

(参考訳) 産業サイバー物理システム(ICPS)は、コンピュータ科学、通信技術、工学の分野を統合し、現代の製造業と産業の不可欠な構成要素として登場した。しかし、ICPSは機器の故障、性能劣化、セキュリティ上の脅威など、長期運用において様々な課題に直面している。効率的なメンテナンスと管理を実現するため、診断と健康管理(PHM)はICPSにおいて障害予測、健康モニタリング、保守意思決定などの重要なタスクに広く適用されている。 bertやgptのような大規模基礎モデル(lfm)の出現は、ai技術の著しい進歩を意味し、chatgptはこの研究パラダイムにおける顕著な成果であり、一般的な人工知能の可能性を保っている。データ取得技術とデータ処理能力の継続的な向上を考えると、LCMはICPSのPHMドメインにおいて重要な役割を担うことが期待される。しかし、現在、ICPSにおけるPHMへのLPMの適用については合意が得られておらず、今後の方向性を解明するために体系的なレビューとロードマップが必要である。このギャップを埋めるために,本論文は基礎となるモデルの重要な要素と最近の進歩を解明する。icpにおけるphmのグランドモデリングの最新動向の包括的検証と理解は,icpの信頼性,可用性,安全性のさらなる向上を図りつつ,産業分野の意思決定者や研究者に貴重な資料を提供することができる。

Industrial Cyber-Physical Systems (ICPS) integrate the disciplines of computer science, communication technology, and engineering, and have emerged as integral components of contemporary manufacturing and industries. However, ICPS encounters various challenges in long-term operation, including equipment failures, performance degradation, and security threats. To achieve efficient maintenance and management, prognostics and health management (PHM) finds widespread application in ICPS for critical tasks, including failure prediction, health monitoring, and maintenance decision-making. The emergence of large-scale foundation models (LFMs) like BERT and GPT signifies a significant advancement in AI technology, and ChatGPT stands as a remarkable accomplishment within this research paradigm, harboring potential for General Artificial Intelligence. Considering the ongoing enhancement in data acquisition technology and data processing capability, LFMs are anticipated to assume a crucial role in the PHM domain of ICPS. However, at present, a consensus is lacking regarding the application of LFMs to PHM in ICPS, necessitating systematic reviews and roadmaps to elucidate future directions. To bridge this gap, this paper elucidates the key components and recent advances in the underlying model.A comprehensive examination and comprehension of the latest advances in grand modeling for PHM in ICPS can offer valuable references for decision makers and researchers in the industrial field while facilitating further enhancements in the reliability, availability, and safety of ICPS.

翻訳日:2024-01-03 00:49:55 公開日:2023-12-29

# text-to-sqlのためのsqlクエリのハードネス解析の分離

Decoupling SQL Query Hardness Parsing for Text-to-SQL ( http://arxiv.org/abs/2312.06172v2 )

ライセンス: Link先を確認

Jiawen Yi and Guo Chen

(参考訳) Text-to-SQLタスクの基本的な目標は、自然言語の質問をSQLクエリに変換することだ。現在の研究は、主に自然言語質問とスキーマ間の情報結合を強調しており、この分野では重要な進歩がなされている。主要なタスク要求源としての自然言語の質問は、対応するSQLクエリの難易度を決定するが、両者の相関は常に無視される。しかし、質問とクエリの相関が切り離された場合、タスクを単純化する可能性がある。本稿では,SQLクエリの難易度解析の分離に基づくテキストからSQLへの革新的フレームワークを提案する。このフレームワークは質問やスキーマを分析し、クエリの難しさに基づいてText-to-SQLタスクを分離する。これにより、言語モデルに対する解析のプレッシャーを大幅に減らす。提案フレームワークを評価し,クモデベロップメントにおけるファインターン方式の新たな最先端性能を実現する。

The fundamental goal of the Text-to-SQL task is to translate natural language question into SQL query. Current research primarily emphasizes the information coupling between natural language questions and schemas, and significant progress has been made in this area. The natural language questions as the primary task requirements source determines the hardness of correspond SQL queries, the correlation between the two always be ignored. However, when the correlation between questions and queries was decoupled, it may simplify the task. In this paper, we introduce an innovative framework for Text-to-SQL based on decoupling SQL query hardness parsing. This framework decouples the Text-to-SQL task based on query hardness by analyzing questions and schemas, simplifying the multi-hardness task into a single-hardness challenge. This greatly reduces the parsing pressure on the language model. We evaluate our proposed framework and achieve a new state-of-the-art performance of fine-turning methods on Spider dev.

翻訳日:2024-01-03 00:49:27 公開日:2023-12-29

# 多様なニューラルアーキテクチャを処理するグラフメタネットワーク

Graph Metanetworks for Processing Diverse Neural Architectures ( http://arxiv.org/abs/2312.04501v2 )

ライセンス: Link先を確認

Derek Lim, Haggai Maron, Marc T. Law, Jonathan Lorraine, James Lucas

(参考訳) ニューラルネットワークは、学習した情報をパラメータ内で効率的にエンコードする。したがって、ニューラルネットワーク自体を入力データとして扱うことで、多くのタスクを統一することができる。その際、近年の研究は、パラメータ空間の対称性と幾何学の計算の重要性を実証した。しかし、これらの作品はmlpやcnnのような特定のネットワーク向けに正規化層のないアーキテクチャを開発しており、そのようなアーキテクチャを他の種類のネットワークに一般化することは困難である。本研究では、他のニューラルネットワークから重みを取るニューラルネットワークを入力として構築することで、これらの課題を克服する。簡単に言えば、入力ニューラルネットワークを表すグラフを慎重に構築し、グラフニューラルネットワークを使用してグラフを処理する。当社のアプローチであるgraph metanetworks(gmns)は、マルチヘッドアテンション層、正規化層、畳み込み層、resnetブロック、グループ同変線形層など、競合するメソッドが苦労する神経アーキテクチャに一般化します。 GMNは,入力ニューラルネットワーク関数が変化しないパラメータ置換対称性と等価であることを示す。多様なニューラルネットワークアーキテクチャ上でのメタネットワークタスクにおける本手法の有効性を検証する。

Neural networks efficiently encode learned information within their parameters. Consequently, many tasks can be unified by treating neural networks themselves as input data. When doing so, recent studies demonstrated the importance of accounting for the symmetries and geometry of parameter spaces. However, those works developed architectures tailored to specific networks such as MLPs and CNNs without normalization layers, and generalizing such architectures to other types of networks can be challenging. In this work, we overcome these challenges by building new metanetworks - neural networks that take weights from other neural networks as input. Put simply, we carefully build graphs representing the input neural networks and process the graphs using graph neural networks. Our approach, Graph Metanetworks (GMNs), generalizes to neural architectures where competing methods struggle, such as multi-head attention layers, normalization layers, convolutional layers, ResNet blocks, and group-equivariant linear layers. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions unchanged. We validate the effectiveness of our method on several metanetwork tasks over diverse neural network architectures.

翻訳日:2024-01-03 00:48:11 公開日:2023-12-29

# MACCA:Causal Credit Assignmentによるオフラインマルチエージェント強化学習

MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment ( http://arxiv.org/abs/2312.03644v2 )

ライセンス: Link先を確認

Ziyan Wang, Yali Du, Yudi Zhang, Meng Fang, Biwei Huang

(参考訳) オフラインマルチエージェント強化学習(MARL)は、オンラインインタラクションが非現実的またはリスクのあるシナリオで有用である。 MARLの独立した学習は柔軟性とスケーラビリティを提供するが、オフライン環境で個々のエージェントにクレジットを正確に割り当てることは、環境とのインタラクションが禁止されているため、課題となる。本稿では、オフラインMARL設定におけるクレジット割り当てに対処するため、MACCA(Multi-Agent Causal Credit Assignment)という新しいフレームワークを提案する。我々のアプローチであるMACCAは、生成過程を動的ベイズネットワークとして特徴づけ、環境変数、状態、行動、報酬の関係を捉える。このモデルをオフラインデータ上で推定すると、MACCAは個々の報酬の因果関係を分析し、正確かつ解釈可能なクレジット割り当てを確実にすることで、各エージェントの貢献を学習することができる。さらに、このアプローチのモジュラリティにより、様々なオフラインMARLメソッドとシームレスに統合できます。理論的には、オフラインデータセットの設定の下では、基礎となる因果構造とエージェントの個々の報酬を生成する関数が識別可能であることが証明され、モデリングの正確性の基礎となった。実験では,MACCAが最先端の手法より優れるだけでなく,他のバックボーンと統合した場合の性能も向上することを示した。

Offline Multi-agent Reinforcement Learning (MARL) is valuable in scenarios where online interaction is impractical or risky. While independent learning in MARL offers flexibility and scalability, accurately assigning credit to individual agents in offline settings poses challenges because interactions with an environment are prohibited. In this paper, we propose a new framework, namely Multi-Agent Causal Credit Assignment (MACCA), to address credit assignment in the offline MARL setting. Our approach, MACCA, characterizing the generative process as a Dynamic Bayesian Network, captures relationships between environmental variables, states, actions, and rewards. Estimating this model on offline data, MACCA can learn each agent's contribution by analyzing the causal relationship of their individual rewards, ensuring accurate and interpretable credit assignment. Additionally, the modularity of our approach allows it to seamlessly integrate with various offline MARL methods. Theoretically, we proved that under the setting of the offline dataset, the underlying causal structure and the function for generating the individual rewards of agents are identifiable, which laid the foundation for the correctness of our modeling. In our experiments, we demonstrate that MACCA not only outperforms state-of-the-art methods but also enhances performance when integrated with other backbones.

翻訳日:2024-01-03 00:47:53 公開日:2023-12-29

# beyond isolation: ナレッジグラフ構築を改善するマルチエージェントシナジー

Beyond Isolation: Multi-Agent Synergy for Improving Knowledge Graph Construction ( http://arxiv.org/abs/2312.03022v2 )

ライセンス: Link先を確認

Hongbin Ye, Honghao Gui, Aijia Zhang, Tong Liu, Wei Hua, Weiqiang Jia

(参考訳) 知識グラフ構築(KGC)は、エンティティ、関係、イベントの抽出を含む多面的な作業である。伝統的に、大規模言語モデル(llm)はこの複雑な状況において単独のタスク解決エージェントと見なされてきた。しかし,本稿では,新しいフレームワークである cooperkgc を導入することで,このパラダイムに挑戦する。従来のアプローチとは別に、CooperKGCは協調処理ネットワークを確立し、エンティティ、リレーショナル、イベント抽出タスクを同時に処理できるKGCコラボレーションチームを構成する。我々の実験は、CooperKGC内の多様なエージェント間の協調と情報相互作用の促進が、単独で動作している個々の認知プロセスよりも優れた結果をもたらすことを示した。重要な点として,cooperkgcによるコラボレーションは,複数のインタラクションをまたいだ知識選択,修正,集約能力の向上に寄与することが明らかとなった。

Knowledge graph construction (KGC) is a multifaceted undertaking involving the extraction of entities, relations, and events. Traditionally, large language models (LLMs) have been viewed as solitary task-solving agents in this complex landscape. However, this paper challenges this paradigm by introducing a novel framework, CooperKGC. Departing from the conventional approach, CooperKGC establishes a collaborative processing network, assembling a KGC collaboration team capable of concurrently addressing entity, relation, and event extraction tasks. Our experiments unequivocally demonstrate that fostering collaboration and information interaction among diverse agents within CooperKGC yields superior results compared to individual cognitive processes operating in isolation. Importantly, our findings reveal that the collaboration facilitated by CooperKGC enhances knowledge selection, correction, and aggregation capabilities across multiple rounds of interactions.

翻訳日:2024-01-03 00:47:02 公開日:2023-12-29

# 一般化時空間インプテーションのための低ランク性トランスフォーマー

ImputeFormer: Low Rankness-Induced Transformers for Generalizable Spatiotemporal Imputation ( http://arxiv.org/abs/2312.01728v2 )

ライセンス: Link先を確認

Tong Nie, Guoyang Qin, Wei Ma, Yuewen Mei, and Jian Sun

(参考訳) データの欠如は、科学と工学の両方のタスク、特に時空間データのモデリングにおいて広く問題となっている。この問題は、機械学習ソリューションに貢献するために多くの研究を惹きつける。既存の計算ソリューションには、主に低ランクモデルとディープラーニングモデルが含まれる。一方、低ランクモデルは一般的な構造的優先権を持つが、モデルの容量は限られている。一方、深層学習モデルは、時空間過程の事前知識を欠きながら、表現性の健全な特徴を有する。両パラダイムの強みを活かし,低ランク性によるトランスフォーマーモデルを用いて,強い帰納バイアスと高いモデル表現率のバランスを実現する。時空間データの固有構造を活用することにより、バランスの取れた信号-雑音表現を学習し、様々な計算問題に応用できる。交通速度,交通量,太陽エネルギー,スマートメータリング,空気品質など,異種データセットの精度,効率,一般性において,その優位性を示す。包括的ケーススタディにより、解釈可能性をさらに強化する。実証結果の証明は、低ランク性のような時系列プリミティブを組み込むことで、広範囲の時空間計算問題にアプローチする一般化可能なモデルの開発を大幅に促進できるという強い信念を与える。

Missing data is a pervasive issue in both scientific and engineering tasks, especially for the modeling of spatiotemporal data. This problem attracts many studies to contribute to machine learning solutions. Existing imputation solutions mainly include low-rank models and deep learning models. On the one hand, low-rank models assume general structural priors, but have limited model capacity. On the other hand, deep learning models possess salient features of expressivity, while lack prior knowledge of the spatiotemporal process. Leveraging the strengths of both two paradigms, we demonstrate a low rankness-induced Transformer model to achieve a balance between strong inductive bias and high model expressivity. The exploitation of the inherent structures of spatiotemporal data enables our model to learn balanced signal-noise representations, making it versatile for a variety of imputation problems. We demonstrate its superiority in terms of accuracy, efficiency, and generality in heterogeneous datasets, including traffic speed, traffic volume, solar energy, smart metering, and air quality. Comprehensive case studies are performed to further strengthen interpretability. Promising empirical results provide strong conviction that incorporating time series primitives, such as low-rank properties, can substantially facilitate the development of a generalizable model to approach a wide range of spatiotemporal imputation problems.

翻訳日:2024-01-03 00:46:49 公開日:2023-12-29

# 事故GPT:マルチモーダル大モデルによるV2X環境認識の事故解析と防止

AccidentGPT: Accident Analysis and Prevention from V2X Environmental Perception with Multi-modal Large Model ( http://arxiv.org/abs/2312.13156v3 )

ライセンス: Link先を確認

Lening Wang, Yilong Ren, Han Jiang, Pinlong Cai, Daocheng Fu, Tianqi Wang, Zhiyong Cui, Haiyang Yu, Xuesong Wang, Hanchu Zhou, Helai Huang, Yinhai Wang

(参考訳) 交通事故は、人的被害と財産の被害の両方に重要な貢献をするものであり、交通安全の分野で多くの研究者が研究の焦点となっている。しかし、従来の研究では、静的環境アセスメントや動的運転分析、事故前予測や事故後ルール分析に焦点をあてた研究は、通常は孤立して行われている。交通安全の包括的な理解と応用を開発するための効果的な枠組みが欠如している。このギャップに対処するために,本研究では,総合的な事故解析とマルチモーダル大モデルであるAccidentGPTを紹介する。事故GPTは,交通安全分野における事故解析と防止に対する総合的なアプローチを可能にする,マルチセンサ認識に基づくマルチモーダル情報インタラクションフレームワークを確立する。具体的には, 自律走行車では, 総合的な環境認識と, 車両の制御と衝突回避のための理解を提供する。人間の運転する車両では、プロアクティブな長距離安全警告と盲点警報を提供すると同時に、人間と機械の対話と対話を通じて安全運転の推奨と行動規範を提供する。さらに,交通警察や交通管理機関では,歩行者,車両,道路,環境などを含む交通安全のインテリジェントかつリアルタイムな分析を,複数の車両や道路試験装置からの協調的な認識を通じて支援している。このシステムはまた、車両衝突後の事故原因と責任を徹底的に分析することができる。我々のフレームワークは交通安全研究に総合的なシーン理解を統合する最初の大規模モデルである。プロジェクトページ: https://accidentgpt.github.io

Traffic accidents, being a significant contributor to both human casualties and property damage, have long been a focal point of research for many scholars in the field of traffic safety. However, previous studies, whether focusing on static environmental assessments or dynamic driving analyses, as well as pre-accident predictions or post-accident rule analyses, have typically been conducted in isolation. There has been a lack of an effective framework for developing a comprehensive understanding and application of traffic safety. To address this gap, this paper introduces AccidentGPT, a comprehensive accident analysis and prevention multi-modal large model. AccidentGPT establishes a multi-modal information interaction framework grounded in multi-sensor perception, thereby enabling a holistic approach to accident analysis and prevention in the field of traffic safety. Specifically, our capabilities can be categorized as follows: for autonomous driving vehicles, we provide comprehensive environmental perception and understanding to control the vehicle and avoid collisions. For human-driven vehicles, we offer proactive long-range safety warnings and blind-spot alerts while also providing safety driving recommendations and behavioral norms through human-machine dialogue and interaction. Additionally, for traffic police and management agencies, our framework supports intelligent and real-time analysis of traffic safety, encompassing pedestrian, vehicles, roads, and the environment through collaborative perception from multiple vehicles and road testing devices. The system is also capable of providing a thorough analysis of accident causes and liability after vehicle collisions. Our framework stands as the first large model to integrate comprehensive scene understanding into traffic safety studies. Project page: https://accidentgpt.github.io

翻訳日:2024-01-02 21:04:59 公開日:2023-12-29

# サドル支配スクランブルにおけるスプレッド複雑性

Spread complexity in saddle-dominated scrambling ( http://arxiv.org/abs/2312.12593v2 )

ライセンス: Link先を確認

Kyoung-Bum Huh, Hyun-Sik Jeong, Juan F. Pedraza

(参考訳) 近年、量子システムの複雑性とカオス性の尺度として、拡散複雑性の概念krylov complexity for statesが導入された。本稿では,サドル支配スクランブルを示す<emph{integrable} 系における熱場二重状態の拡散複雑性について検討する。具体的には,saddle-dominated scramblingを特徴とする量子力学系の代表的な例として,リプキン・メシュコフ・グリックモデルと逆調和振動子に着目した。 Lanczosアルゴリズムの適用により,これらのシステムにおける拡散複雑性は,特異なランプピーク・スロープ・プレートパターンを呈し,emph{chaotic}システムに類似した特徴を示すことが明らかとなった。その結果、拡散複雑性は貴重なプローブとして機能するが、真の量子カオスを正確に診断するには、一般に追加の物理入力が必要であることが示された。また,拡散複雑性,スペクトル形状因子,クリロフ空間内の遷移確率との関係についても検討した。我々は,計算結果の分析的確認を行い,複雑性のehrenfest定理を検証し,拡散複雑性の早い段階での異なる二次的挙動を同定する。

Recently, the concept of spread complexity, Krylov complexity for states, has been introduced as a measure of the complexity and chaoticity of quantum systems. In this paper, we study the spread complexity of the thermofield double state within \emph{integrable} systems that exhibit saddle-dominated scrambling. Specifically, we focus on the Lipkin-Meshkov-Glick model and the inverted harmonic oscillator as representative examples of quantum mechanical systems featuring saddle-dominated scrambling. Applying the Lanczos algorithm, our numerical investigation reveals that the spread complexity in these systems exhibits features reminiscent of \emph{chaotic} systems, displaying a distinctive ramp-peak-slope-plateau pattern. Our results indicate that, although spread complexity serves as a valuable probe, accurately diagnosing true quantum chaos generally necessitates additional physical input. We also explore the relationship between spread complexity, the spectral form factor, and the transition probability within the Krylov space. We provide analytical confirmation of our numerical results, validating the Ehrenfest theorem of complexity and identifying a distinct quadratic behavior in the early-time regime of spread complexity.

翻訳日:2024-01-02 21:03:36 公開日:2023-12-29

# 編集できますか? 大規模言語モデルによるコード編集指導の追跡能力の評価

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions ( http://arxiv.org/abs/2312.12450v3 )

ライセンス: Link先を確認

Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Anton Lozhkov, Carolyn Jane Anderson, Arjun Guha

(参考訳) 様々なコード合成タスクのための大規模言語モデルの開発と評価に、かなりの量の研究が集中している。これには、自然言語命令からのコード合成、コードからのテストの合成、コードの説明の合成が含まれる。対照的に、LLMを用いた命令コード編集の動作について検討する。これらはモデルがプロンプトで提供されるコードのブロックを更新するよう指示されるタスクである。編集命令は、追加または削除する機能、バグの説明、修正の要求、異なる種類のソリューションの要求、その他の多くの一般的なコード編集タスクを要求できる。コード編集タスクのベンチマークを慎重に作成し,いくつかの最先端LCMを評価した。我々の評価は、最先端のオープンモデルとクローズドモデルの間の大きなギャップを露呈する。例えば、GPT-3.5-Turboでさえ、コード編集において最高のオープンモデルよりも8.8%良い。また、新しく、慎重にキュレートされ、パーミッシブにライセンスされたコード編集セットと自然言語命令も導入しました。このトレーニングセットを使うことで、オープンコードllmを微調整して、コード編集能力を大幅に改善できることを示します。

A significant amount of research is focused on developing and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language instructions, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is instructed to update a block of code provided in a prompt. The editing instruction may ask for a feature to added or removed, describe a bug and ask for a fix, ask for a different kind of solution, or many other common code editing tasks. We introduce a carefully crafted benchmark of code editing tasks and use it evaluate several cutting edge LLMs. Our evaluation exposes a significant gap between the capabilities of state-of-the-art open and closed models. For example, even GPT-3.5-Turbo is 8.8% better than the best open model at editing code. We also introduce a new, carefully curated, permissively licensed training set of code edits coupled with natural language instructions. Using this training set, we show that we can fine-tune open Code LLMs to significantly improve their code editing capabilities.

翻訳日:2024-01-02 21:03:15 公開日:2023-12-29

# 大規模言語モデルのための検索型生成:調査

Retrieval-Augmented Generation for Large Language Models: A Survey ( http://arxiv.org/abs/2312.10997v2 )

ライセンス: Link先を確認

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang and Haofen Wang

(参考訳) 大きな言語モデル(LLM)は重要な能力を示すが、幻覚、時代遅れの知識、不透明で追跡不能な推論プロセスといった課題に直面している。 Augmented Generation (RAG) は、外部データベースからのリアルタイムデータを LLM 応答に組み込むことによって、これらの問題に対する有望な解決策として登場した。これによってモデル、特に知識集約型タスクの正確性と信頼性が向上し、継続的な知識更新とドメイン固有情報の統合が可能になる。 RAG は LLM の本質的な知識と外部データベースの巨大な動的リポジトリを相乗的に統合する。本稿では,RAGの進化を詳細に分析し,Naive RAG,Advanced RAG,Modular RAGの3つのパラダイムに着目した。 RAGシステムの3つの基本コンポーネント(レトリバー、ジェネレータ、拡張方法)を方法論的に検討し、各コンポネネット内の最先端技術について検討する。さらに、RAGモデルを評価するための新しいメトリクスと機能や、最新の評価フレームワークについても紹介する。最後に,今後の課題,モダリティの拡張,RAG技術スタックとエコシステムの開発という3つの視点から,今後の研究方向性を概説する。

Large Language Models (LLMs) demonstrate significant capabilities but face challenges such as hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Augmented Generation (RAG) has emerged as a promising solution to these issues by incorporating real-time data from external databases into LLM responses. This enhances the accuracy and credibility of the models, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of external databases. This survey paper provides an in-depth analysis of the evolution of RAG, focusing on three key paradigms: Naive RAG, Advanced RAG, and Modular RAG. It methodically examines the three fundamental components of RAG systems: the retriever, the generator, and the augmentation methods, underscoring the cutting-edge technologies within each componenet. Additionally, the paper introduces novel metrics and capabilities for evaluating RAG models, as well as the most recent evaluation framework. Finally, the paper outlines future research directions from three perspectives: future challenges,modality extension,and the development of the RAG technical stack and ecosystem

翻訳日:2024-01-02 21:01:58 公開日:2023-12-29

# 差分強度検出とパリティ検出に基づくマッハ・ツェンダー干渉計の最適非ガウス演算

Optimal non-Gaussian operations in difference-intensity detection and parity detection-based Mach-Zehnder interferometer ( http://arxiv.org/abs/2312.10774v2 )

ライセンス: Link先を確認

Manali Verma, Chandan Kumar, Karunesh K. Mishra, and Prasanta K. Panigrahi

(参考訳) 位相推定における確率的非ガウス演算の利点を差分強度とパリティ検出に基づくマッハ・ツェンダー干渉計(MZI)を用いて検討する。我々は,光子サブトラクション(PS),光子付加(PA),光子触媒(PC)の3種類の非ガウス的操作を単一モード圧縮真空(SSV)状態で行う実験的に実装可能なモデルを考える。差分強度検出に基づくMZIでは、2つのPC操作が最も最適であるのに対し、パリティ検出に基づくMZIでは2つのPA操作が最も最適なプロセスとして現れる。また,本研究は実験家にとって有益であるように,最高の性能で対応するスクイージングパラメータと透過率パラメータも提供してきた。さらに, モーメント生成関数の一般表現を導出し, ホモダイン検出や二次ホモダイン検出などの他の検出手法の探索に有用である。

We investigate the benefits of probabilistic non-Gaussian operations in phase estimation using difference-intensity and parity detection-based Mach-Zehnder interferometers (MZI). We consider an experimentally implementable model to perform three different non-Gaussian operations, namely photon subtraction (PS), photon addition (PA), and photon catalysis (PC) on a single-mode squeezed vacuum (SSV) state. In difference-intensity detection-based MZI, two PC operation is found to be the most optimal, while for parity detection-based MZI, two PA operation emerges as the most optimal process. We have also provided the corresponding squeezing and transmissivity parameters at best performance, making our study relevant for experimentalists. Further, we have derived the general expression of moment-generating function, which shall be useful in exploring other detection schemes such as homodyne detection and quadratic homodyne detection.

翻訳日:2024-01-02 21:01:10 公開日:2023-12-29

# 連続可変量子鍵分布の実用繊維による有限離開時のセキュリティ解析

The Security Analysis of Continuous-Variable Quantum Key Distribution under Limited Eavesdropping with Practical Fiber ( http://arxiv.org/abs/2312.16206v2 )

ライセンス: Link先を確認

Sheng Liu, Lu Fan, Zhengyu Li, Qiang Zhou, Yunbo Li, Dong Wang, Dechao Zhang, Yichen Zhang, and Han Li

(参考訳) 実用条件下での最適盗聴モデルの研究は、セキュアな情報伝達に量子鍵分布(QKD)システムを用いる場合の現実的なリスクを評価するのに役立つ。直感的には、繊維の損失は、盗聴者によって収穫されるのではなく、環境への光エネルギーの漏出につながり、qkdシステムの性能を実用的に向上しながら盗聴能力を制限する。しかし、チャネルが正規パートナーの制御外であり、漏洩信号が検出できないため、損失ファイバの存在下で最適な盗聴モデルを定義することは困難である。本稿では,2つの遠隔局と共用絡み込み源を必要とする遠隔地攻撃モデルに基づいて,ファイバロスが盗聴能力に与える影響について検討する。実際の損失により分散した絡み合いが制限されると、2つのテレポーテーションステーションを1つにマージして送信サイトの近くに配置すると最適な攻撃が起こり、これは絡み合い攻撃と類似するがワイヤーテーピング比が低下する。 Eveが最高のホロウコアファイバーを使用していると仮定すると、実用環境での秘密鍵レートは理想の盗聴よりも20%から40%高い。エンタングルメント蒸留技術が十分に成熟し、高品質な分散エンタングルメントを提供することができるなら、2つのテレポーテーションステーションは、盗聴性能を向上させるために遠距離分離されるべきであり、盗聴は最適な集団攻撃に近づくことさえ可能である。現在の絡み合い浄化技術の下では、避けられない繊維の損失は、盗聴能力を大幅に制限し、現実的なシステムの秘密鍵レートと送信距離を高め、実用的な応用シナリオにおけるQKDシステムの開発を促進することができる。

Research on optimal eavesdropping models under practical conditions will help to evaluate realistic risk when employing quantum key distribution (QKD) system for secure information transmission. Intuitively, fiber loss will lead to the optical energy leaking to the environment, rather than harvested by the eavesdropper, which also limits the eavesdropping ability while improving the QKD system performance in practical use. However, defining the optimal eavesdropping model in the presence of lossy fiber is difficult because the channel is beyond the control of legitimate partners and the leaked signal is undetectable. Here we investigate how the fiber loss influences the eavesdropping ability based on a teleportation-based collective attack model which requires two distant stations and a shared entanglement source. We find that if the distributed entanglement is limited due to the practical loss, the optimal attack occurs when the two teleportation stations are merged to one and placed close to the transmitter site, which performs similar to the entangling-cloning attack but with a reduced wiretapping ratio. Assuming Eve uses the best available hollow-core fiber, the secret key rate in the practical environment can be 20%~40% higher than that under ideal eavesdropping. While if the entanglement distillation technology is mature enough to provide high quality of distributed entanglement, the two teleportation stations should be distantly separated for better eavesdropping performance, where the eavesdropping can even approach the optimal collective attack. Under the current level of entanglement purification technology, the unavoidable fiber loss can still greatly limit the eavesdropping ability as well as enhance the secret key rate and transmission distance of the realistic system, which promotes the development of QKD systems in practical application scenarios.

翻訳日:2024-01-02 20:27:08 公開日:2023-12-29

# SymmPI: グループ対称性を持つデータの予測推論

SymmPI: Predictive Inference for Data with Group Symmetries ( http://arxiv.org/abs/2312.16160v2 )

ライセンス: Link先を確認

Edgar Dobriban, Mengxin Yu

(参考訳) 予測の不確かさの定量化は、現代の統計学において核となる問題である。予測推論の手法は様々な仮定の下で開発されており、例えば標準共形予測では、置換群のような特別な変換群の下でデータの分布の不変性に依存することが多い。さらに,既存の予測手法の多くは,特徴出力観測の順序で観測されていない結果を予測することを目的としている。一方、より一般的な観測モデル(例えば、部分的に観測された特徴)の下での予測推論や、より一般的な分布対称性を満たすデータ(例えば、物理学における回転不変あるいは座標非依存観察)に関心がある。本稿では,データ分布が任意の観測モデルに一般群対称性を持つ場合の予測推論手法であるSymbPIを提案する。本手法は,分布不変性を維持しつつデータを処理する分布同変変換の新たな概念を利用する。 symmpiは分布不変性下で有効なカバレッジを有し,分布シフト時の性能を特徴付け,最近の結果を特殊事例として回収する。ネットワーク内の頂点に関連付けられた未観測値を予測するために,SymmPIを適用した。 2層階層モデルにおけるいくつかのシミュレーションと経験的データ分析の例では、symmpiは既存の手法と比較して好適に機能する。

Quantifying the uncertainty of predictions is a core problem in modern statistics. Methods for predictive inference have been developed under a variety of assumptions, often -- for instance, in standard conformal prediction -- relying on the invariance of the distribution of the data under special groups of transformations such as permutation groups. Moreover, many existing methods for predictive inference aim to predict unobserved outcomes in sequences of feature-outcome observations. Meanwhile, there is interest in predictive inference under more general observation models (e.g., for partially observed features) and for data satisfying more general distributional symmetries (e.g., rotationally invariant or coordinate-independent observations in physics). Here we propose SymmPI, a methodology for predictive inference when data distributions have general group symmetries in arbitrary observation models. Our methods leverage the novel notion of distributional equivariant transformations, which process the data while preserving their distributional invariances. We show that SymmPI has valid coverage under distributional invariance and characterize its performance under distribution shift, recovering recent results as special cases. We apply SymmPI to predict unobserved values associated to vertices in a network, where the distribution is unchanged under relabelings that keep the network structure unchanged. In several simulations in a two-layer hierarchical model, and in an empirical data analysis example, SymmPI performs favorably compared to existing methods.

翻訳日:2024-01-02 20:26:36 公開日:2023-12-29

# ローゼン・モース散乱状態に対するルジャンドル関数の一般化

Generalization of Legendre functions applied to Rosen-Morse scattering states ( http://arxiv.org/abs/2312.15652v2 )

ライセンス: Link先を確認

F. L. Freitas

(参考訳) 関連するレジェンド関数の一般化が提案され、ローゼン・モースポテンシャルの散乱状態を記述するために用いられる。関数は、超幾何関数の言葉で明示的な式が与えられ、その漸近的な振る舞いを調べ、全反射領域と部分反射領域の状態の要求に合致するように示される。反射係数と透過係数の基本的な式が与えられ、一般化されたルジャンドル関数の積分恒等式が証明され、散乱状態に対する誘導積分変換のスペクトル測度が計算される。これらの手法は、経路積分法を必要とせず、ポテンシャルに対する完全な古典解を与える。

A generalization of associated Legendre functions is proposed and used to describe the scattering states of the Rosen-Morse potential. The functions are then given explicit formulas in terms of the hypergeometric function, their asymptotic behavior is examined and shown to match the requirements for states in the regions of total and partial reflection. Elementary expressions are given for reflection and transmission coefficients, and an integral identity for the generalized Legendre functions is proven, allowing the calculation of the spectral measure of the induced integral transform for the scattering states. These methods provide a complete classical solution to the potential, without need of path integral techniques.

翻訳日:2024-01-02 20:25:58 公開日:2023-12-29

# SOLAR 10.7B: 単純だが効果的なアップスケーリングによる大規模言語モデルのスケーリング

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling ( http://arxiv.org/abs/2312.15166v2 )

ライセンス: Link先を確認

Dahyun Kim, Chanjun Park, Sanghoon Kim, Wonsung Lee, Wonho Song, Yunsu Kim, Hyeonwoo Kim, Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee, Sunghun Kim

(参考訳) 我々は107億のパラメータを持つ大規模言語モデル(LLM)であるSOLAR 10.7Bを紹介し、様々な自然言語処理(NLP)タスクにおいて優れた性能を示す。近年の大規模llmの効率化に触発されて,深度拡大スケーリング(dus, depth up-scaling)と呼ばれるllmのスケーリング手法を提案する。他のLLMアップスケーリング手法とは異なり、DUSはトレーニングや推論を効率的に行うのに複雑な変更を必要としない。実験により, DUS は単純だが, 高速 LLM のスケールアップには有効であることがわかった。 dusモデルに基づいて、さらに、命令追従機能のために微調整された変種であるsolar 10.7b-instructを、mixtral-8x7b-instructを上回っている。 solar 10.7bはapache 2.0ライセンスの下で公開されており、llm分野の幅広いアクセスとアプリケーションを促進する。

We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion parameters, demonstrating superior performance in various natural language processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale LLMs, we present a method for scaling LLMs called depth up-scaling (DUS), which encompasses depthwise scaling and continued pretraining. In contrast to other LLM up-scaling methods that use mixture-of-experts, DUS does not require complex changes to train and inference efficiently. We show experimentally that DUS is simple yet effective in scaling up high-performance LLMs from small ones. Building on the DUS model, we additionally present SOLAR 10.7B-Instruct, a variant fine-tuned for instruction-following capabilities, surpassing Mixtral-8x7B-Instruct. SOLAR 10.7B is publicly available under the Apache 2.0 license, promoting broad access and application in the LLM field.

翻訳日:2024-01-02 20:25:47 公開日:2023-12-29

# sc-gs: 編集可能な動的シーンのためのスパース制御ガウススプレート

SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes ( http://arxiv.org/abs/2312.14937v2 )

ライセンス: Link先を確認

Yi-Hua Huang and Yang-Tian Sun and Ziyi Yang and Xiaoyang Lyu and Yan-Pei Cao and Xiaojuan Qi

(参考訳) ダイナミックシーンのための新しいビュー合成は、コンピュータビジョンとグラフィックスにおいて依然として困難な問題である。近年,静的なシーンを表現し,高品質でリアルタイムな新規ビュー合成を実現するための堅牢な手法としてガウススプラッティングが登場している。この手法に基づき,動的シーンの動きと外観を,それぞれ疎い制御点と密集したガウス型に明示的に分解する新しい表現法を提案する。我々のキーとなる考え方は、3次元ガウスの運動場を得るために、学習補間重みを通して局所的に補間できるコンパクトな6DF変換基底を学ぶために、ガウス変換よりもはるかに少ないスパース制御点を使用することである。変形MLPを用いて各制御点の時間変化6 DoF変換を予測し,学習の複雑さを低減し,学習能力を高め,時間的および空間的コヒーレントな動作パターンの獲得を容易にする。次に,3次元ガウス,制御点の標準空間位置,変形MLPを共同で学習し,3次元シーンの外観,幾何学,ダイナミックスを再構築する。学習中、異なる領域の異なる運動複雑度に対応するために制御点の位置と個数を適応的に調整し、学習運動の空間的連続性と局所的剛性をできるだけ厳密な原理に従ってARAP損失を発生させる。最後に, 明示的なスパースモーション表現と外観からの分解により, 高忠実性を維持しつつ, ユーザ制御によるモーション編集を実現する。広汎な実験により,本手法は,新しいビュー合成手法を高速で実現し,新しい外観保存型モーション編集アプリケーションを実現する。プロジェクトページ:https://yihua7.github.io/SC-GS-web/

Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. Recently, Gaussian splatting has emerged as a robust technique to represent static scenes and enable high-quality and real-time novel view synthesis. Building upon this technique, we propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse control points and dense Gaussians, respectively. Our key idea is to use sparse control points, significantly fewer in number than the Gaussians, to learn compact 6 DoF transformation bases, which can be locally interpolated through learned interpolation weights to yield the motion field of 3D Gaussians. We employ a deformation MLP to predict time-varying 6 DoF transformations for each control point, which reduces learning complexities, enhances learning abilities, and facilitates obtaining temporal and spatial coherent motion patterns. Then, we jointly learn the 3D Gaussians, the canonical space locations of control points, and the deformation MLP to reconstruct the appearance, geometry, and dynamics of 3D scenes. During learning, the location and number of control points are adaptively adjusted to accommodate varying motion complexities in different regions, and an ARAP loss following the principle of as rigid as possible is developed to enforce spatial continuity and local rigidity of learned motions. Finally, thanks to the explicit sparse motion representation and its decomposition from appearance, our method can enable user-controlled motion editing while retaining high-fidelity appearances. Extensive experiments demonstrate that our approach outperforms existing approaches on novel view synthesis with a high rendering speed and enables novel appearance-preserved motion editing applications. Project page: https://yihua7.github.io/SC-GS-web/

翻訳日:2024-01-02 20:24:22 公開日:2023-12-29

# 量子実時間発展のためのテンソル正規化群法

Tensor Renormalization Group Methods for Quantum Real-time Evolution ( http://arxiv.org/abs/2312.14825v2 )

ライセンス: Link先を確認

Michael Hite and Yannick Meurice

(参考訳) 格子ゲージ理論における実時間発展のab-initio計算は、非常に興味深い応用であるが、計算の難解な側面を提示している。ユークリッド時間格子場理論の文脈で開発されたテンソル再正規化群法は, トロタライズ展開作用素のリアルタイム計算に応用できることを示す。本稿では,各種観測器の切断手順の最適化について検討する。この数値解法を1次元量子イジングモデルに適用し,順序相の外部横場を用いて計算を行い,$n_{s}=4$および8サイトの普遍量子計算と比較する。

Ab-initio calculations of real-time evolution for lattice gauge theory have very interesting potential applications but present challenging computational aspects. We show that tensor renormalization group methods developed in the context of Euclidean-time lattice field theory can be applied to calculation of Trotterized evolution operators at real time. We discuss the optimization of truncation procedures for various observables. We apply the numerical methods to the 1D Quantum Ising Model with an external transverse field in the ordered phase and compare with universal quantum computing for $N_{s}=4$ and 8 sites.

翻訳日:2024-01-02 20:23:49 公開日:2023-12-29

# Large Language Model (LLM) Bias Index -- LLMBI

Large Language Model (LLM) Bias Index -- LLMBI ( http://arxiv.org/abs/2312.14769v3 )

ライセンス: Link先を確認

Abiodun Finbarrs Oketunji, Muhammad Anas, Deepthi Saina

(参考訳) LLMBI(Large Language Model Bias Index)は、GPT-4のような大規模言語モデル(LLM)に固有のバイアスを定量化し、対処するための先駆的なアプローチである。多様な分野におけるLSMの普及と影響を認識している。本研究は,モデル応答を誘発する可能性のあるバイアスを系統的に測定し緩和する新しい計量 LLMBI を導入する。年齢,性別,人種的偏見に限らず,多次元の偏見を取り入れた複合スコアリングシステムを用いたLSMBIの定式化を行った。このメトリクスを運用するには, LLM応答の収集と注釈付け, バイアス検出のための洗練された自然言語処理(NLP)技術の適用, 特殊な数学的公式による LLMBI スコアの計算を含む多段階的なプロセスに携わる。この公式は、様々なバイアス次元の重み付け平均値、データセットの多様性の欠陥に対するペナルティ、感情バイアスに対する補正を統合する。 OpenAIのAPIからの応答を用いた実証分析では,バイアス検出の代表的な方法として,高度な感情分析を採用している。この研究は、LLMがテキスト生成において印象的な能力を示す一方で、異なる次元にまたがる様々なバイアスを示すことを明らかにしている。 LLMBIは、モデルと時間とともにバイアスを比較するための定量尺度を提供し、LLMの公平性と信頼性を高める上で、システムエンジニア、研究者、規制当局にとって重要なツールを提供する。偏見のない人間のような反応を模倣するLLMの可能性を強調している。さらに、社会規範や倫理基準の進化に合わせて、そのようなモデルを継続的に監視し、再検討する必要性を強調している。

The Large Language Model Bias Index (LLMBI) is a pioneering approach designed to quantify and address biases inherent in large language models (LLMs), such as GPT-4. We recognise the increasing prevalence and impact of LLMs across diverse sectors. This research introduces a novel metric, LLMBI, to systematically measure and mitigate biases potentially skewing model responses. We formulated LLMBI using a composite scoring system incorporating multiple dimensions of bias, including but not limited to age, gender, and racial biases. To operationalise this metric, we engaged in a multi-step process involving collecting and annotating LLM responses, applying sophisticated Natural Language Processing (NLP) techniques for bias detection, and computing the LLMBI score through a specially crafted mathematical formula. The formula integrates weighted averages of various bias dimensions, a penalty for dataset diversity deficiencies, and a correction for sentiment biases. Our empirical analysis, conducted using responses from OpenAI's API, employs advanced sentiment analysis as a representative method for bias detection. The research reveals LLMs, whilst demonstrating impressive capabilities in text generation, exhibit varying degrees of bias across different dimensions. LLMBI provides a quantifiable measure to compare biases across models and over time, offering a vital tool for systems engineers, researchers and regulators in enhancing the fairness and reliability of LLMs. It highlights the potential of LLMs in mimicking unbiased human-like responses. Additionally, it underscores the necessity of continuously monitoring and recalibrating such models to align with evolving societal norms and ethical standards.

翻訳日:2024-01-02 20:23:39 公開日:2023-12-29

# 強化学習に基づく列生成のための複数カラム選択戦略

A Reinforcement-Learning-Based Multiple-Column Selection Strategy for Column Generation ( http://arxiv.org/abs/2312.14213v2 )

ライセンス: Link先を確認

Haofeng Yuan, Lichang Fang, Shiji Song

(参考訳) カラム生成(CG)は、大規模線形プログラミング(LP)問題を解決する最も成功した手法の一つである。非常に多くの変数(列)を持つLPが与えられた場合、CGの考え方は列のサブセットのみを明示的に考慮し、目的値を改善するために潜在的カラムを反復的に追加することである。最も負のコストでカラムを追加するとcgの収束が保証されるが、単一のカラムではなく、イテレーション毎に複数のカラムを追加することがより高速な収束につながることが示されている。しかし、多数の候補列から最も有望な列を選択するために、複数列選択戦略を設計することは依然として課題である。本稿では,新しい強化学習ベース(RL)マルチカラム選択戦略を提案する。私たちの知る限りでは、cgに対するrlベースの最初のマルチカラム選択戦略です。本手法の有効性は,カットストック問題とグラフカラー問題という2つの問題に対して評価される。 RLをベースとした複数カラム選択戦略は, 広く使用されている単一カラムと複数カラムの選択戦略と比較して, より高速に収束し, CGイテレーション数や実行回数を大幅に削減する。

Column generation (CG) is one of the most successful approaches for solving large-scale linear programming (LP) problems. Given an LP with a prohibitively large number of variables (i.e., columns), the idea of CG is to explicitly consider only a subset of columns and iteratively add potential columns to improve the objective value. While adding the column with the most negative reduced cost can guarantee the convergence of CG, it has been shown that adding multiple columns per iteration rather than a single column can lead to faster convergence. However, it remains a challenge to design a multiple-column selection strategy to select the most promising columns from a large number of candidate columns. In this paper, we propose a novel reinforcement-learning-based (RL) multiple-column selection strategy. To the best of our knowledge, it is the first RL-based multiple-column selection strategy for CG. The effectiveness of our approach is evaluated on two sets of problems: the cutting stock problem and the graph coloring problem. Compared to several widely used single-column and multiple-column selection strategies, our RL-based multiple-column selection strategy leads to faster convergence and achieves remarkable reductions in the number of CG iterations and runtime.

翻訳日:2024-01-02 20:22:22 公開日:2023-12-29

# 二重単位回路の基本電荷

Fundamental charges for dual-unitary circuits ( http://arxiv.org/abs/2312.14148v2 )

ライセンス: Link先を確認

Tom Holden-Dye, Lluis Masanes, Arijeet Pal

(参考訳) デュアルユニタリ量子回路は、近年、多体量子力学の解析的扱いやすいモデルとして注目を集めている。ブリックワーク」パターンで配置された2量子ゲートの1+1D格子を構成するこれらのモデルは、空間と時間の役割を交換して各ゲートがユニタリでなければならないという制約によって定義される。この二重ユニタリ性は、これらの回路における局所作用素のダイナミクスを制限する:そのような作用素の支持は、回路の幾何学によって設定された因果光円錐の端の1つまたは両方に沿って、システムの有効光速で成長しなければならない。この特性を用いて、1+1D双対ユニタリ回路の場合、幅-$w$保存密度の集合($w$連続部位で支えられた演算子から構成される)は幅-$w$ソリトン演算子の集合と一対一の対応であり、乗算位相までは、双対ユニタリ力学により光の有効速度で空間的に変換される。これらの多体ソリトンを構成するいくつかの方法(具体的には局所ヒルベルト空間次元$d=2$)が証明される: 第一に、より小さく構成的なソリトン積を含む単純な構成、第二に、より小さなソリトン積として単に理解できない構成によって、ヨルダン・ウィグナー変換の下でのフェルミオンの積の正確な解釈を持つ。これにより、複雑な多体ソリトン(量子ビット上の双対ユニタリ回路)の微視的構造を特徴づける部分的な進歩がもたらされる一方で、フェルミオンモデルと双対ユニタリ回路の間のリンクが確立され、この枠組みで探究できる物理学の理解が促進される。

Dual-unitary quantum circuits have recently attracted attention as an analytically tractable model of many-body quantum dynamics. Consisting of a 1+1D lattice of 2-qudit gates arranged in a 'brickwork' pattern, these models are defined by the constraint that each gate must remain unitary under swapping the roles of space and time. This dual-unitarity restricts the dynamics of local operators in these circuits: the support of any such operator must grow at the effective speed of light of the system, along one or both of the edges of a causal light cone set by the geometry of the circuit. Using this property, it is shown here that for 1+1D dual-unitary circuits the set of width-$w$ conserved densities (constructed from operators supported over $w$ consecutive sites) is in one-to-one correspondence with the set of width-$w$ solitons - operators which, up to a multiplicative phase, are simply spatially translated at the effective speed of light by the dual-unitary dynamics. A number of ways to construct these many-body solitons (explicitly in the case where the local Hilbert space dimension $d=2$) are then demonstrated: firstly, via a simple construction involving products of smaller, constituent solitons; and secondly, via a construction which cannot be understood as simply in terms of products of smaller solitons, but which does have a neat interpretation in terms of products of fermions under a Jordan-Wigner transformation. This provides partial progress towards a characterisation of the microscopic structure of complex many-body solitons (in dual-unitary circuits on qubits), whilst also establishing a link between fermionic models and dual-unitary circuits, advancing our understanding of what kinds of physics can be explored in this framework.

翻訳日:2024-01-02 20:22:03 公開日:2023-12-29

# DiffusionGAN3D: 3D GANとDiffusion Priorを併用したテキスト誘導3D生成とドメイン適応

DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors ( http://arxiv.org/abs/2312.16837v2 )

ライセンス: Link先を確認

Biwen Lei, Kai Yu, Mengyang Feng, Miaomiao Cui, Xuansong Xie

(参考訳) テキスト誘導型ドメイン適応と3D認識画像の生成は、様々な分野で多くの応用を見出した。しかしながら、トレーニングデータの欠如と、多種多様な幾何学と外観を扱うことの難しさから、これらのタスクの既存の方法は、柔軟性の欠如、不安定性、低忠実性といった問題に苦しめられている。本稿では,3D GANと拡散前処理を組み合わせたテキスト誘導型3Dドメイン適応と生成を促進する新しいフレームワークDiffusionGAN3Dを提案する。具体的には,事前学習した3次元生成モデル(eg3dなど)とテキストから画像への拡散モデルを統合する。前者はテキストから安定した高品質なアバター生成のための強力な基盤を提供する。そして、拡散モデルは、強力で効率的なテキスト誘導ドメイン適応を実現するために、3Dジェネレータの微調整を情報的方向でガイドする。テキスト対アバタールにおけるドメイン適応の多様性と生成能力を高めるために,それぞれ,相対的距離損失とケース固有の学習可能な三面体を導入する。さらに,上述の両タスクのテクスチャ品質を向上させるために,プログレッシブなテクスチャリファインメントモジュールを設計する。広範な実験により、提案フレームワークは、ドメイン適応とテキストからアバレルタスクの両方において優れた結果を達成でき、生成品質と効率の点で既存の方法よりも優れています。プロジェクトのホームページはhttps://younglbw.github.io/DiffusionGAN3D-homepage/にある。

Text-guided domain adaption and generation of 3D-aware portraits find many applications in various fields. However, due to the lack of training data and the challenges in handling the high variety of geometry and appearance, the existing methods for these tasks suffer from issues like inflexibility, instability, and low fidelity. In this paper, we propose a novel framework DiffusionGAN3D, which boosts text-guided 3D domain adaption and generation by combining 3D GANs and diffusion priors. Specifically, we integrate the pre-trained 3D generative models (e.g., EG3D) and text-to-image diffusion models. The former provides a strong foundation for stable and high-quality avatar generation from text. And the diffusion models in turn offer powerful priors and guide the 3D generator finetuning with informative direction to achieve flexible and efficient text-guided domain adaption. To enhance the diversity in domain adaption and the generation capability in text-to-avatar, we introduce the relative distance loss and case-specific learnable triplane respectively. Besides, we design a progressive texture refinement module to improve the texture quality for both tasks above. Extensive experiments demonstrate that the proposed framework achieves excellent results in both domain adaption and text-to-avatar tasks, outperforming existing methods in terms of generation quality and efficiency. The project homepage is at https://younglbw.github.io/DiffusionGAN3D-homepage/.

翻訳日:2024-01-02 19:56:23 公開日:2023-12-29

# DarkShot:低解像度で高画質で暗い画像を照らす

DarkShot: Lighting Dark Images with Low-Compute and High-Quality ( http://arxiv.org/abs/2312.16805v2 )

ライセンス: Link先を確認

Jiazhang Zheng, Lei Li, Qiuping Liao, Cheng Li, Li Li, Yangxing Liu

(参考訳) 夜間の撮影は極端に低照度で、主に極低信号対雑音比に起因する困難に遭遇する。現実のデプロイメントでは、実用的なソリューションは視覚的に魅力的な結果を生み出すだけでなく、最小限の計算も必要です。しかし、既存のほとんどの手法は修復性能の改善に焦点を当てているか、品質の犠牲で軽量モデルを採用するかのどちらかである。本稿では,計算量を最小限に抑えつつ,低照度化タスクにおける既存のSOTA手法よりも優れた軽量ネットワークを提案する。提案ネットワークは,Siamese Self-Attention Block (SSAB) と Skip-Channel Attention (SCA) モジュールを組み込んで,グローバルな情報を集約するモデルの能力を高め,高解像度画像に適している。また,低照度画像復元プロセスの解析に基づいて,優れた結果を得るための2段階フレームワークを提案する。我々のモデルは、SOTA復元の品質を維持しながら、最小限の計算でUHD 4K解像度画像を復元することができる。

Nighttime photography encounters escalating challenges in extremely low-light conditions, primarily attributable to the ultra-low signal-to-noise ratio. For real-world deployment, a practical solution must not only produce visually appealing results but also require minimal computation. However, most existing methods are either focused on improving restoration performance or employ lightweight models at the cost of quality. This paper proposes a lightweight network that outperforms existing state-of-the-art (SOTA) methods in low-light enhancement tasks while minimizing computation. The proposed network incorporates Siamese Self-Attention Block (SSAB) and Skip-Channel Attention (SCA) modules, which enhance the model's capacity to aggregate global information and are well-suited for high-resolution images. Additionally, based on our analysis of the low-light image restoration process, we propose a Two-Stage Framework that achieves superior results. Our model can restore a UHD 4K resolution image with minimal computation while keeping SOTA restoration quality.

翻訳日:2024-01-02 19:55:59 公開日:2023-12-29

# 正規および不規則時系列インプットのための連続時間オートエンコーダ

Continuous-time Autoencoders for Regular and Irregular Time Series Imputation ( http://arxiv.org/abs/2312.16581v2 )

ライセンス: Link先を確認

Hyowon Wi, Yehjin Shin, Noseong Park

(参考訳) 時系列計算は、時系列の最も基本的なタスクの1つである。実世界の時系列データセットは、しばしば不完全である(または観測が不完全である)。多くの異なる時系列計算法が提案されている。最近のセルフアテンションに基づく手法は最先端のインプテーション性能を示している。しかし、連続時間リカレントニューラルネットワーク(rnn)、すなわちニューラルネットワーク制御微分方程式(ncdes)に基づくインプテーション法を設計するのは、長い間見過ごされてきた。この目的のために、NCDEに基づいて時系列(変分)オートエンコーダを再設計する。提案手法は連続時間オートエンコーダ(cta)と呼ばれ、入力時系列サンプルを(隠れたベクトルではなく)連続した隠れ経路に符号化し、それをデコードして入力を再構成・インデュートする。 4つのデータセットと19のベースラインを用いた実験では、ほぼすべてのケースで最高のインプテーション性能を示す。

Time series imputation is one of the most fundamental tasks for time series. Real-world time series datasets are frequently incomplete (or irregular with missing observations), in which case imputation is strongly required. Many different time series imputation methods have been proposed. Recent self-attention-based methods show the state-of-the-art imputation performance. However, it has been overlooked for a long time to design an imputation method based on continuous-time recurrent neural networks (RNNs), i.e., neural controlled differential equations (NCDEs). To this end, we redesign time series (variational) autoencoders based on NCDEs. Our method, called continuous-time autoencoder (CTA), encodes an input time series sample into a continuous hidden path (rather than a hidden vector) and decodes it to reconstruct and impute the input. In our experiments with 4 datasets and 19 baselines, our method shows the best imputation performance in almost all cases.

翻訳日:2024-01-02 19:55:19 公開日:2023-12-29

# 弱教師付き3次元意味セグメンテーションに対するマルチモダリティアフィニティ推論

Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation ( http://arxiv.org/abs/2312.16578v2 )

ライセンス: Link先を確認

Xiawei Li, Qingyuan Xu, Jing Zhang, Tianyi Zhang, Qian Yu, Lu Sheng, Dong Xu

(参考訳) 3d point cloud semantic segmentationには幅広いアプリケーションがある。近年,シーンレベルのラベルを活用することで,高価な手作業によるアノテーション処理を緩和することを目的とした,制御の弱いポイントクラウドセグメンテーション手法が提案されている。しかし、これらの手法は、RGB-Dスキャンに存在するリッチな幾何学情報(形状やスケールなど)や外観情報(色やテクスチャなど)を効果的に活用していない。さらに、現在のアプローチでは、弱いシーンレベルのラベルから学ぶのに不可欠である特徴抽出ネットワークから推測できる点親和性を完全に活用できない。さらに、従来の研究は、弱教師付き3次元セマンティックセマンティックセグメンテーションにおけるポイントクラウドデータの長期分布による有害な効果を見落としている。そこで本研究では,新たに導入された多モード点親和性推論モジュールを用いて,シーンレベルの弱教師付きポイントクラウドセグメンテーション手法を提案する。本論文で提案する点親和性は,複数モード(例えば,点雲とRGB)の特徴を特徴とし,分類器重みを正規化することにより,カテゴリ分布の先行を必要とせずに,長い尾分布の有害な影響を軽減する。 ScanNetとS3DISベンチマークの大規模な実験により,提案手法の有効性が検証された。コードはhttps://github.com/Sunny599/AAAI24-3DWSSG-MMAで公開されている。

3D point cloud semantic segmentation has a wide range of applications. Recently, weakly supervised point cloud segmentation methods have been proposed, aiming to alleviate the expensive and laborious manual annotation process by leveraging scene-level labels. However, these methods have not effectively exploited the rich geometric information (such as shape and scale) and appearance information (such as color and texture) present in RGB-D scans. Furthermore, current approaches fail to fully leverage the point affinity that can be inferred from the feature extraction network, which is crucial for learning from weak scene-level labels. Additionally, previous work overlooks the detrimental effects of the long-tailed distribution of point cloud data in weakly supervised 3D semantic segmentation. To this end, this paper proposes a simple yet effective scene-level weakly supervised point cloud segmentation method with a newly introduced multi-modality point affinity inference module. The point affinity proposed in this paper is characterized by features from multiple modalities (e.g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution. Extensive experiments on the ScanNet and S3DIS benchmarks verify the effectiveness of our proposed method, which outperforms the state-of-the-art by ~4% to ~6% mIoU. Codes are released at https://github.com/Sunny599/AAAI24-3DWSSG-MMA.

翻訳日:2024-01-02 19:54:36 公開日:2023-12-29

# GRSDet:Few-shot Object Detectionのための局所逆サンプル生成学習

GRSDet: Learning to Generate Local Reverse Samples for Few-shot Object Detection ( http://arxiv.org/abs/2312.16571v2 )

ライセンス: Link先を確認

Hefei Mei, Taijin Zhao, Shiyuan Tang, Heqian Qiu, Lanxiao Wang, Minjian Zhang, Fanman Meng, Hongliang Li

(参考訳) Few-shot Object Detection (FSOD) は、いくつかの新しいクラストレーニングデータを用いてのみオブジェクト検出を実現することを目的としている。既存の手法の多くは、ベースクラスの知識を伝達することで、新しいクラス分布を構築するための移行学習戦略を採用している。しかし、この直接的な方法は、決定空間における新しいクラスと他の類似のカテゴリとを簡単に混同する。この問題に対処するために,プロトタイプ参照フレームに局所逆サンプル(lrsamples)を生成し,新しいクラス分布の中心位置と境界範囲を適応的に調整し,fsodのより識別的な新しいクラスサンプルを学習する。まず, LRSamples の選択規則, LRSamples の生成元, 校正分布中心への拡張を含む Center Calibration Variance Augmentation (CCVA) モジュールを提案する。具体的には,クラス内機能変換器(ifc)をccvaの生成器として設計し,選択規則を学習する。 IFCは、ベーストレーニングから微調整に知識を移すことで、新しいクラス分布を校正するために、豊富な新しいサンプルを生成する。さらに,決定境界からの距離に応じて,サンプルの重要性を適応的に調整する特徴密度境界最適化 (FDBO) モジュールを提案する。類似クラスの高密度領域(閉じた決定境界領域)の重要性を強調し、類似クラスの低密度領域(より決定境界領域)の重みを減少させることで、各カテゴリの明確な決定境界を最適化することができる。提案手法の有効性を示すために広範な実験を行った。提案手法は,DeFRCN と MFDC ベースラインに基づく Pascal VOC と MS COCO データセットに対して一貫した改善を実現する。

Few-shot object detection (FSOD) aims to achieve object detection only using a few novel class training data. Most of the existing methods usually adopt a transfer-learning strategy to construct the novel class distribution by transferring the base class knowledge. However, this direct way easily results in confusion between the novel class and other similar categories in the decision space. To address the problem, we propose generating local reverse samples (LRSamples) in Prototype Reference Frames to adaptively adjust the center position and boundary range of the novel class distribution to learn more discriminative novel class samples for FSOD. Firstly, we propose a Center Calibration Variance Augmentation (CCVA) module, which contains the selection rule of LRSamples, the generator of LRSamples, and augmentation on the calibrated distribution centers. Specifically, we design an intra-class feature converter (IFC) as the generator of CCVA to learn the selecting rule. By transferring the knowledge of IFC from the base training to fine-tuning, the IFC generates plentiful novel samples to calibrate the novel class distribution. Moreover, we propose a Feature Density Boundary Optimization (FDBO) module to adaptively adjust the importance of samples depending on their distance from the decision boundary. It can emphasize the importance of the high-density area of the similar class (closer decision boundary area) and reduce the weight of the low-density area of the similar class (farther decision boundary area), thus optimizing a clearer decision boundary for each category. We conduct extensive experiments to demonstrate the effectiveness of our proposed method. Our method achieves consistent improvement on the Pascal VOC and MS COCO datasets based on DeFRCN and MFDC baselines.

翻訳日:2024-01-02 19:54:10 公開日:2023-12-29

# PanGu-Draw: 時間分割学習と再利用可能なクープ拡散による資源効率の良いテキスト・画像合成

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion ( http://arxiv.org/abs/2312.16486v2 )

ライセンス: Link先を確認

Guansong Lu, Yuanfan Guo, Jianhua Han, Minzhe Niu, Yihan Zeng, Songcen Xu, Zeyi Huang, Zhao Zhong, Wei Zhang, Hang Xu

(参考訳) 現在の大規模拡散モデルは条件付き画像合成において大きな飛躍を示しており、テキスト、人間のポーズ、エッジといった多様な手がかりを解釈することができる。しかし、計算資源や膨大なデータ収集への依存は依然としてボトルネックとなっている。一方で、異なる制御とユニークな潜在空間での操作に特化した既存の拡散モデルの統合は、互換性のない画像解像度と潜在空間埋め込み構造のために課題となり、共同使用を妨げている。これらの制約に対処するため,複数の制御信号に対応可能な資源効率の高いテキスト・画像合成のための新しい潜時拡散モデルPanGu-Drawを提案する。まず,モノリシックなテキストから画像へのモデルを構造とテクスチャ生成器に分割した,リソース効率の高い時間分離トレーニング戦略を提案する。各ジェネレータは、データ利用と計算効率を最大化し、データ準備を48%削減し、トレーニングリソースを51%削減するレジームを使用してトレーニングされる。次に,異なる潜在空間と事前定義された分解能を持つ様々な事前学習拡散モデルの協調的利用を可能にするアルゴリズムであるcoop-diffusionを提案する。これにより、追加データや再トレーニングを必要とせず、任意の解像度でマルチコントロール画像合成が可能となる。 pangu-drawの実証的検証は、テキスト対画像およびマルチコントロール画像生成における例外的な能力を示し、将来のモデルのトレーニング効率と世代の汎用性に有望な方向を示している。最大の5B T2I PanGu-DrawモデルはAscendプラットフォームでリリースされた。プロジェクトページ:$\href{https://pangu-draw.github.io}{this~https~url}$

Current large-scale diffusion models represent a giant leap forward in conditional image synthesis, capable of interpreting diverse cues like text, human poses, and edges. However, their reliance on substantial computational resources and extensive data collection remains a bottleneck. On the other hand, the integration of existing diffusion models, each specialized for different controls and operating in unique latent spaces, poses a challenge due to incompatible image resolutions and latent space embedding structures, hindering their joint use. Addressing these constraints, we present "PanGu-Draw", a novel latent diffusion model designed for resource-efficient text-to-image synthesis that adeptly accommodates multiple control signals. We first propose a resource-efficient Time-Decoupling Training Strategy, which splits the monolithic text-to-image model into structure and texture generators. Each generator is trained using a regimen that maximizes data utilization and computational efficiency, cutting data preparation by 48% and reducing training resources by 51%. Secondly, we introduce "Coop-Diffusion", an algorithm that enables the cooperative use of various pre-trained diffusion models with different latent spaces and predefined resolutions within a unified denoising process. This allows for multi-control image synthesis at arbitrary resolutions without the necessity for additional data or retraining. Empirical validations of Pangu-Draw show its exceptional prowess in text-to-image and multi-control image generation, suggesting a promising direction for future model training efficiencies and generation versatility. The largest 5B T2I PanGu-Draw model is released on the Ascend platform. Project page: $\href{https://pangu-draw.github.io}{this~https~URL}$

翻訳日:2024-01-02 19:53:39 公開日:2023-12-29

# LLMファクトスコープ:内部状態解析によるLLMのFactual Discernmentの発見

LLM Factoscope: Uncovering LLMs' Factual Discernment through Inner States Analysis ( http://arxiv.org/abs/2312.16374v2 )

ライセンス: Link先を確認

Jinwen He, Yujia Gong, Kai Chen, Zijin Lin, Chengan Wei, Yue Zhao

(参考訳) 大規模言語モデル(llm)は、幅広い知識と創造性を備えた様々なドメインに革命をもたらした。しかし、LLMにおける重要な問題は、現実と異なる出力を生成する傾向にある。この現象は、正確性が最重要である医療相談や法的助言のような敏感な応用において特に関係している。本稿では,llmの内部状態を利用して事実検出を行う新しいシャムネットワークモデルであるllmfactoscopeを提案する。本研究は,LLMの内部状態における実物と非実物との区別可能なパターンを明らかにする。我々は,様々なアーキテクチャにおけるllmファクトスコープの有効性を実証し,96%以上の精度を実現した。本研究は, LLMの内部状態を事実検出に活用するための新たな道を開き, 信頼性と透明性を高めるため, LLMの内部動作のさらなる探索を奨励する。

Large Language Models (LLMs) have revolutionized various domains with extensive knowledge and creative capabilities. However, a critical issue with LLMs is their tendency to produce outputs that diverge from factual reality. This phenomenon is particularly concerning in sensitive applications such as medical consultation and legal advice, where accuracy is paramount. In this paper, we introduce the LLM factoscope, a novel Siamese network-based model that leverages the inner states of LLMs for factual detection. Our investigation reveals distinguishable patterns in LLMs' inner states when generating factual versus non-factual content. We demonstrate the LLM factoscope's effectiveness across various architectures, achieving over 96% accuracy in factual detection. Our work opens a new avenue for utilizing LLMs' inner states for factual detection and encourages further exploration into LLMs' inner workings for enhanced reliability and transparency.

翻訳日:2024-01-02 19:51:56 公開日:2023-12-29

# 電磁界と相互作用する加速原子の絡み合いダイナミクス

Entanglement dynamics of accelerated atoms interacting with the Electromagnetic Field ( http://arxiv.org/abs/2312.16342v2 )

ライセンス: Link先を確認

M. S. Soares, N. F. Svaiter and G. Menezes

(参考訳) 開量子系の理論を用いたエンタングルメント力学における加速度の影響について検討する。このシナリオでは、異なる適切な時間で異なる双曲軌道に沿って移動する2つの原子を考える。一般化マスター方程式は、電磁場と相互作用する双極子対に使用される。本研究は, エンタングルメント収穫や急激な死現象において, 適切な加速が重要な役割を担っていることを観察し, 原子の偏光がこの結果に与える影響について検討する。

We study the effects of acceleration in entanglement dynamics using the theory of open quantum systems. In this scenario we consider two atoms moving along different hyperbolic trajectories with different proper times. The generalized master equation is used for a pair of dipoles interacting with the electromagnetic field. We observe that the proper acceleration plays an essential role in the entanglement harvesting and sudden death phenomenom and we study how the polarization of the atoms affects this results.

翻訳日:2024-01-02 19:51:31 公開日:2023-12-29

# DL3DV-10K:ディープラーニングに基づく3Dビジョンのための大規模シーンデータセット

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision ( http://arxiv.org/abs/2312.16256v2 )

ライセンス: Link先を確認

Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera

(参考訳) 我々は、ニューラルレイディアンス場(NeRF)に基づく3次元表現学習から、新しいビュー合成(NVS)への応用まで、ディープラーニングに基づく3次元視覚の進歩を目の当たりにしてきた。しかし、ディープラーニングに基づく3Dビジョンのための既存のシーンレベルのデータセットは、合成環境か現実世界のシーンの限られた選択に限られており、非常に不十分である。この不十分さは、既存の方法の包括的なベンチマークを妨げるだけでなく、深層学習に基づく3d分析で探せることの欠如を損なう。この重要なギャップに対処するため、DL3DV-10Kは大規模なシーンデータセットで、65種類のPOI(point-of-interest)位置から撮影された10,510の動画から51.2万フレームを特徴としている。我々は, DL3DV-10Kにおける最近のNVS手法の総合的なベンチマークを行い, 今後のNVS研究に有用な知見を明らかにした。さらに, DL3DV-10Kから一般化可能なNeRFを学習するためのパイロット実験の結果を得た。私たちのDL3DV-10Kデータセット、ベンチマーク結果、モデルはhttps://dl3dv-10k.github.io/DL3DV-10K/で公開されます。

We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However, existing scene-level datasets for deep learning-based 3D vision, limited to either synthetic environments or a narrow selection of real-world scenes, are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap, we present DL3DV-10K, a large-scale scene dataset, featuring 51.2 million frames from 10,510 videos captured from 65 types of point-of-interest (POI) locations, covering both bounded and unbounded scenes, with different levels of reflection, transparency, and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K, which revealed valuable insights for future research in NVS. In addition, we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K, which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation. Our DL3DV-10K dataset, benchmark results, and models will be publicly accessible at https://dl3dv-10k.github.io/DL3DV-10K/.

翻訳日:2024-01-02 19:51:21 公開日:2023-12-29

# 弾力性制約強化学習

Resilient Constrained Reinforcement Learning ( http://arxiv.org/abs/2312.17194v2 )

ライセンス: Link先を確認

Dongsheng Ding and Zhengyan Huan and Alejandro Ribeiro

(参考訳) 本研究では,複数の制約仕様をトレーニング前に特定しない制約強化学習(rl)問題のクラスについて検討する。報酬最大化目標と制約満足度との間に不明確なトレードオフがあるため、適切な制約仕様を特定することは困難である。この問題に対処するために、ポリシーと制約仕様を一緒に検索する新しい制約付きRLアプローチを提案する。本手法は、学習目的に導入される緩和コストに応じて制約を緩和する適応を特徴とする。この特徴は、生態系が操作を変えることによって破壊に適応する様子を模倣するので、我々のアプローチは弾力性制約付きRLと呼ばれる。具体的には、制約満足度と弾力性均衡の概念による報酬の最大化を両立させる十分条件を提供し、この均衡を最適解とする弾力性制約性ポリシー最適化の扱いやすい定式化を提案し、最適性ギャップと制約満足度に対する非漸近収束性保証を持つ2つの弾力性制約付きポリシー探索アルゴリズムを提唱する。さらに,計算実験において,本手法の有効性と有効性を示す。

We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before training. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward maximization objective and the constraint satisfaction, which is ubiquitous in constrained decision-making. To tackle this issue, we propose a new constrained RL approach that searches for policy and constraint specifications together. This method features the adaptation of relaxing the constraint according to a relaxation cost introduced in the learning objective. Since this feature mimics how ecological systems adapt to disruptions by altering operation, our approach is termed as resilient constrained RL. Specifically, we provide a set of sufficient conditions that balance the constraint satisfaction and the reward maximization in notion of resilient equilibrium, propose a tractable formulation of resilient constrained policy optimization that takes this equilibrium as an optimal solution, and advocate two resilient constrained policy search algorithms with non-asymptotic convergence guarantees on the optimality gap and constraint satisfaction. Furthermore, we demonstrate the merits and the effectiveness of our approach in computational experiments.

翻訳日:2024-01-02 19:08:28 公開日:2023-12-29

# DreamGaussian4D: 4Dガウシアンスプラッティング

DreamGaussian4D: Generative 4D Gaussian Splatting ( http://arxiv.org/abs/2312.17142v2 )

ライセンス: Link先を確認

Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu

(参考訳) 最近、4Dコンテンツ生成で顕著な進歩を遂げた。しかし、既存の手法では、最適化時間が長く、動作制御性が欠如しており、詳細度が低い。本稿では,4次元ガウス分割表現に基づく効率的な4D生成フレームワークであるDreamGaussian4Dを紹介する。我々の重要な洞察は、ガウススプラッティングにおける空間変換の明示的なモデリングは、暗黙の表現よりも4次元生成設定に適しているということである。 dreamgaussian4dは最適化時間を数時間から数分に短縮し、生成された3dモーションを柔軟に制御し、3dエンジンで効率的にレンダリングできるアニメーションメッシュを生成する。

Remarkable progress has been made in 4D content generation recently. However, existing methods suffer from long optimization time, lack of motion controllability, and a low level of detail. In this paper, we introduce DreamGaussian4D, an efficient 4D generation framework that builds on 4D Gaussian Splatting representation. Our key insight is that the explicit modeling of spatial transformations in Gaussian Splatting makes it more suitable for the 4D generation setting compared with implicit representations. DreamGaussian4D reduces the optimization time from several hours to just a few minutes, allows flexible control of the generated 3D motion, and produces animated meshes that can be efficiently rendered in 3D engines.

翻訳日:2024-01-02 19:07:46 公開日:2023-12-29

# ARTrackV2: 自動回帰トラッカーの表示方法と説明方法

ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe ( http://arxiv.org/abs/2312.17133v2 )

ライセンス: Link先を確認

Yifan Bai, Zeyang Zhao, Yihong Gong, Xing Wei

(参考訳) ARTrackV2は、追跡の2つの重要な側面、すなわち、どこを見るか(ローカライゼーション)と、ターゲットオブジェクトをビデオフレーム間でどのように記述するか(外観分析)の2点を統合する。 artrackv2は、前者の基盤を基盤として、オブジェクトの軌跡を「読み出し」し、その外観を自己回帰的に「書き直す」ための統一的な生成フレームワークを導入することで、概念を拡張している。このアプローチは、動きと視覚的特徴の合同進化をモデル化する時間連続的な方法論を育む。さらに、ARTrackV2はその効率性と単純さで際立つもので、フレーム内オートレグレッションの低さと外観更新のための手動パラメータを回避している。そのシンプルさにもかかわらず、artrackv2は、既存のベンチマークデータセットで最先端のパフォーマンスを実現し、優れた効率性を示している。特にARTrackV2は、GOT-10kで79.5\%、TrackingNetで86.1\%のAOスコアを達成し、ARTrackより3.6 \times$速い。コードはリリースされます。

We present ARTrackV2, which integrates two pivotal aspects of tracking: determining where to look (localization) and how to describe (appearance analysis) the target object across video frames. Building on the foundation of its predecessor, ARTrackV2 extends the concept by introducing a unified generative framework to "read out" object's trajectory and "retell" its appearance in an autoregressive manner. This approach fosters a time-continuous methodology that models the joint evolution of motion and visual features, guided by previous estimates. Furthermore, ARTrackV2 stands out for its efficiency and simplicity, obviating the less efficient intra-frame autoregression and hand-tuned parameters for appearance updates. Despite its simplicity, ARTrackV2 achieves state-of-the-art performance on prevailing benchmark datasets while demonstrating remarkable efficiency improvement. In particular, ARTrackV2 achieves AO score of 79.5\% on GOT-10k, and AUC of 86.1\% on TrackingNet while being $3.6 \times$ faster than ARTrack. The code will be released.

翻訳日:2024-01-02 19:07:35 公開日:2023-12-29

# 因果決定のための大規模言語モデル

Large Language Model for Causal Decision Making ( http://arxiv.org/abs/2312.17122v2 )

ライセンス: Link先を確認

Haitao Jiang, Lin Ge, Yuhe Gao, Jianian Wang, Rui Song

(参考訳) 大規模言語モデル(llm)は、一般的なトピックに対する言語理解と推論の成功を示している。しかし、因果決定のようなコーパス・レア概念におけるユーザ特定構造化データと知識に基づく推論能力はまだ限られている。本研究では,LLM を LLM4Causal に微調整することで,因果的タスクを識別し,対応する関数を実行し,ユーザのクエリと提供されるデータセットに基づいてその数値結果を解釈できる可能性を検討する。一方,より制御可能なgptプロンプトのためのデータ生成プロセスを提案し,(1)因果問題識別のためのcausal-retrieval-benchと因果関数呼び出しのための入力パラメータ抽出,(2)文脈内因果解釈のためのcausal-interpret-benchの2つの命令チューニングデータセットを提案する。 3つのケーススタディで、llm4causalは因果問題に対するエンドツーエンドソリューションを提供し、理解しやすい回答を提供できることを示した。数値研究では、クエリによって与えられた正しい因果タスクを識別する能力も明らかにされている。

Large Language Models (LLMs) have shown their success in language understanding and reasoning on general topics. However, their capability to inference based on user-specified structured data and knowledge in corpus-rare concepts like causal decision-making is still limited. In this work, we explore the possibility of fine-tuning an open-sourced LLM into LLM4Causal, which can identify the causal task, execute a corresponding function, and interpret its numerical results based on users' queries and the provided dataset. Meanwhile, we propose a data generation process for more controllable GPT prompting and present two instruction-tuning datasets: (1) Causal-Retrieval-Bench for causal problem identification and input parameter extraction for causal function calling and (2) Causal-Interpret-Bench for in-context causal interpretation. With three case studies, we showed that LLM4Causal can deliver end-to-end solutions for causal problems and provide easy-to-understand answers. Numerical studies also reveal that it has a remarkable ability to identify the correct causal task given a query.

翻訳日:2024-01-02 19:07:15 公開日:2023-12-29

# 完全スパース3次元パノプティカル占有予測

Fully Sparse 3D Panoptic Occupancy Prediction ( http://arxiv.org/abs/2312.17118v2 )

ライセンス: Link先を確認

Haisong Liu, Haiguang Wang, Yang Chen, Zetong Yang, Jia Zeng, Li Chen, Limin Wang

(参考訳) 運転予測は自動運転の領域において重要な役割を果たす。従来の手法では、通常、密集した3Dボリュームを構築し、シーン固有の空間を無視し、高い計算コストをもたらす。さらに、これらの手法は意味的占有に限られており、異なるインスタンスを区別できない。そこで本研究では,スパルスOccと呼ばれる,スパルスなパン光学占有ネットワークを新たに導入する。 SparseOccは最初、視覚入力からスパース3D表現を再構築する。その後、スパースインスタンスクエリを使用して、スパース3D表現から各オブジェクトインスタンスを予測する。これらのインスタンスクエリはマスク誘導スパースサンプリングを介して2次元特徴と相互作用するため、コストのかかる高密度特徴やグローバルな注意を回避できる。さらに、視覚中心のpanoptic占有率ベンチマークを初めて確立しました。 SparseOccはその有効性をOcc3D-nusデータセットで示し、平均的な連邦間(mIoU)を26.0で達成し、リアルタイムの推論速度は25.4 FPSを維持した。 SparseOccは、前の8フレームから時間的モデリングを取り入れることで、その性能をさらに向上させ、30.9 mIoUをホイッスルやベルなしで達成した。コードは利用可能になる。

Occupancy prediction plays a pivotal role in the realm of autonomous driving. Previous methods typically constructs a dense 3D volume, neglecting the inherent sparsity of the scene, which results in a high computational cost. Furthermore, these methods are limited to semantic occupancy and fail to differentiate between distinct instances. To exploit the sparsity property and ensure instance-awareness, we introduce a novel fully sparse panoptic occupancy network, termed SparseOcc. SparseOcc initially reconstructs a sparse 3D representation from visual inputs. Subsequently, it employs sparse instance queries to predict each object instance from the sparse 3D representation. These instance queries interact with 2D features via mask-guided sparse sampling, thereby circumventing the need for costly dense features or global attention. Additionally, we have established the first-ever vision-centric panoptic occupancy benchmark. SparseOcc demonstrates its efficacy on the Occ3D-nus dataset by achieving a mean Intersection over Union (mIoU) of 26.0, while maintaining a real-time inference speed of 25.4 FPS. By incorporating temporal modeling from the preceding 8 frames, SparseOcc further improves its performance, achieving 30.9 mIoU without whistles and bells. Code will be made available.

翻訳日:2024-01-02 19:06:53 公開日:2023-12-29

# 変圧器の長さ外挿:位置符号化の観点から

Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding ( http://arxiv.org/abs/2312.17044v2 )

ライセンス: Link先を確認

Liang Zhao, Xiaocheng Feng, Xiachong Feng, Bing Qin, Ting Liu

(参考訳) Transformerは、シークエンスにおける複雑な依存関係をモデル化する優れた能力のため、誕生以来、自然言語処理(NLP)の分野を嵐によって捉えてきた。ほぼ全てのnlpタスクにおけるトランスフォーマーに基づく事前学習言語モデル(plms)の成功にもかかわらず、それらはすべて事前設定された長さ制限に苦しめられており、この成功は、見かけのデータを超えた長いシーケンス、すなわち長さの補間問題にまで拡張することができない。長さ外挿は人間の言語能力の中核的な特徴であるため、研究者の間で大きな関心を集めている。トランスフォーマーの長さ外挿を強化するため,多くの手法が提案され,主に外挿可能な位置符号化に焦点が当てられている。本稿では,既存の手法をより深く理解し,今後の研究に刺激を与えることを目的として,位置符号化の観点から,これらの研究成果を統一的な表記法として整理的かつ体系的に検討する。

Transformer has taken the natural language processing (NLP) field by storm since birth, owing to its superior ability to model complex dependencies in sequences. Despite the great success of pretrained language models (PLMs) based on Transformer across almost all NLP tasks, they all suffer from a preset length limit and thus can hardly extend this success to longer sequences beyond seen data, namely the length extrapolation problem. Length extrapolation has aroused great interest among researchers, as it is the core feature of human language capacity. To enhance length extrapolation of Transformers, a plethora of methods have been proposed, mostly focusing on extrapolatable position encodings. In this article, we provide an organized and systematical review of these research efforts in a unified notation from a position encoding perspective, aiming to enable the reader to gain a deep understanding of existing methods and provide stimuli for future research.

翻訳日:2024-01-02 19:06:29 公開日:2023-12-29

# ソフトウェア開発エージェントの体験的共同学習

Experiential Co-Learning of Software-Developing Agents ( http://arxiv.org/abs/2312.17025v2 )

ライセンス: Link先を確認

Chen Qian and Yufan Dang and Jiahao Li and Wei Liu and Weize Chen and Cheng Yang and Zhiyuan Liu and Maosong Sun

(参考訳) 近年の大規模言語モデル(llm)の発展は、特にllm駆動の自律エージェントを通じて、様々なドメインに大きな変化をもたらした。これらのエージェントは、シームレスに協調し、タスクを分割し、精度を高め、人間の関与の必要性を最小限に抑えることができる。しかし、これらのエージェントはしばしば、過去の経験から利益を得ることなく、独立した様々なタスクにアプローチする。この分離は、タスク解決における繰り返しのミスや非効率な試行につながる可能性がある。そこで,本稿では,教師とアシスタントエージェントが過去の軌跡からショートカット指向の体験を収集し,過去の経験を相互推論に利用するための新しい枠組みであるExperiential Co-Learningを紹介する。このパラダイムは、以前の経験に富んだもので、エージェントに見えないタスクをより効果的に対処させる。

Recent advancements in large language models (LLMs) have brought significant changes to various domains, especially through LLM-driven autonomous agents. These agents are now capable of collaborating seamlessly, splitting tasks and enhancing accuracy, thus minimizing the need for human involvement. However, these agents often approach a diverse range of tasks in isolation, without benefiting from past experiences. This isolation can lead to repeated mistakes and inefficient trials in task solving. To this end, this paper introduces Experiential Co-Learning, a novel framework in which instructor and assistant agents gather shortcut-oriented experiences from their historical trajectories and use these past experiences for mutual reasoning. This paradigm, enriched with previous experiences, equips agents to more effectively address unseen tasks.

翻訳日:2024-01-02 19:06:09 公開日:2023-12-29

# FFCA-Net:サイド情報の高速カスケードアライメントによるステレオ画像圧縮

FFCA-Net: Stereo Image Compression via Fast Cascade Alignment of Side Information ( http://arxiv.org/abs/2312.16963v2 )

ライセンス: Link先を確認

Yichong Xia, Yujun Huang, Bin Chen, Haoqian Wang, Yaowei Wang

(参考訳) マルチビュー圧縮技術、特にステレオ画像圧縮(SIC)は、車載カメラや3D関連アプリケーションにおいて重要な役割を果たす。興味深いことに、分散ソース符号化(DSC)理論は、独立符号化と共同復号によって相関ソースの効率的なデータ圧縮を実現することができることを示唆している。これは近年急速に発展してきた分散SIC手法を動機付けている。しかし、これらのアプローチはステレオ撮影タスクのユニークな特徴を無視し、高い復号遅延を引き起こす。この制限に対処するために,デコーダの側情報を完全に活用する機能ベースの高速カスケードアライメントネットワーク(FFCA-Net)を提案する。 FFCAは粗大なカスケードアライメントアプローチを採用する。最初の段階では、FFCAはステレオプリミティブに基づいたフィーチャードメインパッチマッチングモジュールを使用する。このモジュールは、自明なマッチング手法の探索空間における冗長性を低減し、さらにノイズの導入を緩和する。その後の段階では、時間ガラスを用いたスパースステレオリファインメントネットワークを用いて、画像間特徴を計算コストの削減とともに調整する。さらに,FFF(Fast Feature Fusion Network)と呼ばれる軽量かつ高性能な機能融合ネットワークを考案し,その特徴をデコードした。 InStereo2K,KITTI,Cityscapesのデータセットによる実験結果から,従来のSIC手法よりもアプローチが優れていることが示された。特に,提案手法は,他の手法よりも3倍から10倍高速な復号化を実現する。

Multi-view compression technology, especially Stereo Image Compression (SIC), plays a crucial role in car-mounted cameras and 3D-related applications. Interestingly, the Distributed Source Coding (DSC) theory suggests that efficient data compression of correlated sources can be achieved through independent encoding and joint decoding. This motivates the rapidly developed deep-distributed SIC methods in recent years. However, these approaches neglect the unique characteristics of stereo-imaging tasks and incur high decoding latency. To address this limitation, we propose a Feature-based Fast Cascade Alignment network (FFCA-Net) to fully leverage the side information on the decoder. FFCA adopts a coarse-to-fine cascaded alignment approach. In the initial stage, FFCA utilizes a feature domain patch-matching module based on stereo priors. This module reduces redundancy in the search space of trivial matching methods and further mitigates the introduction of noise. In the subsequent stage, we utilize an hourglass-based sparse stereo refinement network to further align inter-image features with a reduced computational cost. Furthermore, we have devised a lightweight yet high-performance feature fusion network, called a Fast Feature Fusion network (FFF), to decode the aligned features. Experimental results on InStereo2K, KITTI, and Cityscapes datasets demonstrate the significant superiority of our approach over traditional and learning-based SIC methods. In particular, our approach achieves significant gains in terms of 3 to 10-fold faster decoding speed than other methods.

翻訳日:2024-01-02 19:05:55 公開日:2023-12-29

# 認知図面の自動描画:金標準に対する機械的スコアの品質評価

Automatic Scoring of Cognition Drawings: Assessing the Quality of Machine-Based Scores Against a Gold Standard ( http://arxiv.org/abs/2312.16887v2 )

ライセンス: Link先を確認

Arne Bethmann, Marina Aoki, Charlotte Hunsicker, Claudia Weileder

(参考訳) 図面描画はしばしば認知症スクリーニングプロトコルの一部として使われる。 The Survey of Health Aging and Retirement in Europe (SHARE)は、認知に関する調査モジュールの一部として、Addenbrooke's Cognitive Examination IIIの3つの図面テストを採用した。図面は通常訓練を受けた臨床医が採点するが、shareは面接を行う対面面接者を使ってフィールドワーク中に図面を採点する。インタビュアーはスコアの一貫性が低く、臨床訓練の欠如によりエラーを起こしやすいため、これはデータ品質にリスクをもたらす可能性がある。そこで本稿では,最初の概念実証を報告し,ディープラーニングを用いたスコアリングの自動化の可能性について評価する。我々は,ドイツにおける第8波から約2,000枚の図面とそれに対応するインタビュアースコア,および自己開発した'ゴールドスタンダード'スコアを用いて,いくつかの異なる畳み込みニューラルネットワーク(cnn)モデルを訓練した。結果は、このアプローチが実際に実現可能であることを示唆している。インタビュアースコアのトレーニングと比較すると、ゴールド標準データに基づいてトレーニングされたモデルは、予測精度を約10ポイント向上する。最高のパフォーマンスモデルであるconvnext baseは、約85%の精度を実現している。これは有望な結果であるが、モデルはまだ部分的に正しい図面を得るのに苦労しており、これはインタビュアーにとっても問題となっている。これは、プロダクションレベルの予測精度を達成するために、より優れたトレーニングデータが必要であることを示唆している。したがって、トレーニング例の品質と量を改善するための次のステップについて議論する。

Figure drawing is often used as part of dementia screening protocols. The Survey of Health Aging and Retirement in Europe (SHARE) has adopted three drawing tests from Addenbrooke's Cognitive Examination III as part of its questionnaire module on cognition. While the drawings are usually scored by trained clinicians, SHARE uses the face-to-face interviewers who conduct the interviews to score the drawings during fieldwork. This may pose a risk to data quality, as interviewers may be less consistent in their scoring and more likely to make errors due to their lack of clinical training. This paper therefore reports a first proof of concept and evaluates the feasibility of automating scoring using deep learning. We train several different convolutional neural network (CNN) models using about 2,000 drawings from the 8th wave of the SHARE panel in Germany and the corresponding interviewer scores, as well as self-developed 'gold standard' scores. The results suggest that this approach is indeed feasible. Compared to training on interviewer scores, models trained on the gold standard data improve prediction accuracy by about 10 percentage points. The best performing model, ConvNeXt Base, achieves an accuracy of about 85%, which is 5 percentage points higher than the accuracy of the interviewers. While this is a promising result, the models still struggle to score partially correct drawings, which are also problematic for interviewers. This suggests that more and better training data is needed to achieve production-level prediction accuracy. We therefore discuss possible next steps to improve the quality and quantity of training examples.

翻訳日:2024-01-02 19:05:31 公開日:2023-12-29

# ClST:知識蒸留による自動変調認識のための畳み込みトランスフォーマフレームワーク

ClST: A Convolutional Transformer Framework for Automatic Modulation Recognition by Knowledge Distillation ( http://arxiv.org/abs/2312.17446v1 )

ライセンス: Link先を確認

Dongbin Hou, Lixin Li, Wensheng Lin, Junli Liang, Zhu Han

(参考訳) 近年のディープラーニング (DL) の急速な発展に伴い, DLを用いた自動変調認識 (AMR) の精度が向上した。しかし、複雑なチャネル環境や大規模DLモデルにおける訓練信号データ不足は、DL手法の展開を難しくする重要な要因である。そこで,本研究では,畳み込み結合信号変換(clst)と呼ばれる新しいニューラルネットワークと,信号知識蒸留(skd)と呼ばれる新しい知識蒸留法を提案する。 ClSTは、畳み込みを含むトランスフォーマー階層、平行空間チャネルアテンション(PSCA)機構と呼ばれる新しいアテンション機構、畳み込み-トランスフォーマープロジェクション(CTP)と呼ばれる新しい畳み込みトランスフォーマーブロックの3つの主要な修正によって達成される。 SKDは、ニューラルネットワークのパラメータと複雑さを効果的に削減する知識蒸留法である。 2つの軽量ニューラルネットワークをskdアルゴリズム、kd-cnnとkd-mobilenetを用いてトレーニングし、ニューラルネットワークを小型デバイスで使用できるというニーズを満たす。シミュレーションの結果、clstはすべてのデータセットで高度なニューラルネットワークを上回ることがわかった。さらに、kd-cnnとkd-mobilenetは、ネットワークの複雑さを少なくして高い認識精度を得られるため、小型通信デバイスへのamrの展開に非常に有用である。

With the rapid development of deep learning (DL) in recent years, automatic modulation recognition (AMR) with DL has achieved high accuracy. However, insufficient training signal data in complicated channel environments and large-scale DL models are critical factors that make DL methods difficult to deploy in practice. Aiming to these problems, we propose a novel neural network named convolution-linked signal transformer (ClST) and a novel knowledge distillation method named signal knowledge distillation (SKD). The ClST is accomplished through three primary modifications: a hierarchy of transformer containing convolution, a novel attention mechanism named parallel spatial-channel attention (PSCA) mechanism and a novel convolutional transformer block named convolution-transformer projection (CTP) to leverage a convolutional projection. The SKD is a knowledge distillation method to effectively reduce the parameters and complexity of neural networks. We train two lightweight neural networks using the SKD algorithm, KD-CNN and KD-MobileNet, to meet the demand that neural networks can be used on miniaturized devices. The simulation results demonstrate that the ClST outperforms advanced neural networks on all datasets. Moreover, both KD-CNN and KD-MobileNet obtain higher recognition accuracy with less network complexity, which is very beneficial for the deployment of AMR on miniaturized communication devices.

翻訳日:2024-01-02 14:07:25 公開日:2023-12-29

# SMoT: ステートマシンについて考える

SMoT: Think in State Machine ( http://arxiv.org/abs/2312.17445v1 )

ライセンス: Link先を確認

Jia Liu, Jie Shuai

(参考訳) 言語モデル推論の現在の推進的アプローチは、主に言語モデル(LLM)による推論経路の自律的な探索に依存しており、誤った経路に遭遇した場合、避けられない追跡操作に直面している。これに続いて、代替の推論経路が追求される。しかしながら、人間は問題から最適解を抽象化することに長けており、同様の問題解決のための迅速かつ正確な推論を容易にする。これを踏まえ、私たちは専門家の知識を活用してLLM内の問題解決を強化する可能性について検討する。我々は,LLMを効率的な推論経路で表現し,無作為な探索をなくすために,事前定義された状態マシンを利用する新しいパラダイムであるState Machine of Thought(SMoT)を導入する。さらに,エージェントに異なる目的を割り当てるマルチエージェント機構を提案し,SMoT推論の精度を高めることを目的とした。アレイ推論タスクから導かれた実験結果から,SMoTが95%の異常精度を実現し,最先端のベースラインの性能を上回ることがわかった。

Current prompting approach for language model inference mainly rely on Language Model's (LLM) autonomous exploration of reasoning paths, confronts an inevitable retracing operation when erroneous routes are encountered. This is followed by the pursuit of alternative reasoning paths. However, humans are adept at abstracting optimal solutions from problems, thereby facilitating swift and precise reasoning for similar problems resolution. In light of this, we delves into the potential of harnessing expert knowledge to enhance problem-solving within LLMs. We introduce a novel paradigm, the State Machine of Thought (SMoT), which employs predefined state machines to furnish LLMs with efficient reasoning paths, thereby eliminating fruitless exploration. Furthermore, we propose a multi-agent mechanism that assigns different objectives to agents, aiming to enhance the accuracy of SMoT reasoning. The experimental results, derived from an array reasoning task, reveal that SMoT realizes an extraordinary accuracy of 95\%, surpassing the performance of the state-of-the-art baselines.

翻訳日:2024-01-02 14:06:59 公開日:2023-12-29

# ハトの穴から抜け出す:レコメンデーションシステムにおけるミスカバリレーション、バイアス、ステレオタイプを調べるための統一フレームワーク

Break Out of a Pigeonhole: A Unified Framework for Examining Miscalibration, Bias, and Stereotype in Recommender Systems ( http://arxiv.org/abs/2312.17443v1 )

ライセンス: Link先を確認

Yongsu Ahn and Yu-Ru Lin

(参考訳) 利用者のニーズに合わせて商品や情報をパーソナライズすることの利点にもかかわらず、推薦システムは人気アイテムや特定のカテゴリーのアイテムや支配的なユーザーグループに有利なバイアスをもたらす傾向がある。本研究では,レコメンデーションシステムの体系的誤りと,ステレオタイプやバイアス,誤校正など,さまざまな説明責任問題にどのように現れるかを明らかにすることを目的とする。本稿では,予測誤りの原因を,個人レベルでも集団レベルでも,様々な種類のシステム誘発効果を定量化する重要な指標の集合に識別する統合フレームワークを提案する。評価の枠組みに基づき,映画推薦の文脈において最も広く採用されているアルゴリズムについて検討した。 1) アルゴリズムの違い: 単純なアルゴリズムによって生成されるレコメンデーションは、より複雑なアルゴリズムによって生成されるものよりもステレオタイプ的であるが、バイアスが少ない傾向にある。 2) グループや個人に対する異なる影響: システムによる偏見とステレオタイプは非定型的ユーザや少数派(女性や高齢者など)に不均等な影響を及ぼす。 3) 緩和機会: 構造方程式モデリングを用いて, ユーザ特性(典型的・多様性), システム誘発効果, 誤校正の相互作用を同定する。また,ステレオタイプ低減や推奨品質の向上に有効な過小評価されたグループや個人を過小評価することで,システム誘発効果の軽減の可能性についても検討した。本研究は,レコメンダシステムにおけるシステム誘発効果とミスキャリブレーションだけでなく,ステレオタイプ問題も体系的に検討した最初の研究である。

Despite the benefits of personalizing items and information tailored to users' needs, it has been found that recommender systems tend to introduce biases that favor popular items or certain categories of items, and dominant user groups. In this study, we aim to characterize the systematic errors of a recommendation system and how they manifest in various accountability issues, such as stereotypes, biases, and miscalibration. We propose a unified framework that distinguishes the sources of prediction errors into a set of key measures that quantify the various types of system-induced effects, both at the individual and collective levels. Based on our measuring framework, we examine the most widely adopted algorithms in the context of movie recommendation. Our research reveals three important findings: (1) Differences between algorithms: recommendations generated by simpler algorithms tend to be more stereotypical but less biased than those generated by more complex algorithms. (2) Disparate impact on groups and individuals: system-induced biases and stereotypes have a disproportionate effect on atypical users and minority groups (e.g., women and older users). (3) Mitigation opportunity: using structural equation modeling, we identify the interactions between user characteristics (typicality and diversity), system-induced effects, and miscalibration. We further investigate the possibility of mitigating system-induced effects by oversampling underrepresented groups and individuals, which was found to be effective in reducing stereotypes and improving recommendation quality. Our research is the first systematic examination of not only system-induced effects and miscalibration but also the stereotyping issue in recommender systems.

翻訳日:2024-01-02 14:06:40 公開日:2023-12-29

# ウィグダーソンズの不確実性原理へのアプローチについて

On Wigdersons' approach to the uncertainty principle ( http://arxiv.org/abs/2312.17438v1 )

ライセンス: Link先を確認

Nuno Costa Dias, Franz Luef and Jo\~ao Nuno Prata

(参考訳) 我々は、A. Wigderson と Y. Wigderson が提案する不確実性原理を再考する。このアプローチは、時間と周波数の同時的急激な局所化の不確かさを表すいくつかの不等式を導出できる主要な不確実性原理に基づいている。さらに、フーリエ変換の特別な性質は必要とせず、したがって一次不確実性原理を満たすすべての作用素に容易に適用できる。 A. Wigderson と Y. Wigderson も高次元への多くの一般化を提案し、本論文で述べるいくつかの予想を述べた。我々は,著者が提案する結果を証明するためには,より一般的な初等不確実性原理を考える必要があると論じる。副産物として、カウリング・プライス不確実性原理に類似した新たな不等式を求め、一次不確実性原理からエントロピー不確実性原理を導出する。

We revisit the uncertainty principle from the point of view suggested by A. Wigderson and Y. Wigderson. This approach is based on a primary uncertainty principle from which one can derive several inequalities expressing the impossibility of a simultaneous sharp localization in time and frequency. Moreover, it requires no specific properties of the Fourier transform and can therefore be easily applied to all operators satisfying the primary uncertainty principle. A. Wigderson and Y. Wigderson also suggested many generalizations to higher dimensions and stated several conjectures which we address in the present paper. We argue that we have to consider a more general primary uncertainty principle to prove the results suggested by the authors. As a by-product we obtain some new inequalities akin to the Cowling-Price uncertainty principle and derive the entropic uncertainty principle from the primary uncertainty principles.

翻訳日:2024-01-02 14:06:09 公開日:2023-12-29

# 大規模言語モデルによるビデオ理解:調査

Video Understanding with Large Language Models: A Survey ( http://arxiv.org/abs/2312.17432v1 )

ライセンス: Link先を確認

Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, Jianguo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu

(参考訳) オンラインビデオプラットフォームの急成長とビデオコンテンツの増大に伴い、熟練したビデオ理解ツールの需要が著しく高まっている。本稿では,LLM(Large Language Models, LLMs)を用いて, LLM(Vid-LLMs)のパワーを利用した映像理解の最近の進歩について概説する。 Vid-LLMの創発的能力は驚くほど進歩しており、特に空間的空間的推論と常識的知識が組み合わさり、将来的なビデオ理解の道のりを示唆している。我々は、vid-llmsのユニークな特徴と能力を調べ、そのアプローチをllmベースのビデオエージェント、vid-llmsプリトレーニング、vid-llms命令チューニング、ハイブリッド手法の4つのタイプに分類した。さらに,本調査では,Vid-LLMのタスクとデータセットの包括的調査と評価手法についても紹介した。さらに、調査は、様々なドメインにわたるvid-llmの広範囲な応用を探求し、実世界のビデオ理解における課題に対処する上で、その驚くべきスケーラビリティと汎用性を示す。最後に,既存のvid-llmの限界と今後の研究の方向性をまとめた。詳細については、https://github.com/yunlong10/Awesome-LLMs-for-Video-Understandingのリポジトリをご覧ください。

With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly. With Large Language Models (LLMs) showcasing remarkable capabilities in key language tasks, this survey provides a detailed overview of the recent advancements in video understanding harnessing the power of LLMs (Vid-LLMs). The emergent capabilities of Vid-LLMs are surprisingly advanced, particularly their ability for open-ended spatial-temporal reasoning combined with commonsense knowledge, suggesting a promising path for future video understanding. We examine the unique characteristics and capabilities of Vid-LLMs, categorizing the approaches into four main types: LLM-based Video Agents, Vid-LLMs Pretraining, Vid-LLMs Instruction Tuning, and Hybrid Methods. Furthermore, this survey also presents a comprehensive study of the tasks and datasets for Vid-LLMs, along with the methodologies employed for evaluation. Additionally, the survey explores the expansive applications of Vid-LLMs across various domains, thereby showcasing their remarkable scalability and versatility in addressing challenges in real-world video understanding. Finally, the survey summarizes the limitations of existing Vid-LLMs and the directions for future research. For more information, we recommend readers visit the repository at https://github.com/yunlong10/Awesome-LLMs-for-Video-Understanding.

翻訳日:2024-01-02 14:05:53 公開日:2023-12-29

# MVPatch:現実世界の物体探知機に対する敵のカモフラージュ攻撃のより鮮明なパッチ

MVPatch: More Vivid Patch for Adversarial Camouflaged Attacks on Object Detectors in the Physical World ( http://arxiv.org/abs/2312.17431v1 )

ライセンス: Link先を確認

Zheng Zhou, Hongbo Zhao, Ju Liu, Qiaosheng Zhang, Guangbiao Wang, Chunlei Wang and Wenquan Feng

(参考訳) 近年の研究では、敵のパッチがオブジェクト検出モデルからの出力を操作できることが示されている。しかし、これらのパッチの顕著なパターンは、より注意を引き、人間の間で疑念を喚起する可能性がある。さらに、既存の研究は主に個々のモデルの攻撃性能に重点を置いており、複数のオブジェクト検出モデルに対するアンサンブル攻撃のための敵パッチの生成を無視している。これらの問題に対処するため,従来のパラダイム,例えば識別の容易さや伝達性の低さを考慮しつつ,敵パッチの転送性やステルス性を改善することを目的とした「MVPatch(More Vivid Patch)」と呼ばれる新しいアプローチを提案する。本手法では, アンサンブル攻撃損失関数を用いて, 複数の物体検出器の物体信頼度を低減し, 対向パッチの転送性を向上させる。さらに,画像類似度比較(CSS)損失関数によって実現される軽量な視覚類似度測定アルゴリズムを提案する。拡張実験により,提案したMVPatchアルゴリズムは,デジタルドメインと物理ドメインの両方で類似したアルゴリズムよりも優れた攻撃伝達性を実現するとともに,より自然な外観を示すことを示した。これらの結果は,提案したMVPatch攻撃アルゴリズムの顕著なステルス性と伝達性を強調した。

Recent research has shown that adversarial patches can manipulate outputs from object detection models. However, the conspicuous patterns on these patches may draw more attention and raise suspicions among humans. Moreover, existing works have primarily focused on the attack performance of individual models and have neglected the generation of adversarial patches for ensemble attacks on multiple object detection models. To tackle these concerns, we propose a novel approach referred to as the More Vivid Patch (MVPatch), which aims to improve the transferability and stealthiness of adversarial patches while considering the limitations observed in prior paradigms, such as easy identification and poor transferability. Our approach incorporates an attack algorithm that decreases object confidence scores of multiple object detectors by using the ensemble attack loss function, thereby enhancing the transferability of adversarial patches. Additionally, we propose a lightweight visual similarity measurement algorithm realized by the Compared Specified Image Similarity (CSS) loss function, which allows for the generation of natural and stealthy adversarial patches without the reliance on additional generative models. Extensive experiments demonstrate that the proposed MVPatch algorithm achieves superior attack transferability compared to similar algorithms in both digital and physical domains, while also exhibiting a more natural appearance. These findings emphasize the remarkable stealthiness and transferability of the proposed MVPatch attack algorithm.

翻訳日:2024-01-02 14:05:26 公開日:2023-12-29

# lefl: フェデレーション学習における低エントロピークライアントサンプリング

LEFL: Low Entropy Client Sampling in Federated Learning ( http://arxiv.org/abs/2312.17430v1 )

ライセンス: Link先を確認

Waqwoya Abebe, Pablo Munoz, Ali Jannesari

(参考訳) Federated Learning(FL)は、複数のクライアントが協力して、プライベートデータを使用して単一のグローバルモデルを最適化する、機械学習パラダイムである。グローバルモデルは、一連のトレーニングラウンドを通じてFLトレーニングプロセスを編成する中央サーバによって維持される。各ラウンドで、サーバはクライアントプールからクライアントをサンプリングし、さらに最適化するために最新のグローバルモデルパラメータを送信する。ナイーブサンプリング戦略はランダムクライアントサンプリングを実装し、プライバシの理由からクライアントデータ分布を見積もらない。そこで我々は,データプライバシを尊重しつつ,モデルの学習したハイレベル機能に基づいて,クライアントの1回クラスタリングを行うことで,代替サンプリング戦略を提案する。これにより、サーバは各ラウンドでクラスタ間で階層化されたクライアントサンプリングを実行することができる。このアプローチで選択されたサンプルクライアントのデータセットは、グローバルデータ分布に対して低い相対エントロピーをもたらす。その結果、flトレーニングはノイズが少なくなり、いくつかの実験でグローバルモデルの収束率を最大7.4%向上させる。さらに、目標精度を達成するために必要な通信ラウンドを大幅に削減する。

Federated learning (FL) is a machine learning paradigm where multiple clients collaborate to optimize a single global model using their private data. The global model is maintained by a central server that orchestrates the FL training process through a series of training rounds. In each round, the server samples clients from a client pool before sending them its latest global model parameters for further optimization. Naive sampling strategies implement random client sampling and fail to factor client data distributions for privacy reasons. Hence we proposes an alternative sampling strategy by performing a one-time clustering of clients based on their model's learned high-level features while respecting data privacy. This enables the server to perform stratified client sampling across clusters in every round. We show datasets of sampled clients selected with this approach yield a low relative entropy with respect to the global data distribution. Consequently, the FL training becomes less noisy and significantly improves the convergence of the global model by as much as 7.4% in some experiments. Furthermore, it also significantly reduces the communication rounds required to achieve a target accuracy.

翻訳日:2024-01-02 14:05:01 公開日:2023-12-29

# ゼロショット自然言語ビデオローカライズのためのコモンセンス

Commonsense for Zero-Shot Natural Language Video Localization ( http://arxiv.org/abs/2312.17429v1 )

ライセンス: Link先を確認

Meghana Holla, Ismini Lourentzou

(参考訳) Zero-shot Natural Language-Video Localization (NLVL)法は,ビデオセグメントと擬似クエリアノテーションを動的に生成することにより,生のビデオデータのみを用いたNLVLモデルのトレーニングにおいて有望な結果を示した。しかし、既存の擬似クエリーはソースビデオの基盤を欠くことが多く、構造化されていないコンテンツと解離したコンテンツをもたらす。本稿では,ゼロショットNLVLにおけるコモンセンス推論の有効性について検討する。具体的には、コモンセンスを利用したゼロショットNLVLフレームワークであるCORONETを紹介し、コモンセンス拡張モジュールを介してビデオと生成された擬似クエリ間のギャップを埋める。 CORONETは、知識グラフから抽出されたコモンセンス情報を符号化するグラフ畳み込みネットワーク(GCN)と、ローカライゼーションの前にエンコードされたビデオと擬似クエリ表現を強化するクロスアテンション機構を利用する。 2つのベンチマークデータセットに対する実証的な評価を通じて、CORONETがゼロショットと弱教師付きベースラインを越え、様々なリコールしきい値で最大32.13%、mIoUで最大6.33%の改善を達成したことを示す。これらの結果は, ゼロショットNLVLにおけるコモンセンス推論の活用の重要性を裏付けるものである。

Zero-shot Natural Language-Video Localization (NLVL) methods have exhibited promising results in training NLVL models exclusively with raw video data by dynamically generating video segments and pseudo-query annotations. However, existing pseudo-queries often lack grounding in the source video, resulting in unstructured and disjointed content. In this paper, we investigate the effectiveness of commonsense reasoning in zero-shot NLVL. Specifically, we present CORONET, a zero-shot NLVL framework that leverages commonsense to bridge the gap between videos and generated pseudo-queries via a commonsense enhancement module. CORONET employs Graph Convolution Networks (GCN) to encode commonsense information extracted from a knowledge graph, conditioned on the video, and cross-attention mechanisms to enhance the encoded video and pseudo-query representations prior to localization. Through empirical evaluations on two benchmark datasets, we demonstrate that CORONET surpasses both zero-shot and weakly supervised baselines, achieving improvements up to 32.13% across various recall thresholds and up to 6.33% in mIoU. These results underscore the significance of leveraging commonsense reasoning for zero-shot NLVL.

翻訳日:2024-01-02 14:04:48 公開日:2023-12-29

# ChangeNet: マルチテンポラルな非対称な変更検出データセット

ChangeNet: Multi-Temporal Asymmetric Change Detection Dataset ( http://arxiv.org/abs/2312.17428v1 )

ライセンス: Link先を確認

Deyi Ji, Siqi Gao, Mingyuan Tao, Hongtao Lu, Feng Zhao

(参考訳) 変更検出(CD)は、バイテンポラルデータセットが利用できることで、大きな関心を集めている。しかし、マルチ時間画像の取得とラベル付けの膨大なコストのため、既存の変更検出データセットは少ない量で、時間的に短く、実践性も低い。そのためコミュニティの活性化には,広範な時間的フェーズをカバーする大規模実用指向データセットが緊急に必要となる。この目的のために、特に多時間変化検出のためのchangenetデータセットと、`asymmetric change detection(非対称変化検出)という新しいタスクが提示される。具体的には、changenetは31,000のマルチテンポラルイメージペア、100都市のさまざまな複雑なシーン、6つのピクセルレベルの注釈付きカテゴリで構成されており、levir-cdやwhu building cdなど、既存のすべての変更検出データセットよりもはるかに優れている。さらに、changenetには、同じ領域の異なる時間相における現実世界の視点歪みの量が含まれており、変化検出アルゴリズムの実用化を促進することができる。 ChangeNetデータセットはバイナリ変更検出(BCD)とセマンティック変更検出(SCD)の両方に適している。そこで我々は,6つのBCD法と2つのSCD法でChangeNetデータセットをベンチマークし,その課題と大きな意義を実証した。データセットはhttps://github.com/jankyee/ChangeNetで公開されている。

Change Detection (CD) has been attracting extensive interests with the availability of bi-temporal datasets. However, due to the huge cost of multi-temporal images acquisition and labeling, existing change detection datasets are small in quantity, short in temporal, and low in practicability. Therefore, a large-scale practical-oriented dataset covering wide temporal phases is urgently needed to facilitate the community. To this end, the ChangeNet dataset is presented especially for multi-temporal change detection, along with the new task of ``Asymmetric Change Detection". Specifically, ChangeNet consists of 31,000 multi-temporal images pairs, a wide range of complex scenes from 100 cities, and 6 pixel-level annotated categories, which is far superior to all the existing change detection datasets including LEVIR-CD, WHU Building CD, etc.. In addition, ChangeNet contains amounts of real-world perspective distortions in different temporal phases on the same areas, which is able to promote the practical application of change detection algorithms. The ChangeNet dataset is suitable for both binary change detection (BCD) and semantic change detection (SCD) tasks. Accordingly, we benchmark the ChangeNet dataset on six BCD methods and two SCD methods, and extensive experiments demonstrate its challenges and great significance. The dataset is available at https://github.com/jankyee/ChangeNet.

翻訳日:2024-01-02 14:04:26 公開日:2023-12-29

# 非バイアスシーングラフ生成のためのコンテキストベース転送と効率的な反復学習

Context-based Transfer and Efficient Iterative Learning for Unbiased Scene Graph Generation ( http://arxiv.org/abs/2312.17425v1 )

ライセンス: Link先を確認

Qishen Chen, Xinyu Lyu, Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song

(参考訳) アンバイアスドシーングラフ生成(USGG)は、SGGのバイアスド予測に対処することを目的としている。この目的のために、データ転送法は、粗粒度述語を細粒度に変換し、不均衡分布を緩和するように設計されている。しかし、「女性テーブル」の「食」が適さないなど、移動ラベルと対象物対の間の文脈的関連性を見落としている。さらに、それらは通常、データ転送のモデルを事前トレーニングしてから、転送ラベルを使用してスクラッチからトレーニングすることから始まり、重要な計算コストを伴う2段階のプロセスを伴う。そこで我々は,段階的に拡張されたデータを用いてSGGモデルを反復的に学習するCITransというプラグイン・アンド・プレイ方式を提案する。まず,きめ細かなデータ転送を実現するために,述語意味空間内に主観オブジェクト制約を課すコンテキスト制限転送(crt)を導入する。その後、効率的な反復学習(eil)が反復的にモデルを訓練し、モデルの学習状態と一致する拡張ラベルを生成し、トレーニングプロセスを加速する。最後に、広範囲な実験によりCITransが最先端を実現し、高い効率で結果が得られた。

Unbiased Scene Graph Generation (USGG) aims to address biased predictions in SGG. To that end, data transfer methods are designed to convert coarse-grained predicates into fine-grained ones, mitigating imbalanced distribution. However, them overlook contextual relevance between transferred labels and subject-object pairs, such as unsuitability of 'eating' for 'woman-table'. Furthermore, they typically involve a two-stage process with significant computational costs, starting with pre-training a model for data transfer, followed by training from scratch using transferred labels. Thus, we introduce a plug-and-play method named CITrans, which iteratively trains SGG models with progressively enhanced data. First, we introduce Context-Restricted Transfer (CRT), which imposes subject-object constraints within predicates' semantic space to achieve fine-grained data transfer. Subsequently, Efficient Iterative Learning (EIL) iteratively trains models and progressively generates enhanced labels which are consistent with model's learning state, thereby accelerating the training process. Finally, extensive experiments show that CITrans achieves state-of-the-art and results with high efficiency.

翻訳日:2024-01-02 14:04:03 公開日:2023-12-29

# 正規化偏差2乗統計量を用いたガウス混合フィルタの厳密整合性試験

Exact Consistency Tests for Gaussian Mixture Filters using Normalized Deviation Squared Statistics ( http://arxiv.org/abs/2312.17420v1 )

ライセンス: Link先を確認

Nisar Ahmed, Luke Burks, Kailah Cabral, Alyssa Bekai Rose

(参考訳) ガウス混合系の確率的系状態密度を近似する離散時間確率フィルタにおける動的一貫性の評価問題を考える。動的一貫性とは、推定確率分布が実際の不確かさを正確に記述することを意味する。このように、一貫性テストの問題は、推定子チューニングと検証に関するアプリケーションで自然に生じます。しかし、密度関数の一般複雑性のため、混合型推定器の整合性テストのための簡単なアプローチは定義と実装が難しいままである。本稿では正規化偏差二乗(NDS)統計の枠組みにおけるガウス混合整合性試験の新しい正確な結果を得る。一般多変量ガウス混合モデルのNDSテスト統計は、効率的に計算ツールが利用できる一般化されたカイ二乗分布の混合を正確に追従している。結果の整合性試験の精度と有用性を静的および動的混合推定例で数値的に示す。

We consider the problem of evaluating dynamic consistency in discrete time probabilistic filters that approximate stochastic system state densities with Gaussian mixtures. Dynamic consistency means that the estimated probability distributions correctly describe the actual uncertainties. As such, the problem of consistency testing naturally arises in applications with regards to estimator tuning and validation. However, due to the general complexity of the density functions involved, straightforward approaches for consistency testing of mixture-based estimators have remained challenging to define and implement. This paper derives a new exact result for Gaussian mixture consistency testing within the framework of normalized deviation squared (NDS) statistics. It is shown that NDS test statistics for generic multivariate Gaussian mixture models exactly follow mixtures of generalized chi-square distributions, for which efficient computational tools are available. The accuracy and utility of the resulting consistency tests are numerically demonstrated on static and dynamic mixture estimation examples.

翻訳日:2024-01-02 14:03:43 公開日:2023-12-29

# ユニバーサルクトリットゲートのベンチマーク

Benchmarking of Universal Qutrit Gates ( http://arxiv.org/abs/2312.17418v1 )

ライセンス: Link先を確認

David Amaro-Alcal\'a, Barry C. Sanders, Hubert de Guise

(参考訳) 本稿では,ユニバーサルクトリットゲート集合のキャラクタリゼーションスキームを提案する。量子力学系に対する関心の高まりに動機づけられ、我々の超二面体群がクトリット t~gate の性能を特徴づけるためのスキームを基礎付けるための基準を適用する。結果として得られたqutritスキームは実現可能であり、qutrit cliffordのランダム化ベンチマークに使用されるリソースに似たリソースとデータ分析技術が必要です。クトリットのT~ゲートベンチマークと既知のクトリットクリフォードゲートベンチマークを組み合わせることで、普遍クトリットゲートセットの完全な特徴付けが可能になる。

We introduce a characterisation scheme for a universal qutrit gate set. Motivated by rising interest in qutrit systems, we apply our criteria to establish that our hyperdihedral group underpins a scheme to characterise the performance of a qutrit T~gate. Our resulting qutrit scheme is feasible, as it requires resources and data analysis techniques similar to resources employed for qutrit Clifford randomised benchmarking. Combining our T~gate benchmarking procedure for qutrits with known qutrit Clifford-gate benchmarking enables complete characterisation of a universal qutrit gate set.

翻訳日:2024-01-02 14:03:29 公開日:2023-12-29

# ベイズ上皮性不確実性推定のための生成後ネットワーク

Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation ( http://arxiv.org/abs/2312.17411v1 )

ライセンス: Link先を確認

Melrose Roderick, Felix Berkenkamp, Fatemeh Sheikholeslami, Zico Kolter

(参考訳) 多くの実世界の問題では、トレーニングデータには制限があるが、ラベルのないデータが豊富にある。本稿では,ラベルのないデータを用いて高次元問題におけるてんかん不確実性を推定する新しい手法GPNを提案する。 gpnは、関数上の事前分布が与えられたとき、ネットワークを事前のサンプルに向けて正規化することによって後続分布を直接近似する生成モデルである。理論上,本手法はベイズ後方を近似し,それよりも認識的不確実性推定と拡張性が向上することを示す。

In many real-world problems, there is a limited set of training data, but an abundance of unlabeled data. We propose a new method, Generative Posterior Networks (GPNs), that uses unlabeled data to estimate epistemic uncertainty in high-dimensional problems. A GPN is a generative model that, given a prior distribution over functions, approximates the posterior distribution directly by regularizing the network towards samples from the prior. We prove theoretically that our method indeed approximates the Bayesian posterior and show empirically that it improves epistemic uncertainty estimation and scalability over competing methods.

翻訳日:2024-01-02 14:03:15 公開日:2023-12-29

# 点雲データにおける異なる地形表面の粗さ記述子の比較

Comparing roughness descriptors for distinct terrain surfaces in point cloud data ( http://arxiv.org/abs/2312.17407v1 )

ライセンス: Link先を確認

Lei Fan and Yang Zhao

(参考訳) 地形表面の粗さは、しばしば抽象的に説明され、文献に見られる様々な記述子によって定量的な特徴付けに困難をもたらす。本研究は,5種類の粗さ記述子を比較し,異なる空間的変動を伴う3つの地形の地形表面粗さマップ間の相関について検討した。さらに,これらの相関に対する空間的尺度と補間法の影響について検討した。本研究では,光検出と測位技術により得られた濃厚点雲データを用いた。本研究は,局所的粗さ値がその後の解析において重要な役割を果たす研究において,複数の記述子を組み込むことの重要性を強調しながら,大域的パターンの類似性と局所的パターンの区別の両方を浮き彫りにした。空間スケールは、より粗い地形への影響が小さく、補間法は異なる記述子から派生した粗さマップに最小限の影響が認められた。

Terrain surface roughness, often described abstractly, poses challenges in quantitative characterisation with various descriptors found in the literature. This study compares five commonly used roughness descriptors, exploring correlations among their quantified terrain surface roughness maps across three terrains with distinct spatial variations. Additionally, the study investigates the impacts of spatial scales and interpolation methods on these correlations. Dense point cloud data obtained through Light Detection and Ranging technique are used in this study. The findings highlight both global pattern similarities and local pattern distinctions in the derived roughness maps, emphasizing the significance of incorporating multiple descriptors in studies where local roughness values play a crucial role in subsequent analyses. The spatial scales were found to have a smaller impact on rougher terrain, while interpolation methods had minimal influence on roughness maps derived from different descriptors.

翻訳日:2024-01-02 14:03:02 公開日:2023-12-29

# 意識割当(POCA)を用いたパラメータ最適化

Parameter Optimization with Conscious Allocation (POCA) ( http://arxiv.org/abs/2312.17404v1 )

ライセンス: Link先を確認

Joshua Inman, Tanmay Khandait, Giulia Pedrielli, and Lalitha Sankar

(参考訳) 現代の機械学習アルゴリズムの性能は、ハイパーパラメータのセットの選択に依存する。ハイパーパラメータの一般的な例は、学習率と密集したニューラルネットワークの層数である。 Auto-MLは最適化の一分野であり、この分野で重要な貢献をしている。 Auto-MLでは、低予算で評価した後の低パフォーマンスな構成を排除するハイパーバンドベースのアプローチが最も効果的である。しかし、これらのアルゴリズムの性能は、計算予算を様々なハイパーパラメータ構成にどの程度効果的に割り当てるかに大きく依存する。本稿では,入力した予算をベイジアンサンプリングスキームに従って生成するハイパーパラメータ構成に適応的に割り当てるハイパーバンドベースのアルゴリズムであるパラメータ最適化(POCA)を提案する。我々はPOCAを、人工玩具関数とディープニューラルネットワークのハイパーパラメータを最適化する最も近い競合相手と比較し、POCAが両方の設定でより高速な構成を見つけることを発見した。

The performance of modern machine learning algorithms depends upon the selection of a set of hyperparameters. Common examples of hyperparameters are learning rate and the number of layers in a dense neural network. Auto-ML is a branch of optimization that has produced important contributions in this area. Within Auto-ML, hyperband-based approaches, which eliminate poorly-performing configurations after evaluating them at low budgets, are among the most effective. However, the performance of these algorithms strongly depends on how effectively they allocate the computational budget to various hyperparameter configurations. We present the new Parameter Optimization with Conscious Allocation (POCA), a hyperband-based algorithm that adaptively allocates the inputted budget to the hyperparameter configurations it generates following a Bayesian sampling scheme. We compare POCA to its nearest competitor at optimizing the hyperparameters of an artificial toy function and a deep neural network and find that POCA finds strong configurations faster in both settings.

翻訳日:2024-01-02 14:02:46 公開日:2023-12-29

# ジョブの正しいプロンプト:大規模言語モデルによるコードレビュー欠陥の修復

The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model ( http://arxiv.org/abs/2312.17485v1 )

ライセンス: Link先を確認

Zelin Zhao, Zhaogui Xu, Jialong Zhu, Peng Di, Yuan Yao, Xiaoxing Ma

(参考訳) 自動プログラム修復(APR)技術は、コードレビュー(CR)プロセス中にプログラム欠陥を発見して修復する手作業を減らす可能性がある。しかしながら、既存のaprアプローチに伴う正確さと時間的コストの制限は、彼らの産業的実践への採用を妨げている。重要な要素の1つはレビューコメントの未使用であり、欠陥や潜在的な修正に関する貴重な洞察を提供する。近年のLLM(Large Language Models)の進歩により、自然言語やプログラミング言語を理解する能力が向上し、レビューコメントに基づいたパッチの生成が可能になった。本稿では, CR欠陥の修復にLLMを有効利用するための包括的調査を行う。本研究では,人間のレビュアーと自動チェッカーの2つの異なるデータセットを用いて,主流のllm間でさまざまなプロンプトを設計,比較する。実験の結果, 72.97%の顕著な補修率を示し, 自動補修技術の有効性と実用性を大幅に向上させた。

Automatic program repair (APR) techniques have the potential to reduce manual efforts in uncovering and repairing program defects during the code review (CR) process. However, the limited accuracy and considerable time costs associated with existing APR approaches hinder their adoption in industrial practice. One key factor is the under-utilization of review comments, which provide valuable insights into defects and potential fixes. Recent advancements in Large Language Models (LLMs) have enhanced their ability to comprehend natural and programming languages, enabling them to generate patches based on review comments. This paper conducts a comprehensive investigation into the effective utilization of LLMs for repairing CR defects. In this study, various prompts are designed and compared across mainstream LLMs using two distinct datasets from human reviewers and automated checkers. Experimental results demonstrate a remarkable repair rate of 72.97% with the best prompt, highlighting a substantial improvement in the effectiveness and practicality of automatic repair techniques.

翻訳日:2024-01-02 13:44:25 公開日:2023-12-29

# truth forest: チューニングなし介入による大規模言語モデルにおける多元的真理性の実現に向けて

Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning ( http://arxiv.org/abs/2312.17484v1 )

ライセンス: Link先を確認

Zhongzhi Chen, Xingwu Sun, Xianfeng Jiao, Fengzong Lian, Zhanhui Kang, Di Wang, Cheng-Zhong Xu

(参考訳) 大きな言語モデル(LLM)が様々なタスクで大きな成功を収めたが、幻覚を生じさせることに苦しむ。多次元直交プローブを用いて隠れ真理表現を明らかにすることでllmの真理性を高める方法である真理フォレストを提案する。具体的には、プローブに直交制約を組み込むことで真理をモデリングするための複数の直交基底を生成する。さらに,LLMにおける識別と真理特徴の生成のギャップを減らし,シーケンス内の幅広い位置を考慮に入れた体系的手法であるRandom Peekを導入する。このアプローチを用いることで,Llama-2-7Bの真偽を40.8\%から74.5\%に改善した。同様に、微調整されたモデルでも顕著な改善が見られる。我々はプローブを用いて真理特徴の徹底的な解析を行った。可視化の結果,直交プローブが真理関連特徴を補完し,データセットの固有構造を明らかにするクラスタを形成することがわかった。コード: \url{https://github.com/jongjyh/trfr}

Despite the great success of large language models (LLMs) in various tasks, they suffer from generating hallucinations. We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. Specifically, it creates multiple orthogonal bases for modeling truth by incorporating orthogonal constraints into the probes. Moreover, we introduce Random Peek, a systematic technique considering an extended range of positions within the sequence, reducing the gap between discerning and generating truth features in LLMs. By employing this approach, we improved the truthfulness of Llama-2-7B from 40.8\% to 74.5\% on TruthfulQA. Likewise, significant improvements are observed in fine-tuned models. We conducted a thorough analysis of truth features using probes. Our visualization results show that orthogonal probes capture complementary truth-related features, forming well-defined clusters that reveal the inherent structure of the dataset. Code: \url{https://github.com/jongjyh/trfr}

翻訳日:2024-01-02 13:44:08 公開日:2023-12-29

# 冗長性補修によるバケット旅団量子ランダムアクセスメモリの収量最大化

Maximizing the Yield of Bucket Brigade Quantum Random Access Memory using Redundancy Repair ( http://arxiv.org/abs/2312.17483v1 )

ライセンス: Link先を確認

Dongmin Kim, Sovanmonynuth Heng, Sengthai Heng and Youngsun Han

(参考訳) 量子ランダムアクセスメモリ(Quantum Random Access Memory, qRAM)は、オラクルベースの量子アルゴリズムを実行するための重要な計算要素である。 qRAMは、量子重ね合わせの原理を利用して、メモリセルに格納された全てのデータに同時にアクセスし、量子アルゴリズムの優れた性能を保証する。 qRAMメモリセルは、様々な量子ノイズに対するqRAMの動作を成功させるために量子エラー補正技術によって符号化された論理量子ビットを含む。量子ノイズに加えて、シリコン技術に基づく低技術ノードは量子ビット密度を増加させ、欠陥量子ビットを導入する可能性がある。 qRAMは多くの量子ビットから構成されているので、その収量は欠陥量子ビットによって減少する。しかし、qecスキームは大量の物理キュービットを必要とするため、リソースのオーバーヘッドがかかる。このオーバーヘッド問題を解決するために、冗長量子ビットを導入して欠陥量子ビットを補う量子メモリアーキテクチャを提案する。また,qRAM における論理量子ビット数の違いに対して,理想的生成誤差率を 0.5% から 1% に変化させることにより,提案アーキテクチャがもたらす収率改善を解析した。 1,024個の論理量子ビットからなるqRAMでは、8つの冗長論理量子ビットは、冗長な修復スキームを使用しないqRAMから95.92%向上した。

Quantum Random Access Memory (qRAM) is an essential computing element for running oracle-based quantum algorithms. qRAM exploits the principle of quantum superposition to access all data stored in the memory cell simultaneously and guarantees the superior performance of quantum algorithms. A qRAM memory cell comprises logical qubits encoded through quantum error correction technology for the successful operation of qRAM against various quantum noises. In addition to quantum noise, the low-technology nodes based on silicon technology can increase the qubit density and may introduce defective qubits. As qRAM comprises many qubits, its yield will be reduced by defective qubits; these qubits must be handled using QEC scheme. However, the QEC scheme requires numerous physical qubits, which burdens resource overhead. To resolve this overhead problem, we propose a quantum memory architecture that compensates for defective qubits by introducing redundant qubits. We also analyze the yield improvement offered by our proposed architecture by varying the ideal fabrication error rate from 0.5% to 1% for different numbers of logical qubits in the qRAM. In the qRAM comprising 1,024 logical qubits, eight redundant logical qubits improved the yield by 95.92% from that of qRAM not employing the redundant repair scheme.

翻訳日:2024-01-02 13:43:52 公開日:2023-12-29

# MosaicBERT: 高速プレトレーニング用に最適化された双方向エンコーダ

MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining ( http://arxiv.org/abs/2312.17482v1 )

ライセンス: Link先を確認

Jacob Portes, Alex Trott, Sam Havens, Daniel King, Abhinav Venigalla, Moin Nadeem, Nikhil Sardana, Daya Khudia, Jonathan Frankle

(参考訳) BERT型エンコーダモデルはNLP研究で多用されているが、多くの研究者はトレーニングコストが高いため、スクラッチから独自のBERTを事前訓練していない。 BERTが普及してから30年が経ち、BERTに体系的に組み込まれていない他のトランスフォーマーアーキテクチャやトレーニング構成で多くの進歩が見られた。本稿では,bert形式のエンコーダアーキテクチャとトレーニングレシピであるmosaicbertを紹介する。この効率的なアーキテクチャは、FlashAttention、Atention with Linear Biases (ALiBi)、Gated Linear Units (GLU)、パッド付きトークンを動的に除去するモジュール、そして低精度のLayerNormを古典的なトランスフォーマーエンコーダブロックに組み込む。トレーニングレシピには、Masked Language Modeling(MLM)目標の30%のマスキング比率、bfloat16精度、GPUスループットに最適化された語彙サイズ、RoBERTaや他のエンコーダモデルのベストプラクティスが含まれている。 C4データセットのスクラッチから事前トレーニングされた場合、このベースモデルは、約20ドルで8 A100 80 GB GPU上で1.13時間の平均GLUEスコア79.6を達成する。我々は, 事前学習速度のパレート曲線に対して広範囲の精度をプロットし, モザイクBERTベースと大が競合するBERTベースと大と比べ常にパレートが最適であることを示す。この事前トレーニングでの実証的なスピードアップにより、研究者やエンジニアは既存のジェネリックモデルの微調整ではなく、BERTスタイルのカスタムモデルを低コストで事前トレーニングすることができる。私たちはモデル重みとコードをオープンソース化します。

Although BERT-style encoder models are heavily used in NLP research, many researchers do not pretrain their own BERTs from scratch due to the high cost of training. In the past half-decade since BERT first rose to prominence, many advances have been made with other transformer architectures and training configurations that have yet to be systematically incorporated into BERT. Here, we introduce MosaicBERT, a BERT-style encoder architecture and training recipe that is empirically optimized for fast pretraining. This efficient architecture incorporates FlashAttention, Attention with Linear Biases (ALiBi), Gated Linear Units (GLU), a module to dynamically remove padded tokens, and low precision LayerNorm into the classic transformer encoder block. The training recipe includes a 30% masking ratio for the Masked Language Modeling (MLM) objective, bfloat16 precision, and vocabulary size optimized for GPU throughput, in addition to best-practices from RoBERTa and other encoder models. When pretrained from scratch on the C4 dataset, this base model achieves a downstream average GLUE (dev) score of 79.6 in 1.13 hours on 8 A100 80 GB GPUs at a cost of roughly $20. We plot extensive accuracy vs. pretraining speed Pareto curves and show that MosaicBERT base and large are consistently Pareto optimal when compared to a competitive BERT base and large. This empirical speed up in pretraining enables researchers and engineers to pretrain custom BERT-style models at low cost instead of finetune on existing generic models. We open source our model weights and code.

翻訳日:2024-01-02 13:43:36 公開日:2023-12-29

# 量子対数多重フラクタル

Quantum logarithmic multifractality ( http://arxiv.org/abs/2312.17481v1 )

ライセンス: Link先を確認

Weitao Chen, Olivier Giraud, Jiangbin Gong, Gabriel Lemari\'e

(参考訳) 厳密な解析的導出と広範な数値シミュレーションを組み合わせることで、アンダーソン転移の事実上無限次元系において「対数的多フラクタル性」と呼ばれるエキゾチックな多フラクタル挙動を報告した。有限次元アンダーソン遷移やスケール不変な二階相遷移で観察される従来の多重フラクタル臨界特性とは対照的に、対数的多重フラクタル性、固有状態統計、空間相関、波動パケットダイナミクスの存在下では、システムのサイズや時間の対数において代数的なスケーリング則を示すことができる。我々の発見は、アンダーソン転移を経る複雑な系の強い有限サイズ効果とスローダイナミクス、例えば多体局在遷移に関する重要な知見を提供する。

Through a combination of rigorous analytical derivations and extensive numerical simulations, this work reports an exotic multifractal behavior, dubbed "logarithmic multifractality", in effectively infinite-dimensional systems undergoing the Anderson transition. In marked contrast to conventional multifractal critical properties observed at finite-dimensional Anderson transitions or scale-invariant second-order phase transitions, in the presence of logarithmic multifractality, eigenstate statistics, spatial correlations, and wave packet dynamics can all exhibit scaling laws which are algebraic in the logarithm of system size or time. Our findings offer crucial insights into strong finite-size effects and slow dynamics in complex systems undergoing the Anderson transition, such as the many-body localization transition.

翻訳日:2024-01-02 13:43:06 公開日:2023-12-29

# 文化的学習型モラルマシン:逆強化学習によるAIによる人間価値システムの帰納学習

Culturally-Attuned Moral Machines: Implicit Learning of Human Value Systems by AI through Inverse Reinforcement Learning ( http://arxiv.org/abs/2312.17479v1 )

ライセンス: Link先を確認

Nigini Oliveira, Jasmine Li, Koosha Khalvati, Rodolfo Cortes Barragan, Katharina Reinecke, Andrew N. Meltzoff, and Rajesh P. N. Rao

(参考訳) 人工知能(AI)の普遍的道徳コードの構築は、異なる人間の文化が道徳と異なる社会的規範の定義を持っていることを考えると、困難または不可能である。したがって、我々は、AIの価値体系は文化的に直感的であるべきだと主張する: 特定の文化で育てられた子供が、その文化の特定の価値と規範を学ぶのと同じように、特定の人間のコミュニティで活動するAIエージェントは、そのコミュニティの道徳的、倫理的、文化的規範を取得するべきである。 AIシステムは、人間の観察とインタラクションからこのようなコードを取得することができるのか? 本稿では,AIエージェントが暗黙的に文化的に調整された価値システムを取得する方法として,逆強化学習(IRL)を提案する。我々は、AIエージェントがIRLを使用してエージェントの道徳的価値を管理する異なる報酬関数を学習する実験パラダイムを用いて、リアルタイムな意思決定を必要とするオンラインバーチャルワールドにおける異なる文化グループの振る舞いを観察する。本稿では,特定の文化集団の平均的行動から学習するaiエージェントは,その集団の行動を反映した利他的特徴を身につけることができ,この学習価値システムは利他的判断を必要とする新たなシナリオに一般化できることを示す。私たちの知識では、AIエージェントが人間との観察と相互作用から継続的に価値と規範を学習し、それによって彼らが活動している文化に順応する能力によって、潜在的にAIエージェントが授けられる可能性の最初のデモンストレーションを提供します。

Constructing a universal moral code for artificial intelligence (AI) is difficult or even impossible, given that different human cultures have different definitions of morality and different societal norms. We therefore argue that the value system of an AI should be culturally attuned: just as a child raised in a particular culture learns the specific values and norms of that culture, we propose that an AI agent operating in a particular human community should acquire that community's moral, ethical, and cultural codes. How AI systems might acquire such codes from human observation and interaction has remained an open question. Here, we propose using inverse reinforcement learning (IRL) as a method for AI agents to acquire a culturally-attuned value system implicitly. We test our approach using an experimental paradigm in which AI agents use IRL to learn different reward functions, which govern the agents' moral values, by observing the behavior of different cultural groups in an online virtual world requiring real-time decision making. We show that an AI agent learning from the average behavior of a particular cultural group can acquire altruistic characteristics reflective of that group's behavior, and this learned value system can generalize to new scenarios requiring altruistic judgments. Our results provide, to our knowledge, the first demonstration that AI agents could potentially be endowed with the ability to continually learn their values and norms from observing and interacting with humans, thereby becoming attuned to the culture they are operating in.

翻訳日:2024-01-02 13:42:50 公開日:2023-12-29

# llmsの意思決定能力の感度を探求する:迅速な変動とハイパーパラメータからの洞察

Exploring the Sensitivity of LLMs' Decision-Making Capabilities: Insights from Prompt Variation and Hyperparameters ( http://arxiv.org/abs/2312.17476v1 )

ライセンス: Link先を確認

Manikanta Loya, Divya Anand Sinha, Richard Futrell

(参考訳) 大規模言語モデル(llm)の進歩により、意思決定を含む幅広いタスクで広く使われるようになった。これまでの研究では、LLMの意思決定能力と人間の意思決定能力を比較してきた。しかし、これらの研究は必ずしもLLMの行動の過度パラメータに対する感受性とプロンプトの変化を適切に考慮していない。本研究では,Binz と Schulz (2023) による水平決定タスクにおける LLM の性能について検討し,LLM がプロンプトやハイパーパラメータの変動にどう反応するかを解析した。異なる能力を持つ3つのOpenAI言語モデルで実験することにより、入力プロンプトと温度設定に基づいて意思決定能力が変動することを確認する。以前の発見言語モデルとは対照的に、プロンプトへの簡単な調整の後、人間のような探索的エクスプロイトのトレードオフを表示する。

The advancement of Large Language Models (LLMs) has led to their widespread use across a broad spectrum of tasks including decision making. Prior studies have compared the decision making abilities of LLMs with those of humans from a psychological perspective. However, these studies have not always properly accounted for the sensitivity of LLMs' behavior to hyperparameters and variations in the prompt. In this study, we examine LLMs' performance on the Horizon decision making task studied by Binz and Schulz (2023) analyzing how LLMs respond to variations in prompts and hyperparameters. By experimenting on three OpenAI language models possessing different capabilities, we observe that the decision making abilities fluctuate based on the input prompts and temperature settings. Contrary to previous findings language models display a human-like exploration exploitation tradeoff after simple adjustments to the prompt.

翻訳日:2024-01-02 13:42:19 公開日:2023-12-29

# 患者とAIの EHR インタラクション:注意: EHR インタラクション

EHR Interaction Between Patients and AI: NoteAid EHR Interaction ( http://arxiv.org/abs/2312.17475v1 )

ライセンス: Link先を確認

Xiaocheng Zhang, Zonghai Yao, Hong Yu

(参考訳) 大規模言語モデル(LLM)の急速な進歩と意味的・文脈的理解における卓越した性能により、特殊領域におけるLLMの可能性は探索を保証している。本報告では,患者を電子健康記録(EHR)の理解に役立てる必要性に起因した課題である,患者教育を支援するためのジェネレーティブ LLM を用いた革新的アプローチである NoteAid EHR Interaction Pipeline を紹介する。本研究は, 患者がEHRを読んだ後, 患者が提示する疑問に答えることのできない, EHR内容の説明の提供という, 患者の視点からの2つの新しい課題を考案した。これらのデータとnoteaid ehrインタラクションパイプラインを通じて,made medical notes collectionから1万インスタンスを含むデータセットと876インスタンスをそれぞれ抽出し,2つのタスクの実行を行った。これらのタスクにおけるLCMの性能データを収集し,対応するNoteAid EHRインタラクションデータセットとして構築した。 LLM評価と64例の厳密な手作業によるデータセット全体の総合的な評価を通じて,患者教育におけるLLMの可能性を示す。さらに、この領域における将来の探索と応用のための貴重なデータを提供し、また、社内システムトレーニングのための高品質な合成データセットも提供する。

With the rapid advancement of Large Language Models (LLMs) and their outstanding performance in semantic and contextual comprehension, the potential of LLMs in specialized domains warrants exploration. This paper introduces the NoteAid EHR Interaction Pipeline, an innovative approach developed using generative LLMs to assist in patient education, a task stemming from the need to aid patients in understanding Electronic Health Records (EHRs). Building upon the NoteAid work, we designed two novel tasks from the patient's perspective: providing explanations for EHR content that patients may not understand and answering questions posed by patients after reading their EHRs. We extracted datasets containing 10,000 instances from MIMIC Discharge Summaries and 876 instances from the MADE medical notes collection, respectively, executing the two tasks through the NoteAid EHR Interaction Pipeline with these data. Performance data of LLMs on these tasks were collected and constructed as the corresponding NoteAid EHR Interaction Dataset. Through a comprehensive evaluation of the entire dataset using LLM assessment and a rigorous manual evaluation of 64 instances, we showcase the potential of LLMs in patient education. Besides, the results provide valuable data support for future exploration and applications in this domain while also supplying high-quality synthetic datasets for in-house system training.

翻訳日:2024-01-02 13:42:08 公開日:2023-12-29

# FerKD : 効率的な蒸留用ラベル適応

FerKD: Surgical Label Adaptation for Efficient Distillation ( http://arxiv.org/abs/2312.17473v1 )

ライセンス: Link先を確認

Zhiqiang Shen

(参考訳) 本稿では, 部分ソフトハードラベル適応と領域校正機構を組み合わせた新しい効率的な知識蒸留フレームワークであるFerKDを提案する。我々のアプローチは、RandomResizedCropのような標準的なデータ拡張が、入力を簡単な正、強正、強負のさまざまな条件に変換する傾向にあるという観察と直感に由来する。伝統的な蒸留フレームワークでは、これらの変換されたサンプルは、事前訓練された教師モデルに由来する予測確率によって等しく利用される。しかし、事前学習した教師の予測値に頼るだけでは、従来の研究では、これらのソフトラベル予測の信頼性を無視している。そこで本研究では,ソフト化したハードグラウンドルースラベルを用いて,信頼度の低い領域をコンテキストとする新しいスキームを提案する。私たちのアプローチは、ハードリージョンの採掘とキャリブレーションのプロセスです。本手法が収束速度と最終的な精度を劇的に向上できることを実証的に示す。さらに, 一貫した混合戦略は, ソフトラベルを生かして, ソフト監督の分布を安定化できることがわかった。その結果、同一画像内に類似領域を混合することにより、混合画像と対応するソフトラベルの変動を弱める安定化された自己混合増強法が導入された。 FerKDは直感的でよく設計された学習システムであり、以前のFKDソリューションではいくつかのヒューリスティックやハイパーパラメータを排除している。さらに重要なのは、ImageNet-1Kと下流タスクの大幅な改善だ。例えば、FerKDはResNet-50でImageNet-1Kで81.2%を達成し、FKDとFunMatchを著しく上回っている。より優れたトレーニング済み重量とより大きなアーキテクチャを活用して、微調整されたViT-G14は89.9%も達成しました。私たちのコードはhttps://github.com/szq0214/FKD/tree/main/FerKDで利用可能です。

We present FerKD, a novel efficient knowledge distillation framework that incorporates partial soft-hard label adaptation coupled with a region-calibration mechanism. Our approach stems from the observation and intuition that standard data augmentations, such as RandomResizedCrop, tend to transform inputs into diverse conditions: easy positives, hard positives, or hard negatives. In traditional distillation frameworks, these transformed samples are utilized equally through their predictive probabilities derived from pretrained teacher models. However, merely relying on prediction values from a pretrained teacher, a common practice in prior studies, neglects the reliability of these soft label predictions. To address this, we propose a new scheme that calibrates the less-confident regions to be the context using softened hard groundtruth labels. Our approach involves the processes of hard regions mining + calibration. We demonstrate empirically that this method can dramatically improve the convergence speed and final accuracy. Additionally, we find that a consistent mixing strategy can stabilize the distributions of soft supervision, taking advantage of the soft labels. As a result, we introduce a stabilized SelfMix augmentation that weakens the variation of the mixed images and corresponding soft labels through mixing similar regions within the same image. FerKD is an intuitive and well-designed learning system that eliminates several heuristics and hyperparameters in former FKD solution. More importantly, it achieves remarkable improvement on ImageNet-1K and downstream tasks. For instance, FerKD achieves 81.2% on ImageNet-1K with ResNet-50, outperforming FKD and FunMatch by remarkable margins. Leveraging better pre-trained weights and larger architectures, our finetuned ViT-G14 even achieves 89.9%. Our code is available at https://github.com/szq0214/FKD/tree/main/FerKD.

翻訳日:2024-01-02 13:41:43 公開日:2023-12-29

# soi相関光子対源のための低損失・高安定・再利用可能なエッジカプラの実証

Demonstration of a low loss, highly stable and re-useable edge coupler for SOI correlated photon pair sources ( http://arxiv.org/abs/2312.17464v1 )

ライセンス: Link先を確認

Jinyi Du, George F.R. Chen, Hongwei Gao, James A. Grieve, Dawn T.H. Tan, Alexander Ling

(参考訳) シリコンオン絶縁体(SOI)フォトニックチップから光ファイバーに光を結合する安定低損失法について報告する。オンチップテーパ導波路と切断された小型コア光ファイバを用いてこの技術を実現する。オンチップテーパはモノリシックであり、パターン化されたクラッドを必要としないため、チップ製造プロセスが簡単になる。光ファイバセグメントは、直径0.1dB以下のSMF−28繊維にスプライシングされたセンチメートルの小型コアファイバ(UHNA7)からなる。この設計で全体の結合損失は-0.64dBである。チップエッジとファイバ先端は、オンチップテーパやファイバを傷つけることなく結合することができる。表面間の摩擦はアライメントを維持し、接着剤を用いずに10日間連続測定中に+=0.1dbカップリング変動を観測する。この技術は、ファイバ内でラマンノイズを発生させる可能性を最小限に抑え、長いUHNAファイバやフレキシブルレンズファイバに基づく結合戦略と比較して優れた安定性を有する。また, エッジカプラを相関光子対源に印加し, 1.21万cps, シーディング効率21.3%の生偶然カウント率を観測した。低ポンプ電力系統において, オート相関関数g_H^2 (0) を0.0004以下に達成した。

We report a stable, low loss method for coupling light from silicon-on-insulator (SOI) photonic chips into optical fibers. The technique is realized using an on-chip tapered waveguide and a cleaved small core optical fiber. The on-chip taper is monolithic and does not require a patterned cladding, thus simplifying the chip fabrication process. The optical fiber segment is composed of a centimeter-long small core fiber (UHNA7) which is spliced to SMF-28 fiber with less than -0.1 dB loss. We observe an overall coupling loss of -0.64 dB with this design. The chip edge and fiber tip can be butt coupled without damaging the on-chip taper or fiber. Friction between the surfaces maintains alignment leading to an observation of += 0.1 dB coupling fluctuation during a ten-day continuous measurement without use of any adhesive. This technique minimizes the potential for generating Raman noise in the fiber, and has good stability compared to coupling strategies based on longer UHNA fibers or fragile lensed fibers. We also applied the edge coupler on a correlated photon pair source and observed a raw coincidence count rate of 1.21 million cps and heralding efficiency of 21.3%. We achieved an auto correlation function g_H^2 (0) as low as 0.0004 at the low pump power regime.

翻訳日:2024-01-02 13:41:12 公開日:2023-12-29

# 普通に:共変量シフトの回帰に分光的に適応する

Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift ( http://arxiv.org/abs/2312.17463v1 )

ライセンス: Link先を確認

Benjamin Eyre, Elliot Creager, David Madras, Vardan Papyan, Richard Zemel

(参考訳) 利用可能なトレーニングデータとは異なる分布を堅牢に実行するディープニューラルネットワーク分類器の設計は、機械学習研究の活発な領域である。しかし、回帰-連続目標-残基のモデリングにおける類似問題に対する分布外一般化は、比較的未探索である。この問題に対処するため、我々は第一原理に戻り、通常最小方形(OLS)回帰に対する閉形式解が共変量シフトにどのように敏感であるかを分析する。我々は、ソースとターゲットデータの固有スペクトル分解の観点から、OLSモデルの分布外リスクを特徴付ける。次に、この知見を用いて、事前学習されたニューラル回帰モデルの最後の層の重みを適応させ、異なる分布から得られる入力データを改善する方法を提案する。本稿では,この軽量スペクトル適応手法により,合成および実世界のデータセットの分布外性能が向上することを示す。

Designing deep neural network classifiers that perform robustly on distributions differing from the available training data is an active area of machine learning research. However, out-of-distribution generalization for regression-the analogous problem for modeling continuous targets-remains relatively unexplored. To tackle this problem, we return to first principles and analyze how the closed-form solution for Ordinary Least Squares (OLS) regression is sensitive to covariate shift. We characterize the out-of-distribution risk of the OLS model in terms of the eigenspectrum decomposition of the source and target data. We then use this insight to propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution. We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.

翻訳日:2024-01-02 13:40:49 公開日:2023-12-29

# フェルミオンk-RDM推定のためのバランシング誤差予算

Balancing error budget for fermionic k-RDM estimation ( http://arxiv.org/abs/2312.17452v1 )

ライセンス: Link先を確認

Nayuta Takemori, Yusuke Teranishi, Wataru Mizukami, and Nobuyuki Yoshioka

(参考訳) 還元密度行列(RDM)は、局所的な物理量情報を含む物理特性を理解するために、量子多体系において重要である。本研究の目的は、量子コンピューティングにおける高次RDM推定の課題を引き起こす様々な誤差制約を最小化することである。我々は,高次RDM推定における統計的誤差と系統的誤差の最適バランスを,特に累積展開を用いてサンプルの複雑性を抑える際に同定する。さらに,1次元と2次元のFermi Hubbardモデルに対する量子部分空間法の数値実験を通して,励起状態計算におけるハードウェアノイズの抑制効果を示す。我々の研究は、コスト効率のよい実用的な量子コンピューティングへの道を歩み、実際、エラーの複数の側面によって制約されている。

The reduced density matrix (RDM) is crucial in quantum many-body systems for understanding physical properties, including all local physical quantity information. This study aims to minimize various error constraints that causes challenges in higher-order RDMs estimation in quantum computing. We identify the optimal balance between statistical and systematic errors in higher-order RDM estimation in particular when cumulant expansion is used to suppress the sample complexity. Furthermore, we show via numerical demonstration of quantum subspace methods for one and two dimensional Fermi Hubbard model that, biased yet efficient estimations better suppress hardware noise in excited state calculations. Our work paves a path towards cost-efficient practical quantum computing that in reality is constrained by multiple aspects of errors.

翻訳日:2024-01-02 13:40:35 公開日:2023-12-29

# fedled:垂直フェデレートトランスファー学習によるラベルフリー機器障害診断

FedLED: Label-Free Equipment Fault Diagnosis with Vertical Federated Transfer Learning ( http://arxiv.org/abs/2312.17451v1 )

ライセンス: Link先を確認

Jie Shen, Shusen Yang, Cong Zhao, Xuebin Ren, Peng Zhao, Yuqian Yang, Qing Han, Shuaijun Wu

(参考訳) フェデレート・トランスファー・ラーニング(FTL)に基づく知的機器故障診断は、学術と産業の両方からかなりの注目を集めている。サンプルが限られている実世界の産業エージェントは、生のデータプライバシを損なうことなく、障害診断モデルを構築することができる。しかし、既存のアプローチでは、実用エージェントの異なる作業条件によって引き起こされる厳密なサンプルの不均一性や、新しく配備された機器の極端な欠陥ラベルの不足に対処できない。これらの問題に対処するため、FedLEDは非教師なしFTL機器故障診断法であり、非ラベル対象領域の知識をさらに活用して効果的な教師なしモデル転送を行う。実機観測データを用いた広範な実験の結果、FedLEDは診断精度(最大4.13倍)と一般性の両方において、SOTAアプローチよりも明らかに優れていた。対象ドメイン知識によって体系的に強化されたラベルフリー機器故障診断のさらなる研究を期待する。

Intelligent equipment fault diagnosis based on Federated Transfer Learning (FTL) attracts considerable attention from both academia and industry. It allows real-world industrial agents with limited samples to construct a fault diagnosis model without jeopardizing their raw data privacy. Existing approaches, however, can neither address the intense sample heterogeneity caused by different working conditions of practical agents, nor the extreme fault label scarcity, even zero, of newly deployed equipment. To address these issues, we present FedLED, the first unsupervised vertical FTL equipment fault diagnosis method, where knowledge of the unlabeled target domain is further exploited for effective unsupervised model transfer. Results of extensive experiments using data of real equipment monitoring demonstrate that FedLED obviously outperforms SOTA approaches in terms of both diagnosis accuracy (up to 4.13 times) and generality. We expect our work to inspire further study on label-free equipment fault diagnosis systematically enhanced by target domain knowledge.

翻訳日:2024-01-02 13:40:21 公開日:2023-12-29

# 量子チャネルにおける情報フレギリティまたはロバスト性

Information Fragility or Robustness Under Quantum Channels ( http://arxiv.org/abs/2312.17450v1 )

ライセンス: Link先を確認

Nicholas Laracuente, Graeme Smith

(参考訳) 量子状態はノイズ下で自然に崩壊する。多くの初期の研究は、崩壊率の低い境界を定量化し、様々な文脈で指数関数的な崩壊を示した。雑音が十分弱い場合、雑音後情報量と初期情報量の比率に均一な上限はあるか? 古典を含むいくつかのシナリオでは、乗法的な逆境界を見つける。しかし、必ずしもそうとは限らない。 qubit dephasing や depolarizing のような単純なノイズであっても、相互情報は任意に弱い雑音の下で非有界因子によって低下することがある。適用例として、任意に良好な入力コピーを環境に送信する確率が高いにもかかわらず、非ゼロなプライベートキャパシティを持つチャネルのファミリーを見つける。

Quantum states naturally decay under noise. Many earlier works have quantified and demonstrated lower bounds on the decay rate, showing exponential decay in a wide variety of contexts. Here we study the converse question: are there uniform upper bounds on the ratio of post-noise to initial information quantities when noise is sufficiently weak? In several scenarios, including classical, we find multiplicative converse bounds. However, this is not always the case. Even for simple noise such as qubit dephasing or depolarizing, mutual information may fall by an unbounded factor under arbitrarily weak noise. As an application, we find families of channels with non-zero private capacity despite arbitrarily high probability of transmitting an arbitrarily good copy of the input to the environment.

翻訳日:2024-01-02 13:40:03 公開日:2023-12-29

# ヒューマンインテント推論によるトラッキング

Tracking with Human-Intent Reasoning ( http://arxiv.org/abs/2312.17448v1 )

ライセンス: Link先を確認

Jiawen Zhu, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Huchuan Lu, Yifeng Geng, Xuansong Xie

(参考訳) 知覚モデリングの進歩は、物体追跡の性能を大幅に改善した。しかし、初期フレームのターゲットオブジェクトを特定する現在のメソッドは、どちらかである。 1)ボックスまたはマスクテンプレートを使用するか、または 2) 明示的な言語記述を提供する。これらの方法は面倒で、トラッカーが自己推論能力を持つことを許さない。そこで本研究では,トラッカがビデオフレーム内で自動的にトラッキングを行うための暗黙的なトラッキング命令を提供する,新たなトラッキングタスク -- 命令追跡を提案する。本研究では,物体追跡のための大規模視覚言語モデル(lvlm)から知識と推論能力の統合について検討する。具体的には,複雑な推論に基づく追跡が可能なtrackgptと呼ばれるトラッカーを提案する。 TrackGPTは、まずLVLMを使用して、追跡命令を理解し、どのターゲットを追跡するかの手がかりを埋め込みを参照させる。そして、知覚成分は、埋め込みに基づいて追跡結果を生成する。 TrackGPTの性能を評価するため,インストラクション・チューニングと評価のためのインストラクション・ビデオ・ペアが1万を超えるインストラクション・トラッキング・ベンチマークであるInsTrackを構築した。実験によれば、 trackgpt はビデオオブジェクトのセグメンテーションベンチマークを参照して性能が向上し、例えば 66.5 $\mathcal{j}\&\mathcal{f}$ on refer-davis という新しいパフォーマンスが得られる。また、新しい評価プロトコル下での命令追跡の優れた性能を示す。コードとモデルは \href{https://github.com/jiawen-zhu/TrackGPT}{https://github.com/jiawen-zhu/TrackGPT} で公開されている。

Advances in perception modeling have significantly improved the performance of object tracking. However, the current methods for specifying the target object in the initial frame are either by 1) using a box or mask template, or by 2) providing an explicit language description. These manners are cumbersome and do not allow the tracker to have self-reasoning ability. Therefore, this work proposes a new tracking task -- Instruction Tracking, which involves providing implicit tracking instructions that require the trackers to perform tracking automatically in video frames. To achieve this, we investigate the integration of knowledge and reasoning capabilities from a Large Vision-Language Model (LVLM) for object tracking. Specifically, we propose a tracker called TrackGPT, which is capable of performing complex reasoning-based tracking. TrackGPT first uses LVLM to understand tracking instructions and condense the cues of what target to track into referring embeddings. The perception component then generates the tracking results based on the embeddings. To evaluate the performance of TrackGPT, we construct an instruction tracking benchmark called InsTrack, which contains over one thousand instruction-video pairs for instruction tuning and evaluation. Experiments show that TrackGPT achieves competitive performance on referring video object segmentation benchmarks, such as getting a new state-of the-art performance of 66.5 $\mathcal{J}\&\mathcal{F}$ on Refer-DAVIS. It also demonstrates a superior performance of instruction tracking under new evaluation protocols. The code and models are available at \href{https://github.com/jiawen-zhu/TrackGPT}{https://github.com/jiawen-zhu/TrackGPT}.

翻訳日:2024-01-02 13:39:51 公開日:2023-12-29

# Darwin3:新しいISAとオンチップ学習を備えた大規模ニューロモルフィックチップ

Darwin3: A large-scale neuromorphic chip with a Novel ISA and On-Chip Learning ( http://arxiv.org/abs/2312.17582v1 )

ライセンス: Link先を確認

De Ma, Xiaofei Jin, Shichun Sun, Yitao Li, Xundong Wu, Youneng Hu, Fangchao Yang, Huajin Tang, Xiaolei Zhu, Peng Lin and Gang Pan

(参考訳) スパイキングニューラルネットワーク(SNN)は,その生物学的妥当性と計算効率向上の可能性に注目が集まっている。 SNNの高時空間ダイナミクスと一致させるためには、ハードウェアベースのニューロンとシナプス回路でSNNを直接実行するのがニューロモルフィックチップである。本稿では、10個の命令と数個の拡張命令からなる新しい命令セットアーキテクチャ(isa)を備えた,darwin3と呼ばれる大規模ニューロモルフィックチップを提案する。柔軟なニューロンモデルプログラミングと局所学習ルール設計をサポートする。 darwin3チップアーキテクチャは、革新的なルーティングアルゴリズムを備えたコンピューティングノードのメッシュで設計されている。圧縮機構を用いてシナプス接続を表現し,メモリ使用量を大幅に削減した。 darwin3チップは最大2億3500万のニューロンをサポートしており、ニューロン規模では最大である。実験の結果,darwin3ではコード密度が28.3倍まで向上し,ニューロンコアのファンインとファンアウトは接続圧縮により4096倍,3072倍に向上した。われわれのDarwin3チップは、畳み込みスパイクニューラルネットワーク(CSNN)をチップにマッピングする際に、メモリを6.8Xから200.8Xまで節約し、他のニューロモルフィックチップと比較して、最先端の性能とレイテンシを示す。

Spiking Neural Networks (SNNs) are gaining increasing attention for their biological plausibility and potential for improved computational efficiency. To match the high spatial-temporal dynamics in SNNs, neuromorphic chips are highly desired to execute SNNs in hardware-based neuron and synapse circuits directly. This paper presents a large-scale neuromorphic chip named Darwin3 with a novel instruction set architecture(ISA), which comprises 10 primary instructions and a few extended instructions. It supports flexible neuron model programming and local learning rule designs. The Darwin3 chip architecture is designed in a mesh of computing nodes with an innovative routing algorithm. We used a compression mechanism to represent synaptic connections, significantly reducing memory usage. The Darwin3 chip supports up to 2.35 million neurons, making it the largest of its kind in neuron scale. The experimental results showed that code density was improved up to 28.3x in Darwin3, and neuron core fan-in and fan-out were improved up to 4096x and 3072x by connection compression compared to the physical memory depth. Our Darwin3 chip also provided memory saving between 6.8X and 200.8X when mapping convolutional spiking neural networks (CSNN) onto the chip, demonstrating state-of-the-art performance in accuracy and latency compared to other neuromorphic chips.

翻訳日:2024-01-02 12:51:09 公開日:2023-12-29

# 時系列予測のための多目的進化アンサンブル学習を用いたLSTMネットワークにおける組込み特徴選択

Embedded feature selection in LSTM networks with multi-objective evolutionary ensemble learning for time series forecasting ( http://arxiv.org/abs/2312.17517v1 )

ライセンス: Link先を確認

Raquel Espinosa, Fernando Jim\'enez, Jos\'e Palma

(参考訳) 時系列予測は様々な分野において重要な役割を担い、複雑な時間パターンを効果的に扱える堅牢なモデルの開発を必要とする。本稿では,多目的進化アルゴリズムを用いて,長期短期記憶ネットワークに埋め込まれた特徴選択手法を提案する。本手法は,特定のデータ分割におけるルート平均二乗誤差をターゲットとした進化的アルゴリズムの各目的関数を用いて,LSTMの重みと偏りを分割的に最適化する。アルゴリズムによって同定された非支配予測モデルの集合を用いて、積み重ねに基づくアンサンブル学習によりメタモデルを構築する。さらに,提案手法は,非支配的予測モデル群における属性選択頻度が属性の重要性を反映しているため,属性重要度決定への道筋を提供する。この属性の重要性洞察は予測プロセスに解釈可能な次元を追加します。イタリアとスペイン南東部の大気質時系列データを用いた実験により,従来のLSTMの一般化能力を大幅に向上し,オーバーフィッティングを効果的に低減することを示した。 CancelOut法とEAR-FS法の比較分析により,本手法の優れた性能が示された。

Time series forecasting plays a crucial role in diverse fields, necessitating the development of robust models that can effectively handle complex temporal patterns. In this article, we present a novel feature selection method embedded in Long Short-Term Memory networks, leveraging a multi-objective evolutionary algorithm. Our approach optimizes the weights and biases of the LSTM in a partitioned manner, with each objective function of the evolutionary algorithm targeting the root mean square error in a specific data partition. The set of non-dominated forecast models identified by the algorithm is then utilized to construct a meta-model through stacking-based ensemble learning. Furthermore, our proposed method provides an avenue for attribute importance determination, as the frequency of selection for each attribute in the set of non-dominated forecasting models reflects their significance. This attribute importance insight adds an interpretable dimension to the forecasting process. Experimental evaluations on air quality time series data from Italy and southeast Spain demonstrate that our method substantially improves the generalization ability of conventional LSTMs, effectively reducing overfitting. Comparative analyses against state-of-the-art CancelOut and EAR-FS methods highlight the superior performance of our approach.

翻訳日:2024-01-02 12:50:47 公開日:2023-12-29

# コラボレーティブ・オン・ザ・フライ: avalon gameにおけるアドホックなチームワークのための言語エージェントの探索

Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game ( http://arxiv.org/abs/2312.17515v1 )

ライセンス: Link先を確認

Zijing Shi, Meng Fang, Shunfeng Zheng, Shilong Deng, Ling Chen, Yali Du

(参考訳) 大規模言語モデル(llm)とのマルチエージェントコラボレーションは、基本的なタスクの熟練度を示しているが、より複雑なシナリオでの効率性は未検討である。ゲーム環境では、これらのエージェントはコーディネーションプロトコルが確立されていない状況に直面し、限られたデータからチームメイトに関するインテリジェントな推論を行う必要がある。この問題は、エージェントがさまざまなチームメイトと協力して共通の目標を達成する可能性のある、アドホックなチームワークの領域を動機付けている。本研究は,エージェントが自然言語によって駆動される環境で動作するアドホックなチームワーク問題に焦点を当てている。チームコラボレーションにおけるllmエージェントの可能性を明らかにし,コミュニケーションの幻覚に関する問題点を浮き彫りにした。この問題を解決するために、我々は、llmに拡張メモリとコード駆動推論を装備する汎用エージェントであるcodeactを開発した。

Multi-agent collaboration with Large Language Models (LLMs) demonstrates proficiency in basic tasks, yet its efficiency in more complex scenarios remains unexplored. In gaming environments, these agents often face situations without established coordination protocols, requiring them to make intelligent inferences about teammates from limited data. This problem motivates the area of ad hoc teamwork, in which an agent may potentially cooperate with a variety of teammates to achieve a shared goal. Our study focuses on the ad hoc teamwork problem where the agent operates in an environment driven by natural language. Our findings reveal the potential of LLM agents in team collaboration, highlighting issues related to hallucinations in communication. To address this issue, we develop CodeAct, a general agent that equips LLM with enhanced memory and code-driven reasoning, enabling the repurposing of partial information for rapid adaptation to new teammates.

翻訳日:2024-01-02 12:50:26 公開日:2023-12-29

# クエリ計画ガイダンスによるデータベースエンジンのテスト

Testing Database Engines via Query Plan Guidance ( http://arxiv.org/abs/2312.17510v1 )

ライセンス: Link先を確認

Jinsheng Ba, Manuel Rigger

(参考訳) データベースシステムはデータの保存とクエリに広く使われている。テストオラクルは、そのようなシステム、すなわちデータベースシステムが誤った結果を計算する原因となるバグを見つけるために提案されている。完全自動化されたテストアプローチを実現するために、テストオラクルをテストケース生成技術と組み合わせ、テストケースは、データベースの状態とテストオラクルを適用可能なクエリを参照する。本研究では,自動テストの"興味深い"テストケースへの誘導を目的としたクエリプランガイダンス(QPG)の概念を提案する。 SQLや他のクエリ言語は宣言的です。したがって、クエリを実行するために、データベースシステムは、ソース言語内のすべての演算子を、実行可能な、いわゆる物理演算子の1つに翻訳する。私たちの直感は、さまざまなクエリ計画の探索に向けてテストを行うことによって、より興味深い振る舞いも探求する、ということです。そこで本研究では,データベースの状態に有望な変異を徐々に適用し,dbmsがその後のクエリに対して多様なクエリプランを作成する変異手法を提案する。我々はこの手法を,SQLite,TiDB,CockroachDBの3つの成熟した,広く使用されている,広範囲にテストされたデータベースシステムに適用した。提案手法は, 単純乱数生成法より4.85-408.48倍, コードカバレッジガイダンス法より7.46倍, クエリプランを設計する。商用のクエリプランを含むほとんどのデータベースシステムでは、qpgは一般的に適用可能なブラックボックスアプローチであると考えており、コアアイデアは他のコンテキスト(例えばテストスイートの品質を測定するために)にも適用できると考えています。

Database systems are widely used to store and query data. Test oracles have been proposed to find logic bugs in such systems, that is, bugs that cause the database system to compute an incorrect result. To realize a fully automated testing approach, such test oracles are paired with a test case generation technique; a test case refers to a database state and a query on which the test oracle can be applied. In this work, we propose the concept of Query Plan Guidance (QPG) for guiding automated testing towards "interesting" test cases. SQL and other query languages are declarative. Thus, to execute a query, the database system translates every operator in the source language to one of potentially many so-called physical operators that can be executed; the tree of physical operators is referred to as the query plan. Our intuition is that by steering testing towards exploring diverse query plans, we also explore more interesting behaviors-some of which are potentially incorrect. To this end, we propose a mutation technique that gradually applies promising mutations to the database state, causing the DBMS to create diverse query plans for subsequent queries. We applied our method to three mature, widely-used, and extensively-tested database systems-SQLite, TiDB, and CockroachDB-and found 53 unique, previously unknown bugs. Our method exercises 4.85-408.48X more unique query plans than a naive random generation method and 7.46X more than a code coverage guidance method. Since most database systems-including commercial ones-expose query plans to the user, we consider QPG a generally applicable, black-box approach and believe that the core idea could also be applied in other contexts (e.g., to measure the quality of a test suite).

翻訳日:2024-01-02 12:50:07 公開日:2023-12-29

# インスタンスレベルの感情音声変換のための注意型対話型ディスタングルネットワーク

Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion ( http://arxiv.org/abs/2312.17508v1 )

ライセンス: Link先を確認

Yun Chen, Lingxiao Yang, Qi Chen, Jian-Huang Lai, Xiaohua Xie

(参考訳) 感情音声変換は、非感情成分を維持しながら、所定の感情に従って音声を操作することを目的としている。既存のアプローチでは、きめ細かい感情的な特性を表現できない。本稿では,音声変換にインスタンスワイドな感情知識を活用する,意図に基づく対話型ディスタングネットワーク(AINN)を提案する。ステージiでは,言語間コントラスト学習(inter-speech contrastive learning)を利用して,きめ細かな感情をモデル化し,感情とコンテンツを分離する。ステージIIでは,多視点整合性機構による変換の正規化を提案する。この技術は、きめ細かい感情を伝達し、音声コンテンツを維持するのに役立つ。大規模な実験の結果、AINNは客観的指標と主観的指標の両方で最先端の成績を示している。

Emotional Voice Conversion aims to manipulate a speech according to a given emotion while preserving non-emotion components. Existing approaches cannot well express fine-grained emotional attributes. In this paper, we propose an Attention-based Interactive diseNtangling Network (AINN) that leverages instance-wise emotional knowledge for voice conversion. We introduce a two-stage pipeline to effectively train our network: Stage I utilizes inter-speech contrastive learning to model fine-grained emotion and intra-speech disentanglement learning to better separate emotion and content. In Stage II, we propose to regularize the conversion with a multi-view consistency mechanism. This technique helps us transfer fine-grained emotion and maintain speech content. Extensive experiments show that our AINN outperforms state-of-the-arts in both objective and subjective metrics.

翻訳日:2024-01-02 12:49:37 公開日:2023-12-29

# HIV-1に対する抗レトロウイルス治療成績予測のためのアウトオブディストリビューションロバスト性グラフニューラルネットワークモデル

A graph neural network-based model with Out-of-Distribution Robustness for enhancing Antiretroviral Therapy Outcome Prediction for HIV-1 ( http://arxiv.org/abs/2312.17506v1 )

ライセンス: Link先を確認

Giulia Di Teodoro, Federico Siciliano, Valerio Guarrasi, Anne-Mieke Vandamme, Valeria Ghisetti, Anders S\"onnerborg, Maurizio Zazzi, Fabrizio Silvestri, Laura Palagi

(参考訳) HIV-1に対する抗レトロウイルス療法の結果を予測することは、特に有効データが限られている薬物を含む治療体制において、非常に難しい臨床課題である。この不足は、新しい薬物が市場に導入されたか、臨床での使用が制限されたために生じる可能性がある。この問題に対処するために,完全連結(FC)ニューラルネットワークとグラフニューラルネットワーク(GNN)の機能を組み合わせた,新しいジョイントフュージョンモデルを導入する。 FCネットワークは、最新の遺伝子型抵抗試験で同定されたウイルス変異からなる特徴ベクターと、治療に用いられる薬物を用いた表型データを使用する。逆に、gnnは、ウイルスの遺伝配列に基づいて生体内治療の有効性を推定するためのベンチマーク基準となるスタンフォードの薬剤耐性変異テーブルから得られた知識を活用して、有益なグラフを構築する。テストセットにおけるアウト・オブ・ディストリビューション・ドラッグに対するこれらのモデルの堅牢性を評価するとともに,そのようなシナリオを扱う上でのGNNの役割に着目した。包括的分析により,提案モデルがFCモデル,特にアウト・オブ・ディストリビューション・ドラッグにおいて一貫した性能を示した。これらの結果は、スタンフォードのスコアをモデルに統合し、その一般化性と堅牢性を高めるという利点を強調すると同時に、データ可用性の制限された現実世界のアプリケーションでもその有用性を拡張する。本研究は,抗レトロウイルス療法の予後予測と,よりインフォームドな臨床判断に寄与するアプローチの可能性を強調した。

Predicting the outcome of antiretroviral therapies for HIV-1 is a pressing clinical challenge, especially when the treatment regimen includes drugs for which limited effectiveness data is available. This scarcity of data can arise either due to the introduction of a new drug to the market or due to limited use in clinical settings. To tackle this issue, we introduce a novel joint fusion model, which combines features from a Fully Connected (FC) Neural Network and a Graph Neural Network (GNN). The FC network employs tabular data with a feature vector made up of viral mutations identified in the most recent genotypic resistance test, along with the drugs used in therapy. Conversely, the GNN leverages knowledge derived from Stanford drug-resistance mutation tables, which serve as benchmark references for deducing in-vivo treatment efficacy based on the viral genetic sequence, to build informative graphs. We evaluated these models' robustness against Out-of-Distribution drugs in the test set, with a specific focus on the GNN's role in handling such scenarios. Our comprehensive analysis demonstrates that the proposed model consistently outperforms the FC model, especially when considering Out-of-Distribution drugs. These results underscore the advantage of integrating Stanford scores in the model, thereby enhancing its generalizability and robustness, but also extending its utility in real-world applications with limited data availability. This research highlights the potential of our approach to inform antiretroviral therapy outcome prediction and contribute to more informed clinical decisions.

翻訳日:2024-01-02 12:49:23 公開日:2023-12-29

# カモフラージュインスタンスセグメンテーションへのオープンボキャブラリー拡散の活用

Leveraging Open-Vocabulary Diffusion to Camouflaged Instance Segmentation ( http://arxiv.org/abs/2312.17505v1 )

ライセンス: Link先を確認

Tuan-Anh Vu, Duc Thanh Nguyen, Qing Guo, Binh-Son Hua, Nhat Minh Chung, Ivor W. Tsang, Sai-Kit Yeung

(参考訳) テキスト・ツー・イメージ拡散技術は、テキスト記述から高品質な画像を生成する素晴らしい能力を示している。これは、視覚領域とテキスト領域の間に強い相関があることを示している。さらに、CLIPのようなテキストイメージ識別モデルは、オープンな概念から利用できるリッチで多様な情報のおかげで、テキストプロンプトからのイメージラベリングが優れている。本稿では,これらの技術的進歩を活用し,コンピュータビジョンにおける課題を解決している。具体的には,オープンボキャブラリによって,迷彩物体表現の多元的テキスト的特徴を学習する権限を付与された,最先端の拡散モデルに基づく手法を提案する。このようなクロスドメイン表現は、視覚的手がかりが微妙であるカモフラージュされたオブジェクトのセグメンテーションにおいて、特に訓練で見えない新しいオブジェクトのセグメンテーションにおいて望ましい。また、ドメイン間機能を効果的に融合し、各フォアグラウンドオブジェクトに対して関連する機能を関与させる技術支援コンポーネントも開発しています。提案手法を検証し,カモフラージュされたインスタンスセグメンテーションと一般のオープン語彙インスタンスセグメンテーションのベンチマークデータセット上で既存手法と比較する。提案手法の既存手法に対する進歩を実験的に検証した。将来の研究をサポートするために、コードと事前訓練されたモデルを公開します。

Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions. This indicates that there exists a strong correlation between the visual and textual domains. In addition, text-image discriminative models such as CLIP excel in image labelling from text prompts, thanks to the rich and diverse information available from open concepts. In this paper, we leverage these technical advances to solve a challenging problem in computer vision: camouflaged instance segmentation. Specifically, we propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations. Such cross-domain representations are desirable in segmenting camouflaged objects where visual cues are subtle to distinguish the objects from the background, especially in segmenting novel objects which are not seen in training. We also develop technically supportive components to effectively fuse cross-domain features and engage relevant features towards respective foreground objects. We validate our method and compare it with existing ones on several benchmark datasets of camouflaged instance segmentation and generic open-vocabulary instance segmentation. Experimental results confirm the advances of our method over existing ones. We will publish our code and pre-trained models to support future research.

翻訳日:2024-01-02 12:48:53 公開日:2023-12-29

# hibid:階層的オフライン深層強化学習による予算配分を伴うクロスチャネル制約入札システム

HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning ( http://arxiv.org/abs/2312.17503v1 )

ライセンス: Link先を確認

Hao Wang, Bo Tang, Chi Harold Liu, Shangqin Mao, Jiahong Zhou, Zipeng Dai, Yaqi Sun, Qianlong Xie, Xingxing Wang, Dong Wang

(参考訳) オンラインディスプレイ広告プラットフォームは、毎日何十億もの広告要求に対してリアルタイム入札(RTB)を提供することで、多くの広告主にサービスを提供する。入札戦略は、広告リクエストを複数のチャネルにまたがって処理し、設定された財務上の制約、すなわち総予算とクリック単価(cpc)などに基づくクリック数を最大化する。単一チャネル入札を主軸とする既存事業とは違って,予算配分を伴うクロスチャネル制約入札を明示的に検討する。具体的には,非競争予算配分のための補助損失を備えた高レベルプランナーと,予算割り当てに応じて適応入札戦略を行うデータ拡張型低レベル実行ツールからなる,階層型オフライン深層強化学習(drl)フレームワーク「hibid」を提案する。さらに、チャネル間CPC制約を満たすために、CPC誘導動作選択機構を導入する。大規模ログデータとオンラインA/Bテストの両方に関する広範な実験を通じて、HiBidはクリック数、CPC満足率、投資率(ROI)において6つのベースラインを上回っていることを確認した。また、HiBid on Meituanの広告プラットホームも展開しており、毎日何万もの広告主が利用している。

Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., total budget and cost-per-click (CPC), etc. Different from existing works mainly focusing on single channel bidding, we explicitly consider cross-channel constrained bidding with budget allocation. Specifically, we propose a hierarchical offline deep reinforcement learning (DRL) framework called ``HiBid'', consisted of a high-level planner equipped with auxiliary loss for non-competitive budget allocation, and a data augmentation enhanced low-level executor for adaptive bidding strategy in response to allocated budgets. Additionally, a CPC-guided action selection mechanism is introduced to satisfy the cross-channel CPC constraint. Through extensive experiments on both the large-scale log data and online A/B testing, we confirm that HiBid outperforms six baselines in terms of the number of clicks, CPC satisfactory ratio, and return-on-investment (ROI). We also deploy HiBid on Meituan advertising platform to already service tens of thousands of advertisers every day.

翻訳日:2024-01-02 12:48:29 公開日:2023-12-29

# 正三部絡みの忠実な幾何学的尺度

Faithful geometric measures for genuine tripartite entanglement ( http://arxiv.org/abs/2312.17496v1 )

ライセンス: Link先を確認

Xiaozhen Ge, Yong Wang, Yu Xiang, Guofeng Zhang, Lijun Liu, Li Li, and Shuming Cheng

(参考訳) 離散的,連続的,ハイブリッドな量子系の真の三部構造交絡に対する忠実な幾何学図式を示す。まず、三角形関係 $\mathcal{E}^\alpha_{i|jk}\leq \mathcal{E}^\alpha_{j|ik}+\mathcal{E}^\alpha_{k|ij}$ は、すべての部分加法的二部分エンタングルメント測度 $\mathcal{E}$ 、すべてのパーティー $i, j, k$ 、すべての$\alpha \in [0, 1]$ と全ての純三部分状態に対して成り立つ。幾何学的解釈では、$\mathcal{E}^\alpha$ で測られる二分交絡は三角形の側面に対応し、$\alpha \in (0, 1)$ の面積が 0 でないのは、基底状態が真に絡み合っている場合に限りである。すると、0<\alpha\leq 1/2$ の非可視三角形領域を厳密に証明する。これらの測度に対する有用な下界と上界が得られ、その結果の一般化も示される。最後に、半加法および非加法測度の集合が与えられたとき、いくつかの状態は常に任意の$\alpha>1$の三角関係に違反し、三角領域は任意の$\alpha>1/2$の尺度ではないことが証明される。したがって, 離散的および連続的多成分絡み合いの研究において, 有意な進展が期待できる。

We present a faithful geometric picture for genuine tripartite entanglement of discrete, continuous, and hybrid quantum systems. We first find that the triangle relation $\mathcal{E}^\alpha_{i|jk}\leq \mathcal{E}^\alpha_{j|ik}+\mathcal{E}^\alpha_{k|ij}$ holds for all subadditive bipartite entanglement measure $\mathcal{E}$, all permutations under parties $i, j, k$, all $\alpha \in [0, 1]$, and all pure tripartite states. It provides a geometric interpretation that bipartition entanglement, measured by $\mathcal{E}^\alpha$, corresponds to the side of a triangle, of which the area with $\alpha \in (0, 1)$ is nonzero if and only if the underlying state is genuinely entangled. Then, we rigorously prove the non-obtuse triangle area with $0<\alpha\leq 1/2$ is a measure for genuine tripartite entanglement. Useful lower and upper bounds for these measures are obtained, and generalizations of our results are also presented. Finally, it is significantly strengthened for qubits that, given a set of subadditive and non-additive measures, some state is always found to violate the triangle relation for any $\alpha>1$, and the triangle area is not a measure for any $\alpha>1/2$. Hence, our results are expected to aid significant progress in studying both discrete and continuous multipartite entanglement.

翻訳日:2024-01-02 12:48:05 公開日:2023-12-29

# 薬物特性予測のためのマルチモーダル融合深層学習における化学言語と分子グラフの統合

Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction ( http://arxiv.org/abs/2312.17495v1 )

ライセンス: Link先を確認

Xiaohua Lu, Liangxu Xie, Lei Xu, Rongzhi Mao, Shan Chang, Xiaojun Xu

(参考訳) 正確な分子特性の予測は難しいが、薬物発見には不可欠である。近年,分子特性予測に多くのモノモーダル深層学習法が応用されている。しかし、モノモダル学習の固有の制限は、分子表現の1つのモダリティのみに依存することであり、薬物分子の包括的理解を制限し、データノイズに対する反発を阻害する。この制限を克服するため,分子表現の異なるマルチモーダル深層学習モデルを構築した。薬物分子を3つの分子表現、SMILES符号化ベクター、ECFP指紋、分子グラフに変換する。モーダル情報処理には、トランスフォーマーエンコーダ、双方向ゲートリカレントユニット(BiGRU)、グラフ畳み込みネットワーク(GCN)をそれぞれ利用し、相補的および自然的に発生する生体情報を取得するモデル能力を向上することができる。 6分子データを用いたトリプルモーダルモデルの評価を行った。バイモーダル学習モデルと異なり、特定の特徴を捉え、各モーダル情報の寄与をよりよく活用するための5つの融合手法を採用する。モノモーダルモデルと比較すると,マルチモーダルフューズドディープラーニング(MMFDL)モデルは単一モデルよりも精度,信頼性,耐雑音性に優れている。さらに,PDBbindの精製集合におけるタンパク質-リガンド複合体分子の結合定数の予測における一般化能力を示す。マルチモーダルモデルの利点は、適切なモデルと適切な融合法を用いて多様なデータソースを処理する能力にある。

Accurately predicting molecular properties is a challenging but essential task in drug discovery. Recently, many mono-modal deep learning methods have been successfully applied to molecular property prediction. However, the inherent limitation of mono-modal learning arises from relying solely on one modality of molecular representation, which restricts a comprehensive understanding of drug molecules and hampers their resilience against data noise. To overcome the limitations, we construct multimodal deep learning models to cover different molecular representations. We convert drug molecules into three molecular representations, SMILES-encoded vectors, ECFP fingerprints, and molecular graphs. To process the modal information, Transformer-Encoder, bi-directional gated recurrent units (BiGRU), and graph convolutional network (GCN) are utilized for feature learning respectively, which can enhance the model capability to acquire complementary and naturally occurring bioinformatics information. We evaluated our triple-modal model on six molecule datasets. Different from bi-modal learning models, we adopt five fusion methods to capture the specific features and leverage the contribution of each modal information better. Compared with mono-modal models, our multimodal fused deep learning (MMFDL) models outperform single models in accuracy, reliability, and resistance capability against noise. Moreover, we demonstrate its generalization ability in the prediction of binding constants for protein-ligand complex molecules in the refined set of PDBbind. The advantage of the multimodal model lies in its ability to process diverse sources of data using proper models and suitable fusion methods, which would enhance the noise resistance of the model while obtaining data diversity.

翻訳日:2024-01-02 12:47:25 公開日:2023-12-29

# qgface: 混合品質顔認識のための品質誘導共同学習

QGFace: Quality-Guided Joint Training For Mixed-Quality Face Recognition ( http://arxiv.org/abs/2312.17494v1 )

ライセンス: Link先を確認

Youzhe Song and Feng Wang

(参考訳) 画像中の顔作物の品質は、カメラの解像度、距離、照明条件などの多くの要因によって決定される。これにより、現実的なアプリケーションでは、異なる品質の顔画像の識別が困難な問題となる。しかし、既存のほとんどのアプローチは、高品質(HQ)または低品質(LQ)の画像に特化して設計されており、その性能は混合画質で劣化する。また, 事前訓練した特徴抽出器や他の補助構造を用いて, 訓練と評価を支援する手法も多数存在する。本稿では,本部画像とLQ画像の両方を同時に理解するための鍵は,その特性に応じて異なる学習手法を適用することである,と指摘する。本論文では,単一エンコーダで異なる品質の画像を同時に学習できる,混合品質顔認識のための新しい品質誘導共同学習手法を提案する。品質分割に基づいて、hqデータ学習に分類に基づく手法を用いる。一方,id情報を持たないlq画像では,自己教師付き画像コントラスト学習によって学習する。モデル更新を効果的にキャッチアップし、協調訓練シナリオにおけるコントラスト学習の識別性を向上させるために、真のエンコーダの機能を備えたコントラストペアを構成するプロキシ更新リアルタイムキューを提案する。低品質データセットSCfaceとTinyface、混合品質データセットIJB-B、および5つの高品質データセットの実験は、異なる品質の顔画像を認識するための提案手法の有効性を示した。

The quality of a face crop in an image is decided by many factors such as camera resolution, distance, and illumination condition. This makes the discrimination of face images with different qualities a challenging problem in realistic applications. However, most existing approaches are designed specifically for high-quality (HQ) or low-quality (LQ) images, and the performances would degrade for the mixed-quality images. Besides, many methods ask for pre-trained feature extractors or other auxiliary structures to support the training and the evaluation. In this paper, we point out that the key to better understand both the HQ and the LQ images simultaneously is to apply different learning methods according to their qualities. We propose a novel quality-guided joint training approach for mixed-quality face recognition, which could simultaneously learn the images of different qualities with a single encoder. Based on quality partition, classification-based method is employed for HQ data learning. Meanwhile, for the LQ images which lack identity information, we learn them with self-supervised image-image contrastive learning. To effectively catch up the model update and improve the discriminability of contrastive learning in our joint training scenario, we further propose a proxy-updated real-time queue to compose the contrastive pairs with features from the genuine encoder. Experiments on the low-quality datasets SCface and Tinyface, the mixed-quality dataset IJB-B, and five high-quality datasets demonstrate the effectiveness of our proposed approach in recognizing face images of different qualities.

翻訳日:2024-01-02 12:46:58 公開日:2023-12-29

# フェデレート学習を用いた大規模言語モデルの微分プライベート低ランク適応

Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning ( http://arxiv.org/abs/2312.17493v1 )

ライセンス: Link先を確認

Xiao-Yang Liu, Rongyi Zhu, Daochen Zha, Jiechao Gao, Shan Zhong, Meikang Qiu

(参考訳) 大型言語モデル(LLM)の関心の高まりと応用は、金融や医学などの特定の応用に適合するように、これらのモデルを微調整するきっかけとなった。しかし、データプライバシに関する懸念は、特に複数の利害関係者が機密データを使用してLLMを協調的に強化しようとするときに現れている。このシナリオでは、連合学習は自然な選択となり、生データを中央サーバに公開することなく、分散微調整を可能にする。そこで本研究では,LLMにおけるデータプライバシを,実践的なフェデレーション学習アプローチを通じて微調整し,複数のパーティからのセキュアなコントリビューションによるLLMの強化を実現する方法について検討する。しかし、課題が生じる。 1)生データ露出を避けつつも、モデル出力からセンシティブな情報を推測するリスクがある。 2) LLM の連合学習は, 通信オーバーヘッドが顕著である。これらの課題に対処するために,本論文では,LLMに適した新しいフェデレーション学習アルゴリズムDP-LoRAを紹介する。 dp-loraは、重み付け更新にノイズを追加し、個々のデータプライバシを維持しながら、協調的なモデルトレーニングを促進するガウス機構を採用することで、データのプライバシを保護する。さらに、DP-LoRAは、低ランク適応による通信効率を最適化し、分散トレーニング中の更新重みの伝達を最小化する。様々なLCMを用いた医療、財務、一般データセットにわたる実験結果から、DP-LoRAは通信オーバーヘッドを最小限にしつつ、厳格なプライバシー制約を効果的に保証することを示した。

The surge in interest and application of large language models (LLMs) has sparked a drive to fine-tune these models to suit specific applications, such as finance and medical science. However, concerns regarding data privacy have emerged, especially when multiple stakeholders aim to collaboratively enhance LLMs using sensitive data. In this scenario, federated learning becomes a natural choice, allowing decentralized fine-tuning without exposing raw data to central servers. Motivated by this, we investigate how data privacy can be ensured in LLM fine-tuning through practical federated learning approaches, enabling secure contributions from multiple parties to enhance LLMs. Yet, challenges arise: 1) despite avoiding raw data exposure, there is a risk of inferring sensitive information from model outputs, and 2) federated learning for LLMs incurs notable communication overhead. To address these challenges, this article introduces DP-LoRA, a novel federated learning algorithm tailored for LLMs. DP-LoRA preserves data privacy by employing a Gaussian mechanism that adds noise in weight updates, maintaining individual data privacy while facilitating collaborative model training. Moreover, DP-LoRA optimizes communication efficiency via low-rank adaptation, minimizing the transmission of updated weights during distributed training. The experimental results across medical, financial, and general datasets using various LLMs demonstrate that DP-LoRA effectively ensures strict privacy constraints while minimizing communication overhead.

翻訳日:2024-01-02 12:46:33 公開日:2023-12-29

# HEAP:Contrastive Groupingによる教師なしオブジェクト発見とローカライゼーション

HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping ( http://arxiv.org/abs/2312.17492v1 )

ライセンス: Link先を確認

Xin Zhang, Jinheng Xie, Yuan Yuan, Michael Bi Mi, Robby T. Tan

(参考訳) 教師なしオブジェクト発見とローカライゼーション(unsupervised object discovery and localization)は、監視なしで画像内のオブジェクトを検出または分割することを目的としている。近年の取り組みは、自己監督型トランスフォーマー機能を利用して、有能な前景物体を識別する顕著な可能性を実証している。しかし、そのスコープはイメージ内のパッチレベルの機能のみの上に構築され、領域/イメージレベルとクロスイメージの関係をより広いスケールで無視する。さらに、これらの方法は複数のインスタンスと様々なセマンティクスを区別できない。これらの問題に対処するため,Herarchical mErging framework via contrAstive grouPing (HEAP)を提案する。具体的には,自己教師付き特徴間の相関に基づいて画像内パッチを意味的にコヒーレントな領域に適応的にグループ化するクロスアテンション機構を備えた新しい軽量ヘッドを提案する。さらに,各領域間の識別性を確保するため,画像にまたがる類似領域を絞り込むために,領域レベルのコントラストクラスタリング損失を導入する。また、フォアグラウンドと背景表現を分離するために画像レベルのコントラスト損失が存在し、それによってフォアグラウンドオブジェクトと背景が発見される。 HEAPは効率的な階層的な画像分解を容易にし、より正確なオブジェクト発見に寄与すると同時に、様々なクラスのオブジェクトの区別を可能にする。セマンティックセグメンテーション検索、教師なしオブジェクト発見、およびサリエンシ検出タスクに関する大規模な実験結果は、HEAPが最先端のパフォーマンスを達成することを示す。

Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision. Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features. However, their scopes only build upon patch-level features within an image, neglecting region/image-level and cross-image relationships at a broader scale. Moreover, these methods cannot differentiate various semantics from multiple instances. To address these problems, we introduce Hierarchical mErging framework via contrAstive grouPing (HEAP). Specifically, a novel lightweight head with cross-attention mechanism is designed to adaptively group intra-image patches into semantically coherent regions based on correlation among self-supervised features. Further, to ensure the distinguishability among various regions, we introduce a region-level contrastive clustering loss to pull closer similar regions across images. Also, an image-level contrastive loss is present to push foreground and background representations apart, with which foreground objects and background are accordingly discovered. HEAP facilitates efficient hierarchical image decomposition, which contributes to more accurate object discovery while also enabling differentiation among objects of various classes. Extensive experimental results on semantic segmentation retrieval, unsupervised object discovery, and saliency detection tasks demonstrate that HEAP achieves state-of-the-art performance.

翻訳日:2024-01-02 12:46:04 公開日:2023-12-29

# 双曲偏微分方程式の作用素学習

Operator learning for hyperbolic partial differential equations ( http://arxiv.org/abs/2312.17489v1 )

ライセンス: Link先を確認

Christopher Wang and Alex Townsend

(参考訳) 本研究では,2変数の双曲偏微分方程式(pde)の解演算子を入出力訓練ペアから復元する最初の厳密な確率論的アルゴリズムを構築した。双曲型PDEの解作用素を復元する主な課題は特性の存在であり、それに伴うグリーン関数は不連続である。したがって,本アルゴリズムの中心的な構成要素は,特徴の近似位置を特定するランク検出方式である。ランダム化された特異値分解とドメインの適応的階層分割を組み合わせることで、演算子ノルムにおいて$O(\Psi_\epsilon^{-1}\epsilon^{-7}\log(\Xi_\epsilon^{-1}\epsilon^{-1}))$入力出力ペアに対して相対誤差$O(\Xi_\epsilon^{-1}\epsilon)$を$\epsilon\to0$と高確率で構築する。ここで、$\Psi_\epsilon$は解演算子の退化特異値の存在を表し、$\Xi_\epsilon$はトレーニングデータの品質を測定する。双曲 pde の係数の正則性に関する仮定は、双曲 pde が楕円型および放物型 pde の ‘instantaneous smoothing effect’' を持たないことを考慮すれば相対的に弱く、係数の正則性が増加するにつれて回復率は向上する。

We construct the first rigorously justified probabilistic algorithm for recovering the solution operator of a hyperbolic partial differential equation (PDE) in two variables from input-output training pairs. The primary challenge of recovering the solution operator of hyperbolic PDEs is the presence of characteristics, along which the associated Green's function is discontinuous. Therefore, a central component of our algorithm is a rank detection scheme that identifies the approximate location of the characteristics. By combining the randomized singular value decomposition with an adaptive hierarchical partition of the domain, we construct an approximant to the solution operator using $O(\Psi_\epsilon^{-1}\epsilon^{-7}\log(\Xi_\epsilon^{-1}\epsilon^{-1}))$ input-output pairs with relative error $O(\Xi_\epsilon^{-1}\epsilon)$ in the operator norm as $\epsilon\to0$, with high probability. Here, $\Psi_\epsilon$ represents the existence of degenerate singular values of the solution operator, and $\Xi_\epsilon$ measures the quality of the training data. Our assumptions on the regularity of the coefficients of the hyperbolic PDE are relatively weak given that hyperbolic PDEs do not have the ``instantaneous smoothing effect'' of elliptic and parabolic PDEs, and our recovery rate improves as the regularity of the coefficients increases.

翻訳日:2024-01-02 12:45:39 公開日:2023-12-29

# 予測プロセスモニタリングのための解釈可能かつ説明可能な機械学習手法:体系的文献レビュー

Interpretable and Explainable Machine Learning Methods for Predictive Process Monitoring: A Systematic Literature Review ( http://arxiv.org/abs/2312.17584v1 )

ライセンス: Link先を確認

Nijat Mehdiyev, Maxim Majlatow and Peter Fettke

(参考訳) 本稿では, PRISMAフレームワークを用いて, 予測プロセスマイニングの文脈における機械学習モデルの説明可能性と解釈可能性について, 体系的文献レビュー(SLR)を提案する。人工知能(AI)とMLシステムの急速な進歩を踏まえ、これらの技術の「ブラックボックス」の性質を理解することがますます重要になっている。プロセスマイニングの領域に特化して、複雑なビジネスプロセスデータでトレーニングされたMLモデルを解釈する際の課題を考察する。我々は本質的に解釈可能なモデルとポストホックな説明技術を必要とするモデルとを区別し、現在の方法論とそれらの様々なアプリケーションドメインにまたがるアプリケーションの概要を提供する。本研究は厳密な書誌分析を通じて,予測プロセスマイニングにおける説明可能性と解釈可能性の状態を詳細に合成し,重要な傾向,課題,今後の方向性を明らかにする。本研究の目的は,より信頼性が高く,透明性が高く,効果的な知的システムを開発・実装する方法について,研究者や実践者により深く理解させることである。

This paper presents a systematic literature review (SLR) on the explainability and interpretability of machine learning (ML) models within the context of predictive process mining, using the PRISMA framework. Given the rapid advancement of artificial intelligence (AI) and ML systems, understanding the "black-box" nature of these technologies has become increasingly critical. Focusing specifically on the domain of process mining, this paper delves into the challenges of interpreting ML models trained with complex business process data. We differentiate between intrinsically interpretable models and those that require post-hoc explanation techniques, providing a comprehensive overview of the current methodologies and their applications across various application domains. Through a rigorous bibliographic analysis, this research offers a detailed synthesis of the state of explainability and interpretability in predictive process mining, identifying key trends, challenges, and future directions. Our findings aim to equip researchers and practitioners with a deeper understanding of how to develop and implement more trustworthy, transparent, and effective intelligent systems for predictive process analytics.

翻訳日:2024-01-02 10:18:55 公開日:2023-12-29

# 行動項目駆動による長文の要約

Action-Item-Driven Summarization of Long Meeting Transcripts ( http://arxiv.org/abs/2312.17581v1 )

ライセンス: Link先を確認

Logan Golia, Jugal Kalita

(参考訳) オンライン会議の普及の増加は、所定の会議の概要を自動的に生成できるモデルの実用性を大幅に向上させた。本稿では,ミーティングサマリーの生成を自動化する新しい,効果的なアプローチを提案する。この問題に対する現在のアプローチは、ミーティングを単に長い対話として考えることで、一般的かつ基本的な要約を生み出している。しかし,本アルゴリズムでは,会議書に含まれるアクション項目によって駆動される抽象的な会議要約を生成することができる。これは、要約を逐次生成し、ミーティングの各セクションを並列にアクションイテム抽出アルゴリズムを用いて行う。これらのセクションのサマリーはすべて結合され、コヒーレントかつアクション・テーマ駆動のサマリを作成するためにまとめられる。さらに,長文をトピックベースのセクションに分割することで,アルゴリズムの時間効率を向上させるとともに,長期依存を忘れる大規模言語モデル(LLM)の問題を解決するための3つの新しい手法を提案する。我々のパイプラインは、AMIコーパス全体で64.98のBERTSスコアを達成した。これは、細調整されたBART(Bidirectional and Auto-Regressive Transformers)モデルによって生成された現在の最先端結果から約4.98%の増加である。

The increased prevalence of online meetings has significantly enhanced the practicality of a model that can automatically generate the summary of a given meeting. This paper introduces a novel and effective approach to automate the generation of meeting summaries. Current approaches to this problem generate general and basic summaries, considering the meeting simply as a long dialogue. However, our novel algorithms can generate abstractive meeting summaries that are driven by the action items contained in the meeting transcript. This is done by recursively generating summaries and employing our action-item extraction algorithm for each section of the meeting in parallel. All of these sectional summaries are then combined and summarized together to create a coherent and action-item-driven summary. In addition, this paper introduces three novel methods for dividing up long transcripts into topic-based sections to improve the time efficiency of our algorithm, as well as to resolve the issue of large language models (LLMs) forgetting long-term dependencies. Our pipeline achieved a BERTScore of 64.98 across the AMI corpus, which is an approximately 4.98% increase from the current state-of-the-art result produced by a fine-tuned BART (Bidirectional and Auto-Regressive Transformers) model.

翻訳日:2024-01-02 10:18:35 公開日:2023-12-29

# 分布型低ランク埋め込み

Distribution-based Low-rank Embedding ( http://arxiv.org/abs/2312.17579v1 )

ライセンス: Link先を確認

Bardia Yousefi

(参考訳) 乳房異常の早期発見は重要な課題である。特に,赤外線サーモグラフィーは乳癌検診や臨床検診(CBE)に有用である。不均質な熱パターンを測定することは、行列分解法によって達成される計算力学サーモグラフィを組み込む鍵である。これらのアプローチは、熱系列全体から主要な熱パターンを抽出することに焦点を当てている。しかし, 時間的変化を効果的に表現する支配的イメージを歌唱する作業は, 計算サーモグラフィの分野において難解な課題である。本稿では,この課題に対する2つの新しい戦略として,固有ベクトル (jse) とワイブル埋め込みアプローチに対するjames-steinの適用を提案する。主な目的は、熱データストリームの低次元(LD)表現を作ることである。このLD近似は、早期乳癌検出のための熱力学を抽出し、最適化されたハイパーパラメーターで分類モデルを訓練する基盤となる。さらに, 種々の埋め込み結合と行列分解法の比較解析を行う。提案手法は,Weibull 埋め込みを用いた分類精度が81.7% (+/-5.2%) であり,従来提案した他の埋め込み手法よりも優れていることを示す。比較分析では、Sparse PCTとDeep SemiNMFは、それぞれ80.9%と78.6%が最も高い精度を示した。これらの結果から,jseとweibullの埋込み技術は,cbeの改善につながるバイオマーカーとして重要な熱パターンを保存し,乳癌の早期発見を可能にすることが示唆された。

The early detection of breast abnormalities is a matter of critical significance. Notably, infrared thermography has emerged as a valuable tool in breast cancer screening and clinical breast examination (CBE). Measuring heterogeneous thermal patterns is the key to incorporating computational dynamic thermography, which can be achieved by matrix factorization techniques. These approaches focus on extracting the predominant thermal patterns from the entire thermal sequence. Yet, the task of singling out the dominant image that effectively represents the prevailing temporal changes remains a challenging pursuit within the field of computational thermography. In this context, we propose applying James-Stein for eigenvector (JSE) and Weibull embedding approaches, as two novel strategies in response to this challenge. The primary objective is to create a low-dimensional (LD) representation of the thermal data stream. This LD approximation serves as the foundation for extracting thermomics and training a classification model with optimized hyperparameters, for early breast cancer detection. Furthermore, we conduct a comparative analysis of various embedding adjuncts to matrix factorization methods. The results of the proposed method indicate an enhancement in the projection of the predominant basis vector, yielding classification accuracy of 81.7% (+/-5.2%) using Weibull embedding, which outperformed other embedding approaches we proposed previously. In comparison analysis, Sparse PCT and Deep SemiNMF showed the highest accuracies having 80.9% and 78.6%, respectively. These findings suggest that JSE and Weibull embedding techniques substantially help preserve crucial thermal patterns as a biomarker leading to improved CBE and enabling the very early detection of breast cancer.

翻訳日:2024-01-02 10:18:13 公開日:2023-12-29

# カードの量子ハウス

The Quantum House Of Cards ( http://arxiv.org/abs/2312.17570v1 )

ライセンス: Link先を確認

Xavier Waintal

(参考訳) 量子コンピュータは、新しい薬物の発見、肥料生産のための新しい触媒、暗号化プロトコルの破断、金融ポートフォリオの最適化、新しい人工知能アプリケーションの実装など、多くの重要な問題を解決するために提案されている。しかし、これまでは3から5への乗算のような単純なタスクは、既存の量子ハードウェアを超えている。本稿では、量子コンピュータが約束を果たすためには、解決すべき困難について検討する。私は、最上位層(実際のアルゴリズムと関連するアプリケーション)から最下位層(量子ハードウェア、その制御電子回路、低温工学など)まで量子コンピュータを構築することを想定された技術のスタック全体について、量子エラー訂正の重要な中間層を忘れずに議論します。

Quantum computers have been proposed to solve a number of important problems such as discovering new drugs, new catalysts for fertilizer production, breaking encryption protocols, optimizing financial portfolios, or implementing new artificial intelligence applications. Yet, to date, a simple task such as multiplying 3 by 5 is beyond existing quantum hardware. This article examines the difficulties that would need to be solved for quantum computers to live up to their promises. I discuss the whole stack of technologies that has been envisioned to build a quantum computer from the top layers (the actual algorithms and associated applications) down to the very bottom ones (the quantum hardware, its control electronics, cryogeny, etc.) while not forgetting the crucial intermediate layer of quantum error correction.

翻訳日:2024-01-02 10:17:47 公開日:2023-12-29

# Few-Shot Neural Radiance Fieldのインフォームティブ光選択

Informative Rays Selection for Few-Shot Neural Radiance Fields ( http://arxiv.org/abs/2312.17561v1 )

ライセンス: Link先を確認

Marco Orsingher, Anthony Dell'Eva, Paolo Zani, Paolo Medici, Massimo Bertozzi

(参考訳) NeRF(Neural Radiance Fields)は最近、画像ベースの3D再構成の強力な方法として登場したが、シーンごとの長い最適化は、特にリソース制約のある環境での実用的利用を制限する。既存のアプローチでは、入力ビューの数を減らし、複雑な損失または他のモダリティからの追加入力で学習されたボリューム表現を規則化する。本稿では,キー情報線に着目して,数ショットシナリオにおけるNeRFの簡易かつ効果的なトレーニング方法であるKeyNeRFを提案する。このような光線は、まず、シーンのカバレッジを保証しながらベースラインの多様性を促進するビューセレクションアルゴリズムによりカメラレベルで選択され、その後、ローカル画像エントロピーに基づく確率分布からのサンプリングにより画素レベルで選択される。提案手法は,既存のNeRFコードベースの変更を最小限に抑えつつ,最先端の手法に対して良好に機能する。

Neural Radiance Fields (NeRF) have recently emerged as a powerful method for image-based 3D reconstruction, but the lengthy per-scene optimization limits their practical usage, especially in resource-constrained settings. Existing approaches solve this issue by reducing the number of input views and regularizing the learned volumetric representation with either complex losses or additional inputs from other modalities. In this paper, we present KeyNeRF, a simple yet effective method for training NeRF in few-shot scenarios by focusing on key informative rays. Such rays are first selected at camera level by a view selection algorithm that promotes baseline diversity while guaranteeing scene coverage, then at pixel level by sampling from a probability distribution based on local image entropy. Our approach performs favorably against state-of-the-art methods, while requiring minimal changes to existing NeRF codebases.

翻訳日:2024-01-02 10:17:34 公開日:2023-12-29

# 神経節下出血後の頭部CTにおける深部血行再建のためのスキントランスフォーマーを用いた完全自動化パイプライン

A Fully Automated Pipeline Using Swin Transformers for Deep Learning-Based Blood Segmentation on Head CT Scans After Aneurysmal Subarachnoid Hemorrhage ( http://arxiv.org/abs/2312.17553v1 )

ライセンス: Link先を確認

Sergio Garcia Garcia, Santiago Cepeda, Ignacio Arrese, Rosario Sarabia

(参考訳) 背景: 自発性くも膜下出血(SAH)の正確な容積評価は, その臨床的, 予後に関連があると思われる, 現在の手動・半自動的手法による労働集約的な作業である。本研究では,非コントラストCT(noncontrast Computed Tomography, NCCT)スキャンを用いて, トランスフォーマーをベースとしたSwin UNETRアーキテクチャを用いて, SAH患者に対して, 人工知能による完全自動血液セグメンテーションツールを開発した。方法:Swin UNETRを用いた大動脈瘤下出血(aSAH)患者のNCCTスキャンを経時的に解析した。提案手法は,diceスコア,intersection over union (iou), volumetric similarity index (vsi), 対称平均表面距離 (sasd), 感度, 特異性などの指標を用いて, 手動セグメンテッド・グラウンド・真実データに対する性能評価を行った。モデルの一般化性をテストするために,外部機関からの検証コホートを組み込んだ。結果: モデルでは, 内部および外部の検証コホートにまたがって, 堅牢な性能指標で高い精度を示した。特に高Dice係数 (0.873), IoU (0.810), VSI (0.840), 感度 (0.821), 特異度 (0.996) およびSASD (1.866) を達成し,SAH患者の血液分画能を示唆した。モデルの効率は処理速度に反映され、リアルタイムアプリケーションの可能性を示している。結論: 本モデルでは, ncct 画像における asah 後の血液自動分画の大幅な進歩を示す。計算強度にもかかわらず、このモデルはユーザフレンドリーなインターフェースを備えた標準ハードウェアで効果的に動作し、より広範な臨床応用を促進する。様々なデータセットのさらなる検証は、その臨床的信頼性を確認することが保証される。

Background: Accurate volumetric assessment of spontaneous subarachnoid hemorrhage (SAH) is a labor-intensive task performed with current manual and semiautomatic methods that might be relevant for its clinical and prognostic implications. In the present research, we sought to develop and validate an artificial intelligence-driven, fully automated blood segmentation tool for SAH patients via noncontrast computed tomography (NCCT) scans employing a transformer-based Swin UNETR architecture. Methods: We retrospectively analyzed NCCT scans from patients with confirmed aneurysmal subarachnoid hemorrhage (aSAH) utilizing the Swin UNETR for segmentation. The performance of the proposed method was evaluated against manually segmented ground truth data using metrics such as Dice score, intersection over union (IoU), the volumetric similarity index (VSI), the symmetric average surface distance (SASD), and sensitivity and specificity. A validation cohort from an external institution was included to test the generalizability of the model. Results: The model demonstrated high accuracy with robust performance metrics across the internal and external validation cohorts. Notably, it achieved high Dice coefficient (0.873), IoU (0.810), VSI (0.840), sensitivity (0.821) and specificity (0.996) values and a low SASD (1.866), suggesting proficiency in segmenting blood in SAH patients. The model's efficiency was reflected in its processing speed, indicating potential for real-time applications. Conclusions: Our Swin UNETR-based model offers significant advances in the automated segmentation of blood after aSAH on NCCT images. Despite the computational intensity, the model operates effectively on standard hardware with a user-friendly interface, facilitating broader clinical adoption. Further validation across diverse datasets is warranted to confirm its clinical reliability.

翻訳日:2024-01-02 10:17:16 公開日:2023-12-29

# 自然言語推論を用いた効率的なユニバーサル分類器の構築

Building Efficient Universal Classifiers with Natural Language Inference ( http://arxiv.org/abs/2312.17543v1 )

ライセンス: Link先を確認

Moritz Laurer, Wouter van Atteveldt, Andreu Casas, Kasper Welbers

(参考訳) 生成型大言語モデル(llm)は、テキスト生成の普遍性のおかげで、マイノリティショットとゼロショット学習の主流となっている。しかし、多くのユーザーは、分類タスクを自動化したい場合にのみ、生成LDMの幅広い機能を必要としない。より小さなbertライクなモデルは普遍的なタスクも学べるので、細かいチューニング(ゼロショットの分類)を必要とせず、新しいタスクをほんの数例(fewshot)で学べる一方で、生成型llmよりもはるかに効率的である。本稿では、自然言語推論(nli)を、生成型llmの命令の微調整として類似した原則に従う普遍的分類タスクとして用いる方法を説明し、(2)普遍的分類器を構築するための再利用可能なjupyterノートブックによるステップバイステップガイドを提供し、389のクラスで33のデータセットで訓練された結果の普遍的分類器を共有する。私たちが共有しているコードの一部は、2023年12月時点で5500万回以上ダウンロードされた古いゼロショット分類器のトレーニングに使用されています。我々の新しい分類器はゼロショット性能を9.4%向上させる。

Generative Large Language Models (LLMs) have become the mainstream choice for fewshot and zeroshot learning thanks to the universality of text generation. Many users, however, do not need the broad capabilities of generative LLMs when they only want to automate a classification task. Smaller BERT-like models can also learn universal tasks, which allow them to do any text classification task without requiring fine-tuning (zeroshot classification) or to learn new tasks with only a few examples (fewshot), while being significantly more efficient than generative LLMs. This paper (1) explains how Natural Language Inference (NLI) can be used as a universal classification task that follows similar principles as instruction fine-tuning of generative LLMs, (2) provides a step-by-step guide with reusable Jupyter notebooks for building a universal classifier, and (3) shares the resulting universal classifier that is trained on 33 datasets with 389 diverse classes. Parts of the code we share has been used to train our older zeroshot classifiers that have been downloaded more than 55 million times via the Hugging Face Hub as of December 2023. Our new classifier improves zeroshot performance by 9.4%.

翻訳日:2024-01-02 10:16:41 公開日:2023-12-29

# 説明可能な二分分類のための距離誘導生成逆ネットワーク

Distance Guided Generative Adversarial Network for Explainable Binary Classifications ( http://arxiv.org/abs/2312.17538v1 )

ライセンス: Link先を確認

Xiangyu Xiong, Yue Sun, Xiaohong Liu, Wei Ke, Chan-Tong Lam, Jiangang Chen, Mingfeng Jiang, Mingwei Wang, Hui Xie, Tong Tong, Qinquan Gao, Hao Chen, Tao Tan

(参考訳) データ拡張の潜在的な利点はデータ不足を軽減することであるが、従来の拡張手法は主にドメイン内の知識に依存している。一方,gans (advanced generative adversarial networks) では,多種多様なドメイン間サンプルを生成する。これらの手法は二項分類における決定境界の記述に限定的な貢献をする。本稿では,超平面空間における生成サンプルの変動度を制御する距離誘導型GAN(DisGAN)を提案する。具体的には、2つの方法を組み合わせてDisGANのアイデアをインスタンス化する。第1の方法は垂直距離GAN(VerDisGAN)であり、ドメイン間の生成は垂直距離で条件付けられる。第2の方法は水平距離GAN(HorDisGAN)であり、ドメイン内生成は水平距離に条件付けられる。さらに、VerDisGANは、ソースイメージをハイパープレーンにマッピングすることで、クラス固有の領域を生成することができる。実験結果から, DisGAN は GAN に基づく拡張法よりも一貫した性能を示した。提案手法は異なる分類アーキテクチャに適用でき,マルチクラス分類に拡張できる可能性がある。

Despite the potential benefits of data augmentation for mitigating the data insufficiency, traditional augmentation methods primarily rely on the prior intra-domain knowledge. On the other hand, advanced generative adversarial networks (GANs) generate inter-domain samples with limited variety. These previous methods make limited contributions to describing the decision boundaries for binary classification. In this paper, we propose a distance guided GAN (DisGAN) which controls the variation degrees of generated samples in the hyperplane space. Specifically, we instantiate the idea of DisGAN by combining two ways. The first way is vertical distance GAN (VerDisGAN) where the inter-domain generation is conditioned on the vertical distances. The second way is horizontal distance GAN (HorDisGAN) where the intra-domain generation is conditioned on the horizontal distances. Furthermore, VerDisGAN can produce the class-specific regions by mapping the source images to the hyperplane. Experimental results show that DisGAN consistently outperforms the GAN-based augmentation methods with explainable binary classification. The proposed method can apply to different classification architectures and has potential to extend to multi-class classification.

翻訳日:2024-01-02 10:16:16 公開日:2023-12-29

# Olapa-MCoT:中国のLLMの数学的推論能力向上

Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of LLMs ( http://arxiv.org/abs/2312.17535v1 )

ライセンス: Link先を確認

Shaojie Zhu, Zhaobin Wang, Chengxiang Zhuo, Hui Lu, Bo Hu and Zang Li

(参考訳) CoT(Chain-of-Thought)は、LLMの推論問題を解決する方法である。近年,LLMのCoT性能向上に向けた研究が数多く行われている。本研究では,Lama2-13B PLMをベースとしたLLMであるOlapa-MCoTを提案する。アライメントトレーニング中に,オラパmcotの中国数学推論能力を最適化することを中心に,simrrhfアルゴリズムと不正確なデータ再学習を提案した。実験の結果、中国の数学的推論の正確さは、llama2-13bと比較して50%、36%上昇した。さらに、英語の推論能力の精度も4%近く向上した。

CoT (Chain-of-Thought) is a way to solve reasoning problems for LLMs . Recently, many researches appear for improving the CoT capability of LLMs. In this work, we also proposed Olapa-MCoT, which is a LLMs based on llama2-13B PLM for finetuning and alignment learning. During the alignment training, we proposed the SimRRHF algorithm and Incorrect Data Relearning and mainly focused on optimizing the Chinese mathematical reasoning ability of Olapa-MCoT. The experiment achieved significant results, with the accuracy of Chinese mathematical reasoning up to 50%, 36% rise compared to llama2-13B. In addition, the accuracy of English reasoning ability also increased by nearly 4%.

翻訳日:2024-01-02 10:15:59 公開日:2023-12-29

# 次元知覚による大規模言語モデルの量的推論能力の向上

Enhancing Quantitative Reasoning Skills of Large Language Models through Dimension Perception ( http://arxiv.org/abs/2312.17532v1 )

ライセンス: Link先を確認

Yuncheng Huang, Qianyu He, Jiaqing Liang, Sihang Jiang, Yanghua Xiao and Yunwen Chen

(参考訳) 量は、エンティティのマグニチュード特性を特徴づけるテキストの個別で重要な要素であり、特に推論タスクにおいて自然言語を理解するための正確な視点を提供する。近年、大言語モデル(llm)に基づく推論タスクの研究が盛んに行われており、そのほとんどは数値だけに焦点を当てており、その重要性にもかかわらず量と単位の次元概念を無視している。我々は、次元の概念は量を理解するのに不可欠であり、LLMが量的推論を行うのに非常に重要であると論じる。しかし、次元知識と量関連ベンチマークの欠如により、LLMの性能は低下した。そこで,我々は,次元知覚に基づく言語モデルの量的推論能力を高める枠組みを提案する。まず,この領域の知識ギャップに対処するため,次元単位知識ベース(DimUnitKB)を構築した。本研究では,llmの次元知覚スキルを探究し,向上させるために,3つのカテゴリからなる7つのタスクからなるベンチマークディメバルを提案する。本手法の有効性を評価するために,定量的推論タスクを提案し,実験を行う。その結果, GPT-4と比較して, 定量的推論の精度(43.55%～50.67%)が劇的に向上することがわかった。

Quantities are distinct and critical components of texts that characterize the magnitude properties of entities, providing a precise perspective for the understanding of natural language, especially for reasoning tasks. In recent years, there has been a flurry of research on reasoning tasks based on large language models (LLMs), most of which solely focus on numerical values, neglecting the dimensional concept of quantities with units despite its importance. We argue that the concept of dimension is essential for precisely understanding quantities and of great significance for LLMs to perform quantitative reasoning. However, the lack of dimension knowledge and quantity-related benchmarks has resulted in low performance of LLMs. Hence, we present a framework to enhance the quantitative reasoning ability of language models based on dimension perception. We first construct a dimensional unit knowledge base (DimUnitKB) to address the knowledge gap in this area. We propose a benchmark DimEval consisting of seven tasks of three categories to probe and enhance the dimension perception skills of LLMs. To evaluate the effectiveness of our methods, we propose a quantitative reasoning task and conduct experiments. The experimental results show that our dimension perception method dramatically improves accuracy (43.55%->50.67%) on quantitative reasoning tasks compared to GPT-4.

翻訳日:2024-01-02 10:15:47 公開日:2023-12-29

# RS-DGC:リモートセンシング画像解釈における動的勾配圧縮の周辺統計探査

RS-DGC: Exploring Neighborhood Statistics for Dynamic Gradient Compression on Remote Sensing Image Interpretation ( http://arxiv.org/abs/2312.17530v1 )

ライセンス: Link先を確認

Weiying Xie, Zixuan Wang, Jitao Ma, Daixun Li, Yunsong Li

(参考訳) 分散ディープラーニング(distributed deep learning)は、地球観測プログラムによって毎日生成されるオープンデータの量の増加によって引き起こされる課題のために、リモートセンシング(rs)アプリケーションで注目を集めている。しかし、複数のノード間でモデル更新を送信するための高い通信コストは、スケーラブルな分散学習にとって大きなボトルネックである。グラディエントスペーシフィケーションは、通信コストを削減し、トレーニング速度を加速する効果的な勾配圧縮(GC)技術として検証されている。現状のグラデーションスパーシフィケーション法は主に「より大きく、より重要」な基準に基づいており、パフォーマンスに影響を与えるために一般的に観察される小さなグラデーションの重要性を無視している。近傍情報からの多様体構造の情報表現に着想を得て,RS-DGC と呼ばれる近傍統計指標を用いた簡易かつ効果的な動的勾配圧縮手法を提案する。まず、勾配近傍を導入することで勾配間の相互依存性を高め、ランダムノイズの影響を低減する。 RS-DGCのキーコンポーネントはNSI(Neighborhood Statistical Indicator)であり、各ノード上の特定の近傍における勾配の重要性を定量化し、各イテレーションにおける勾配伝達前の局所勾配を分散させる。さらに, 各層の重要性変化をリアルタイムで追跡するために, 層幅動的圧縮方式を提案する。広範なダウンストリームタスクは,rs画像のインテリジェントな解釈という観点から,提案手法の優位性を検証する。例えば、VGG-19ネットワークを用いて、NWPU-RESISC45データセット上で50倍以上の通信圧縮を行い、0.51%の精度向上を実現した。

Distributed deep learning has recently been attracting more attention in remote sensing (RS) applications due to the challenges posed by the increased amount of open data that are produced daily by Earth observation programs. However, the high communication costs of sending model updates among multiple nodes are a significant bottleneck for scalable distributed learning. Gradient sparsification has been validated as an effective gradient compression (GC) technique for reducing communication costs and thus accelerating the training speed. Existing state-of-the-art gradient sparsification methods are mostly based on the "larger-absolute-more-important" criterion, ignoring the importance of small gradients, which is generally observed to affect the performance. Inspired by informative representation of manifold structures from neighborhood information, we propose a simple yet effective dynamic gradient compression scheme leveraging neighborhood statistics indicator for RS image interpretation, termed RS-DGC. We first enhance the interdependence between gradients by introducing the gradient neighborhood to reduce the effect of random noise. The key component of RS-DGC is a Neighborhood Statistical Indicator (NSI), which can quantify the importance of gradients within a specified neighborhood on each node to sparsify the local gradients before gradient transmission in each iteration. Further, a layer-wise dynamic compression scheme is proposed to track the importance changes of each layer in real time. Extensive downstream tasks validate the superiority of our method in terms of intelligent interpretation of RS images. For example, we achieve an accuracy improvement of 0.51% with more than 50 times communication compression on the NWPU-RESISC45 dataset using VGG-19 network.

翻訳日:2024-01-02 10:15:26 公開日:2023-12-29

# 画像超解像の初期訓練におけるノイズフリー最適化

Noise-free Optimization in Early Training Steps for Image Super-Resolution ( http://arxiv.org/abs/2312.17526v1 )

ライセンス: Link先を確認

MinKyu Lee, Jae-Pil Heo

(参考訳) 近年の深層学習に基づく単一画像超解像法(SISR)は,高分解能(HR)画像に対する画素幅の最小化により,ネットワークをトレーニングする典型的な手法である。しかし, 基本訓練方式が主流であるにもかかわらず, 不正な逆問題の文脈での利用については, 十分に検討されていない。本研究では,対象のHR画像から複数のHR画像に対する期待値である最適セントロイドと,HR画像とセントロイドの残差として定義される固有ノイズの2つのサブコンポーネントに分解することで,基礎となる構成成分をよりよく理解することを目的とする。以上の結果から,現在のトレーニング手法ではSISRの不正な性質を捉えられず,特に早期訓練の段階では固有のノイズ項に弱いことが示唆された。そこで本研究では,バニラ訓練の初期段階における固有雑音項を,最適な遠心率を推定し,直接的最適化を行うことで効果的に除去できる新しい最適化手法を提案する。実験の結果,提案手法はバニラ訓練の安定性を効果的に向上し,全体の性能向上につながることが示された。コードはgithub.com/2minkyulee/ECOで入手できる。

Recent deep-learning-based single image super-resolution (SISR) methods have shown impressive performance whereas typical methods train their networks by minimizing the pixel-wise distance with respect to a given high-resolution (HR) image. However, despite the basic training scheme being the predominant choice, its use in the context of ill-posed inverse problems has not been thoroughly investigated. In this work, we aim to provide a better comprehension of the underlying constituent by decomposing target HR images into two subcomponents: (1) the optimal centroid which is the expectation over multiple potential HR images, and (2) the inherent noise defined as the residual between the HR image and the centroid. Our findings show that the current training scheme cannot capture the ill-posed nature of SISR and becomes vulnerable to the inherent noise term, especially during early training steps. To tackle this issue, we propose a novel optimization method that can effectively remove the inherent noise term in the early steps of vanilla training by estimating the optimal centroid and directly optimizing toward the estimation. Experimental results show that the proposed method can effectively enhance the stability of vanilla training, leading to overall performance gain. Codes are available at github.com/2minkyulee/ECO.

翻訳日:2024-01-02 10:14:59 公開日:2023-12-29

# 強化学習アプローチによる近似計算手法の設計空間探索

Design Space Exploration of Approximate Computing Techniques with a Reinforcement Learning Approach ( http://arxiv.org/abs/2312.17525v1 )

ライセンス: Link先を確認

Sepide Saeedi, Alessandro Savino, Stefano Di Carlo

(参考訳) 近似コンピューティング(AxC)技術は、様々なアプリケーションのパフォーマンス向上の正確さのトレードオフにおいて、ますます人気が高まっている。あるアプリケーションに最適なaxcテクニックを選択するのは困難です。設計空間を探索するための提案手法のうち、強化学習(rl)のような機械学習アプローチは有望な結果を示している。本稿では,精度の低下とパワー,計算時間の削減を両立させるアプリケーションの近似バージョンを求めるために,rlを用いた多目的設計空間探索手法を提案する。実験の結果,いくつかのベンチマークにおいて,精度低下と消費電力減少と計算時間とのトレードオフが良好であった。

Approximate Computing (AxC) techniques have become increasingly popular in trading off accuracy for performance gains in various applications. Selecting the best AxC techniques for a given application is challenging. Among proposed approaches for exploring the design space, Machine Learning approaches such as Reinforcement Learning (RL) show promising results. In this paper, we proposed an RL-based multi-objective Design Space Exploration strategy to find the approximate versions of the application that balance accuracy degradation and power and computation time reduction. Our experimental results show a good trade-off between accuracy degradation and decreased power and computation time for some benchmarks.

翻訳日:2024-01-02 10:14:37 公開日:2023-12-29

# CHIP2023におけるPromptCBLUE共有タスクの概要

Overview of the PromptCBLUE Shared Task in CHIP2023 ( http://arxiv.org/abs/2312.17522v1 )

ライセンス: Link先を確認

Wei Zhu, Xiaoling Wang, Mosha Chen, Buzhou Tang

(参考訳) 本稿では,CHIP-2023会議におけるPromptCBLUE共有タスク(http://cips-chip.org.cn/2023/eval1)の概要を紹介する。この共有タスクはcblueベンチマークを改訂し、一般的な医学自然言語処理において、中国オープンドメインまたは医療ドメイン大規模言語モデル(llm)のための優れたテストベッドを提供する。 2つの異なる線がある。 (a)プロンプト・チューニング・トラック、LLMのマルチタスク・プロンプト・チューニングの調査 (b)オープンソースllmのコンテキスト内学習能力の検証。業界と学界の両方の多くのチームが共有タスクに参加し、トップチームは素晴らしいテスト結果を得た。本稿では,タスク,データセット,評価指標,および両タスクの上位システムについて述べる。最後に,参加チームによる様々なアプローチの評価手法と結果について概説する。

This paper presents an overview of the PromptCBLUE shared task (http://cips-chip.org.cn/2023/eval1) held in the CHIP-2023 Conference. This shared task reformualtes the CBLUE benchmark, and provide a good testbed for Chinese open-domain or medical-domain large language models (LLMs) in general medical natural language processing. Two different tracks are held: (a) prompt tuning track, investigating the multitask prompt tuning of LLMs, (b) probing the in-context learning capabilities of open-sourced LLMs. Many teams from both the industry and academia participated in the shared tasks, and the top teams achieved amazing test results. This paper describes the tasks, the datasets, evaluation metrics, and the top systems for both tasks. Finally, the paper summarizes the techniques and results of the evaluation of the various approaches explored by the participating teams.

翻訳日:2024-01-02 10:14:28 公開日:2023-12-29

# 悲観的二段階最適化による決定に焦点を当てた予測:計算的研究

Decision-focused predictions via pessimistic bilevel optimization: a computational study ( http://arxiv.org/abs/2312.17640v1 )

ライセンス: Link先を確認

V\'ictor Bucarey, Sophia Calder\'on, Gonzalo Mu\~noz, Frederic Semet

(参考訳) 最適化パラメータの不確実性に対処することは、重要かつ長年の課題である。通常、不確定パラメータを正確に予測し、決定論的最適化問題を解く。しかし、このいわゆる \emph{predict-then-Optimize} 手順による決定は、不確実なパラメータに非常に敏感である。本研究は,<emph{regret>尺度を最小化することを目的として構築された予測モデルを構築することを目的とした,<emph{decision</de>予測の作成における最近の取り組みに貢献する。我々は悲観的二レベル最適化モデルとして、正確に期待される後悔の最小化を定式化する。そして、双対性引数を用いて、これを非凸二次最適化問題として再構成する。最後に,トラクタビリティを実現するための様々な計算手法を示す。コストベクトルが不確実なショートパスの場合の計算結果を報告する。提案手法は, 意思決定型学習の最先端手法であるElmachtoub と Grigas (2022) のアプローチにより, トレーニング性能を向上させることができることを示す。

Dealing with uncertainty in optimization parameters is an important and longstanding challenge. Typically, uncertain parameters are predicted accurately, and then a deterministic optimization problem is solved. However, the decisions produced by this so-called \emph{predict-then-optimize} procedure can be highly sensitive to uncertain parameters. In this work, we contribute to recent efforts in producing \emph{decision-focused} predictions, i.e., to build predictive models that are constructed with the goal of minimizing a \emph{regret} measure on the decisions taken with them. We formulate the exact expected regret minimization as a pessimistic bilevel optimization model. Then, using duality arguments, we reformulate it as a non-convex quadratic optimization problem. Finally, we show various computational techniques to achieve tractability. We report extensive computational results on shortest-path instances with uncertain cost vectors. Our results indicate that our approach can improve training performance over the approach of Elmachtoub and Grigas (2022), a state-of-the-art method for decision-focused learning.

翻訳日:2024-01-02 09:52:50 公開日:2023-12-29

# マルチモーダルICUデータを用いた病院内死亡予測のためのXAI

XAI for In-hospital Mortality Prediction via Multimodal ICU Data ( http://arxiv.org/abs/2312.17624v1 )

ライセンス: Link先を確認

Xingqiao Li, Jindong Gu, Zhiyong Wang, Yancheng Yuan, Bo Du, and Fengxiang He

(参考訳) 集中治療室(ICU)患者の院内死亡予測は最終臨床結果の鍵となる。 AIは正確さに長けているが、説明責任の欠如に悩まされている。この問題に対処するために,マルチモーダルICUデータを用いた病院内死亡予測のための,効率的かつ説明可能なAIソリューションであるeXplainable Multimodal Mortality Predictor (X-MMP)を提案する。我々は,臨床データから異種入力を受け取り,意思決定が可能なマルチモーダル学習をフレームワークに採用する。さらに,lrp法のトランスへの適切な拡張として,マルチモーダル入力上での説明を生成し,予測に寄与する有意な特徴を明らかにした。さらに, 臨床結果に対する各モダリティの寄与を可視化し, 意思決定の背後にある理由を理解することを支援する。我々はMIMIC-IIIとMIMIC-III波形データベースマッチングサブセットに基づくマルチモーダルデータセットを構築した。ベンチマークデータセットに関する包括的実験は,提案手法が競合予測精度で合理的に解釈できることを実証する。特に、我々の枠組みは、医療研究において重要な要素の発見を容易にする他の臨床課題に容易に移行することができる。

Predicting in-hospital mortality for intensive care unit (ICU) patients is key to final clinical outcomes. AI has shown advantaged accuracy but suffers from the lack of explainability. To address this issue, this paper proposes an eXplainable Multimodal Mortality Predictor (X-MMP) approaching an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data. We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions. Furthermore, we introduce an explainable method, namely Layer-Wise Propagation to Transformer, as a proper extension of the LRP method to Transformers, producing explanations over multimodal inputs and revealing the salient features attributed to prediction. Moreover, the contribution of each modality to clinical outcomes can be visualized, assisting clinicians in understanding the reasoning behind decision-making. We construct a multimodal dataset based on MIMIC-III and MIMIC-III Waveform Database Matched Subset. Comprehensive experiments on benchmark datasets demonstrate that our proposed framework can achieve reasonable interpretation with competitive prediction accuracy. In particular, our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.

翻訳日:2024-01-02 09:52:33 公開日:2023-12-29

# 絡み合い幅に基づく絡み合い量子化器の下位境界

Lower Bounds of Entanglement Quantifiers Based On Entanglement Witnesses ( http://arxiv.org/abs/2312.17620v1 )

ライセンス: Link先を確認

Xian Shi

(参考訳) ある絡み合い測度の観点で二成分系の絡み合いを定量化することは、一般的には難しい問題であり、システムに関する情報が少ない場合にははるかに悪い。本書では,エンタングルメント基準の2つのクラスに基づいて,エンタングルメント測度,コンカージェンス,形成のエンタングルメント,幾何学的エンタングルメント測度の下限を求める方法を提案する。

To quantify the entanglement of bipartite systems in terms of some entanglement measure is a challenging problem in general, and it is much worse when the information about the system is less. In this manuscript, based on two classes of entanglement criteria, we present a method to obtain the lower bounds of the entanglement measures, concurrence, entanglement of formation, and geometrical entanglement measure.

翻訳日:2024-01-02 09:52:09 公開日:2023-12-29

# 生成情報抽出のための大規模言語モデル:調査

Large Language Models for Generative Information Extraction: A Survey ( http://arxiv.org/abs/2312.17617v1 )

ライセンス: Link先を確認

Derong Xu, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, Enhong Chen

(参考訳) 情報抽出(ie)は、自然言語テキストから構造的知識(エンティティ、関係、イベントなど)を抽出することを目的としている。近年,ジェネレーティブ・Large Language Models (LLM) はテキスト理解と生成において顕著な能力を示し,様々な領域やタスクをまたいだ一般化を実現している。その結果、LLMの能力を活用し、生成パラダイムに基づいたIEタスクに実行可能なソリューションを提供するための多くの研究が提案されている。そこで本研究では,IE タスクにおける LLM の取り組みを総合的に検討し,最近の進歩を調査する。まず,これらの課題を多種多様なIEサブタスクと学習パラダイムで分類し,先進的な手法を実証的に分析し,LLMによるIEタスクの出現傾向を明らかにする。徹底的なレビューに基づいて,今後の研究にふさわしい技術と有望な研究方向性について,いくつかの知見を見出している。パブリックリポジトリをメンテナンスし、関連するリソースを継続的に更新します。

Information extraction (IE) aims to extract structural knowledge (such as entities, relations, and events) from plain natural language texts. Recently, generative Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation, allowing for generalization across various domains and tasks. As a result, numerous works have been proposed to harness abilities of LLMs and offer viable solutions for IE tasks based on a generative paradigm. To conduct a comprehensive systematic review and exploration of LLM efforts for IE tasks, in this study, we survey the most recent advancements in this field. We first present an extensive overview by categorizing these works in terms of various IE subtasks and learning paradigms, then we empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs. Based on thorough review conducted, we identify several insights in technique and promising research directions that deserve further exploration in future studies. We maintain a public repository and consistently update related resources at: \url{https://github.com/quqxui/Awesome-LLM4IE-Papers}.

翻訳日:2024-01-02 09:52:00 公開日:2023-12-29

# グラフ畳み込みネットワークのワンショットマルチレートプルーニング

One-Shot Multi-Rate Pruning of Graph Convolutional Networks ( http://arxiv.org/abs/2312.17615v1 )

ライセンス: Link先を確認

Hichem Sahbi

(参考訳) 本稿では,マルチレート・マグニチュード・プルーニング(Multi-Rate Magnitude Pruning,MRMP)と呼ばれる,ネットワークトポロジと重みを併用した軽量なグラフ畳み込みネットワーク(GCN)の設計を提案する。本手法は,学習したネットワークの重み分布を事前分布と整合させることにより,変動し,進行する。一方で、任意の固定プルーニングレートを実装したり、設計した軽量gcnの一般化性能を向上させることができる。一方、MRMPは、重みを調整することなく、任意の目標プルーニング速度で正確なネットワークを推定するために、共有重みの上に複数のGCNを共同で訓練する。骨格に基づく認識の課題に対して行われた大規模な実験は、特に非常に高い刈取体制下で、我々の軽量GCNのかなりの増加を示している。

In this paper, we devise a novel lightweight Graph Convolutional Network (GCN) design dubbed as Multi-Rate Magnitude Pruning (MRMP) that jointly trains network topology and weights. Our method is variational and proceeds by aligning the weight distribution of the learned networks with an a priori distribution. In the one hand, this allows implementing any fixed pruning rate, and also enhancing the generalization performances of the designed lightweight GCNs. In the other hand, MRMP achieves a joint training of multiple GCNs, on top of shared weights, in order to extrapolate accurate networks at any targeted pruning rate without retraining their weights. Extensive experiments conducted on the challenging task of skeleton-based recognition show a substantial gain of our lightweight GCNs particularly at very high pruning regimes.

翻訳日:2024-01-02 09:51:40 公開日:2023-12-29

# 印刷多層パーセプトロンの増積と活性化のベスポーク近似

Bespoke Approximation of Multiplication-Accumulation and Activation Targeting Printed Multilayer Perceptrons ( http://arxiv.org/abs/2312.17612v1 )

ライセンス: Link先を確認

Florentia Afentaki, Gurol Saglam, Argyris Kokkinis, Kostas Siozios, Georgios Zervakis, Mehdi B Tahoori

(参考訳) Printed Electronics (PE) は、真のユビキタスコンピューティングを実現するための顕著な技術である、際立った特徴と特徴を特徴とする。これは、これまでコンピューティングの浸透が限られていた整合性および超低コストのソリューションを必要とするアプリケーションドメインに特に関係している。シリコンベースの技術とは異なり、peは非繰り返しのエンジニアリングコスト、超低製造コスト、コンフォーサル、フレキシブル、非毒性、伸縮可能なハードウェアのオンデマンド製造などの非並列的な機能を提供する。しかし、PEはその大きな特徴サイズのために一定の制限に直面しており、機械学習分類器のような複雑な回路の実現を妨げる。本研究では,近似計算の原理と(完全にカスタマイズされた)設計の原理を活用し,これらの制約に対処する。超低出力多層パーセプトロン(mlp)分類器の設計のための自動化フレームワークを提案する。これは初めて、mlpのニューロンの全ての機能を近似する包括的アプローチである、乗算、蓄積、活性化を用いる。各種のMLPを網羅的に評価することにより,最も複雑なMLPアーキテクチャであっても,バッテリ駆動による操作が可能であり,技術の現状を大きく上回っていることを示す。

Printed Electronics (PE) feature distinct and remarkable characteristics that make them a prominent technology for achieving true ubiquitous computing. This is particularly relevant in application domains that require conformal and ultra-low cost solutions, which have experienced limited penetration of computing until now. Unlike silicon-based technologies, PE offer unparalleled features such as non-recurring engineering costs, ultra-low manufacturing cost, and on-demand fabrication of conformal, flexible, non-toxic, and stretchable hardware. However, PE face certain limitations due to their large feature sizes, that impede the realization of complex circuits, such as machine learning classifiers. In this work, we address these limitations by leveraging the principles of Approximate Computing and Bespoke (fully-customized) design. We propose an automated framework for designing ultra-low power Multilayer Perceptron (MLP) classifiers which employs, for the first time, a holistic approach to approximate all functions of the MLP's neurons: multiplication, accumulation, and activation. Through comprehensive evaluation across various MLPs of varying size, our framework demonstrates the ability to enable battery-powered operation of even the most intricate MLP architecture examined, significantly surpassing the current state of the art.

翻訳日:2024-01-02 09:51:26 公開日:2023-12-29

# P2M2-Net: Part-Aware Prompt-Guided Multimodal Point Cloud Completion

P2M2-Net: Part-Aware Prompt-Guided Multimodal Point Cloud Completion ( http://arxiv.org/abs/2312.17611v1 )

ライセンス: Link先を確認

Linlian Jiang, Pan Chen, Ye Wang, Tieru Wu, Rui Ma

(参考訳) 厳重に遮蔽された点雲からの欠落領域の推測は非常に困難である。特に、幾何学や構造の詳細が豊富な3次元形状では、未知の部分の固有の曖昧さが存在する。既存のアプローチでは、1対1のマッピングを教師ありの方法で学習するか、生成モデルを訓練して3dポイントクラウド形状の完了のための欠失点を合成する。しかし、これらの方法は完了過程の制御性に欠けており、結果は決定論的か制御されていない多様性を示す。そこで我々は,プロンプト駆動型データ生成と編集に着想を得て,p2m2-netと呼ばれる新しいプロンプト誘導型ポイントクラウド補完フレームワークを提案する。入力部分点クラウドと、意味論や欠落領域の構造といった部分認識情報を記述するテキストプロンプトが与えられた場合、トランスフォーマーベースのコンプリートネットワークは、マルチモーダル特徴を効率的に融合させ、プロンプトガイダンスに従って様々な結果を生成することができる。我々は、新しい大規模PartNet-PromptデータセットでP2M2-Netをトレーニングし、2つの挑戦的な形状補完ベンチマークで広範な実験を行う。定量および定性的な結果は、より制御可能な部分認識点雲の完成と生成のためのプロンプトを組み込むことの有効性を示している。コードとデータはhttps://github.com/JLU-ICL/P2M2-Netで公開されている。

Inferring missing regions from severely occluded point clouds is highly challenging. Especially for 3D shapes with rich geometry and structure details, inherent ambiguities of the unknown parts are existing. Existing approaches either learn a one-to-one mapping in a supervised manner or train a generative model to synthesize the missing points for the completion of 3D point cloud shapes. These methods, however, lack the controllability for the completion process and the results are either deterministic or exhibiting uncontrolled diversity. Inspired by the prompt-driven data generation and editing, we propose a novel prompt-guided point cloud completion framework, coined P2M2-Net, to enable more controllable and more diverse shape completion. Given an input partial point cloud and a text prompt describing the part-aware information such as semantics and structure of the missing region, our Transformer-based completion network can efficiently fuse the multimodal features and generate diverse results following the prompt guidance. We train the P2M2-Net on a new large-scale PartNet-Prompt dataset and conduct extensive experiments on two challenging shape completion benchmarks. Quantitative and qualitative results show the efficacy of incorporating prompts for more controllable part-aware point cloud completion and generation. Code and data are available at https://github.com/JLU-ICL/P2M2-Net.

翻訳日:2024-01-02 09:51:03 公開日:2023-12-29

# アクチュエータ劣化シナリオにおける四足ロボットの適応制御戦略

Adaptive Control Strategy for Quadruped Robots in Actuator Degradation Scenarios ( http://arxiv.org/abs/2312.17606v1 )

ライセンス: Link先を確認

Xinyuan Wu, Wentao Dong, Hang Lai, Yong Yu and Ying Wen

(参考訳) 四足歩行ロボットは極端な環境に強い適応性を持つが、欠点を経験することもある。これらの障害が発生したら、ロボットはタスクに戻る前に修理されなければならない。これらの欠点の1つがアクチュエータ劣化であり、デバイス老化や予期せぬ運用イベントなどの要因に起因する。伝統的に、この問題に対処するには複雑なフォールトトレラント設計に大きく依存している。学習に基づくアプローチは、これらの制限を緩和する効果的な方法を提供するが、現実世界の四足ロボットにそのような方法を効果的に配置する研究上のギャップが存在する。本稿では,Actuator Degradation Adaptation Transformer (ADAPT) という,強化学習に根ざした先駆的な教師学習フレームワークについて紹介する。このフレームワークは統合された制御戦略を生み出し、ロボットは内部センサーにのみ依存しながら、突然の関節アクチュエータ障害にもかかわらず、移動を維持およびタスクを実行することができる。 unitree a1プラットフォームにおける経験的評価は、実世界の四足ロボットへの適応の展開可能性と有効性を検証し、このアプローチの堅牢性と実用性を確認する。

Quadruped robots have strong adaptability to extreme environments but may also experience faults. Once these faults occur, robots must be repaired before returning to the task, reducing their practical feasibility. One prevalent concern among these faults is actuator degradation, stemming from factors like device aging or unexpected operational events. Traditionally, addressing this problem has relied heavily on intricate fault-tolerant design, which demands deep domain expertise from developers and lacks generalizability. Learning-based approaches offer effective ways to mitigate these limitations, but a research gap exists in effectively deploying such methods on real-world quadruped robots. This paper introduces a pioneering teacher-student framework rooted in reinforcement learning, named Actuator Degradation Adaptation Transformer (ADAPT), aimed at addressing this research gap. This framework produces a unified control strategy, enabling the robot to sustain its locomotion and perform tasks despite sudden joint actuator faults, relying exclusively on its internal sensors. Empirical evaluations on the Unitree A1 platform validate the deployability and effectiveness of Adapt on real-world quadruped robots, and affirm the robustness and practicality of our approach.

翻訳日:2024-01-02 09:50:37 公開日:2023-12-29

# 物体中心の運動制約の抽象化を用いた統合タスクと運動計画

Unified Task and Motion Planning using Object-centric Abstractions of Motion Constraints ( http://arxiv.org/abs/2312.17605v1 )

ライセンス: Link先を確認

Alejandro Agostini, Justus Piater

(参考訳) タスク・アンド・モーション・プランニング(tamp)では、タスク計画法で使用される抽象記述の曖昧さと過小決定は、タスクを成功させるために必要な物理的制約を特徴付けるのを困難にしている。通常のアプローチは、タスク計画レベルでそのような制約を見落とし、実現不可能な動作、計画修正、そして実現可能な解決策が見つかるまで再計画を行う、高価な準記号幾何学的推論手法を実装することである。本稿では,タスクとモーションプランニングを一つのヒューリスティック検索に統一するTAMP手法を提案する。提案手法は,既成のAIヒューリスティックサーチの計算効率を活用し,物理的に実現可能な計画を実現するための,オブジェクト中心の動作制約の抽象化に基づく。これらの計画は、集中的なサブシンボリックな幾何学的推論を必要とせずに、タスク実行のためのオブジェクトやモーションパラメータに直接変換することができる。

In task and motion planning (TAMP), the ambiguity and underdetermination of abstract descriptions used by task planning methods make it difficult to characterize physical constraints needed to successfully execute a task. The usual approach is to overlook such constraints at task planning level and to implement expensive sub-symbolic geometric reasoning techniques that perform multiple calls on unfeasible actions, plan corrections, and re-planning until a feasible solution is found. We propose an alternative TAMP approach that unifies task and motion planning into a single heuristic search. Our approach is based on an object-centric abstraction of motion constraints that permits leveraging the computational efficiency of off-the-shelf AI heuristic search to yield physically feasible plans. These plans can be directly transformed into object and motion parameters for task execution without the need of intensive sub-symbolic geometric reasoning.

翻訳日:2024-01-02 09:50:17 公開日:2023-12-29

# 量子グレードナノダイアモンドによる生体細胞の超垂直スピン検出

Quantum-grade nanodiamonds for ultrabright spin detection in live cells ( http://arxiv.org/abs/2312.17603v1 )

ライセンス: Link先を確認

Keisuke Oshimi, Hiromu Nakashima, Sara Mandi\'c, Hina Kobayashi, Minori Teramoto, Hirokazu Tsuji, Yoshiki Nishibayashi, Yutaka Shikano, Toshu An, and Masazumi Fujiwara

(参考訳) 光アクセス可能なスピン活性ナノ材料は、生物サンプルを探索するための量子ナノセンサーとして有望である。しかし、これらの材料に対するバイオイメージングレベルの明るさと高品質なスピン特性を達成することは困難であり、量子バイオセンシングへの応用を妨げる。ここでは、スピンレス12C-炭素同位体の濃縮と置換窒素スピン不純物低減によるスピン環境工学により、0.6-1.3ppm窒素空孔(NV)中心を含む超明るい蛍光ナノダイヤモンド(NDs)を実証する。培養細胞に容易に導入されたNDは、かなり狭く光学的に検出された磁気共鳴(ODMR)スペクトルを示し、従来のIb型NDに匹敵するODMR深度を与えるために16倍のマイクロ波励起電力を必要とした。 T1 = 0.68 ms と T2 = 1.6 us (1.6 ms と 2.7 us max) の平均スピン緩和時間は、それぞれタイプIbよりも5倍と10倍長い。本研究で得られたバルク状nvスピン特性と明るい蛍光は,生体用nd系量子センサの感度を著しく向上させた。

Optically accessible spin-active nanomaterials are promising as quantum nanosensors for probing biological samples. However, achieving bioimaging-level brightness and high-quality spin properties for these materials is challenging and hinders their application in quantum biosensing. Here, we demonstrate ultrabright fluorescent nanodiamonds (NDs) containing 0.6-1.3-ppm nitrogen-vacancy (NV) centers by spin-environment engineering via enriching spin-less 12C-carbon isotopes and reducing substitutional nitrogen spin impurities. The NDs, readily introduced into cultured cells, exhibited substantially narrow optically detected magnetic resonance (ODMR) spectra, requiring 16-times less microwave excitation power to give an ODMR depth comparable to that of conventional type-Ib NDs. They show average spin-relaxation times of T1 = 0.68 ms and T2 = 1.6 us (1.6 ms and 2.7 us maximum) that were 5- and 10-fold longer than those of type-Ib, respectively. The bulk-like NV spin properties and bright fluorescence demonstrated in this study significantly improve the sensitivity of ND-based quantum sensors for biological applications.

翻訳日:2024-01-02 09:49:59 公開日:2023-12-29

# タスク指向llmシステムの設計における可能性の専制性:スコーピング調査

The Tyranny of Possibilities in the Design of Task-Oriented LLM Systems: A Scoping Survey ( http://arxiv.org/abs/2312.17601v1 )

ライセンス: Link先を確認

Dhruv Dhamani and Mary Lou Maher

(参考訳) 本調査は,タスク指向LLMシステムの設計空間の現在の理解に焦点を当て,利用可能な設計パラメータの定義と関係について詳述する。本論文は、タスク指向のLLMシステムを定義し、複雑なソフトウェア開発タスクにおける多様なLLMシステム構成(単一LLM、単一LLMエージェント、複数のLLMエージェントシステムを含む)の性能を考察し、その結果を仮説化する思考実験を通して、そのようなシステムの設計空間を探求することから始まる。結果のパターンを議論し,それを3つの予想に定式化する。これらの予想は一部は誤った仮定に基づいているかもしれないが、将来の研究の出発点となる。次に,LLM増補研究の包括・組織化,技術推進,不確実性評価など,いくつかの設計パラメータについて検討した。本稿は,これらの分野の研究評価において,計算とエネルギー効率に重点が置かれていないことを指摘する。本研究は,プロンプト手法をマルチエージェントシステムと見なすことのできるレンズを提供するプロンプト手法のエージェント中心の投影を可能にするために,リニアおよび非線形コンテキストの概念を開発するための基礎を提供する。本稿では、llmプロンシングとllmベースのマルチエージェントシステム間の研究のクロスポーリン化、および既存のプロンシング技術に基づく合成トレーニングデータの生成における、このレンズの意義について述べる。いずれにせよ、スコーピング調査は将来の研究の指針となる7つの予想を提示している。

This scoping survey focuses on our current understanding of the design space for task-oriented LLM systems and elaborates on definitions and relationships among the available design parameters. The paper begins by defining a minimal task-oriented LLM system and exploring the design space of such systems through a thought experiment contemplating the performance of diverse LLM system configurations (involving single LLMs, single LLM-based agents, and multiple LLM-based agent systems) on a complex software development task and hypothesizes the results. We discuss a pattern in our results and formulate them into three conjectures. While these conjectures may be partly based on faulty assumptions, they provide a starting point for future research. The paper then surveys a select few design parameters: covering and organizing research in LLM augmentation, prompting techniques, and uncertainty estimation, and discussing their significance. The paper notes the lack of focus on computational and energy efficiency in evaluating research in these areas. Our survey findings provide a basis for developing the concept of linear and non-linear contexts, which we define and use to enable an agent-centric projection of prompting techniques providing a lens through which prompting techniques can be viewed as multi-agent systems. The paper discusses the implications of this lens, for the cross-pollination of research between LLM prompting and LLM-based multi-agent systems; and also, for the generation of synthetic training data based on existing prompting techniques in research. In all, the scoping survey presents seven conjectures that can help guide future research efforts.

翻訳日:2024-01-02 09:49:35 公開日:2023-12-29

# 製造業における環境サステナビリティへの拡張現実の応用と可能性を探る

Exploring the current applications and potential of extended reality for environmental sustainability in manufacturing ( http://arxiv.org/abs/2312.17595v1 )

ライセンス: Link先を確認

Huizhong Cao, Henrik S\"oderlund, M\'elanie Derspeisse and Bj\"orn Johansson

(参考訳) 産業5.0への転換に対応して、デジタルツールの新たな応用とともに、環境持続性を優先する製造システムを求める声が高まっている。拡張現実(VR)、拡張現実(AR)、MR(Mixed Reality)を含む拡張現実(XR)は、産業用5.0のイネーブラーとして認識されている技術の1つである。 XRは、より持続可能な製造の原動力となる可能性があるが、その潜在的な環境効果は、あまり注目されていない。本稿では,環境サステナビリティの原則に関連するXR技術分野における現在の製造応用と研究について考察する。本稿の目的は,(1)現在検討中の文献・研究におけるxr技術の活用事例を特定し,製造における環境持続可能性に取り組むこと,(2)産業や企業に対して,環境持続的製造においてxrを実施するためのユースケース,ツールボックス,方法論,ワークフローを提供すること,の2点である。国立標準技術研究所(nist)によって開発された持続可能性指標の分類に基づき、著者らは実用的xrのユースケースの基準を用いて現在の文献を分析し、マッピングした。この調査は、環境の持続可能性を高める可能性がある製造におけるXR技術の現在の応用とユースケースをマッピングした。研究成果は文献に言及したユースケースとして提示され、将来の研究者や産業における実装のガイダンスやインスピレーションとして、環境サステナビリティのドライバとしてXRを用いている。さらに, 環境サステナビリティのドライバとしてのXRの関心を高めるため, 今後の研究と研究の議論を開いている。

In response to the transformation towards Industry 5.0, there is a growing call for manufacturing systems that prioritize environmental sustainability, alongside the emerging application of digital tools. Extended Reality (XR) - including Virtual Reality (VR), Augmented Reality (AR) and Mixed Reality (MR) - is one of the technologies identified as an enabler for Industry 5.0. XR could potentially also be a driver for more sustainable manufacturing: however, its potential environmental benefits have received limited attention. This paper aims to explore the current manufacturing applications and research within the field of XR technology connected to the environmental sustainability principle. The objectives of this paper are two-fold: (1) Identify the currently explored use cases of XR technology in literature and research, addressing environmental sustainability in manufacturing; (2) Provide guidance and references for industry and companies to use cases, toolboxes, methodologies, and workflows for implementing XR in environmental sustainable manufacturing practices. Based on the categorization of sustainability indicators, developed by the National Institute of Standards and Technology (NIST), the authors analyzed and mapped the current literature, with criteria of pragmatic XR use cases for manufacturing. The exploration resulted in a mapping of the current applications and use cases of XR technology within manufacturing that has the potential to drive environmental sustainability. The results are presented as stated use-cases with reference to the literature, contributing as guidance and inspiration for future researchers or implementations in industry, using XR as a driver for environmental sustainability. Furthermore, the authors open up the discussion for future work and research to increase the attention of XR as a driver for environmental sustainability.

翻訳日:2024-01-02 09:49:08 公開日:2023-12-29

# 効率的なvr製造のためのvrインタラクション:マルチユーザーvrナビゲーションプラットフォームのためのミニマップ

VR interaction for efficient virtual manufacturing: mini map for multi-user VR navigation platform ( http://arxiv.org/abs/2312.17593v1 )

ライセンス: Link先を確認

Huizhong Cao, Henrik S\"oderlund, M\'elanie Despeisse, Francisco Garcia Rivera, and Bj\"orn Johansson

(参考訳) 過去10年間で、製造業におけるvrアプリケーションの価値とポテンシャルは、業界4.0以降の増加に伴い、大きな注目を集めている。レイアウト計画、仮想設計レビュー、オペレータトレーニングにおける効果は、これまでの研究で十分に確立されている。しかし、製造におけるVRの多くの機能要件と相互作用パラメータはあいまいに定義されている。探索を待っている分野は空間認識と学習であり、仮想製造システム内のナビゲーションを理解し空間データを処理するのに不可欠である。これは、仮想空間における参加者の空間意識が会議やデザインレビューの効率に大きく影響するマルチユーザーvrアプリケーションにおいて特に重要である。本稿では,仮想ファクトリーレイアウト計画のためのインタラクティブな位置決めマップに着目し,ナビゲーション支援としてのディジタルマップのユーザインタラクション設計について検討する。 VRゲーム産業から頻繁に使われる技術やインタラクティブマップを確立するために文献研究が行われた。複数のインタラクティブマップのデモは、Unityゲームエンジンを使用してVRマルチユーザープラットフォームに実装された包括的なA/Bテストを提供する。インタラクティブマップの5つの異なるプロトタイプが20人の参加者と40の検証済みデータストリームによってテスト、評価、評価された。その結果,インタラクティブマップの最も効率的なインタラクション設計について解析し,考察した。

Over the past decade, the value and potential of VR applications in manufacturing have gained significant attention in accordance with the rise of Industry 4.0 and beyond. Its efficacy in layout planning, virtual design reviews, and operator training has been well-established in previous studies. However, many functional requirements and interaction parameters of VR for manufacturing remain ambiguously defined. One area awaiting exploration is spatial recognition and learning, crucial for understanding navigation within the virtual manufacturing system and processing spatial data. This is particularly vital in multi-user VR applications where participants' spatial awareness in the virtual realm significantly influences the efficiency of meetings and design reviews. This paper investigates the interaction parameters of multi-user VR, focusing on interactive positioning maps for virtual factory layout planning and exploring the user interaction design of digital maps as navigation aid. A literature study was conducted in order to establish frequently used technics and interactive maps from the VR gaming industry. Multiple demonstrators of different interactive maps provide a comprehensive A/B test which were implemented into a VR multi-user platform using the Unity game engine. Five different prototypes of interactive maps were tested, evaluated and graded by the 20 participants and 40 validated data streams collected. The most efficient interaction design of interactive maps is thus analyzed and discussed in the study.

翻訳日:2024-01-02 09:48:38 公開日:2023-12-29

# ロバスト性向上と説明指導によるテキスト分類のための忠実な説明

Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training ( http://arxiv.org/abs/2312.17591v1 )

ライセンス: Link先を確認

Dongfang Li, Baotian Hu, Qingcai Chen, Shan He

(参考訳) 特徴属性法は、信頼できるAIに向けたディープニューラルネットワークに広く適用されてきたモデル予測の説明として重要な入力トークンを強調する。しかし、近年の研究では、これらの手法による説明は忠実で堅牢であるという課題に直面している。本稿では,テキスト分類のためのより忠実な説明(regex)に向けたロバスト性向上と説明指導トレーニングを提案する。まず,入力勾配正規化手法と仮想対角トレーニングによりモデルロバスト性を改善する。第二に、ノイズの多いトークンをマスクし、モデル注意と特徴属性の類似性を最大化し、外部情報をインポートすることなく自己学習の手順と見なすことができる。我々は,5つの帰属手法による6つのデータセットに対する広範な実験を行い,ドメイン外設定の忠実さを評価する。その結果、REGEXは全ての設定における説明の忠実度を向上し、さらに2つのランダム化テストに基づいて一貫したゲインを得ることがわかった。さらに,REGEXが生成したハイライト説明を用いて,選択列予測モデルをトレーニングすることにより,タスク性能をエンドツーエンド手法に匹敵することを示す。

Feature attribution methods highlight the important input tokens as explanations to model predictions, which have been widely applied to deep neural networks towards trustworthy AI. However, recent works show that explanations provided by these methods face challenges of being faithful and robust. In this paper, we propose a method with Robustness improvement and Explanation Guided training towards more faithful EXplanations (REGEX) for text classification. First, we improve model robustness by input gradient regularization technique and virtual adversarial training. Secondly, we use salient ranking to mask noisy tokens and maximize the similarity between model attention and feature attribution, which can be seen as a self-training procedure without importing other external information. We conduct extensive experiments on six datasets with five attribution methods, and also evaluate the faithfulness in the out-of-domain setting. The results show that REGEX improves fidelity metrics of explanations in all settings and further achieves consistent gains based on two randomization tests. Moreover, we show that using highlight explanations produced by REGEX to train select-then-predict models results in comparable task performance to the end-to-end method.

翻訳日:2024-01-02 09:48:18 公開日:2023-12-29

# 対話型進化アルゴリズムを用いたシェーダの手続き生成ツール

A Tool for the Procedural Generation of Shaders using Interactive Evolutionary Algorithms ( http://arxiv.org/abs/2312.17587v1 )

ライセンス: Link先を確認

Elio Sasso, Daniele Loiacono, Pier Luca Lanzi

(参考訳) 本稿では,ゲーム開発のための商用ツールUnityエディタと統合されたインタラクティブな進化的アルゴリズムを用いて,シェーダの設計空間を探索するツールを提案する。我々のフレームワークは、最近のシェーダエディタの基盤となるグラフベースの表現とインタラクティブな進化を活用し、デザイナが既存のシェーダからいくつかのビジュアルオプションを探索できるようにする。我々のフレームワークは、現在のシェーダーのグラフ表現を染色体としてエンコードし、シェーダー個体群の進化を誘導する。グラフベースの組換えと突然変異をヒューリスティックに応用し、実現可能なシェーダーを作成する。このフレームワークはUnityエディタの拡張であり、進化計算(およびシェーダープログラミング)の知識がほとんどないデザイナは、ゲームシーンの作業に使用するのと同じビジュアルインターフェースを使用して、基盤となる進化エンジンと対話することができる。

We present a tool for exploring the design space of shaders using an interactive evolutionary algorithm integrated with the Unity editor, a well-known commercial tool for video game development. Our framework leverages the underlying graph-based representation of recent shader editors and interactive evolution to allow designers to explore several visual options starting from an existing shader. Our framework encodes the graph representation of a current shader as a chromosome used to seed the evolution of a shader population. It applies graph-based recombination and mutation with a set of heuristics to create feasible shaders. The framework is an extension of the Unity editor; thus, designers with little knowledge of evolutionary computation (and shader programming) can interact with the underlying evolutionary engine using the same visual interface used for working on game scenes.

翻訳日:2024-01-02 09:47:57 公開日:2023-12-29

# ファジィドライバ生成のためのプロンプトファジィ

Prompt Fuzzing for Fuzz Driver Generation ( http://arxiv.org/abs/2312.17677v1 )

ライセンス: Link先を確認

Yunlong Lyu, Yuxuan Xie, Peng Chen and Hao Chen

(参考訳) 高品質なファズドライバを書くのは時間がかかり、ライブラリを深く理解する必要がある。しかし、最先端の自動ファズドライバ生成技術の性能は、多くの課題を残している。消費者コードから学習されたファズドライバは、深い状態に到達できるが、外部入力に制限される。一方、解釈ファジィはほとんどのAPIを探索できるが、膨大な検索空間において多くの試みが必要である。 PromptFuzzは,未知のライブラリコードを探索するためにファズドライバを反復的に生成するファズ処理を行う。ファジィファジィ処理におけるファジィドライバのAPI使用法を検討するために,命令型プログラム生成,誤プログラムのサニタイズ,カバレッジ誘導型プロンプト突然変異,制約付きファジィザ融合など,いくつかの重要な手法を提案する。 PromptFuzzを実装し,OSS-Fuzzと最先端のファズドライバ生成ソリューション(ホッパー)を比較し,実世界の14のライブラリ上での有効性を評価した。実験の結果, PromptFuzz が生成したファズドライバはOSS-Fuzz の 1.61 倍,Hopper の 1.67 倍のブランチカバレッジを実現していることがわかった。さらに、promptenfuzzによって生成されたfuzzドライバは、以前不明だった44件のクラッシュのうち33件の真のバグを検知し、それぞれのコミュニティによって27件のバグが確認された。

Writing high-quality fuzz drivers is time-consuming and requires a deep understanding of the library. However, the performance of the state-of-the-art automatic fuzz driver generation techniques leaves a lot to be desired. Fuzz drivers, which are learned from consumer code, can reach deep states but are restricted to their external inputs. On the other hand, interpretative fuzzing can explore most APIs but requires numerous attempts in a vast search space. We propose PromptFuzz, a coverage-guided fuzzer for prompt fuzzing that iteratively generates fuzz drivers to explore undiscovered library code. To explore API usage in fuzz drivers during prompt fuzzing, we proposed several key techniques: instructive program generation, erroneous program sanitization, coverage-guided prompt mutation, and constrained fuzzer fusion. We implemented PromptFuzz and evaluated its effectiveness on 14 real-world libraries, comparing it against OSS-Fuzz and the state-of-the-art fuzz driver generation solution (i.e., Hopper). The experiment results demonstrate that the fuzz drivers generated by PromptFuzz achieve higher branch coverage that is 1.61 times greater than that of OSS-Fuzz and 1.67 times greater than that of Hopper. In addition, the fuzz drivers generated by PromptFuzz successfully detect 33 true bugs out of a total of 44 crashes, which were previously unknown, and 27 of these bugs have been confirmed by the respective communities.

翻訳日:2024-01-02 09:24:20 公開日:2023-12-29

# Jatmo: タスク特化ファインタニングによるプロンプトインジェクション防御

Jatmo: Prompt Injection Defense by Task-Specific Finetuning ( http://arxiv.org/abs/2312.17673v1 )

ライセンス: Link先を確認

Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, and David Wagner

(参考訳) 大きな言語モデル(LLM)は、命令追従能力によって大きな研究の注目を集めており、ユーザや開発者は様々なタスクにLLMを利用することができる。しかし、LSMはプロンプトインジェクション攻撃に弱い:モデルの命令追従能力をハイジャックする攻撃のクラスで、望ましくない、おそらく悪質な攻撃に対して応答を変更する。本稿では,プロンプトインジェクション攻撃にレジリエントなタスク固有モデルを生成する方法であるjatmoを紹介する。 Jatmo は LLM が命令チューニングを受けたときのみ命令に従うことができるという事実を活用している。教師がチューニングしたモデルを使用してタスク固有のデータセットを生成し、ベースモデルを微調整する(非インストラクションチューニングされたモデル)。 Jatmoはタスクプロンプトとタスクの入力のデータセットのみを必要とし、教師モデルを使用して出力を生成する。既存のデータセットが存在しない状況では、Jatmoは単一の例、場合によってはまったく使用せず、完全な合成データセットを生成することができる。 6つのタスクに対する実験の結果,ジャトモモデルでは標準LLMと同じ品質の出力が得られる一方で,インジェクションの応答性も高いことがわかった。 GPT-3.5-Turboに対する90%以上の成功率に対して、最良の攻撃は、我々のモデルに対する0.5%未満のケースで成功した。 Jatmoはhttps://github.com/wagner-group/prompt-injection-defense.comでリリースしています。

Large Language Models (LLMs) are attracting significant research attention due to their instruction-following abilities, allowing users and developers to leverage LLMs for a variety of tasks. However, LLMs are vulnerable to prompt-injection attacks: a class of attacks that hijack the model's instruction-following abilities, changing responses to prompts to undesired, possibly malicious ones. In this work, we introduce Jatmo, a method for generating task-specific models resilient to prompt-injection attacks. Jatmo leverages the fact that LLMs can only follow instructions once they have undergone instruction tuning. It harnesses a teacher instruction-tuned model to generate a task-specific dataset, which is then used to fine-tune a base model (i.e., a non-instruction-tuned model). Jatmo only needs a task prompt and a dataset of inputs for the task: it uses the teacher model to generate outputs. For situations with no pre-existing datasets, Jatmo can use a single example, or in some cases none at all, to produce a fully synthetic dataset. Our experiments on six tasks show that Jatmo models provide the same quality of outputs on their specific task as standard LLMs, while being resilient to prompt injections. The best attacks succeeded in less than 0.5% of cases against our models, versus over 90% success rate against GPT-3.5-Turbo. We release Jatmo at https://github.com/wagner-group/prompt-injection-defense.

翻訳日:2024-01-02 09:23:53 公開日:2023-12-29

# 非相互作用電子の格子リングにおける測定誘起クロック

Measurement-induced Clock in a Lattice Ring of Non-interacting Electrons ( http://arxiv.org/abs/2312.17672v1 )

ライセンス: Link先を確認

David S. Schlegel, Stefan Kehrein

(参考訳) 本研究では, 外部駆動を伴わない非相互作用定常量子系における周期性の出現について検討した。具体的には、弱い局所位置測定を行う非相互作用電子の格子環を考える。本研究では, 定常二時間相関関数の周期構造を解析し, 系の群速度と周期性の関係を明らかにする。本研究は、非平衡定常状態の2時間相関器における周期的挙動を強調し、最小相互作用量子系における周期的現象の理解に寄与する測定誘起クロック機構を示す。

We examine the emergence of periodicity in a non-interacting steady-state quantum system without external drive inspired by quantum time crystals' spontaneous time-translation symmetry breaking. Specifically, we consider a lattice ring of non-interacting electrons undergoing weak local position measurements. Our analysis uncovers time-periodic structures in steady-state two-time correlation functions, with periodicity linked to the system's group velocity. This study demonstrates a measurement-induced clock mechanism, highlighting periodic behaviors in two-time correlators of a non-equilibrium steady state, contributing to understanding time-periodic phenomena in minimally interactive quantum systems.

翻訳日:2024-01-02 09:23:07 公開日:2023-12-29

# topcowチャレンジによる牛のベンチマーク--ctaとmraのためのウィリス円のトポロジー認識解剖学的セグメンテーション

Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA ( http://arxiv.org/abs/2312.17670v1 )

ライセンス: Link先を確認

Kaiyuan Yang, Fabio Musio, Yihui Ma, Norman Juchler, Johannes C. Paetzold, Rami Al-Maskari, Luciano H\"oher, Hongwei Bran Li, Ibrahim Ethem Hamamci, Anjany Sekuboyina, Suprosanna Shit, Houjing Huang, Diana Waldmannstetter, Florian Kofler, Fernando Navarro, Martin Menten, Ivan Ezhov, Daniel Rueckert, Iris Vos, Ynte Ruigrok, Birgitta Velthuis, Hugo Kuijf, Julien H\"ammerli, Catherine Wurster, Philippe Bijlenga, Laura Westphal, Jeroen Bisschop, Elisa Colombo, Hakim Baazaoui, Andrew Makmur, James Hallinan, Bene Wiestler, Jan S. Kirschke, Roland Wiest, Emmanuel Montagnon, Laurent Letourneau-Guillon, Adrian Galdran, Francesco Galati, Daniele Falcetta, Maria A. Zuluaga, Chaolong Lin, Haoran Zhao, Zehan Zhang, Sinyoung Ra, Jongyun Hwang, Hyunjin Park, Junqiang Chen, Marek Wodzinski, Henning M\"uller, Pengcheng Shi, Wei Liu, Ting Ma, Cansu Yal\c{c}in, Rachika E. Hamadache, Joaquim Salvi, Xavier Llado, Uma Maria Lal-Trehan Estrada, Valeriia Abramova, Luca Giancardo, Arnau Oliver, Jialu Liu, Haibin Huang, Yue Cui, Zehang Lin, Yusheng Liu, Shunzhi Zhu, Tatsat R. Patel, Vincent M. Tutino, Maysam Orouskhani, Huayu Wang, Mahmud Mossa-Basha, Chengcheng Zhu, Maximilian R. Rokuss, Yannick Kirchhoff, Nico Disch, Julius Holzschuh, Fabian Isensee, Klaus Maier-Hein, Yuki Sato, Sven Hirsch, Susanne Wegener, Bjoern Menze

(参考訳) ウィリス循環(英: Circle of Willis、CoW)は、脳の主要な循環を繋ぐ重要な動脈網である。その血管構造は、重度の神経血管疾患のリスク、重症度、および臨床結果に影響すると考えられている。しかし、高度に可変なCoW解剖を特徴付けることは、まだ手作業であり、時間を要する専門家のタスクである。 CoWは通常、磁気共鳴血管造影(MRA)とCTアンギオグラフィ(CTA)の2つのアンギオグラフィーによって画像化されるが、CoW解剖学、特にCTAのアノテーションを付加したパブリックデータセットは限られている。そこで2023年,注釈付きCoWデータセットの公開とともにTopCoW Challengeを組織し,CoWセグメンテーションタスクを世界中に招待し,4大陸から140人以上の登録参加者を集めた。 TopCoWデータセットは、仮想現実性(VR)技術によって実現された、CoWの13のコンテナコンポーネントに対するボクセルレベルのアノテーションを備えた最初のパブリックデータセットである。また、同じ患者のMRAとCTAをペアにした最初のデータセットでもある。 TopCoWの課題は、トポロジカルメトリクスを重視したマルチクラス解剖学的セグメンテーションタスクとして、CoWの特性問題に取り組むことであった。トップパフォーマンスのチームは、多くの牛の成分を割って約90%のスコアを得ることができたが、低いスコアで動脈や希少な変種を伝えることができた。また、高いサイコロスコアの予測には位相的誤りがあった。追加のトポロジ解析により、特定のCoW成分の検出とCoW変異体のトポロジの整合性の向上が示された。 topcowは、mraとctaの牛解剖学的セグメンテーションタスクを形態学的および位相的にベンチマークする最初の試みであった。

The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neuro-vascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two angiographic imaging modalities, magnetic resonance angiography (MRA) and computed tomography angiography (CTA), but there exist limited public datasets with annotations on CoW anatomy, especially for CTA. Therefore we organized the TopCoW Challenge in 2023 with the release of an annotated CoW dataset and invited submissions worldwide for the CoW segmentation task, which attracted over 140 registered participants from four continents. TopCoW dataset was the first public dataset with voxel-level annotations for CoW's 13 vessel components, made possible by virtual-reality (VR) technology. It was also the first dataset with paired MRA and CTA from the same patients. TopCoW challenge aimed to tackle the CoW characterization problem as a multiclass anatomical segmentation task with an emphasis on topological metrics. The top performing teams managed to segment many CoW components to Dice scores around 90%, but with lower scores for communicating arteries and rare variants. There were also topological mistakes for predictions with high Dice scores. Additional topological analysis revealed further areas for improvement in detecting certain CoW components and matching CoW variant's topology accurately. TopCoW represented a first attempt at benchmarking the CoW anatomical segmentation task for MRA and CTA, both morphologically and topologically.

翻訳日:2024-01-02 09:22:38 公開日:2023-12-29

# AIJack: マシンラーニングのためのセキュリティとプライバシリスクシミュレータ

AIJack: Security and Privacy Risk Simulator for Machine Learning ( http://arxiv.org/abs/2312.17667v1 )

ライセンス: Link先を確認

Hideaki Takahashi

(参考訳) 本稿では,機械学習モデルのトレーニングとデプロイメントに関連するセキュリティとプライバシのリスクを評価可能な,オープンソースのライブラリであるaijackを紹介する。ビッグデータとaiへの関心が高まる中、機械学習の研究とビジネスの進歩が加速している。しかし、最近の研究では、トレーニングデータの盗難や悪意のある攻撃者によるモデルの操作など、潜在的な脅威が明らかになっている。したがって、機械学習のセキュリティとプライバシの脆弱性に関する包括的な理解は、機械学習を現実世界の製品に安全に統合するために不可欠である。 AIJackは、統一されたAPIを通じて、さまざまな攻撃および防御メソッドを備えたライブラリを提供することで、このニーズに対処することを目指している。ライブラリはgithubで公開されている(https://github.com/koukyosyumei/aijack)。

This paper introduces AIJack, an open-source library designed to assess security and privacy risks associated with the training and deployment of machine learning models. Amid the growing interest in big data and AI, advancements in machine learning research and business are accelerating. However, recent studies reveal potential threats, such as the theft of training data and the manipulation of models by malicious attackers. Therefore, a comprehensive understanding of machine learning's security and privacy vulnerabilities is crucial for the safe integration of machine learning into real-world products. AIJack aims to address this need by providing a library with various attack and defense methods through a unified API. The library is publicly available on GitHub (https://github.com/Koukyosyumei/AIJack).

翻訳日:2024-01-02 09:22:00 公開日:2023-12-29

# ユーザ戦略と信頼できるアルゴリズム

User Strategization and Trustworthy Algorithms ( http://arxiv.org/abs/2312.17666v1 )

ライセンス: Link先を確認

Sarah H. Cen, Andrew Ilyas, Aleksander Madry

(参考訳) 推薦システムや採用決定ツールなど、多くの人間向けアルゴリズムは、ユーザが提供するデータに基づいて訓練されている。これらのアルゴリズムの開発者は、データ生成プロセスが外在的であるという仮定を採用する。つまり、ユーザーが与えられたプロンプトにどのように反応するか(例えば、レコメンデーションや採用提案)は、そのプロンプトに依存し、生成したアルゴリズムに依存しない。例えば、ある人の行動が地対真実分布に従うという仮定は、外生的な仮定である。実際には、アルゴリズムが人間と対話する場合、この仮定はユーザーが戦略的であることから、ほとんど成り立たない。例えば最近の研究文書では、tiktokユーザーはtiktokがフィードのキュレーションに使っていることを知った後にスクロールの振る舞いを変更し、uberのドライバーはuberのアルゴリズムの変更に応じて乗車の受け入れとキャンセルの仕方を変えている。本研究は,ユーザとデータ駆動型プラットフォーム間のインタラクションを,反復的な2人プレイゲームとしてモデル化することで,この戦略的行動の意義を考察する。まず最初に、ユーザストラテジフィケーションが短期的にプラットフォームに役立つことが分かりました。そして、それがプラットフォームのデータを破壊し、最終的に反実的な決定をする能力を損なうことを示します。この現象をユーザの信頼と結びつけて,信頼に値するアルゴリズムを設計することで,正確な推定を行うことができることを示す。最後に、潜在的介入を促す信頼の形式化を提供します。

Many human-facing algorithms -- including those that power recommender systems or hiring decision tools -- are trained on data provided by their users. The developers of these algorithms commonly adopt the assumption that the data generating process is exogenous: that is, how a user reacts to a given prompt (e.g., a recommendation or hiring suggestion) depends on the prompt and not on the algorithm that generated it. For example, the assumption that a person's behavior follows a ground-truth distribution is an exogeneity assumption. In practice, when algorithms interact with humans, this assumption rarely holds because users can be strategic. Recent studies document, for example, TikTok users changing their scrolling behavior after learning that TikTok uses it to curate their feed, and Uber drivers changing how they accept and cancel rides in response to changes in Uber's algorithm. Our work studies the implications of this strategic behavior by modeling the interactions between a user and their data-driven platform as a repeated, two-player game. We first find that user strategization can actually help platforms in the short term. We then show that it corrupts platforms' data and ultimately hurts their ability to make counterfactual decisions. We connect this phenomenon to user trust, and show that designing trustworthy algorithms can go hand in hand with accurate estimation. Finally, we provide a formalization of trustworthiness that inspires potential interventions.

翻訳日:2024-01-02 09:21:40 公開日:2023-12-29

# Shape-IoU: ボックス形状とスケールのバウンディングを考慮した高精度メトリック

Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale ( http://arxiv.org/abs/2312.17663v1 )

ライセンス: Link先を確認

Hao Zhang, Shuaijie Zhang

(参考訳) 検出器ローカライゼーションブランチの重要な構成要素として、境界ボックス回帰損失はオブジェクト検出タスクにおいて重要な役割を果たす。既設のバウンディングボックス回帰法は,通常,gtボックスと予測ボックスの幾何学的関係を考慮し,バウンディングボックスの相対位置と形状を用いて損失を算出し,バウンディングボックスの形状やスケールといった固有の特性がバウンディングボックス回帰に与える影響を無視する。本稿では,既存の研究の欠点を補うために,境界箱自体の形状とスケールに着目したバウンディングボックス回帰法を提案する。まず,境界ボックスの回帰特性を分析し,境界ボックス自体の形状とスケール係数が回帰結果に影響を及ぼすことを発見した。以上の結論に基づいて,境界箱自体の形状とスケールに着目して損失を計算し,境界箱の回帰をより正確にする形状IoU法を提案する。最後に,本手法を多数の比較実験により検証し,検出性能を効果的に向上し,既存の手法を上回り,異なる検出タスクで最先端のパフォーマンスを実現することを実証した。

As an important component of the detector localization branch, bounding box regression loss plays a significant role in object detection tasks. The existing bounding box regression methods usually consider the geometric relationship between the GT box and the predicted box, and calculate the loss by using the relative position and shape of the bounding boxes, while ignoring the influence of inherent properties such as the shape and scale of the bounding boxes on bounding box regression. In order to make up for the shortcomings of existing research, this article proposes a bounding box regression method that focuses on the shape and scale of the bounding box itself. Firstly, we analyzed the regression characteristics of the bounding boxes and found that the shape and scale factors of the bounding boxes themselves will have an impact on the regression results. Based on the above conclusions, we propose the Shape IoU method, which can calculate the loss by focusing on the shape and scale of the bounding box itself, thereby making the bounding box regression more accurate. Finally, we validated our method through a large number of comparative experiments, which showed that our method can effectively improve detection performance and outperform existing methods, achieving state-of-the-art performance in different detection tasks.Code is available at https://github.com/malagoutou/Shape-IoU

翻訳日:2024-01-02 09:21:17 公開日:2023-12-29

# Gemini in Reasoning: マルチモーダル大規模言語モデルにおける共通理解の展開

Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models ( http://arxiv.org/abs/2312.17661v1 )

ライセンス: Link先を確認

Yuqing Wang, Yun Zhao

(参考訳) OpenAIのGPT-4V(ision)のようなMLLM(Multimodal Large Language Models)への関心は、学術界と産業界の両方に大きな影響を与えている。これらのモデルは、高度な視覚的理解機能を備えたLarge Language Models (LLM)を強化し、様々なマルチモーダルタスクでアプリケーションを容易にする。最近、Googleはマルチモーダル統合に特化した最先端のMLLMであるGeminiを発表した。その進歩にもかかわらず、予備ベンチマークはgeminiが共通意味推論タスクにおいてgptモデルより遅れていることを示している。しかしながら、この評価は限られたデータセット(すなわちhelaswag)に基づいており、geminiの真のコモンセンス推論ポテンシャルを完全には捉えていない。このギャップに対処するため,本研究では,モダリティ間の共通認識知識の統合を必要とする複雑な推論タスクにおけるgeminiの性能を徹底的に評価する。一般的なタスクからドメイン固有のタスクまで,12のコモンセンス推論データセットを包括的に分析した。これには言語のみに焦点を当てた11のデータセットと、マルチモーダル要素を含むデータセットが含まれている。 4つのLLMと2つのMLLMにわたる実験は、ジェミニの競合するコモンセンス推論能力を示す。さらに,既存のLLMやMLLMが抱えるコモンセンス問題に対処する上での共通課題を明らかにし,これらのモデルのコモンセンス推論能力のさらなる向上の必要性を強調した。

The burgeoning interest in Multimodal Large Language Models (MLLMs), such as OpenAI's GPT-4V(ision), has significantly impacted both academic and industrial realms. These models enhance Large Language Models (LLMs) with advanced visual understanding capabilities, facilitating their application in a variety of multimodal tasks. Recently, Google introduced Gemini, a cutting-edge MLLM designed specifically for multimodal integration. Despite its advancements, preliminary benchmarks indicate that Gemini lags behind GPT models in commonsense reasoning tasks. However, this assessment, based on a limited dataset (i.e., HellaSWAG), does not fully capture Gemini's authentic commonsense reasoning potential. To address this gap, our study undertakes a thorough evaluation of Gemini's performance in complex reasoning tasks that necessitate the integration of commonsense knowledge across modalities. We carry out a comprehensive analysis of 12 commonsense reasoning datasets, ranging from general to domain-specific tasks. This includes 11 datasets focused solely on language, as well as one that incorporates multimodal elements. Our experiments across four LLMs and two MLLMs demonstrate Gemini's competitive commonsense reasoning capabilities. Additionally, we identify common challenges faced by current LLMs and MLLMs in addressing commonsense problems, underscoring the need for further advancements in enhancing the commonsense reasoning abilities of these models.

翻訳日:2024-01-02 09:20:54 公開日:2023-12-29

# 正規表現を用いたリトアニア語の正規化

Normalization of Lithuanian Text Using Regular Expressions ( http://arxiv.org/abs/2312.17660v1 )

ライセンス: Link先を確認

Pijus Kasparaitis

(参考訳) テキスト正規化は、音声合成システムにおいて不可欠な部分である。自然言語のテキストには、数、日付、略語など他の記号類に属する要素がある。これらは非標準語(NSW)と呼ばれ、通常の語に拡張する必要がある。この目的のためには、各NSWのセミオティッククラスを特定する必要がある。リトアニア語に適応したセミオティッククラスの分類が本書で提示されている。 nswsを正規表現に基づいて検出および拡張するためのルールセットが作成される。 3つの全く異なるデータセットで実験を行い、精度を評価した。誤りの原因は説明され、テキスト正規化ルールの開発に推奨される。

Text Normalization is an integral part of any text-to-speech synthesis system. In a natural language text, there are elements such as numbers, dates, abbreviations, etc. that belong to other semiotic classes. They are called non-standard words (NSW) and need to be expanded into ordinary words. For this purpose, it is necessary to identify the semiotic class of each NSW. The taxonomy of semiotic classes adapted to the Lithuanian language is presented in the work. Sets of rules are created for detecting and expanding NSWs based on regular expressions. Experiments with three completely different data sets were performed and the accuracy was assessed. Causes of errors are explained and recommendations are given for the development of text normalization rules.

翻訳日:2024-01-02 09:20:30 公開日:2023-12-29

# 機械学習モデルに基づくutqにおける太陽放射予測

Solar Radiation Prediction in the UTEQ based on Machine Learning Models ( http://arxiv.org/abs/2312.17659v1 )

ライセンス: Link先を確認

Jordy Anchundia Troncoso, \'Angel Torres Quijije, Byron Oviedo and Cristian Zambrano-Vega

(参考訳) 本研究は、ケベド国立工科大学(uteq)中央キャンパスにおいて、太陽放射の予測に用いられる様々な機械学習(ml)モデルの有効性を検討するものである。データはピラノメーターから得られたもので、戦略的にはキャンパスの高所に位置する。この装置は、2020年以来、太陽放射データを継続的に記録し、様々な気象条件と時間変動を含む包括的なデータセットを提供する。相関分析の結果,太陽放射に影響を及ぼす関連する気象変数として,気温と日時が同定された。評価指標である平均二乗誤差(mse)、根平均二乗誤差(rmse)、平均絶対誤差(mae)、決定係数(r^2$)を用いて、線形回帰、k-ネアレスト近傍、決定木、勾配ブースティングなどの異なる機械学習アルゴリズムを比較した。研究では、グラディエント・ブースティング・レグレッショナーが優れた性能を示し、Random Forest Regressorがそれに続いた。これらのモデルは、低いmseと高い$r^2$値で示されるように、太陽放射の非線形パターンを効果的に捉えた。 MLモデルの性能を評価するため、我々はUTEQにおける太陽放射予測のためのWebベースのツールを開発した。その結果,日射予測におけるMLモデルの有効性を実証し,太陽エネルギーの効率的な管理を支援するリアルタイム日射予測における実用的有用性を示した。

This research explores the effectiveness of various Machine Learning (ML) models used to predicting solar radiation at the Central Campus of the State Technical University of Quevedo (UTEQ). The data was obtained from a pyranometer, strategically located in a high area of the campus. This instrument continuously recorded solar irradiance data since 2020, offering a comprehensive dataset encompassing various weather conditions and temporal variations. After a correlation analysis, temperature and the time of day were identified as the relevant meteorological variables that influenced the solar irradiance. Different machine learning algorithms such as Linear Regression, K-Nearest Neighbors, Decision Tree, and Gradient Boosting were compared using the evaluation metrics Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the Coefficient of Determination ($R^2$). The study revealed that Gradient Boosting Regressor exhibited superior performance, closely followed by the Random Forest Regressor. These models effectively captured the non-linear patterns in solar radiation, as evidenced by their low MSE and high $R^2$ values. With the aim of assess the performance of our ML models, we developed a web-based tool for the Solar Radiation Forecasting in the UTEQ available at http://https://solarradiationforecastinguteq.streamlit.app/. The results obtained demonstrate the effectiveness of our ML models in solar radiation prediction and contribute a practical utility in real-time solar radiation forecasting, aiding in efficient solar energy management.

翻訳日:2024-01-02 09:20:23 公開日:2023-12-29

# マトリックス生成状態をもつフェルミオン回路の高速エミュレーション

Fast emulation of fermionic circuits with matrix product states ( http://arxiv.org/abs/2312.17657v1 )

ライセンス: Link先を確認

Justin Provazza, Klaas Gunst, Huanchen Zhai, Garnet K.-L. Chan, Toru Shiozaki, Nicholas C. Rubin, Alec F. White

(参考訳) 本稿では,fermionic quantum emulator (fqe)ソフトウェアライブラリのための行列積状態 (mps) 拡張について述べる。本稿では、スピン1/2フェルミオンの多体波動関数を近似するための対称性適応行列積状態の理論について論じ、FQEインタフェース(MPS-FQE)のオープンソース実装について述べる。このソフトウェアは、ほとんどの基本テンソル演算にオープンソースのpyblock3とblock2ライブラリを使用し、fqeのドロップイン代替として、より大きなフェルミオン回路をより効率的だが近似的にエミュレーションすることができる。最後に,より大きな系の近似エミュレーションが期待できる短期的およびフォールトトレラントな量子アルゴリズムについて,量子位相推定のための状態生成戦略のキャラクタリゼーション,異なる変分量子固有ソルバ ans\"atze のテスト,トロッター誤差の数値評価,一般量子力学問題のシミュレーションなど,いくつかの応用例を示す。これらすべての例において、MPS-FQEによる近似エミュレーションにより、フルステートベクターエミュレータで利用できるものよりもはるかに大きいシステムを扱うことができる。

We describe a matrix product state (MPS) extension for the Fermionic Quantum Emulator (FQE) software library. We discuss the theory behind symmetry adapted matrix product states for approximating many-body wavefunctions of spin-1/2 fermions, and we present an open-source, MPS-enabled implementation of the FQE interface (MPS-FQE). The software uses the open-source pyblock3 and block2 libraries for most elementary tensor operations, and it can largely be used as a drop-in replacement for FQE that allows for more efficient, but approximate, emulation of larger fermionic circuits. Finally, we show several applications relevant to both near-term and fault-tolerant quantum algorithms where approximate emulation of larger systems is expected to be useful: characterization of state preparation strategies for quantum phase estimation, the testing of different variational quantum eigensolver Ans\"atze, the numerical evaluation of Trotter errors, and the simulation of general quantum dynamics problems. In all these examples, approximate emulation with MPS-FQE allows us to treat systems that are significantly larger than those accessible with a full statevector emulator

翻訳日:2024-01-02 09:19:53 公開日:2023-12-29

# スケーラブルな自動運転を実現するvisual point cloud forecasting

Visual Point Cloud Forecasting enables Scalable Autonomous Driving ( http://arxiv.org/abs/2312.17655v1 )

ライセンス: Link先を確認

Zetong Yang, Li Chen, Yanan Sun, Hongyang Li

(参考訳) 一般ビジョンに関する広範な研究とは対照的に、スケーラブルな視覚自律運転のための事前トレーニングは、ほとんど検討されていない。視覚自律運転アプリケーションは、共同認識、予測、計画のためのセマンティクス、3次元幾何学、時間情報を同時に含む機能を必要とする。これを解決するために、視覚点雲予測と呼ばれる新しい事前学習タスクを導入し、過去の視覚入力から将来の点雲を予測する。このタスクの重要な利点は、意味学、3D構造、時間力学のシナジー学習を捉えることである。したがって、様々な下流タスクにおいて優位性を示す。この問題に対処するために、下流のビジュアルエンコーダを事前学習するための一般的なモデルViDARを提案する。最初にエンコーダによる歴史的埋め込みを抽出する。これらの表現は、将来のポイントクラウド予測のために、新しい潜在レンダリング演算子を介して3次元幾何学空間に変換される。実験では、例えば3D検出における3.1%のNDS、モーション予測における10%の誤差削減、計画における衝突率の15%の削減といった下流タスクが顕著に向上した。

In contrast to extensive studies on general vision, pre-training for scalable visual autonomous driving remains seldom explored. Visual autonomous driving applications require features encompassing semantics, 3D geometry, and temporal information simultaneously for joint perception, prediction, and planning, posing dramatic challenges for pre-training. To resolve this, we bring up a new pre-training task termed as visual point cloud forecasting - predicting future point clouds from historical visual input. The key merit of this task captures the synergic learning of semantics, 3D structures, and temporal dynamics. Hence it shows superiority in various downstream tasks. To cope with this new problem, we present ViDAR, a general model to pre-train downstream visual encoders. It first extracts historical embeddings by the encoder. These representations are then transformed to 3D geometric space via a novel Latent Rendering operator for future point cloud prediction. Experiments show significant gain in downstream tasks, e.g., 3.1% NDS on 3D detection, ~10% error reduction on motion forecasting, and ~15% less collision rate on planning.

翻訳日:2024-01-02 09:19:30 公開日:2023-12-29

# Effecitve クロスモーダル蒸留による視覚接地のためのブリジングモダリティギャップ

Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal Distillation ( http://arxiv.org/abs/2312.17648v1 )

ライセンス: Link先を確認

Jiaxi Wang, Wenhui Hu, Xueyang Liu, Beihu Wu, Yuting Qiu, YingYing Cai

(参考訳) ビジュアルグラウンドティングは、画像の特定の領域の視覚情報を対応する自然言語表現と整合させることを目的としている。現在の視覚接地法は、事前訓練された視覚と言語バックボーンを別々に活用し、視覚の特徴と言語的特徴を得る。これら2つの機能はデリケートに設計されたネットワークを介して融合されるが、機能の多様性によってマルチモーダル推論には適用できない。この問題は、現在の視覚的接地法で使用される単一モード事前学習バックボーン間のドメインギャップから生じており、従来のエンドツーエンドのトレーニング手法では克服できない。そこで本研究では,マルチモーダル事前学習モデルを蒸留し,視覚的接地作業の指導を行うEmpowering Pre-trained Model for Visual Grounding (EpmVG)フレームワークを提案する。 EpmVGは、トレーニング済みモデルにおける画像とテキストの一貫性情報を効果的に導入し、バックボーンネットワークに存在するドメインギャップを低減し、視覚的グラウンド処理におけるモデルの性能を向上させる、新しいクロスモーダル蒸留機構に基づいている。従来の5つのデータセットに対して大規模な実験を行い,本手法が最先端手法よりも優れた性能を発揮することを示す。

Visual grounding aims to align visual information of specific regions of images with corresponding natural language expressions. Current visual grounding methods leverage pre-trained visual and language backbones separately to obtain visual features and linguistic features. Although these two types of features are then fused via delicately designed networks, the heterogeneity of the features makes them inapplicable for multi-modal reasoning. This problem arises from the domain gap between the single-modal pre-training backbone used in current visual grounding methods, which can hardly be overcome by the traditional end-to-end training method. To alleviate this, our work proposes an Empowering pre-trained model for Visual Grounding (EpmVG) framework, which distills a multimodal pre-trained model to guide the visual grounding task. EpmVG is based on a novel cross-modal distillation mechanism, which can effectively introduce the consistency information of images and texts in the pre-trained model, to reduce the domain gap existing in the backbone networks, thereby improving the performance of the model in the visual grounding task. Extensive experiments are carried out on five conventionally used datasets, and results demonstrate that our method achieves better performance than state-of-the-art methods.

翻訳日:2024-01-02 09:19:12 公開日:2023-12-29

# 異文化的観点からのマルチモーダル知覚・認知の法則の研究 -海外の中国庭園を例として-

Research on the Laws of Multimodal Perception and Cognition from a Cross-cultural Perspective -- Taking Overseas Chinese Gardens as an Example ( http://arxiv.org/abs/2312.17642v1 )

ライセンス: Link先を確認

Ran Chen, Xueqi Yao, Jing Zhao, Shuhan Xu, Sirui Zhang, Yijun Mao

(参考訳) 本研究では,マルチモーダルデータ解析における知覚的相互作用と認知的相互作用の複雑な関係を,海外の中国庭園における空間的体験設計を中心に検討することを目的とする。ソーシャルメディア上での評価内容や画像は個人の関心や感情反応を反映し,感情情報とイメージベースの認知情報の両方を含む認知研究のための豊富なデータベースを提供する。深層学習技術を活用し,ソーシャルメディアからのテキストデータと視覚データを分析し,海外の中国庭園の文脈における人々の認識と感情認知の関係を明らかにする。さらに,本研究では,マルチエージェントシステム(mas)をaiエージェントとして導入する。各エージェントは、チャットシーンシミュレーションとweb検索を組み合わせることで、美的認知の法則を探求する。この研究は、知覚を感情スコアに翻訳する従来のアプローチを超えて、テキストを直接分析し、意見データを深く掘り下げる研究手法の拡張を可能にする。本研究は,文化コミュニケーションと美的理解の分野における重要な貢献である,多様な文化的文脈における美的体験とその建築・景観デザインへの影響を理解するための新しい視点を提供する。

This study aims to explore the complex relationship between perceptual and cognitive interactions in multimodal data analysis,with a specific emphasis on spatial experience design in overseas Chinese gardens. It is found that evaluation content and images on social media can reflect individuals' concerns and sentiment responses, providing a rich data base for cognitive research that contains both sentimental and image-based cognitive information. Leveraging deep learning techniques, we analyze textual and visual data from social media, thereby unveiling the relationship between people's perceptions and sentiment cognition within the context of overseas Chinese gardens. In addition, our study introduces a multi-agent system (MAS)alongside AI agents. Each agent explores the laws of aesthetic cognition through chat scene simulation combined with web search. This study goes beyond the traditional approach of translating perceptions into sentiment scores, allowing for an extension of the research methodology in terms of directly analyzing texts and digging deeper into opinion data. This study provides new perspectives for understanding aesthetic experience and its impact on architecture and landscape design across diverse cultural contexts, which is an essential contribution to the field of cultural communication and aesthetic understanding.

翻訳日:2024-01-02 09:18:49 公開日:2023-12-29

# mod2t:モデルデータ駆動運動静物追跡法

MoD2T:Model-Data-Driven Motion-Static Object Tracking Method ( http://arxiv.org/abs/2312.17641v1 )

ライセンス: Link先を確認

Yang Feng, Liao Pan, Wu Di, Liu Bo, Zhang Xingle

(参考訳) マルチオブジェクト追跡(MOT)の領域は、ビデオ分析の領域において最重要事項である。しかし、この領域における伝統的な方法論と深層学習に基づくアプローチは、固有の限界を示す。データによってのみ駆動される深層学習法は、対象の運動状態を正確に識別するのは難しいが、包括的数学的モデルに依存する従来の手法は、最適化された追跡精度に苦しむ可能性がある。これらの課題に対処するために、モデルデータ駆動のモーションスタティックオブジェクトトラッキング(mod2t)を導入する。本稿では,従来の数学モデルとディープラーニングに基づくmotフレームワークをうまく融合させ,確立された方法論や高度なディープラーニング技術にのみ依存する制約を効果的に緩和する新しいアーキテクチャを提案する。 MoD2Tの数学的モデリングとディープラーニングの融合により、物体の動き決定の精度が向上し、追跡精度が向上する。我々の実証実験は、UAVの空中監視や街路レベルの追跡など、様々なシナリオでMoD2Tの有効性をしっかりと裏付けている。物体運動状態の判別におけるMoD2Tの習熟度を評価するため,MVF1測定基準を導入する。この新しい性能指標は動作状態の分類の精度を計測するために設計され、mod2tの性能の包括的な評価を提供する。微妙な実験はMVF1の定式化の背後にある理論的根拠を裏付ける。 MoD2Tの性能を総合的に評価するために、さまざまなデータセットを慎重に注釈付けし、厳密なテストを行う。達成されたmvf1スコアは、動作状態の分類の精度を計測するが、キッティデータセットの0.774、mot17の0.521、uavdtの0.827という最小または軽度なカメラの動作で特に注目される。

The domain of Multi-Object Tracking (MOT) is of paramount significance within the realm of video analysis. However, both traditional methodologies and deep learning-based approaches within this domain exhibit inherent limitations. Deep learning methods driven exclusively by data exhibit challenges in accurately discerning the motion states of objects, while traditional methods relying on comprehensive mathematical models may suffer from suboptimal tracking precision. To address these challenges, we introduce the Model-Data-Driven Motion-Static Object Tracking Method (MoD2T). We propose a novel architecture that adeptly amalgamates traditional mathematical modeling with deep learning-based MOT frameworks, thereby effectively mitigating the limitations associated with sole reliance on established methodologies or advanced deep learning techniques. MoD2T's fusion of mathematical modeling and deep learning augments the precision of object motion determination, consequently enhancing tracking accuracy. Our empirical experiments robustly substantiate MoD2T's efficacy across a diverse array of scenarios, including UAV aerial surveillance and street-level tracking. To assess MoD2T's proficiency in discerning object motion states, we introduce MVF1 metric. This novel performance metric is designed to measure the accuracy of motion state classification, providing a comprehensive evaluation of MoD2T's performance. Meticulous experiments substantiate the rationale behind MVF1's formulation. To provide a comprehensive assessment of MoD2T's performance, we meticulously annotate diverse datasets and subject MoD2T to rigorous testing. The achieved MVF1 scores, which measure the accuracy of motion state classification, are particularly noteworthy in scenarios marked by minimal or mild camera motion, with values of 0.774 on the KITTI dataset, 0.521 on MOT17, and 0.827 on UAVDT.

翻訳日:2024-01-02 09:18:29 公開日:2023-12-29

# 双対ボソニックラダーのトラバースによる高次元量子コンピューティングの高速化

Empowering high-dimensional quantum computing by traversing the dual bosonic ladder ( http://arxiv.org/abs/2312.17741v1 )

ライセンス: Link先を確認

Long B. Nguyen, Noah Goss, Karthik Siva, Yosep Kim, Ed Younis, Bingcheng Qing, Akel Hashim, David I. Santiago, Irfan Siddiqi

(参考訳) 高次元量子情報処理は、ハードウェアの限界を超越し、量子技術のフロンティアを前進させる有望な方法として登場した。いわゆる量子ビットの未解決ポテンシャルを損なうには、確立された量子ビット法を超えた量子プロトコルの開発が必要である。本稿では,ラマン支援の2光子相互作用を用いた多次元固体システムに対するロバストでハードウェア効率のよい拡張可能な手法を提案する。有効性を示すために,我々は,マルチキュービット演算の集合を構築し,原子スクレッデッド状態やSchr\\odinger cat状態を含む高度に絡み合った多次元状態を実現し,qudit配列に沿ってプログラム可能な絡み合い分布を実装した。我々の研究は、強く駆動されたマルチキュージット系の量子電磁力学を照らし、高次元量子アプリケーションの開発のための実験的基礎を提供する。

High-dimensional quantum information processing has emerged as a promising avenue to transcend hardware limitations and advance the frontiers of quantum technologies. Harnessing the untapped potential of the so-called qudits necessitates the development of quantum protocols beyond the established qubit methodologies. Here, we present a robust, hardware-efficient, and extensible approach for operating multidimensional solid-state systems using Raman-assisted two-photon interactions. To demonstrate its efficacy, we construct a set of multi-qubit operations, realize highly entangled multidimensional states including atomic squeezed states and Schr\"odinger cat states, and implement programmable entanglement distribution along a qudit array. Our work illuminates the quantum electrodynamics of strongly driven multi-qudit systems and provides the experimental foundation for the future development of high-dimensional quantum applications.

翻訳日:2024-01-02 08:54:04 公開日:2023-12-29

# 量子正則化による2次元質量レスQCDの位相

Phases of 2d massless QCD with qubit regularization ( http://arxiv.org/abs/2312.17734v1 )

ライセンス: Link先を確認

Hanqing Liu, Tanmoy Bhattacharya, Shailesh Chandrasekharan and Rajan Gupta

(参考訳) 我々は,2d SU(N)ゲージ理論の連続体物理学を1つの無質量ディラックフェルミオンに結合して再現する可能性を検討する。連続体理論は、紫外線(UV)におけるN自由フェルミオンと赤外線(IR)におけるコセットウェス・ズミノ・ウィッテン(WZW)モデルによって記述される。本研究では,有限次元リンクヒルベルト空間と一般化ハバードカップリングを持つkogut-susskindハミルトニアンを用いて,これらの特徴の再現性について検討する。強結合展開を用いて, スピン鎖で表される二量体相と他の相のギャップが現れることを示す。さらに、N=2の場合、テンソルネットワーク法を用いて、2次相転移が存在することを示す。遷移における臨界理論はsu(2)_1 wzwモデルとして理解でき、このモデルにおける位相図を定量的に決定できる。モデルの閉じ込め特性を利用することで、自由フェルミオンの紫外線物理学がいかに出現するかを議論するが、我々のモデルにさらなる修正を加える必要がある。

We investigate the possibility of reproducing the continuum physics of 2d SU(N) gauge theory coupled to a single flavor of massless Dirac fermions using qubit regularization. The continuum theory is described by N free fermions in the ultraviolet (UV) and a coset Wess-Zumino-Witten (WZW) model in the infrared (IR). In this work, we explore how well these features can be reproduced using the Kogut-Susskind Hamiltonian with a finite-dimensional link Hilbert space and a generalized Hubbard coupling. Using strong coupling expansions, we show that our model exhibits a gapped dimer phase and another phase described by a spin-chain. Furthermore, for N=2, using tensor network methods, we show that there is a second-order phase transition between these two phases. The critical theory at the transition can be understood as an SU(2)_1 WZW model, using which we determine the phase diagram of our model quantitatively. Using the confinement properties of the model we argue how the UV physics of free fermions could also emerge, but may require further modifications to our model.

翻訳日:2024-01-02 08:53:46 公開日:2023-12-29

# 時間内の光子液化

Photon liquefaction in time ( http://arxiv.org/abs/2312.17732v1 )

ライセンス: Link先を確認

Eduardo Zubizarreta Casalengua and Elena del Valle and Fabrice P. Laussy

(参考訳) 液体中の空間相関と同じ特性を持つフォトンストリームに局所的時間相関をインプリントするメカニズムを提供する。この写真では、単光子放射体は(時空)ガスに対応し、非相関光は理想気体である。我々は、良い単一光子源は、そのような時間的液体の特徴、すなわち(線形依存とは対照的に)短い時間相関の高原と、光子時間順序付けの直接的現示である後の振動を示すものであると主張する。我々は「液体光」の広いファミリーの2階コヒーレンス関数に対する一般の閉形式解析式を得るが、完全に結晶化されることはない。

We provide a mechanism to imprint local temporal correlations in photon streams which have the same character as spatial correlations in liquids. Usual single-photon emitters correspond, in this picture, to a (temporal) gas while uncorrelated light is the ideal gas. We argue that good single-photon sources are those that exhibit such temporal liquid features, i.e., with a plateau for their short-time correlations (as opposed to a linear dependence) and oscillations at later times, which is a direct manifestation of photon time-ordering. We obtain general, closed-form analytical expressions for the second-order coherence function of a broad family of "liquid light" which can be arbitrarily correlated, though never completely crystallized.

翻訳日:2024-01-02 08:53:25 公開日:2023-12-29

# 大規模javaシステムにおけるiast(interactive application security testing)とrasp(runtime application self-protection)ツールの有効性と効率の比較

Comparing Effectiveness and Efficiency of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) Tools in a Large Java-based System ( http://arxiv.org/abs/2312.17726v1 )

ライセンス: Link先を確認

Aishwarya Seth, Saikath Bhattacharya, Sarah Elder, Nusrat Zahan, Laurie Williams

(参考訳) セキュリティリソースは乏しく、実践者はサイバーセキュリティ業界で利用可能な技術やツールを効果的かつ効率的に利用するためのガイダンスが必要である。新たな2つのツールタイプであるInteractive Application Security Testing (IAST) とRuntime Application Self-Protection (RASP) は、Dynamic Application Security Testing (DAST) や Static Application Security Testing (SAST) といった確立したツールに対して、十分に評価されていない。本研究の目的は、さまざまな脆弱性検出・防止技術やツールと比較して、その有効性と効率の分析を通じて、対話型アプリケーションセキュリティテスト(IAST)と実行時アプリケーションセルフプロテクション(RASP)ツールの使用について、実践者がより深い選択をするのを支援することである。オープンソースJavaベースのオンラインアプリケーションであるOpenMRSにIASTとRASPを適用します。 iastとraspの効率性と有効性について,先行研究におけるopenmrsに適用した手法と比較した。検出された脆弱性の数とタイプによって効率と有効性を測定する。本研究は,IASTが他の技術と比較して比較的優れており,効率と有効性の両方において第2位であることを示す。 IASTは8つのTop-10OWASPセキュリティリスクをSMPTで9つ、EMPT、DAST、SASTで7つ検出した。 IASTはSMPTよりも多くの脆弱性を発見した。 IAST (2.14 VpH) の効率はEMPT (2.22 VpH) に次いで第2位である。これらの結果から,ブラックボックスセキュリティテストを行う際のIASTの有用性が示唆された。 OpenMRSのような大規模でエンタープライズ規模のWebアプリケーションのコンテキストでは、RASPは脆弱性検出を置き換えるものではなく、IASTは他のテクニックを補完する強力なツールである。

Security resources are scarce, and practitioners need guidance in the effective and efficient usage of techniques and tools available in the cybersecurity industry. Two emerging tool types, Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP), have not been thoroughly evaluated against well-established counterparts such as Dynamic Application Security Testing (DAST) and Static Application Security Testing (SAST). The goal of this research is to aid practitioners in making informed choices about the use of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) tools through an analysis of their effectiveness and efficiency in comparison with different vulnerability detection and prevention techniques and tools. We apply IAST and RASP on OpenMRS, an open-source Java-based online application. We compare the efficiency and effectiveness of IAST and RASP with techniques applied on OpenMRS in prior work. We measure efficiency and effectiveness in terms of the number and type of vulnerabilities detected and prevented per hour. Our study shows IAST performed relatively well compared to other techniques, performing second-best in both efficiency and effectiveness. IAST detected eight Top-10 OWASP security risks compared to nine by SMPT and seven for EMPT, DAST, and SAST. IAST found more vulnerabilities than SMPT. The efficiency of IAST (2.14 VpH) is second to only EMPT (2.22 VpH). These findings imply that our study benefited from using IAST when conducting black-box security testing. In the context of a large, enterprise-scale web application such as OpenMRS, RASP does not replace vulnerability detection, while IAST is a powerful tool that complements other techniques.

翻訳日:2024-01-02 08:53:11 公開日:2023-12-29

# 2単位行列の固有量子族

Genuinely quantum families of 2-unitary matrices ( http://arxiv.org/abs/2312.17719v1 )

ライセンス: Link先を確認

Rafa{\l} Bistro\'n, Jakub Czartowski and Karol \.Zyczkowski

(参考訳) 量子コンピューティングが発展するにつれて、複数の文脈で制御可能な方法で量子ゲートの絡み合いと切り離しを実装するという問題が再燃する。量子畳み込みニューラルネットワーク(quantum convolutional neural networks)は、エンタングル状態においてエンコードされた情報を失うことなく、qudit数の体系的な減少を基本概念としている。本研究では、畳み込みネットワークのための畳み込みとプールベーシックな構造ブロックの量子アナログに着目し、置換テンソルのコヒーレンスとしてパラメトリズ可能な ``quantum convolution''チャネルを構築し、特徴付ける。この方法で構築された操作は、一般に高い(異なる)エンタングリングパワーを提供する。特に,本手法を用いて構築した畳み込みチャネルに必要な条件を極大絡み合い力を持つために同定する。これに基づいて、2部行列の次元$d^2$ for $d = 7$ および $d = 9$ の新しい連続クラスを2ドルおよび4ドル自由非局所パラメータで確立し、階数 4$ または 4$-partite の完全テンソルに対応する。新たに確立されたファミリーは、量子畳み込みニューラルネットワークにおけるトレーニング可能な畳み込み/プーリング層のプロトタイプとして機能する。

As quantum computing develops, the problem of implementing entangling and disentangling quantum gates in a controllable manner reemerges in multiple contexts. One of the newest applications of such disentangling channels are quantum convolutional neural networks, where the core idea lies in the systematic decrease of qudit numbers without loss of information encoded in entangled states. In this work, we focus on quantum analogues of convolution and pooling - basic building block for convolutional networks - and construct and characterize parametrizable ``quantum convolution'' channels as coherifications of permutation tensors. Operations constructed in this manner generically provide high (dis)entangling power. In particular, we identify conditions necessary for the convolution channels constructed using our method to possess maximal entangling power. Based on this, we establish new, continuous classes of bipartite 2-unitary matrices of dimension $d^2$ for $d = 7$ and $d = 9$, with $2$ and $4$ free nonlocal parameters, corresponding to perfect tensors of rank $4$ or $4$-partite absolutely maximally entangled states. The newly established families may serve as the prototype for trainable convolution/pooling layers in quantum convolutional neural networks.

翻訳日:2024-01-02 08:52:41 公開日:2023-12-29

# テキスト生成のための基本勾配型マルコフ連鎖モンテカルロ

Principled Gradient-based Markov Chain Monte Carlo for Text Generation ( http://arxiv.org/abs/2312.17710v1 )

ライセンス: Link先を確認

Li Du, Afra Amini, Lucas Torroba Hennigen, Xinyan Velocity Yu, Jason Eisner, Holden Lee, Ryan Cotterell

(参考訳) 近年の論文は、高速収束を約束するMCMCアルゴリズムのパラダイムである勾配に基づくサンプリングアルゴリズムを適用することで、エネルギーベースのテキスト生成の可能性を示している。しかし、本論文で示すように、テキスト生成に対するこのアプローチの以前の試みはすべて、対象言語モデルのディストリビューションから正しくサンプルできなかった。この制限に対処するため,本論文では,テキストの分布を限定分布とする忠実なテキストサンプルを設計する問題を考察する。本稿では,対象エネルギーに基づくテキスト分布から試料を正しく抽出するための忠実な勾配に基づくサンプリングアルゴリズムを提案し,その理論的性質について検討する。各種テキスト生成の実験を通じて, 忠実なサンプリング者は, 制御対象に順応しながら, より流動的なテキストを生成できることを実証した。

Recent papers have demonstrated the possibility of energy-based text generation by adapting gradient-based sampling algorithms, a paradigm of MCMC algorithms that promises fast convergence. However, as we show in this paper, previous attempts on this approach to text generation all fail to sample correctly from the target language model distributions. To address this limitation, we consider the problem of designing text samplers that are faithful, meaning that they have the target text distribution as its limiting distribution. We propose several faithful gradient-based sampling algorithms to sample from the target energy-based text distribution correctly, and study their theoretical properties. Through experiments on various forms of text generation, we demonstrate that faithful samplers are able to generate more fluent text while adhering to the control objectives better.

翻訳日:2024-01-02 08:52:18 公開日:2023-12-29

# 線形干渉計におけるボソン-フェルミオン相補性

Boson-fermion complementarity in a linear interferometer ( http://arxiv.org/abs/2312.17709v1 )

ライセンス: Link先を確認

Michael G. Jabbour and Nicolas J. Cerf

(参考訳) ボゾンとフェルミオンの統計は、特にボソン束対フェルミオンアンチバンキングといったアンチノミクスの行動を引き起こすことが知られている。ここでは,ボソニックとフェルミイオンの干渉を任意の線形干渉計で結合する基本関係を確立する。ボゾンとフェルミオンの遷移確率は、それらの値を制限する同じ方程式で一緒に現れ、従って、相互作用の詳細とは独立なボソン-フェルミオン相補性を表現する。例えば、任意の干渉計内の2つの粒子に対して、ボソニックとフェルミオンの平均は古典的粒子に従う確率と一致しなければならない。ちなみに、この基本的な関係は、任意の複素行列の永久行列と決定行列の平方モジュラーを接続する今までにない数学的同一性ももたらしている。

Bosonic and fermionic statistics are well known to give rise to antinomic behaviors, most notably boson bunching vs. fermion antibunching. Here, we establish a fundamental relation that combines bosonic and fermionic multiparticle interferences in an arbitrary linear interferometer. The bosonic and fermionic transition probabilities appear together in a same equation which constrains their values, hence expressing a boson-fermion complementarity that is independent of the details of the interaction. For two particles in any interferometer, for example, it implies that the average of the bosonic and fermionic probabilities must coincide with the probability obeyed by classical particles. Incidentally, this fundamental relation also provides a heretofore unknown mathematical identity connecting the squared moduli of the permanent and determinant of arbitrary complex matrices.

翻訳日:2024-01-02 08:52:04 公開日:2023-12-29

# 中央銀行デジタル通貨(CBDC)における信頼構築とプライバシー問題軽減の6つの方法

The six ways to build trust and reduce privacy concern in a Central Bank Digital Currency (CBDC) ( http://arxiv.org/abs/2312.17708v1 )

ライセンス: Link先を確認

Alex Zarifis and Xusen Cheng

(参考訳) 中央銀行デジタル通貨(CBDC)は少数の国で実施されているが、さらに多くの国で調査されている。 CBDCは中央銀行が発行・支援するデジタル通貨である。消費者信頼は、支払いシステムと技術であるこの通貨の採用を奨励または阻止することができる。本研究は、CBDCにおける消費者信頼の理解を図り、すべての利害関係者に対して開発と採用の段階がより効果的で満足するようにすることを目的とする。

Central Bank Digital Currencies (CBDCs) have been implemented by only a handful of countries, but they are being explored by many more. CBDCs are digital currencies issued and backed by a central bank. Consumer trust can encourage or discourage the adoption of this currency, which is also a payment system and a technology. This research attempts to understand consumer trust in CBDCs so that the development and adoption stages are more effective and satisfying for all the stakeholders.

翻訳日:2024-01-02 08:51:48 公開日:2023-12-29

# TuPy-E:新しいデータセットと包括的なモデル分析によるブラジルのソーシャルメディアにおけるヘイトスピーチの検出

TuPy-E: detecting hate speech in Brazilian Portuguese social media with a novel dataset and comprehensive analysis of models ( http://arxiv.org/abs/2312.17704v1 )

ライセンス: Link先を確認

Felipe Oliveira, Victoria Reis, Nelson Ebecken

(参考訳) ソーシャルメディアは人間の対話に不可欠なものとなり、コミュニケーションと表現のためのプラットフォームを提供している。しかし、これらのプラットフォームでのヘイトスピーチの台頭は個人やコミュニティに重大なリスクをもたらす。ヘイトスピーチの検出と対処は、その豊富な語彙、複雑な文法、地域によって異なるため、ポルトガル語のような言語では特に困難である。そこで我々は,ヘイトスピーチ検出のためのポルトガル最大の注釈付きコーパスTuPy-Eを紹介する。 TuPy-Eはオープンソースアプローチを活用し、研究コミュニティ内のコラボレーションを促進する。 BERTモデルのような高度な技術を用いて詳細な分析を行い、学術的理解と実践的応用に寄与する。

Social media has become integral to human interaction, providing a platform for communication and expression. However, the rise of hate speech on these platforms poses significant risks to individuals and communities. Detecting and addressing hate speech is particularly challenging in languages like Portuguese due to its rich vocabulary, complex grammar, and regional variations. To address this, we introduce TuPy-E, the largest annotated Portuguese corpus for hate speech detection. TuPy-E leverages an open-source approach, fostering collaboration within the research community. We conduct a detailed analysis using advanced techniques like BERT models, contributing to both academic understanding and practical applications

翻訳日:2024-01-02 08:51:38 公開日:2023-12-29

# コンベアモードスピンコヒーレント電子シャットリングによる谷分割のマッピング

Mapping of valley-splitting by conveyor-mode spin-coherent electron shuttling ( http://arxiv.org/abs/2312.17694v1 )

ライセンス: Link先を確認

Mats Volmer, Tom Struck, Arnau Sala, Bingjie Chen, Max Oberl\"ander, Tobias Offermann, Ran Xue, Lino Visser, Jhih-Sian Tu, Stefan Trellenkamp, {\L}ukasz Cywi\'nski, Hendrik Bluhm, Lars R. Schreiber

(参考訳) Si/SiGeヘテロ構造では、低い励起谷状態は電子スピン量子ビットの操作性とスケーラビリティを著しく制限する。谷分割の局所的変動を特徴づけ,理解するためには,空間分解能とエネルギー分解能の高い高速探査法が欠如している。コンベアモードのスピンコヒーレント電子シャットリングにより与えられる空間制御を応用し, エンタングル電子スピンペアをプローブとして磁場依存性の反交差と励起谷状態を検出することにより, 局所谷分割の2次元マッピング法を提案する。この方法は、サブ真空eVエネルギー精度とナノメートル横方向分解能を有する。 210nm×18nmの広い領域にまたがる谷の分割のヒストグラムは、確立されているが時間のかかる磁気スペクトロメトリー法によって得られた統計とよく一致している。特異なヘテロ構造については、谷分割のほぼガウス分布と量子ドットサイズに類似した相関長を求める。このマッピング手法は、スケーラブルな量子コンピューティングのためのsi/sigeヘテロ構造を工学する上で有用なツールとなるかもしれない。

In Si/SiGe heterostructures, the low-lying excited valley state seriously limit operability and scalability of electron spin qubits. For characterizing and understanding the local variations in valley splitting, fast probing methods with high spatial and energy resolution are lacking. Leveraging the spatial control granted by conveyor-mode spin-coherent electron shuttling, we introduce a method for two-dimensional mapping of the local valley splitting by detecting magnetic field dependent anticrossings of ground and excited valley states using entangled electron spin-pairs as a probe. The method has sub-{\mu}eV energy accuracy and a nanometer lateral resolution. The histogram of valley splittings spanning a large area of 210 nm by 18 nm matches well with statistics obtained by the established but time-consuming magnetospectroscopy method. For the specific heterostructure, we find a nearly Gaussian distribution of valley splittings and a correlation length similar to the quantum dot size. Our mapping method may become a valuable tool for engineering Si/SiGe heterostructures for scalable quantum computing.

翻訳日:2024-01-02 08:51:27 公開日:2023-12-29

# マルチスケール・ビジョン・トランスフォーマーとバイパート・マッチング

Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization ( http://arxiv.org/abs/2312.17686v1 )

ライセンス: Link先を確認

Ioanna Ntinou, Enrique Sanchez, Georgios Tzimiropoulos

(参考訳) 行動の局所化は、しばしば別々に対処される検出と認識のタスクを組み合わせる難しい問題である。 State-of-the-artメソッドは、高解像度で事前計算された既成の既成境界ボックス検出に依存し、分類タスクのみに焦点を当てたトランスフォーマーモデルを提案する。このような2段階のソリューションは、リアルタイムデプロイメントでは禁じられている。一方、シングルステージの手法は、ネットワークの一部(一般的にはバックボーン)を作業負荷の大部分を共有に分割することで、両方のタスクをターゲットとすることで、パフォーマンスを向上する。これらの手法は、学習可能なクエリでDETRヘッドを追加することで構築され、クロスアテンションとセルフアテンションの後、対応するMLPに送信して、人のバウンディングボックスとアクションを検出する。しかし、detrのようなアーキテクチャはトレーニングが難しく、大きな複雑さを引き起こす可能性がある。本稿では,視覚変換器の出力トークンに対して,直線的二部整合損失が適用可能であることを観察する。これにより、余分なエンコーダ-デコーダヘッドと学習可能なクエリを必要とせずに両方のタスクを実行できるバックボーン+MPPアーキテクチャが実現される。両タスクを両パートマッチングでトレーニングした単一のMViT-Sアーキテクチャが,RoIで事前計算したバウンディングボックス上でトレーニングした場合,同一のMViT-Sを超えることを示す。我々のMViTv2-Sモデルはトークンプーリングとトレーニングパイプラインを慎重に設計し、AVA2.2上で+3mAPを達成する。 w.r.t.2ステージ。コードとモデルはペーパーリビジョン後にリリースされる。

Action Localization is a challenging problem that combines detection and recognition tasks, which are often addressed separately. State-of-the-art methods rely on off-the-shelf bounding box detections pre-computed at high resolution and propose transformer models that focus on the classification task alone. Such two-stage solutions are prohibitive for real-time deployment. On the other hand, single-stage methods target both tasks by devoting part of the network (generally the backbone) to sharing the majority of the workload, compromising performance for speed. These methods build on adding a DETR head with learnable queries that, after cross- and self-attention can be sent to corresponding MLPs for detecting a person's bounding box and action. However, DETR-like architectures are challenging to train and can incur in big complexity. In this paper, we observe that a straight bipartite matching loss can be applied to the output tokens of a vision transformer. This results in a backbone + MLP architecture that can do both tasks without the need of an extra encoder-decoder head and learnable queries. We show that a single MViT-S architecture trained with bipartite matching to perform both tasks surpasses the same MViT-S when trained with RoI align on pre-computed bounding boxes. With a careful design of token pooling and the proposed training pipeline, our MViTv2-S model achieves +3 mAP on AVA2.2. w.r.t. the two-stage counterpart. Code and models will be released after paper revision.

翻訳日:2024-01-02 08:51:10 公開日:2023-12-29

# 機械学習を用いたIOTシステムのマルウェア検出

Malware Detection in IOT Systems Using Machine Learning Techniques ( http://arxiv.org/abs/2312.17683v1 )

ライセンス: Link先を確認

Ali Mehrban, Pegah Ahadian

(参考訳) IoT環境でのマルウェア検出は堅牢な方法論を必要とする。そこで本研究では,IoTマルウェア識別のためのCNN-LSTMハイブリッドモデルを導入し,その性能評価を行った。 k-foldクロスバリデーションを利用して、提案手法は95.5%の精度を達成し、既存の手法を上回った。 CNNアルゴリズムは優れた学習モデル構築を可能にし、LSTM分類器は高い分類精度を示した。一般的な技術との比較分析は、提案されたモデルの有効性を示し、IoTセキュリティを強化する可能性を強調した。この研究は、代替手段としてSVMの将来の探索を提唱し、分散検出戦略の必要性を強調し、より強力なIOTセキュリティのための予測分析の重要性を強調している。この研究は、IoTエコシステムにおけるよりレジリエントなセキュリティ対策を開発するためのプラットフォームとして機能する。

Malware detection in IoT environments necessitates robust methodologies. This study introduces a CNN-LSTM hybrid model for IoT malware identification and evaluates its performance against established methods. Leveraging K-fold cross-validation, the proposed approach achieved 95.5% accuracy, surpassing existing methods. The CNN algorithm enabled superior learning model construction, and the LSTM classifier exhibited heightened accuracy in classification. Comparative analysis against prevalent techniques demonstrated the efficacy of the proposed model, highlighting its potential for enhancing IoT security. The study advocates for future exploration of SVMs as alternatives, emphasizes the need for distributed detection strategies, and underscores the importance of predictive analyses for a more powerful IOT security. This research serves as a platform for developing more resilient security measures in IoT ecosystems.

翻訳日:2024-01-02 08:50:35 公開日:2023-12-29

# FlowVid: 一貫性のあるビデオ-ビデオ合成のための不完全な光フローのモデリング

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis ( http://arxiv.org/abs/2312.17681v1 )

ライセンス: Link先を確認

Feng Liang, Bichen Wu, Jialiang Wang, Licheng Yu, Kunpeng Li, Yinan Zhao, Ishan Misra, Jia-Bin Huang, Peizhao Zhang, Peter Vajda, Diana Marculescu

(参考訳) 拡散モデルは画像から画像への合成を変換し、現在ではビデオに浸透している。しかし、ビデオフレーム間の時間的一貫性を維持するという課題により、V2V合成の進歩が妨げられている。本稿では,空間条件と時間的光フロー手がかりを併用した一貫したV2V合成フレームワークを提案する。光流に厳密に従属する従来の手法とは対照的に,本手法は流れ推定の不完全さを処理しながらその利点を生かしている。第1フレームからの反りによる光流れを符号化し、拡散モデルにおける補足参照として機能する。これにより,第1のフレームを任意の一般的なi2iモデルで編集し,編集を連続するフレームに伝達することにより,映像合成のためのモデルを実現する。柔軟性: FlowVidは既存のI2Iモデルとシームレスに動作し、スタイリゼーションやオブジェクトスワップ、ローカル編集など、さまざまな変更を容易にします。 2) 効率性: 30 FPS と 512 x512 の解像度を持つ 4 秒のビデオは、それぞれ CoDeF, Rerender, TokenFlow よりも3.1x, 7.2x, 10.5x の 1.5 分で生成される。 (3)高品質:私たちのFlowVidは45.7%の時間を好んでおり、CoDeF (3.5%)、Rerender (10.2%)、TokenFlow (40.4%)を上回っている。

Diffusion models have transformed the image-to-image (I2I) synthesis and are now permeating into videos. However, the advancement of video-to-video (V2V) synthesis has been hampered by the challenge of maintaining temporal consistency across video frames. This paper proposes a consistent V2V synthesis framework by jointly leveraging spatial conditions and temporal optical flow clues within the source video. Contrary to prior methods that strictly adhere to optical flow, our approach harnesses its benefits while handling the imperfection in flow estimation. We encode the optical flow via warping from the first frame and serve it as a supplementary reference in the diffusion model. This enables our model for video synthesis by editing the first frame with any prevalent I2I models and then propagating edits to successive frames. Our V2V model, FlowVid, demonstrates remarkable properties: (1) Flexibility: FlowVid works seamlessly with existing I2I models, facilitating various modifications, including stylization, object swaps, and local edits. (2) Efficiency: Generation of a 4-second video with 30 FPS and 512x512 resolution takes only 1.5 minutes, which is 3.1x, 7.2x, and 10.5x faster than CoDeF, Rerender, and TokenFlow, respectively. (3) High-quality: In user studies, our FlowVid is preferred 45.7% of the time, outperforming CoDeF (3.5%), Rerender (10.2%), and TokenFlow (40.4%).

翻訳日:2024-01-02 08:50:23 公開日:2023-12-29

# 遅延拡散モデルを用いた教師付きグラフ外乱検出のためのデータ拡張

Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models ( http://arxiv.org/abs/2312.17679v1 )

ライセンス: Link先を確認

Kay Liu, Hengrui Zhang, Ziqing Hu, Fangxin Wang, Philip S. Yu

(参考訳) グラフ外乱検出は、グラフニューラルネットワークの領域における研究と応用の顕著な課題である。グラフの多数派からの偏差を示す外れ値ノードを識別する。教師付きグラフ異常検出アルゴリズムに直面する根本的な課題の1つはクラス不均衡の問題である。従来の方法では、損失関数の推定のインスタンスを再重み付けし、高重みを外れ値に、低重みを外れ値に割り当てることで不均衡を軽減する。それでも、これらの戦略は、それぞれ過度に適合する傾向にある。近年,生成モデル,特に拡散モデルが高忠実度画像合成における効果を実証している。その異常な世代品質にもかかわらず、教師付きグラフ異常検出のためのデータ拡張の可能性はほとんど未調査のままである。このギャップを埋めるため,遅延拡散モデルを用いた教師付きグラフアウトリア検出において,クラス不均衡を緩和する新しいデータ拡張であるGODMを導入する。提案手法は,(1) Variantioanl Encoderは,グラフデータ内に存在する異種情報を統一潜在空間にマッピングする。 2)グラフ生成器は,潜伏空間の実際の外れ値と統計的に類似したグラフデータを合成し,(3)潜伏拡散モデルにより反復分解により実際の有機データの潜伏空間分布を学習する。複数のデータセットに対して行われた大規模な実験は、GODMの有効性と効率を裏付けるものである。ケーススタディは、我々の合成データの生成品質をさらに実証した。アクセシビリティと再現性を向上するため、GODMをプラグイン・アンド・プレイパッケージにカプセル化し、Python Package Index (PyPI)でリリースする。

Graph outlier detection is a prominent task of research and application in the realm of graph neural networks. It identifies the outlier nodes that exhibit deviation from the majority in the graph. One of the fundamental challenges confronting supervised graph outlier detection algorithms is the prevalent issue of class imbalance, where the scarcity of outlier instances compared to normal instances often results in suboptimal performance. Conventional methods mitigate the imbalance by reweighting instances in the estimation of the loss function, assigning higher weights to outliers and lower weights to inliers. Nonetheless, these strategies are prone to overfitting and underfitting, respectively. Recently, generative models, especially diffusion models, have demonstrated their efficacy in synthesizing high-fidelity images. Despite their extraordinary generation quality, their potential in data augmentation for supervised graph outlier detection remains largely underexplored. To bridge this gap, we introduce GODM, a novel data augmentation for mitigating class imbalance in supervised Graph Outlier detection with latent Diffusion Models. Specifically, our proposed method consists of three key components: (1) Variantioanl Encoder maps the heterogeneous information inherent within the graph data into a unified latent space. (2) Graph Generator synthesizes graph data that are statistically similar to real outliers from latent space, and (3) Latent Diffusion Model learns the latent space distribution of real organic data by iterative denoising. Extensive experiments conducted on multiple datasets substantiate the effectiveness and efficiency of GODM. The case study further demonstrated the generation quality of our synthetic data. To foster accessibility and reproducibility, we encapsulate GODM into a plug-and-play package and release it at the Python Package Index (PyPI).

翻訳日:2024-01-02 08:49:57 公開日:2023-12-29

PDF登録状況（公開日: 20231229）