Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240114となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 医療IoTサイバーセキュリティのためのZero-Trust Machine Learning Green Architecture: レビュー,分析,実装 A Novel Zero-Trust Machine Learning Green Architecture for Healthcare IoT Cybersecurity: Review, Analysis, and Implementation ( http://arxiv.org/abs/2401.07368v1 ) ライセンス: Link先を確認	Zag ElSayed, Nelly Elsayed, Sajjad Bay,	(参考訳) 医療アプリケーションにおけるIoT(Internet of Things)デバイスの統合は、患者のケア、監視、データ管理に革命をもたらした。 Global IoT in Healthcare Marketの評価額は2023年で252億ドルである。しかし、これらのデバイスの急速な関与は、患者のプライバシーと医療データの整合性に重大な脅威をもたらす情報セキュリティ上の懸念をもたらす。本稿では、医療アプリケーション内のIoTデバイスにおけるセキュリティ脆弱性に対処し、軽減するために設計された、機械学習(ML)ベースのアーキテクチャを紹介する。先進的な畳み込みMLアーキテクチャを活用することで、提案アーキテクチャは、潜在的な脅威を積極的に監視し、検出し、機密性の高い医療情報の機密性と整合性を確保するとともに、コストを最小化し、医療や緊急環境に特化したポータビリティを高めることを目的としている。実験結果は、CICIoT2023データセットを用いてシミュレーションしたゼロデイ検出精度を実証し、x10の係数でコストを削減した結果に基づいて、様々な攻撃を予測するための最大93.6%の精度を示している。このアプローチの重要性は、IoTデバイスのセキュリティ姿勢を強化し、信頼できる医療システムの堅牢な実装を維持することです。 The integration of Internet of Things (IoT) devices in healthcare applications has revolutionized patient care, monitoring, and data management. The Global IoT in Healthcare Market value is $252.2 Billion in 2023. However, the rapid involvement of these devices brings information security concerns that pose critical threats to patient privacy and the integrity of healthcare data. This paper introduces a novel machine learning (ML) based architecture explicitly designed to address and mitigate security vulnerabilities in IoT devices within healthcare applications. By leveraging advanced convolution ML architecture, the proposed architecture aims to proactively monitor and detect potential threats, ensuring the confidentiality and integrity of sensitive healthcare information while minimizing the cost and increasing the portability specialized for healthcare and emergency environments. The experimental results underscore the accuracy of up to 93.6% for predicting various attacks based on the results demonstrate a zero-day detection accuracy simulated using the CICIoT2023 dataset and reduces the cost by a factor of x10. The significance of our approach is in fortifying the security posture of IoT devices and maintaining a robust implementation of trustful healthcare systems.	翻訳日:2024-03-25 12:37:32 公開日:2024-01-14
# 都市車両網における多目的最適道路ユニット配置 Multi-objective Optimal Roadside Units Deployment in Urban Vehicular Networks ( http://arxiv.org/abs/2402.18581v1 ) ライセンス: Link先を確認	Weian Guo, Zecheng Kang, Dongyang Li, Lun Zhang, Li Li,	(参考訳) 都市車両網では,交通効率,安全,関連サービスの重要性が増している。このようなネットワーク内では、道路側ユニット(RSU)が通信を容易にする中間体として機能する。したがって、RSUの展開は通信サービスの質を確保する上で最も重要である。しかし、時間遅延やデプロイメントコストといった最適化の目的は、様々な観点から一般的に開発されている。その結果、対立が目的の間に生じる可能性がある。さらに、都市環境においては、建物、庭園、湖沼、その他のインフラなど様々な障害が存在するため、RSUの展開に課題が生じる。したがって、複数の目的が存在すること、障害によって課される制約、そして大規模な最適化空間を探索する必要性により、配置は重大な困難に直面する。本稿では,2種類の多目的最適化アルゴリズムを提案する。マルチポピュレーション戦略と適応探索手法を利用して,大規模決定変数空間を効率的に探索する。 RSUの過密配置の問題を緩和するために、最適化手順中にRSU密度を調整するための校正機構が採用されている。提案手法は, 車両とRSU間のデータオフロードを, 反復的ベストレスポンスシーケンスゲーム(IBRSG)をセットアップすることで処理する。提案したアルゴリズムと最先端のアルゴリズムを比較することで,高密度・低密度の都市シナリオにおいて,我々の戦略が優れていることを示す。また,提案手法は車両ネットワークの効率を大幅に改善することを示した。 The significance of transportation efficiency, safety, and related services is increasing in urban vehicular networks. Within such networks, roadside units (RSUs) serve as intermediates in facilitating communication. Therefore, the deployment of RSUs is of utmost importance in ensuring the quality of communication services. However, the optimization objectives, such as time delay and deployment cost, are commonly developed from diverse perspectives. As a result, it is possible that conflicts may arise among the objectives. Furthermore, in urban environments, the presence of various obstacles, such as buildings, gardens, lakes, and other infrastructure, poses challenges for the deployment of RSUs. Hence, the deployment encounters significant difficulties due to the existence of multiple objectives, constraints imposed by obstacles, and the necessity to explore a large-scale optimization space. To address this issue, two versions of multi-objective optimization algorithms are proposed in this paper. By utilizing a multi-population strategy and an adaptive exploration technique, the methods efficiently explore a large-scale decision-variable space. In order to mitigate the issue of an overcrowded deployment of RSUs, a calibrating mechanism is adopted to adjust RSU density during the optimization procedures. The proposed methods also take care of data offloading between vehicles and RSUs by setting up an iterative best response sequence game (IBRSG). By comparing the proposed algorithms with several state-of-the-art algorithms, the results demonstrate that our strategies perform better in both high-density and low-density urban scenarios. The results also indicate that the proposed solutions substantially improve the efficiency of vehicular networks.	翻訳日:2024-03-25 08:36:53 公開日:2024-01-14
# AI-Enabled GPT-4 Assistant APIを用いた体系的文献レビュー(SLR)の選択フェーズの合理化 Streamlining the Selection Phase of Systematic Literature Reviews (SLRs) Using AI-Enabled GPT-4 Assistant API ( http://arxiv.org/abs/2402.18582v1 ) ライセンス: Link先を確認	Seyed Mohammad Ali Jafari,	(参考訳) 学術文献の増大は、最新の研究動向に追随する上で、重大な課題となっている。そこで本研究では,SLR(Systematic Literature Reviews)における記事選択フェーズの効率を効率化するための,先駆的なAIベースのツールを提案する。 OpenAIのGPT-4アシスタントAPIの堅牢な機能を利用することで、このツールは幅広い学術分野にわたる記事選択プロセスを均質化することに成功した。データ準備、AIによる記事評価、構造化された結果提示からなる三部作のアプローチにより、このツールは文学レビューの時間的消費タスクを著しく加速する。重要なことに、このツールは、SLRプロセスが実質的な人間の判断を伴う管理や経済学などの分野において、非常に有益である可能性がある。標準GPTモデルを採用することで、潜在的なバイアスを大幅に低減し、SLR選択フェーズの速度と精度を高めることができる。これは研究者の生産性と正確さを増幅するだけでなく、学術出版の活発化の中で学術研究が行なわれる過程において、かなりの進歩を示している。 The escalating volume of academic literature presents a formidable challenge in staying updated with the newest research developments. Addressing this, this study introduces a pioneering AI-based tool, configured specifically to streamline the efficiency of the article selection phase in Systematic Literature Reviews (SLRs). Utilizing the robust capabilities of OpenAI's GPT-4 Assistant API, the tool successfully homogenizes the article selection process across a broad array of academic disciplines. Implemented through a tripartite approach consisting of data preparation, AI-mediated article assessment, and structured result presentation, this tool significantly accelerates the time-consuming task of literature reviews. Importantly, this tool could be highly beneficial in fields such as management and economics, where the SLR process involves substantial human judgment. The adoption of a standard GPT model can substantially reduce potential biases and enhance the speed and precision of the SLR selection phase. This not only amplifies researcher productivity and accuracy but also denotes a considerable stride forward in the way academic research is conducted amidst the surging body of scholarly publications.	翻訳日:2024-03-25 08:36:53 公開日:2024-01-14
# クラウドストレージにおけるセキュリティとプライバシの問題 Security and Privacy Issues in Cloud Storage ( http://arxiv.org/abs/2401.04076v2 ) ライセンス: Link先を確認	Norah Asiri,	(参考訳) クラウドコンピューティングが大きな可能性を秘めているにもかかわらず、消費者がそれに価値ある熱意とペースで採用していない。これは、消費者がクラウドコンピューティングを機密データに使用することをためらう理由であり、消費者がクラウドコンピューティングを一般的なクラウドストレージや、特にクラウドストレージに使用することを妨げている脅威である。クラウドコンピューティングは、独自の構造のため、独自の問題以外に、従来型のセキュリティとプライバシの脅威を継承する。クラウドコンピューティングに関連するいくつかの脅威は、時折プロバイダが意識していない従業員からの内部攻撃、消費者とプロバイダ間の合意の透明性の欠如、データ損失、トラフィックハイジャック、共有テクノロジ、および安全でないアプリケーションインターフェースなどである。このような脅威は、消費者がその機能を安全に使えるようにするための対策が必要である。このレビューでは、コンシューマや企業でさえ意識していないギャップとして、最もセキュリティとプライバシの問題に光を当てています。また、クラウドコンピューティングのシナリオに関わるパーティも定義しています。これらの脅威の結果も示しています。 Even with the vast potential that cloud computing has, so far, it has not been adopted by the consumers with the enthusiasm and pace that it be worthy; this is a very reason statement why consumers still hesitated of using cloud computing for their sensitive data and the threats that prevent the consumers from shifting to use cloud computing in general and cloud storage in particular. The cloud computing inherits the traditional potential security and privacy threats besides its own issues due to its unique structures. Some threats related to cloud computing are the insider malicious attacks from the employees that even sometime the provider unconscious about, the lack of transparency of agreement between consumer and provider, data loss, traffic hijacking, shared technology and insecure application interface. Such threats need remedies to make the consumer use its features in secure way. In this review, we spot the light on the most security and privacy issues which can be attributed as gaps that sometimes the consumers or even the enterprises are not aware of. We also define the parties that involve in scenario of cloud computing that also may attack the entire cloud systems. We also show the consequences of these threats.	翻訳日:2024-03-18 08:46:40 公開日:2024-01-14
# Killer Apps: 高速で大規模なAI兵器 Killer Apps: Low-Speed, Large-Scale AI Weapons ( http://arxiv.org/abs/2402.01663v1 ) ライセンス: Link先を確認	Philip Feldman, Aaron Dant, James R. Foulds	(参考訳) 人工知能(ai)と機械学習(ml)の加速は、openai、meta、antropicなどの組織による最先端生成前訓練トランスフォーマー(gpt)モデルの開発によって強調され、戦争とセキュリティにおける新たな挑戦と機会を提示している。現在注目されているのは、武器システムにおけるAIの統合と、速度論的衝突における迅速な意思決定におけるその役割である。しかし、同様に重要だが見落とされがちな側面は、情報領域内のインターネットスケールにおけるAIベースの心理的操作の可能性である。これらの能力は、世界中の個人、組織、社会に重大な脅威をもたらす可能性がある。本稿では,AI兵器の概念,その展開,検出,潜在的な対策について検討する。 The accelerating advancements in Artificial Intelligence (AI) and Machine Learning (ML), highlighted by the development of cutting-edge Generative Pre-trained Transformer (GPT) models by organizations such as OpenAI, Meta, and Anthropic, present new challenges and opportunities in warfare and security. Much of the current focus is on AI's integration within weapons systems and its role in rapid decision-making in kinetic conflict. However, an equally important but often overlooked aspect is the potential of AI-based psychological manipulation at internet scales within the information domain. These capabilities could pose significant threats to individuals, organizations, and societies globally. This paper explores the concept of AI weapons, their deployment, detection, and potential countermeasures.	翻訳日:2024-02-11 17:03:31 公開日:2024-01-14
# 生成ゴースト:aiの余生のメリットとリスクを予測 Generative Ghosts: Anticipating Benefits and Risks of AI Afterlives ( http://arxiv.org/abs/2402.01662v1 ) ライセンス: Link先を確認	Meredith Ringel Morris and Jed R. Brubaker	(参考訳) AIシステムは、パフォーマンスの幅と深さの両方を急速に改善するので、特定の人物をモデルにしたエージェントの可能性を含む、ますます強力で現実的なエージェントを作るのに役立ちます。私たちは、生涯のうちに、人々が愛する人や死後のより広い世界と対話するカスタムAIエージェントを作るのが一般的になることを期待しています。なぜなら、そのようなエージェントは、創造者が生み出したコンテンツを単に包み込むのではなく、新しいコンテンツを生成することができるからです。本稿では, 生成ゴーストの潜在的な実装に関する設計空間について論じる。次に, 生成的幽霊の実用的, 倫理的意義について論じ, 個人や社会に対する潜在的に肯定的, 否定的な影響について論じる。これらの考察に基づき、我々はAIとHCI研究コミュニティのための研究アジェンダを策定し、人々が安全で有益な方法でAIのアフターリーブを創造し、相互作用できるようにする。 As AI systems quickly improve in both breadth and depth of performance, they lend themselves to creating increasingly powerful and realistic agents, including the possibility of agents modeled on specific people. We anticipate that within our lifetimes it may become common practice for people to create a custom AI agent to interact with loved ones and/or the broader world after death. We call these generative ghosts, since such agents will be capable of generating novel content rather than merely parroting content produced by their creator while living. In this paper, we first discuss the design space of potential implementations of generative ghosts. We then discuss the practical and ethical implications of generative ghosts, including potential positive and negative impacts on individuals and society. Based on these considerations, we lay out a research agenda for the AI and HCI research communities to empower people to create and interact with AI afterlives in a safe and beneficial manner.	翻訳日:2024-02-11 17:03:17 公開日:2024-01-14
# MorpheusNet: 組み込みオンラインシステムのための資源効率の良い睡眠ステージ分類器 MorpheusNet: Resource efficient sleep stage classifier for embedded on-line systems ( http://arxiv.org/abs/2401.10284v1 ) ライセンス: Link先を確認	Ali Kavoosi, Morgan P. Mitchell, Raveen Kariyawasam, John E. Fleming, Penny Lewis, Heidi Johansen-Berg, Hayriye Cagnan, Timothy Denison	(参考訳) 睡眠ステージ分類(ssc)は労働集約的な作業であり、手動分類のための電気生理学的記録の時間を調べる必要がある。これは、治療目的で睡眠ステージを活用する際の制限要因である。ウェアラブルデバイスの普及と拡張により、SSCの自動化により、大規模な睡眠ベースの治療法の展開が可能になる。ディープラーニングはこのプロセスを自動化する潜在的な方法として注目を集めている。これまでの研究では、手動のエキスパートスコアに匹敵する精度を示した。しかし、従来の手法では膨大な量のメモリと計算資源を必要とする。これにより、リアルタイムに分類し、エッジにモデルをデプロイする能力が制限される。このギャップに対処するため、私たちは、外部の計算ソース(例えば携帯電話、クラウド)にアクセスせずに、睡眠ステージをリアルタイムで予測できるモデルを提供することを目指している。このアルゴリズムは、組み込みバッテリー駆動システムで使用可能な電力効率が良い。我々の小型睡眠ステージ分類器は、ハードウェア設定が制約されたほとんどの市販マイクロコントローラ(MCU)に展開できる。これは、我々のアプローチのメモリフットプリントが大幅に少ないためです。モデルは3つの一般公開されたデータベースでテストされ、その性能は最先端に匹敵するが、モデルの複雑さは桁違いに減らした(最先端に比べて最大280倍も小さい)。さらに、パラメータを8ビットに量子化し、平均0.95%の精度でモデルを最適化した。ファームウェアに実装されると、量子化されたモデルはarm cortexm4プロセッサ上で1.6秒のレイテンシを達成し、オンラインのsscベースの治療に使用できる。 Sleep Stage Classification (SSC) is a labor-intensive task, requiring experts to examine hours of electrophysiological recordings for manual classification. This is a limiting factor when it comes to leveraging sleep stages for therapeutic purposes. With increasing affordability and expansion of wearable devices, automating SSC may enable deployment of sleep-based therapies at scale. Deep Learning has gained increasing attention as a potential method to automate this process. Previous research has shown accuracy comparable to manual expert scores. However, previous approaches require sizable amount of memory and computational resources. This constrains the ability to classify in real time and deploy models on the edge. To address this gap, we aim to provide a model capable of predicting sleep stages in real-time, without requiring access to external computational sources (e.g., mobile phone, cloud). The algorithm is power efficient to enable use on embedded battery powered systems. Our compact sleep stage classifier can be deployed on most off-the-shelf microcontrollers (MCU) with constrained hardware settings. This is due to the memory footprint of our approach requiring significantly fewer operations. The model was tested on three publicly available data bases and achieved performance comparable to the state of the art, whilst reducing model complexity by orders of magnitude (up to 280 times smaller compared to state of the art). We further optimized the model with quantization of parameters to 8 bits with only an average drop of 0.95% in accuracy. When implemented in firmware, the quantized model achieves a latency of 1.6 seconds on an Arm CortexM4 processor, allowing its use for on-line SSC-based therapies.	翻訳日:2024-01-28 16:22:42 公開日:2024-01-14
# 臨床脳波分類のためのウィンドウ積み重ねメタモデル Window Stacking Meta-Models for Clinical EEG Classification ( http://arxiv.org/abs/2401.10283v1 ) ライセンス: Link先を確認	Yixuan Zhu, Rohan Kandasamy, Luke J. W. Canham, David Western	(参考訳) ウィンドウニングは、EEG機械学習の分類やその他の時系列タスクにおいて一般的なテクニックである。しかし,この手法を用いると,計算コストが記録全体や記録セット全体のグローバルな関係の学習を阻害する。さらに、親記録からウィンドウに受け継がれたラベルは、そのウィンドウの内容を正確に反映するものではない。これらの問題を解決するために,時間ウインドウドデータ集約に適したメタラーニングの原則を取り入れた多段階モデルアーキテクチャを導入する。さらに、これらの問題を緩和するための2つの異なる戦略をテストしました。テンプル大学病院異常脳波コーポレーション(TUAB)で試験を行ったところ、ベンチマークの精度は89.8%から99.0パーセントに劇的に向上した。このブレークスルー性能は、このデータセットの事前のパフォーマンス予測を超え、EEG解釈課題に対する機械学習ソリューションの臨床応用の道を開く。テンプル大学病院脳波コーパス(tueg)のより広範で多種多様なデータセットを用いて86.7%の精度を得た。 Windowing is a common technique in EEG machine learning classification and other time series tasks. However, a challenge arises when employing this technique: computational expense inhibits learning global relationships across an entire recording or set of recordings. Furthermore, the labels inherited by windows from their parent recordings may not accurately reflect the content of that window in isolation. To resolve these issues, we introduce a multi-stage model architecture, incorporating meta-learning principles tailored to time-windowed data aggregation. We further tested two distinct strategies to alleviate these issues: lengthening the window and utilizing overlapping to augment data. Our methods, when tested on the Temple University Hospital Abnormal EEG Corpus (TUAB), dramatically boosted the benchmark accuracy from 89.8 percent to 99.0 percent. This breakthrough performance surpasses prior performance projections for this dataset and paves the way for clinical applications of machine learning solutions to EEG interpretation challenges. On a broader and more varied dataset from the Temple University Hospital EEG Corpus (TUEG), we attained an accuracy of 86.7%, nearing the assumed performance ceiling set by variable inter-rater agreement on such datasets.	翻訳日:2024-01-28 16:22:18 公開日:2024-01-14
# truth forest: チューニングなし介入による大規模言語モデルにおける多元的真理性の実現に向けて Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning ( http://arxiv.org/abs/2312.17484v2 ) ライセンス: Link先を確認	Zhongzhi Chen, Xingwu Sun, Xianfeng Jiao, Fengzong Lian, Zhanhui Kang, Di Wang, Cheng-Zhong Xu	(参考訳) 大きな言語モデル(LLM)が様々なタスクで大きな成功を収めたが、幻覚を生じさせることに苦しむ。多次元直交プローブを用いて隠れ真理表現を明らかにすることでllmの真理性を高める方法である真理フォレストを提案する。具体的には、プローブに直交制約を組み込むことで真理をモデリングするための複数の直交基底を生成する。さらに,LLMにおける識別と真理特徴の生成のギャップを減らし,シーケンス内の幅広い位置を考慮に入れた体系的手法であるRandom Peekを導入する。このアプローチを用いることで,Llama-2-7Bの真偽を40.8\%から74.5\%に改善した。同様に、微調整されたモデルでも顕著な改善が見られる。我々はプローブを用いて真理特徴の徹底的な解析を行った。可視化の結果,直交プローブが真理関連特徴を補完し,データセットの固有構造を明らかにするクラスタを形成することがわかった。 Despite the great success of large language models (LLMs) in various tasks, they suffer from generating hallucinations. We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. Specifically, it creates multiple orthogonal bases for modeling truth by incorporating orthogonal constraints into the probes. Moreover, we introduce Random Peek, a systematic technique considering an extended range of positions within the sequence, reducing the gap between discerning and generating truth features in LLMs. By employing this approach, we improved the truthfulness of Llama-2-7B from 40.8\% to 74.5\% on TruthfulQA. Likewise, significant improvements are observed in fine-tuned models. We conducted a thorough analysis of truth features using probes. Our visualization results show that orthogonal probes capture complementary truth-related features, forming well-defined clusters that reveal the inherent structure of the dataset.	翻訳日:2024-01-19 19:36:36 公開日:2024-01-14
# ブロックチェーンを用いた農産物サプライチェーンの枠組み A Framework for Agricultural Food Supply Chain using Blockchain ( http://arxiv.org/abs/2401.09476v1 ) ライセンス: Link先を確認	Sudarssan N	(参考訳) 本論文の主な目的は、食品サプライチェーンシステムの信頼性と透明性を確立し、ブロックチェーン技術の助けを借りて、すべての人の食品安全性を確保することである。食品サプライチェーン(英: food supply chain)は、農家や生産者から買い手までの作物を追跡するプロセスである。ブロックチェーンの出現により、多数の農業必需品を提供する安全で不正な環境がより簡単になった。貿易のグローバル化により、現在のサプライチェーン市場は、データの統合、複雑な取引、流通に関わる様々な企業を含んでいる。情報改ざん抵抗、需給関係、追跡可能な監視は、この結果として生じる困難である。 Blockchainは、改ざんに抵抗する情報を提供する分散台帳技術である。この戦略は、中央集権的な権威、仲介者、ビジネスヒストリーの必要性を排除し、高いレベルの完全性、責任、安全性を維持しながら、生産とセキュリティを高めることができる。農業分野における食品サプライチェーンの整合性と透明性を確保するため,ブロックチェーンとIoTに基づく枠組みが提案されている。 The main aim of the paper is to create a trust and transparency in the food supply chain system, ensuring food safety for everyone with the help of Blockchain Technology. Food supply chain is the process of tracing a crop from the farmer or producer to the buyer. With the advent of blockchain, providing a safe and fraud-free environment for the provision of numerous agricultural necessities has become much easier. Because of the globalization of trade, the present supply chain market today includes various companies involving integration of data, complex transactions and distribution. Information tamper resistance, supply-demand relationships, and traceable oversight are all difficulties that arise as a result of this. Blockchain is a distributed ledger technology that can provide information that is resistant to tampering. This strategy can eliminate the need for a centralized trusted authority, intermediaries, and business histories, allowing for increased production and security while maintaining the highest levels of integrity, liability, and safety. In order to have an integrity and transparency in food supply chain in the agricultural sector, a framework is proposed here based on block chain and IoT.	翻訳日:2024-01-19 19:10:03 公開日:2024-01-14
# 二重対向アクティベーション異常検出:対向オートエンコーダは異常発生器である Double-Adversarial Activation Anomaly Detection: Adversarial Autoencoders are Anomaly Generators ( http://arxiv.org/abs/2101.04645v5 ) ライセンス: Link先を確認	J.-P. Schulze, P. Sperl, K. B\"ottinger	(参考訳) 異常検出は、固有のクラス不均衡のため、機械学習アルゴリズムにとって難しいタスクである。観測されたデータを手動で分析するのはコストが高く、時間を要するため、通常、使用可能な場合の既知の異常はごくわずかである。生成モデルとニューラルネットワークの隠れ活性化の解析に着想を得て,DA3Dと呼ばれる新しい教師なし異常検出手法を導入する。ここでは,通常のデータのみに基づく異常な反例を生成するために,対向オートエンコーダを用いる。これらの人工的な異常は、実際の、しかし目に見えない異常を検出することができる。新たな生成手法により,異常検出の教師なしタスクを教師付きタスクに変換する。 DA3Dは、ドメイン知識を必要としない純粋にデータ駆動の方法で最先端の異常検出手法の性能を上回る。 Anomaly detection is a challenging task for machine learning algorithms due to the inherent class imbalance. It is costly and time-demanding to manually analyse the observed data, thus usually only few known anomalies if any are available. Inspired by generative models and the analysis of the hidden activations of neural networks, we introduce a novel unsupervised anomaly detection method called DA3D. Here, we use adversarial autoencoders to generate anomalous counterexamples based on the normal data only. These artificial anomalies used during training allow the detection of real, yet unseen anomalies. With our novel generative approach, we transform the unsupervised task of anomaly detection to a supervised one, which is more tractable by machine learning and especially deep learning methods. DA3D surpasses the performance of state-of-the-art anomaly detection methods in a purely data-driven way, where no domain knowledge is required.	翻訳日:2024-01-18 22:38:52 公開日:2024-01-14
# 教師付き学習とVAEの統一 -- 天体-粒子再構成のための正規化フローベースニューラルネットワークモデルにおけるカバレッジ、体系、適合性 Unifying supervised learning and VAEs -- coverage, systematics and goodness-of-fit in normalizing-flow based neural network models for astro-particle reconstructions ( http://arxiv.org/abs/2008.05825v5 ) ライセンス: Link先を確認	Thorsten Gl\"usenkamp	(参考訳) ニューラルネットワークに基づく天体物理学における事象特性の予測はますます一般的になっている。しかし、多くの場合、結果は単に点予測として利用される。統計的不確実性、カバレッジ、体系的不確実性、あるいは適合度尺度はしばしば計算されない。ここでは、これらすべてのプロパティを単一のネットワークモデルに組み込むことができるトレーニングとネットワークアーキテクチャの特定の選択について説明する。データとラベルの連成分布のKL偏差は、確率的変分推論の1つの傘の下で教師付き学習と変分オートエンコーダ(VAE)を統一することができることを示す。この統一は、ニューラルネットワークモデルに適合するp値を計算することを可能にする拡張教師付き学習スキームを動機付ける。この構成では、ニューラルネットワークで償却された条件付き正規化フローが不可欠である。フローの正規化に特有の特定の「順序付き」輪郭に対して,数値積分を伴わずにカバレッジ確率を計算する方法について論じる。さらに,訓練中の効果的な限界化を通じて,系統的不確実性がどのように組み込まれるかを示す。提案した拡張教師あり訓練は,(1)カバレッジ計算,(2)システマティクス,(3)1つの機械学習モデルにおける適合度尺度を含む。原理上、関連する分布の形状に制約はなく、実際、機械は $\mathbb{r}^n \times \mathbb{s}^m$ のような積空間上で定義される複素多様分布を扱う。しかし、カバレッジ計算では、分布が縮退しすぎる場合、その解釈に注意が必要である。イベント選択や、不確実性の保証を必要とする高速な天文警報において、イベントごとの情報を活用する大きな可能性を見出した。 Neural-network based predictions of event properties in astro-particle physics are getting more and more common. However, in many cases the result is just utilized as a point prediction. Statistical uncertainties, coverage, systematic uncertainties or a goodness-of-fit measure are often not calculated. Here we describe a certain choice of training and network architecture that allows to incorporate all these properties into a single network model. We show that a KL-divergence objective of the joint distribution of data and labels allows to unify supervised learning and variational autoencoders (VAEs) under one umbrella of stochastic variational inference. The unification motivates an extended supervised learning scheme which allows to calculate a goodness-of-fit p-value for the neural network model. Conditional normalizing flows amortized with a neural network are crucial in this construction. We discuss how to calculate coverage probabilities without numerical integration for specific "base-ordered" contours that are unique to normalizing flows. Furthermore we show how systematic uncertainties can be included via effective marginalization during training. The proposed extended supervised training incorporates (1) coverage calculation, (2) systematics and (3) a goodness-of-fit measure in a single machine-learning model. There are in principle no constraints on the shape of the involved distributions, in fact the machinery works with complex multi-modal distributions defined on product spaces like $\mathbb{R}^n \times \mathbb{S}^m$. The coverage calculation, however, requires care in its interpretation when the distributions are too degenerate. We see great potential for exploiting this per-event information in event selections or for fast astronomical alerts which require uncertainty guarantees.	翻訳日:2024-01-18 22:36:40 公開日:2024-01-14
# 高濃度におけるロバストパラ水素誘起偏光 Robust Parahydrogen-Induced Polarization at High Concentrations ( http://arxiv.org/abs/2401.07243v1 ) ライセンス: Link先を確認	Laurynas Dagys, Martin C. Korzeczek, Anna J. Parker, James Eills, John W. Blanchard, Christian Bengs, Malcolm H. Levitt, Stephan Knecht, Ilai Schwartz, M. B. Plenio	(参考訳) パラ水素誘起偏極(PHIP)は、高い核スピン偏極を持つ標的分子を生成する強力な技術である。 PHIPプロセスは、パラ水素と標的分子の間の化学反応を伴い、続いて磁場操作により、指定された核の磁化に核一重項スピン秩序が変換される。単磁化偏極移動過程は中程度の濃度で効果的に作用するが、偏極と濃度の積として定義される高モル分極では効率が低下することが観察された。このモル分極への強い依存は、偏極移動中に試料の磁化によって生じる磁場からの干渉によるもので、複雑なダイナミクスをもたらし、技術のスケーラビリティに大きな影響を与える。遠方二極子場の影響を否定するパルスシーケンスでこの問題に対処し、同時にモル偏極の制限なく、目的のターゲットスピンへのsinglet-to-magnetization偏光移動を実現する。 Parahydrogen-Induced Polarization (PHIP) is a potent technique for generating target molecules with high nuclear spin polarization. The PHIP process involves a chemical reaction between parahydrogen and a target molecule, followed by the transformation of nuclear singlet spin order into magnetization of a designated nucleus through magnetic field manipulations. Although the singlet-to-magnetization polarization transfer process works effectively at moderate concentrations, it is observed to become much less efficient at high molar polarization, defined as the product of polarization and concentration. This strong dependence on the molar polarization is attributed to interference from the field produced by the sample's magnetization during polarization transfer, which leads to complex dynamics and can severely impact the scalability of the technique. We address this challenge with a pulse sequence that negates the influence of the distant dipolar field, while simultaneously achieving singlet-to-magnetization polarization transfer to the desired target spins, free from restrictions on the molar polarization.	翻訳日:2024-01-18 19:12:48 公開日:2024-01-14
# 病理組織学における画像検索について On Image Search in Histopathology ( http://arxiv.org/abs/2401.08699v1 ) ライセンス: Link先を確認	H.R. Tizhoosh, Liron Pantanowitz	(参考訳) 病理組織像は、カメラ付き顕微鏡またはスライドスキャナ全体から得ることができる。これらの画像に基づく類似度計算を利用して患者をマッチングすることは、研究や臨床の文脈において有意な可能性を秘めている。近年の検索技術の進歩により、様々な組織タイプにまたがる細胞構造の微妙な定量化が可能となり、診断、予後、新しい患者の予測を診断および治療された患者のデータベースと比較できる。本稿では,組織病理学における画像検索技術の最近の進歩を総合的に概観し,効率的な画像検索法を求める計算病理学研究者のための簡潔な概要を提供する。 Pathology images of histopathology can be acquired from camera-mounted microscopes or whole slide scanners. Utilizing similarity calculations to match patients based on these images holds significant potential in research and clinical contexts. Recent advancements in search technologies allow for nuanced quantification of cellular structures across diverse tissue types, facilitating comparisons and enabling inferences about diagnosis, prognosis, and predictions for new patients when compared against a curated database of diagnosed and treated cases. In this paper, we comprehensively review the latest developments in image search technologies for histopathology, offering a concise overview tailored for computational pathology researchers seeking effective, fast and efficient image search methods in their work.	翻訳日:2024-01-18 18:42:18 公開日:2024-01-14
# 本当にデータが必要なのか? Do We Really Even Need Data? ( http://arxiv.org/abs/2401.08702v1 ) ライセンス: Link先を確認	Kentaro Hoffman, Stephen Salerno, Awan Afiaz, Jeffrey T. Leek, Tyler H. McCormick	(参考訳) 人工知能と機械学習ツールがよりアクセスしやすくなり、科学者はデータ収集に新たな障害に直面し(例えば、コストの上昇、サーベイ応答率の低下)、事前訓練されたアルゴリズムからの予測を結果変数として使うようになった。財政的な理由や物流的な理由には訴えるが、推論に標準的なツールを使用することで、真の観測できない結果が予測された値に置き換えられる場合、独立した変数と利害関係の関係を誤って表現することができる。本稿では,このいわゆる ‘post-prediction inference'' 問題に固有の統計的課題を特徴付け,3つの潜在的な誤り源を解明する。 (i)予測結果と真に観察できない結果の関係二トレーニングデータの再サンプリング又は不確実性に対する機械学習モデルの堅牢性、及び (iii)バイアスだけでなく、予測から究極の推論手順への不確実性も適切に伝播する。また,推定後推論の枠組みを,調査サンプリング,データ欠落,半教師付き学習など,いくつかの関連分野にまたがる古典的作業と比較した。この対比は、古典的および近代的な推論問題における設計の役割を解明する。 As artificial intelligence and machine learning tools become more accessible, and scientists face new obstacles to data collection (e.g. rising costs, declining survey response rates), researchers increasingly use predictions from pre-trained algorithms as outcome variables. Though appealing for financial and logistical reasons, using standard tools for inference can misrepresent the association between independent variables and the outcome of interest when the true, unobserved outcome is replaced by a predicted value. In this paper, we characterize the statistical challenges inherent to this so-called ``post-prediction inference'' problem and elucidate three potential sources of error: (i) the relationship between predicted outcomes and their true, unobserved counterparts, (ii) robustness of the machine learning model to resampling or uncertainty about the training data, and (iii) appropriately propagating not just bias but also uncertainty from predictions into the ultimate inference procedure. We also contrast the framework for post-prediction inference with classical work spanning several related fields, including survey sampling, missing data, and semi-supervised learning. This contrast elucidates the role of design in both classical and modern inference problems.	翻訳日:2024-01-18 18:26:37 公開日:2024-01-14
# ニューラルネットワークサロゲートを用いた肘型ドラフトチューブの計算効率の最適化 Computationally Efficient Optimisation of Elbow-Type Draft Tube Using Neural Network Surrogates ( http://arxiv.org/abs/2401.08700v1 ) ライセンス: Link先を確認	Ante Sikirica, Ivana Lu\v{c}in, Marta Alvir, Lado Kranj\v{c}evi\'c and Zoran \v{C}arija	(参考訳) 本研究の目的は,肘型ドラフトチューブの設計のための単一目的・多目的最適化アルゴリズムの総合評価と,計算効率のよい最適化ワークフローの導入である。提案したワークフローは、数値シミュレーションから得られたデータに基づいて訓練されたディープニューラルネットワークサロゲートを利用する。サーロゲートの使用により、新しいデザインをより柔軟かつ迅速に評価することができる。線形還元による成功履歴に基づく適応微分進化と分解に基づく多目的進化アルゴリズムは, 最適アルゴリズムとして同定され, 単一目的最適化における異なる目的の影響と, 多目的最適化におけるドラフトチューブ設計への影響を判定するために用いられた。単一目的アルゴリズムの結果は、目的が別々に考慮された場合の多目的アルゴリズムの結果と一致している。しかし、特に計算コストの低いサロゲートに対して、多目的アプローチは一般的に選択されるべきである。最適多目的結果を得るために, 圧力回復係数と抗力係数についてそれぞれ1.5%, 17%の改善を示した。予測値と数値結果との差は, 圧力回復係数が0.5%未満, ドラッグ係数が3%以下である。再生可能エネルギーの需要が増加を続ける中、特に世界的な持続可能性の取り組みにおいて、本研究で議論されているデータ駆動最適化ワークフローの関連性がますます重要になる。 This study aims to provide a comprehensive assessment of single-objective and multi-objective optimisation algorithms for the design of an elbow-type draft tube, as well as to introduce a computationally efficient optimisation workflow. The proposed workflow leverages deep neural network surrogates trained on data obtained from numerical simulations. The use of surrogates allows for a more flexible and faster evaluation of novel designs. The success history-based adaptive differential evolution with linear reduction and the multi-objective evolutionary algorithm based on decomposition were identified as the best-performing algorithms and used to determine the influence of different objectives in the single-objective optimisation and their combined impact on the draft tube design in the multi-objective optimisation. The results for the single-objective algorithm are consistent with those of the multi-objective algorithm when the objectives are considered separately. Multi-objective approach, however, should typically be chosen, especially for computationally inexpensive surrogates. A multi-criteria decision analysis method was used to obtain optimal multi-objective results, showing an improvement of 1.5% and 17% for the pressure recovery factor and drag coefficient, respectively. The difference between the predictions and the numerical results is less than 0.5% for the pressure recovery factor and 3% for the drag coefficient. As the demand for renewable energy continues to increase, the relevance of data-driven optimisation workflows, as discussed in this study, will become increasingly important, especially in the context of global sustainability efforts.	翻訳日:2024-01-18 18:26:18 公開日:2024-01-14
# GNNを用いた高レベル合成における階層的ソース・ツー・ルートQoR予測 Hierarchical Source-to-Post-Route QoR Prediction in High-Level Synthesis with GNNs ( http://arxiv.org/abs/2401.08696v1 ) ライセンス: Link先を確認	Mingzhe Gao, Jieru Zhao, Zhe Lin, Minyi Guo	(参考訳) 高レベル合成(HLS)は、RTLプログラミングを避けてハードウェア設計プロセスを高速化する。しかし,時間経過後の品質(QoR)を考慮した場合,HLSのターンアラウンド時間は有意に増加する。この問題に対処するため,FPGA HLS の階層的後 QoR 予測手法を提案する。(1) C/C++ プログラムから直接遅延と後資源使用量を推定するモデリングフロー,(2) ソースコードの制御とデータフローグラフと HLS プラグマの効果を効果的に表現するグラフ構築手法,(3) ループ階層の影響を捉えることができる階層的 GNN トレーニングと予測手法である。実験結果から,本手法は様々な種類のQoR指標に対して10%未満の予測誤差を示し,最先端のGNN手法と比較して大幅に改善された。提案手法を採用することにより,HLSにおける設計空間探索のランタイムは数十分短縮され,得られたADRSは平均6.91%に短縮される。 High-level synthesis (HLS) notably speeds up the hardware design process by avoiding RTL programming. However, the turnaround time of HLS increases significantly when post-route quality of results (QoR) are considered during optimization. To tackle this issue, we propose a hierarchical post-route QoR prediction approach for FPGA HLS, which features: (1) a modeling flow that directly estimates latency and post-route resource usage from C/C++ programs; (2) a graph construction method that effectively represents the control and data flow graph of source code and effects of HLS pragmas; and (3) a hierarchical GNN training and prediction method capable of capturing the impact of loop hierarchies. Experimental results show that our method presents a prediction error of less than 10% for different types of QoR metrics, which gains tremendous improvement compared with the state-of-the-art GNN methods. By adopting our proposed methodology, the runtime for design space exploration in HLS is shortened to tens of minutes and the achieved ADRS is reduced to 6.91% on average.	翻訳日:2024-01-18 18:25:54 公開日:2024-01-14
# 専門知識と解釈可能なデータ駆動インテリジェンスを統合した感染性角膜炎の協調診断 Enabling Collaborative Clinical Diagnosis of Infectious Keratitis by Integrating Expert Knowledge and Interpretable Data-driven Intelligence ( http://arxiv.org/abs/2401.08695v1 ) ライセンス: Link先を確認	Zhengqing Fang, Shuowen Zhou, Zhouhang Yuan, Yuxuan Si, Mengze Li, Jinxu Li, Yesheng Xu, Wenjia Xie, Kun Kuang, Yingming Li, Fei Wu, and Yu-Feng Yao	(参考訳) 医用画像診断におけるデータ駆動人工知能(AI)は、シリコで顕著な性能を示したが、解釈可能性の欠如により、臨床医のワークフローに「ブラックボックス」を組み込むことは困難である。臨床医がデータから学んだ診断パターンを理解するために,AIベースのバイオマーカーと同一の診断パターンを持つ検索事例を含む可視化推論プロセスを提供する,解釈可能なモデル,知識誘導診断モデル(KGDM)を開発した。臨床医のプロンプトを人間とaiの相互作用を通じて解釈する推論に取り入れ、安全性の向上とより正確な予測に繋がる可能性がある。本研究は角膜盲症の原因である感染性角膜炎(IK)の診断におけるKGDMの性能,解釈可能性,臨床的有用性について検討した。 KGDMの分類性能は、予測検証データセット、外部テストデータセット、公開テストデータセットで評価される。解釈AIベースのバイオマーカーの診断確率比(DOR)は3.011から35.233の範囲で有効であり、臨床経験と一貫した診断パターンを示す。さらに、人間とAIの協調診断テストを実施し、コラボレーションの参加者は、人間とAIの双方を上回るパフォーマンスを達成した。解釈可能性と相互作用を相乗的に統合することにより、臨床医の専門知識とデータ駆動インテリジェンスの統合を促進する。 aiベースのバイオマーカーによる経験の浅い眼科医の促進と、経験者からの介入によるai予測の増大は、経験豊富な医療従事者が制限され、aiの安全性が懸念される他の疾患への拡大の可能性を秘めているkgdmを用いた伝染性角膜炎に対する有望な診断パラダイムを示している。 Although data-driven artificial intelligence (AI) in medical image diagnosis has shown impressive performance in silico, the lack of interpretability makes it difficult to incorporate the "black box" into clinicians' workflows. To make the diagnostic patterns learned from data understandable by clinicians, we develop an interpretable model, knowledge-guided diagnosis model (KGDM), that provides a visualized reasoning process containing AI-based biomarkers and retrieved cases that with the same diagnostic patterns. It embraces clinicians' prompts into the interpreted reasoning through human-AI interaction, leading to potentially enhanced safety and more accurate predictions. This study investigates the performance, interpretability, and clinical utility of KGDM in the diagnosis of infectious keratitis (IK), which is the leading cause of corneal blindness. The classification performance of KGDM is evaluated on a prospective validation dataset, an external testing dataset, and an publicly available testing dataset. The diagnostic odds ratios (DOR) of the interpreted AI-based biomarkers are effective, ranging from 3.011 to 35.233 and exhibit consistent diagnostic patterns with clinic experience. Moreover, a human-AI collaborative diagnosis test is conducted and the participants with collaboration achieved a performance exceeding that of both humans and AI. By synergistically integrating interpretability and interaction, this study facilitates the convergence of clinicians' expertise and data-driven intelligence. The promotion of inexperienced ophthalmologists with the aid of AI-based biomarkers, as well as increased AI prediction by intervention from experienced ones, demonstrate a promising diagnostic paradigm for infectious keratitis using KGDM, which holds the potential for extension to other diseases where experienced medical practitioners are limited and the safety of AI is concerned.	翻訳日:2024-01-18 18:25:32 公開日:2024-01-14
# 弱教師付き関係抽出のための表現学習 Representation Learning for Weakly Supervised Relation Extraction ( http://arxiv.org/abs/2105.00815v2 ) ライセンス: Link先を確認	Zhuang Li	(参考訳) 近年,情報抽出やそのサブタスクであるリレーション抽出が急速に進展している。関係抽出は文中のエンティティ間の意味関係を検出することができる。現在、関係抽出タスクに多くの効率的なアプローチが適用されている。教師付き学習アプローチは特に優れたパフォーマンスを持つ。しかし、まだ多くの難しい課題がある。最も深刻な問題の1つは、手動ラベル付きデータを取得するのが難しいことである。ほとんどの場合、教師付きアプローチの限られたデータは、粗悪なパフォーマンスに等しい。そこで,本研究では,トレーニングデータに制限のある状況下では,教師なし事前学習による教師ありベースラインシステムの性能向上に注目する。機能(feature)は、教師付きアプローチを改善する上で重要なコンポーネントの1つです。伝統的なアプローチは通常手作りの特徴を適用し、専門知識と高価な人的労働を必要とする。しかし、この種の機能はデータのスパーシティに支障をきたす可能性がある。トレーニングセットのサイズが小さい場合、モデルパラメータは低い推定値になる可能性がある。本論文では,関係表現の構文・意味的パターンを多用した分散テキスト表現の特徴を学習するための,教師なし事前学習モデルを提案する。実験により, 従来の手作りの特徴と組み合わせることで, 関係抽出のためのロジスティック分類モデルの性能が向上することが実証された。 Recent years have seen rapid development in Information Extraction, as well as its subtask, Relation Extraction. Relation Extraction is able to detect semantic relations between entities in sentences. Currently, many efficient approaches have been applied to relation extraction tasks. Supervised learning approaches especially have good performance. However, there are still many difficult challenges. One of the most serious problems is that manually labeled data is difficult to acquire. In most cases, limited data for supervised approaches equals lousy performance. Thus here, under the situation with only limited training data, we focus on how to improve the performance of our supervised baseline system with unsupervised pre-training. Feature is one of the key components in improving the supervised approaches. Traditional approaches usually apply hand-crafted features, which require expert knowledge and expensive human labor. However, this type of feature might suffer from data sparsity: when the training set size is small, the model parameters might be poorly estimated. In this thesis, we present several novel unsupervised pre-training models to learn the distributed text representation features, which are encoded with rich syntactic-semantic patterns of relation expressions. The experiments have demonstrated that this type of feature, combine with the traditional hand-crafted features, could improve the performance of the logistic classification model for relation extraction, especially on the classification of relations with only minor training instances.	翻訳日:2024-01-18 04:17:56 公開日:2024-01-14
# 分散学習のためのバイアス圧縮について On Biased Compression for Distributed Learning ( http://arxiv.org/abs/2002.12410v4 ) ライセンス: Link先を確認	Aleksandr Beznosikov and Samuel Horv\'ath and Peter Richt\'arik and Mher Safaryan	(参考訳) 近年,分散学習におけるコミュニケーションのボトルネックを軽減するツールとして,様々なコミュニケーション圧縮技術が登場している。しかし、バイアス圧縮機は、より研究され理解されている非バイアス圧縮機と比較して、実際は優れた性能を示すことが多いが、それらについてはほとんど知られていない。本研究では, 偏差圧縮演算子の3つのクラスについて検討し, その2つのクラスは新しく, その性能は(確率的)勾配降下と分散(確率的)勾配降下に適用した。偏りのある圧縮機が単一ノードと分散設定の両方で線形収束率をもたらすことを初めて示す。 We prove that distributed compressed SGD method, employed with error feedback mechanism, enjoys the ergodic rate $O\left( \delta L \exp \left[-\frac{\mu K}{\delta L}\right] + \frac{(C + \delta D)}{K\mu}\right)$, where $\delta\ge 1$ is a compression parameter which grows when more compression is applied, $L$ and $\mu$ are the smoothness and strong convexity constants, $C$ captures stochastic gradient noise ($C=0$ if full gradients are computed on each node) and $D$ captures the variance of the gradients at the optimum ($D=0$ for over-parameterized models). さらに、通信勾配の合成的および経験的分布に関する理論的研究を通じて、なぜ、また、偏りのある圧縮機が偏りのない変種をどれだけ上回るかについて光を当てた。最後に, 理論的な保証と実用性能が期待できる新しいバイアス圧縮機を提案する。 In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show superior performance in practice when compared to the much more studied and understood unbiased compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We prove that distributed compressed SGD method, employed with error feedback mechanism, enjoys the ergodic rate $O\left( \delta L \exp \left[-\frac{\mu K}{\delta L}\right] + \frac{(C + \delta D)}{K\mu}\right)$, where $\delta\ge 1$ is a compression parameter which grows when more compression is applied, $L$ and $\mu$ are the smoothness and strong convexity constants, $C$ captures stochastic gradient noise ($C=0$ if full gradients are computed on each node) and $D$ captures the variance of the gradients at the optimum ($D=0$ for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose several new biased compressors with promising theoretical guarantees and practical performance.	翻訳日:2024-01-18 04:16:40 公開日:2024-01-14
# 時空間ダイナミクスのための多分解能偏微分方程式保存学習フレームワーク Multi-resolution partial differential equations preserved learning framework for spatiotemporal dynamics ( http://arxiv.org/abs/2205.03990v3 ) ライセンス: Link先を確認	Xin-Yang Liu and Min Zhu and Lu Lu and Hao Sun and Jian-Xun Wang	(参考訳) 従来のデータ駆動ディープラーニングモデルは、複雑な物理プロセスにおける高いトレーニングコスト、エラーの蓄積、そして不十分な一般化に苦しむことが多い。物理インフォームドディープラーニング(PiDL)は、物理原理をモデルに組み込むことによって、これらの課題に対処する。大半のPiDLは、制御方程式を損失関数に埋め込むことで正規化訓練にアプローチするが、これは損失項を測るために広範囲なハイパーパラメータチューニングに大きく依存する。そこで本研究では,偏微分方程式(pde)演算子とネットワーク構造との接続を通じて,離散化制御方程式をニューラルネットワークアーキテクチャに‘baking’することで,物理の事前知識を活用し,pde保存ニューラルネットワーク(ppnn)を実現することを提案する。マルチレゾリューション設定において畳み込み残差ネットワークを介して離散化されたpdesを埋め込み、従来のブラックボックスモデルに匹敵する一般化性と長期予測精度を大幅に向上させる。提案手法の有効性と有効性は, 反応拡散, バーガーズ, ナビエ・ストークス方程式など, 時空間PDEが支配する様々な時空間力学系で実証されている。 Traditional data-driven deep learning models often struggle with high training costs, error accumulation, and poor generalizability in complex physical processes. Physics-informed deep learning (PiDL) addresses these challenges by incorporating physical principles into the model. Most PiDL approaches regularize training by embedding governing equations into the loss function, yet this depends heavily on extensive hyperparameter tuning to weigh each loss term. To this end, we propose to leverage physics prior knowledge by ``baking'' the discretized governing equations into the neural network architecture via the connection between the partial differential equations (PDE) operators and network structures, resulting in a PDE-preserved neural network (PPNN). This method, embedding discretized PDEs through convolutional residual networks in a multi-resolution setting, largely improves the generalizability and long-term prediction accuracy, outperforming conventional black-box models. The effectiveness and merit of the proposed methods have been demonstrated across various spatiotemporal dynamical systems governed by spatiotemporal PDEs, including reaction-diffusion, Burgers', and Navier-Stokes equations.	翻訳日:2024-01-18 04:12:38 公開日:2024-01-14
# 局所的量子状態判別における局所的測定の不適合性 Incompatibility of local measurements provide advantage in local quantum state discrimination ( http://arxiv.org/abs/2204.10948v2 ) ライセンス: Link先を確認	Kornikar Sen, Saronath Halder, Ujjwal Sen	(参考訳) 不確実性原理は、可観測物の非互換性の概念を生じさせると考えられる。同時に測定できない量子測定のパックは、非互換な測定のセットを形成すると言われている。すべての非互換な測定値のセットは、アンサンブルから状態を準備して他の相手に送る量子状態識別タスクにおいて、対応するものよりも有利であり、後者は利用可能な測定値を使用して状態を検出する。大域的および局所的な量子状態の識別の比較は、「非局所的」現象をもたらすことが知られている。本研究では,局所量子状態識別と非互換量子測定の領域間の接続を密閉する。送信者が2部状態を作成し、2つの受信機にサブシステムを送信する局所量子状態識別タスクを考える。受信機は、ローカル不整合測定を用いて送信された状態を検出しようとする。不整合測定を用いて状態を推測する確率と、不整合測定を用いて状態を推測する最大確率の比率を解析した。この比は局所的な測定値の不適合性のロバストネスの単純な関数によって上限される。興味深いことに、すべての非互換な測定セットに対応して、この境界が達成できる少なくとも1つの局所状態判別タスクが存在する。最適局所量子状態判別タスクは、グローバルおよびローカルな状態判別において、不整合性および整合性のある測定による検出を成功させる確率の比の差という意味で、この用語が使われる「非局所性」を含まないことを論じる。結果は、タスクを区別するマルチパーティの局所量子状態の体系に一般化することができる。 The uncertainty principle may be considered as giving rise to the notion of incompatibility of observables. A pack of quantum measurements that cannot be measured simultaneously is said to form a set of incompatible measurements. Every set of incompatible measurements has an advantage over the compatible ones in a quantum state discrimination task where one prepares a state from an ensemble and sends it to another party, and the latter tries to detect the state using available measurements. Comparison between global and local quantum state discriminations is known to lead to a phenomenon of "nonlocality". In this work, we seal a connection between the domains of local quantum state discrimination and incompatible quantum measurements. We consider the local quantum state discrimination task where a sender prepares a bipartite state and sends the subsystems to two receivers. The receivers try to detect the sent state using locally incompatible measurements. We analyze the ratio of the probability of successfully guessing the state using incompatible measurements and the maximum probability of successfully guessing the state using compatible measurements. We find that this ratio is upper bounded by a simple function of robustnesses of incompatibilities of the local measurements. Interestingly, corresponding to every pair of sets of incompatible measurements, there exists at least one local state discrimination task where this bound can be achieved. We argue that the optimal local quantum state discrimination task does not present any "nonlocality", where the term is used in the sense of a difference between the ratios, of probabilities of successful detection via incompatible and compatible measurements, in global and local state discriminations. The results can be generalized to the regime of multipartite local quantum state distinguishing tasks.	翻訳日:2024-01-18 04:12:00 公開日:2024-01-14
# TeleGraph:階層的リンク予測のためのベンチマークデータセット TeleGraph: A Benchmark Dataset for Hierarchical Link Prediction ( http://arxiv.org/abs/2204.07703v2 ) ライセンス: Link先を確認	Min Zhou, Bisheng Li, Menglin Yang, Lujia Pan	(参考訳) リンク予測は、ネットワーク構造データにとって重要な問題であり、その多様な応用のためにかなりの研究努力を惹きつける。現在のリンク予測手法は一般的なネットワークにフォーカスしており、ネットワークの閉じた三角形構造かノード属性のいずれかに依存する。スパースネットワークや高度階層ネットワークでのそれらの性能はよく研究されていない。一方、利用可能なツリーライクなベンチマークデータセットは、シミュレートされるか、ノード情報が少ないか、あるいは小規模である。このギャップを埋めるために、リンク推論技術の評価と育成のために、リッチノード属性に関連付けられた高度にスパースで階層的な通信ネットワークであるTeleGraphを提案する。実験結果から,ほとんどのアルゴリズムは,ほぼ木のようなデータセット上で十分な性能を得られず,リンク予測アルゴリズムの設計やデプロイには特に注意が必要であることが示唆された。 Link prediction is a key problem for network-structured data, attracting considerable research efforts owing to its diverse applications. The current link prediction methods focus on general networks and are overly dependent on either the closed triangular structure of networks or node attributes. Their performance on sparse or highly hierarchical networks has not been well studied. On the other hand, the available tree-like benchmark datasets are either simulated, with limited node information, or small in scale. To bridge this gap, we present a new benchmark dataset TeleGraph, a highly sparse and hierarchical telecommunication network associated with rich node attributes, for assessing and fostering the link inference techniques. Our empirical results suggest that most of the algorithms fail to produce a satisfactory performance on a nearly tree-like dataset, which calls for special attention when designing or deploying the link prediction algorithm in practice.	翻訳日:2024-01-18 04:11:33 公開日:2024-01-14
# ランダムリシャッフルSARAHは完全な勾配計算を必要としない Random-reshuffled SARAH does not need a full gradient computations ( http://arxiv.org/abs/2111.13322v2 ) ライセンス: Link先を確認	Aleksandr Beznosikov and Martin Tak\'a\v{c}	(参考訳) 確率的再帰的勾配アルゴリズム(英: stochastic recursive gradient algorithm, sarah)は、確率的勾配降下(sgd)アルゴリズムの分散還元変種であり、時折目的関数の勾配を必要とする。本稿では,完全な勾配計算の必要性を除去する。これはランダムな再シャッフル戦略を使い、各エポックで得られる確率的勾配を集約することで達成される。集計された確率勾配はサラアルゴリズムの完全な勾配の推定に役立っている。本稿では,提案手法の理論的解析を行い,本手法の効率性を示す数値実験で論文をまとめる。 The StochAstic Recursive grAdient algoritHm (SARAH) algorithm is a variance reduced variant of the Stochastic Gradient Descent (SGD) algorithm that needs a gradient of the objective function from time to time. In this paper, we remove the necessity of a full gradient computation. This is achieved by using a randomized reshuffling strategy and aggregating stochastic gradients obtained in each epoch. The aggregated stochastic gradients serve as an estimate of a full gradient in the SARAH algorithm. We provide a theoretical analysis of the proposed approach and conclude the paper with numerical experiments that demonstrate the efficiency of this approach.	翻訳日:2024-01-18 04:08:27 公開日:2024-01-14
# Googleは、画像ベースのGoogleトレンドで新しいファッション製品の売上をマルチモーダル予測 Well Googled is Half Done: Multimodal Forecasting of New Fashion Product Sales with Image-based Google Trends ( http://arxiv.org/abs/2109.09824v6 ) ライセンス: Link先を確認	Geri Skenderi, Christian Joppi, Matteo Denitto, Marco Cristani	(参考訳) 新しいファッション製品の販売予測は、多くのビジネスダイナミクスを伴う困難な問題であり、古典的な予測アプローチでは解決できない。本稿では,過去のデータがないにもかかわらず,販売を効果的に予測するために,Google Trendsの時系列形式で外因性知識を体系的に探索し,それを新しいファッションアイテムに関連するマルチモーダル情報と組み合わせることの有効性を検討する。特に、エンコーダが外因性時系列の表現を学習し、デコーダがGoogle Trendsエンコーディングと利用可能なビジュアルおよびメタデータ情報に基づいて販売を予測するニューラルネットワークベースのアプローチを提案する。我々のモデルは非自己回帰的に機能し、大きな第1ステップエラーの複合効果を避ける。第2のコントリビューションとして,イタリアのファストファッション企業であるNunalieから2016～2019年の間に販売された,5577のリアルな新製品のマルチモーダル情報を含む,新しいファッション製品販売予測タスク用の公開データセットであるVISUELLEを紹介する。データセットには、製品、メタデータ、関連する販売、関連するGoogle Trendsの画像が備わっている。 visuelleを使って最先端の代替品やいくつかのベースラインと比較し、当社のニューラルネットワークベースのアプローチがパーセンテージと絶対誤差の両方の観点から最も正確であることを示しました。外部知識の追加は、重み付き絶対パーセンテージ誤差(wape)の観点から予測精度を1.5%向上させ、情報的外部情報の利用の重要性を明らかにした。コードとデータセットはhttps://github.com/HumaticsLAB/GTM-Transformer.comで公開されている。 New fashion product sales forecasting is a challenging problem that involves many business dynamics and cannot be solved by classical forecasting approaches. In this paper, we investigate the effectiveness of systematically probing exogenous knowledge in the form of Google Trends time series and combining it with multi-modal information related to a brand-new fashion item, in order to effectively forecast its sales despite the lack of past data. In particular, we propose a neural network-based approach, where an encoder learns a representation of the exogenous time series, while the decoder forecasts the sales based on the Google Trends encoding and the available visual and metadata information. Our model works in a non-autoregressive manner, avoiding the compounding effect of large first-step errors. As a second contribution, we present VISUELLE, a publicly available dataset for the task of new fashion product sales forecasting, containing multimodal information for 5577 real, new products sold between 2016-2019 from Nunalie, an Italian fast-fashion company. The dataset is equipped with images of products, metadata, related sales, and associated Google Trends. We use VISUELLE to compare our approach against state-of-the-art alternatives and several baselines, showing that our neural network-based approach is the most accurate in terms of both percentage and absolute error. It is worth noting that the addition of exogenous knowledge boosts the forecasting accuracy by 1.5% in terms of Weighted Absolute Percentage Error (WAPE), revealing the importance of exploiting informative external information. The code and dataset are both available at https://github.com/HumaticsLAB/GTM-Transformer.	翻訳日:2024-01-18 04:07:30 公開日:2024-01-14
# Lirot.ai: クラウドソーシング型網膜画像セグメンテーションのための新しいプラットフォーム Lirot.ai: A Novel Platform for Crowd-Sourcing Retinal Image Segmentations ( http://arxiv.org/abs/2208.10100v2 ) ライセンス: Link先を確認	Jonathan Fhima, Jan Van Eijgen, Moti Freiman, Ingeborg Stalmans and Joachim A. Behar	(参考訳) 導入: 教師付きディープラーニング(DL)タスクには、大きな注釈付きデータセットが必要である。医学データサイエンスにおいて、dlモデルを開発するための大きな制限の1つは、大量の注釈付き例がないことである。これは、アノテートに必要な時間と専門知識によることが多い。 lirot を紹介します。イメージセグメンテーションの促進とクラウドソーシングのための、新しいプラットフォームだ。方法:リロット。 iPadOSクライアントアプリケーションであるLirot.aiは3つのコンポーネントで構成されている。 lirot.ai-app、lirotというバックエンドサーバ。 Lirot.ai-serverとpython APIの名前。 aiAPI。リロット i-appはSwift 5.6とLirotで開発された。 ai-serverはfirebaseバックエンドです。リロット ai-APIはデータベースの管理を可能にする。リロット i-appは必要なだけ多くのiPadOSデバイスにインストールでき、アノテータは同時にリモートでセグメンテーションを実行することができる。私たちはapple pencilの互換性を取り入れ、セグメンテーションを他のコンピュータベースの代替品よりも高速で、より正確で、専門家にとって直感的なものにしています。結果: Lirotの使用例を示す。参照血管分割を用いた網膜底部データセットの作成のためのai。議論と今後の作業:我々は、アノテートされる画像を選択し、アノテータに配布するより効率的なプロセスを含むことによって、網膜底部データセットの拡大を継続するために、アクティブラーニング戦略を使用する。 Introduction: For supervised deep learning (DL) tasks, researchers need a large annotated dataset. In medical data science, one of the major limitations to develop DL models is the lack of annotated examples in large quantity. This is most often due to the time and expertise required to annotate. We introduce Lirot. ai, a novel platform for facilitating and crowd-sourcing image segmentations. Methods: Lirot. ai is composed of three components; an iPadOS client application named Lirot. ai-app, a backend server named Lirot. ai-server and a python API name Lirot. ai-API. Lirot. ai-app was developed in Swift 5.6 and Lirot. ai-server is a firebase backend. Lirot. ai-API allows the management of the database. Lirot. ai-app can be installed on as many iPadOS devices as needed so that annotators may be able to perform their segmentation simultaneously and remotely. We incorporate Apple Pencil compatibility, making the segmentation faster, more accurate, and more intuitive for the expert than any other computer-based alternative. Results: We demonstrate the usage of Lirot. ai for the creation of a retinal fundus dataset with reference vasculature segmentations. Discussion and future work: We will use active learning strategies to continue enlarging our retinal fundus dataset by including a more efficient process to select the images to be annotated and distribute them to annotators.	翻訳日:2024-01-18 04:01:03 公開日:2024-01-14
# 深部NLPモデルにおけるサルエントニューロンの発見 Discovering Salient Neurons in Deep NLP Models ( http://arxiv.org/abs/2206.13288v2 ) ライセンス: Link先を確認	Nadir Durrani and Fahim Dalvi and Hassan Sajjad	(参考訳) 深部NLPモデルで学んだ表現や、どの知識を捉えるかを理解するために多くの研究がなされてきたが、個々のニューロンにはほとんど注意が払われていない。言語相関分析(英語版)と呼ばれる手法により、モデル内の有意な神経細胞を、いかなる外部特性に関しても抽出し、その知識がニューロン内でどのように保存されているかを理解することを目的としている。以下の質問に答えるために、きめ細かい分析を行う。 (i)特定の言語特性を捉えたネットワーク内のニューロンのサブセットを特定できるか? (ii)ネットワークにまたがる局所化ニューロンや分散ニューロンはどの程度存在するか? iii)情報がどれだけ冗長に保存されているか。 iv)学習した言語知識が下流のnlpタスクにどのように影響するか? 四異なる言語特性の学習において、建築はどのように変化するか。我々のデータ駆動量分析は興味深い発見を照らす。 (i)異なる言語課題を予測できるニューロンの小さなサブセットを発見した。二下位の層に局在する基本的な語彙情報(接尾辞等)を捉えたニューロン三複雑な概念(統語的役割など)を学ぶ者は、主に中層及び上層に置かれる。三ネットワークがタスク特定情報のために上位層を保存するため、転送学習中に、高度層から下位層に言語ニューロンを移動させること。 iv)言語情報がどのように保存されているかに関して,事前学習したモデル間で興味深い違いを見出した。 v) 概念は多言語トランスフォーマーモデルにおいて, 異なる言語にまたがる類似のニューロン分布を示すことがわかった。私たちのコードはneurox toolkitの一部として公開されています。 While a lot of work has been done in understanding representations learned within deep NLP models and what knowledge they capture, little attention has been paid towards individual neurons. We present a technique called as Linguistic Correlation Analysis to extract salient neurons in the model, with respect to any extrinsic property - with the goal of understanding how such a knowledge is preserved within neurons. We carry out a fine-grained analysis to answer the following questions: (i) can we identify subsets of neurons in the network that capture specific linguistic properties? (ii) how localized or distributed neurons are across the network? iii) how redundantly is the information preserved? iv) how fine-tuning pre-trained models towards downstream NLP tasks, impacts the learned linguistic knowledge? iv) how do architectures vary in learning different linguistic properties? Our data-driven, quantitative analysis illuminates interesting findings: (i) we found small subsets of neurons that can predict different linguistic tasks, ii) with neurons capturing basic lexical information (such as suffixation) localized in lower most layers, iii) while those learning complex concepts (such as syntactic role) predominantly in middle and higher layers, iii) that salient linguistic neurons are relocated from higher to lower layers during transfer learning, as the network preserve the higher layers for task specific information, iv) we found interesting differences across pre-trained models, with respect to how linguistic information is preserved within, and v) we found that concept exhibit similar neuron distribution across different languages in the multilingual transformer models. Our code is publicly available as part of the NeuroX toolkit.	翻訳日:2024-01-18 03:59:01 公開日:2024-01-14
# 事前条件付き更新による確率勾配法 Stochastic Gradient Methods with Preconditioned Updates ( http://arxiv.org/abs/2206.00285v2 ) ライセンス: Link先を確認	Abdurakhmon Sadiev, Aleksandr Beznosikov, Abdulla Jasem Almansoori, Dmitry Kamzolov, Rachael Tappenden, Martin Tak\'a\v{c}	(参考訳) 本研究は非凸有限和最小化問題を考える。このような問題に対するアルゴリズムはいくつか存在するが、既存の手法は、問題がひどくスケールしたり、不調になったりした場合にうまく動作しないことが多い。したがって、Hutchinsonによるヘッセン対角線近似のアプローチに基づく事前条件を記述し、新しいスケールアルゴリズム(Scaled SARAHとScaled L-SVRG)を提供する勾配法と組み合わせる。理論的複雑性は滑らかさの仮定の下で保証される。滑らかさとPL条件の両方を仮定すると線形収束が証明される。適応的拡大手法では, 近似的な部分的な2次曲率情報を用い, 問題の影響を軽減できる。この改良された実用性能は,本研究で示された数値実験で実証された。 This work considers the non-convex finite sum minimization problem. There are several algorithms for such problems, but existing methods often work poorly when the problem is badly scaled and/or ill-conditioned, and a primary goal of this work is to introduce methods that alleviate this issue. Thus, here we include a preconditioner based on Hutchinson's approach to approximating the diagonal of the Hessian, and couple it with several gradient-based methods to give new scaled algorithms: Scaled SARAH and Scaled L-SVRG. Theoretical complexity guarantees under smoothness assumptions are presented. We prove linear convergence when both smoothness and the PL condition are assumed. Our adaptively scaled methods use approximate partial second-order curvature information and, therefore, can better mitigate the impact of badly scaled problems. This improved practical performance is demonstrated in the numerical experiments also presented in this work.	翻訳日:2024-01-18 03:56:51 公開日:2024-01-14
# スペクトルクラスタリングのためのLeave-one-out Singular Subspace Perturbation解析 Leave-one-out Singular Subspace Perturbation Analysis for Spectral Clustering ( http://arxiv.org/abs/2205.14855v2 ) ライセンス: Link先を確認	Anderson Y. Zhang, Harrison H. Zhou	(参考訳) 特異部分空間摂動理論は確率と統計において基本的な重要性を持つ。様々な分野にまたがる様々な応用がある。 2つの任意の行列を考えると、一方は他方の左1カラムアウト部分行列であり、2つの対応する特異部分空間の間の距離に対する新しい摂動上界を確立する。これは混合モデルによく適合しており、ウェディンの定理のような古典摂動境界よりも鋭く細かい統計解析ができる。この残余1次摂動理論により、混合モデル下でのスペクトルクラスタリングの性能に関する決定論的帰納的分析を行う。本解析は,サブガウス混合モデルのスペクトルクラスタリングに対する明示的な指数的誤差率をもたらす。等方性ガウスの混合物の場合、この速度はl{\"o}ffler et al. (2021)よりも弱い信号対雑音条件下で最適である。 The singular subspaces perturbation theory is of fundamental importance in probability and statistics. It has various applications across different fields. We consider two arbitrary matrices where one is a leave-one-column-out submatrix of the other one and establish a novel perturbation upper bound for the distance between the two corresponding singular subspaces. It is well-suited for mixture models and results in a sharper and finer statistical analysis than classical perturbation bounds such as Wedin's Theorem. Empowered by this leave-one-out perturbation theory, we provide a deterministic entrywise analysis for the performance of spectral clustering under mixture models. Our analysis leads to an explicit exponential error rate for spectral clustering of sub-Gaussian mixture models. For the mixture of isotropic Gaussians, the rate is optimal under a weaker signal-to-noise condition than that of L{\"o}ffler et al. (2021).	翻訳日:2024-01-18 03:56:36 公開日:2024-01-14
# Impartial Games:強化学習への挑戦 Impartial Games: A Challenge for Reinforcement Learning ( http://arxiv.org/abs/2205.12787v4 ) ライセンス: Link先を確認	Bei Zhou and S{\o}ren Riis	(参考訳) 本稿では,AlphaZero-style reinforcement learning (RL)アルゴリズムが様々なボードゲームで優れている一方で,プレイヤーが駒を共有する公平なゲームでは課題に直面していることを示す。我々は、alphazero型および類似の自己遊び強化学習アルゴリズムの崩壊ブロックであるように見えるゲーム、すなわちnimの子供向けゲームおよびその他の不公平なゲームの具体例を示す。我々の研究は、ニューラルネットワークがパリティ関数を学習する能力に関するデータ分散の複雑さによって引き起こされる課題に基づいており、ノイズラベルの問題によって悪化している。最近の研究では、alphazeroスタイルのアルゴリズムが敵対的攻撃や敵対的摂動に対して脆弱であることを示しており、すべての合法状態においてゲームを習得する学習の難しさを示している。 Nimは小さなボード上で学習できるが、AlphaZeroスタイルのアルゴリズムの学習の進歩は、ボードのサイズが大きくなると劇的に遅くなる。直感的には、Nim のような公平なゲームと Chess や Go のようなパルチザン的なゲームの違いは、ボードの小さな部分が公平なゲームでカバーされている場合、ある空白位置の可視的な部分とその正しい評価との相関がしばしばゼロであるので、その位置が勝つか失われるかを予測できないという事実によって説明できる。この状況は、部分的に空白されたボード位置が典型的には、完全な未発見位置の値に関する多量または少なくともノントリフト情報を提供するパルチザンゲームとは対照的である。 While AlphaZero-style reinforcement learning (RL) algorithms excel in various board games, in this paper we show that they face challenges on impartial games where players share pieces. We present a concrete example of a game - namely the children's game of Nim - and other impartial games that seem to be a stumbling block for AlphaZero-style and similar self-play reinforcement learning algorithms. Our work is built on the challenges posed by the intricacies of data distribution on the ability of neural networks to learn parity functions, exacerbated by the noisy labels issue. Our findings are consistent with recent studies showing that AlphaZero-style algorithms are vulnerable to adversarial attacks and adversarial perturbations, showing the difficulty of learning to master the games in all legal states. We show that Nim can be learned on small boards, but the learning progress of AlphaZero-style algorithms dramatically slows down when the board size increases. Intuitively, the difference between impartial games like Nim and partisan games like Chess and Go can be explained by the fact that if a small part of the board is covered for impartial games it is typically not possible to predict whether the position is won or lost as there is often zero correlation between the visible part of a partly blanked-out position and its correct evaluation. This situation starkly contrasts partisan games where a partly blanked-out board position typically provides abundant or at least non-trifle information about the value of the fully uncovered position.	翻訳日:2024-01-18 03:56:04 公開日:2024-01-14
# RALACs:インタラクションエンコーディングと光フローを用いた自動運転車の行動認識 RALACs: Action Recognition in Autonomous Vehicles using Interaction Encoding and Optical Flow ( http://arxiv.org/abs/2209.14408v3 ) ライセンス: Link先を確認	Eddy Zhou, Alex Zhuang, Alikasim Budhwani, Owen Leather, Rowan Dempster, Quanquan Li, Mohammad Al-Sharman, Derek Rayside, and William Melek	(参考訳) 自律走行車(AV)設定に適用すると、行動認識は環境モデルの状況認識を高めることができる。これは特に、avsの伝統的な幾何学的記述やヒューリスティックが不十分なシナリオで一般的である。しかしながら、伝統的に人間の行動認識は研究されてきたが、ノイズに富んだ、無修正の生のRGBデータへの適応性には限界がある。行動認識のAVへの進歩と導入を促進するために,新たな2段階の行動認識システムであるRALACを提案する。 RALACは、道路シーンにおける行動認識の問題を定式化し、それと人間の行動認識の確立した分野とのギャップを埋める。本研究は,エージェント間の関係をエンコードするために注目層がいかに有用かを示し,そのようなスキームがクラスに依存しないかを強調した。さらに、道路上のエージェントの動的性質に対処するため、ralACsは、下流行動分類のためのエージェントトラックへの関心領域アライメント(ROI)適応のための新しいアプローチを構築している。最後に,本手法では,アクティブエージェント検出の問題点も考慮し,道路シーンにおける関連エージェントの識別に光フローマップを融合する新たな応用法を提案する。提案手法はICCV2021ロードチャレンジデータセットのベースラインを上回り,実際の車両プラットフォームに展開することにより,意思決定における行動認識の有用性に関する予備的な知見を提供する。 When applied to autonomous vehicle (AV) settings, action recognition can enhance an environment model's situational awareness. This is especially prevalent in scenarios where traditional geometric descriptions and heuristics in AVs are insufficient. However, action recognition has traditionally been studied for humans, and its limited adaptability to noisy, un-clipped, un-pampered, raw RGB data has limited its application in other fields. To push for the advancement and adoption of action recognition into AVs, this work proposes a novel two-stage action recognition system, termed RALACs. RALACs formulates the problem of action recognition for road scenes, and bridges the gap between it and the established field of human action recognition. This work shows how attention layers can be useful for encoding the relations across agents, and stresses how such a scheme can be class-agnostic. Furthermore, to address the dynamic nature of agents on the road, RALACs constructs a novel approach to adapting Region of Interest (ROI) Alignment to agent tracks for downstream action classification. Finally, our scheme also considers the problem of active agent detection, and utilizes a novel application of fusing optical flow maps to discern relevant agents in a road scene. We show that our proposed scheme can outperform the baseline on the ICCV2021 Road Challenge dataset and by deploying it on a real vehicle platform, we provide preliminary insight to the usefulness of action recognition in decision making.	翻訳日:2024-01-18 03:47:34 公開日:2024-01-14
# MIXRTs:繰り返しソフト決定木を混合した多エージェント強化学習に向けて MIXRTs: Toward Interpretable Multi-Agent Reinforcement Learning via Mixing Recurrent Soft Decision Trees ( http://arxiv.org/abs/2209.07225v3 ) ライセンス: Link先を確認	Zichuan Liu, Yuanyang Zhu, Zhi Wang, Yang Gao, Chunlin Chen	(参考訳) さまざまな分野で大きな成功を収めている一方で、既存のマルチエージェント強化学習(MARL)とブラックボックスニューラルネットワークアーキテクチャは、学習知識の理解や入力観察が意思決定にどのように影響するかを人によって妨げる不透明な方法で決定を行う。代わりに、伝統的な線形モデルや決定木のような既存の解釈可能なアプローチは通常、弱い表現力と低い精度に悩まされる。ミキシング・リカレント・ソフト・決定木(MIXRTs)は,この性能と解釈可能性の明確な二分法に対処するため,各エージェントのチームへの貢献を反映し,ルート・ツー・リーフ・パスを通じて明確な決定プロセスを表現することができる新しい解釈可能なアーキテクチャである。具体的には、リカレントニューラルネットワークの進歩を利用して、部分観測可能性に対処する新しいソフト決定木を構築し、ツリーベースモデルによる意思決定プロセスに影響を与える特徴を実証する。そして,その値分解フレームワークに基づいて,各エージェントに対して,各アクション値を明示的に混合し,局所的な観察のみを用いて共同行動値を推定することにより,各エージェントに対する信頼度を線形に割り当てる。理論的解析により、MIXRTsは結合作用値の分解における付加性と単調性に関する構造的制約を保証していることが示された。課題であるSpreadとStarCraft IIタスクの評価から、MIXRTは広く研究されている手法と比較して競争性能を達成し、意思決定プロセスのより直接的な説明を提供する。我々は,MARLの新しい解釈可能なパラダイムに光を当てる可能性があり,高い性能と解釈可能性を持った学習アルゴリズム開発に向けた有望な道を探る。 While achieving tremendous success in various fields, existing multi-agent reinforcement learning (MARL) with a black-box neural network architecture makes decisions in an opaque manner that hinders humans from understanding the learned knowledge and how input observations influence decisions. Instead, existing interpretable approaches, such as traditional linear models and decision trees, usually suffer from weak expressivity and low accuracy. To address this apparent dichotomy between performance and interpretability, our solution, MIXing Recurrent soft decision Trees (MIXRTs), is a novel interpretable architecture that can represent explicit decision processes via the root-to-leaf path and reflect each agent's contribution to the team. Specifically, we construct a novel soft decision tree to address partial observability by leveraging the advances in recurrent neural networks, and demonstrate which features influence the decision-making process through the tree-based model. Then, based on the value decomposition framework, we linearly assign credit to each agent by explicitly mixing individual action values to estimate the joint action value using only local observations, providing new insights into how agents cooperate to accomplish the task. Theoretical analysis shows that MIXRTs guarantees the structural constraint on additivity and monotonicity in the factorization of joint action values. Evaluations on the challenging Spread and StarCraft II tasks show that MIXRTs achieves competitive performance compared to widely investigated methods and delivers more straightforward explanations of the decision processes. We explore a promising path toward developing learning algorithms with both high performance and interpretability, potentially shedding light on new interpretable paradigms for MARL.	翻訳日:2024-01-18 03:46:13 公開日:2024-01-14
# Mask Focal Loss: 正準物体検出ネットワークによる密集群カウントのための統一フレームワーク Mask Focal Loss: A unifying framework for dense crowd counting with canonical object detection networks ( http://arxiv.org/abs/2212.11542v3 ) ライセンス: Link先を確認	Xiaopin Zhong, Guankun Wang, Weixiang Liu, Zongze Wu, Yuanlong Deng	(参考訳) 基本的なコンピュータビジョンタスクとして、群衆のカウントは公共の安全において重要な役割を果たす。現在、深層学習に基づく頭部検出は、群集カウントの有望な方法である。しかし,(1)既存の損失関数が高濃度で複雑な場面でサンプルの不均衡に対処できないこと,(2)標準物体検出器が損失計算における空間的一貫性を欠くこと,(2)物体の位置と背景領域の関係を無視すること,(3)頭部検出データセットのほとんどは,境界ボックスのない中心点にのみ注釈付けされていること,の3つの理由から,この問題によく適用できない。これらの問題を克服するために,ガウス核を用いたヒートマップに基づく新しいマスク焦点損失(mfl)を提案する。 MFLは、ヒートマップとバイナリフィーチャーマップの両方の真実に基づく損失関数の統一フレームワークを提供する。さらに、総合アノテーションを用いた合成データセットであるGTA_Headを導入し、評価と比較を行った。広範な実験結果から,様々な検出器とデータセットにおけるmflの性能が向上し,maeとrmseはそれぞれ47.03%,61.99%減少した。そこで本研究は,密度推定に基づく群集数法を推し進めるための強力な基盤を提供する。 As a fundamental computer vision task, crowd counting plays an important role in public safety. Currently, deep learning based head detection is a promising method for crowd counting. However, the highly concerned object detection networks cannot be well applied to this problem for three reasons: (1) Existing loss functions fail to address sample imbalance in highly dense and complex scenes; (2) Canonical object detectors lack spatial coherence in loss calculation, disregarding the relationship between object location and background region; (3) Most of the head detection datasets are only annotated with the center points, i.e. without bounding boxes. To overcome these issues, we propose a novel Mask Focal Loss (MFL) based on heatmap via the Gaussian kernel. MFL provides a unifying framework for the loss functions based on both heatmap and binary feature map ground truths. Additionally, we introduce GTA_Head, a synthetic dataset with comprehensive annotations, for evaluation and comparison. Extensive experimental results demonstrate the superior performance of our MFL across various detectors and datasets, and it can reduce MAE and RMSE by up to 47.03% and 61.99%, respectively. Therefore, our work presents a strong foundation for advancing crowd counting methods based on density estimation.	翻訳日:2024-01-18 03:35:07 公開日:2024-01-14
# deepspeed data efficiency: 効率的なデータサンプリングとルーティングによるディープラーニングモデルの品質とトレーニング効率の向上 DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing ( http://arxiv.org/abs/2212.03597v3 ) ライセンス: Link先を確認	Conglong Li, Zhewei Yao, Xiaoxia Wu, Minjia Zhang, Connor Holmes, Cheng Li, Yuxiong He	(参考訳) ディープラーニングモデルの最近の進歩は、厳しいトレーニングコストを犠牲にしている。モデルサイズの増加は根本原因の1つだが、もう1つの強調されていない事実は、実際にデータスケールはモデルスケールと同じ速度で増加しており、トレーニングコストは両者に比例していることだ。急速に進化するモデルアーキテクチャと比較して、トレーニングデータ(特に高価なファンデーションモデル事前トレーニングのために)を効率的に利用する方法は、データ効率機能に重点を置く便利なフレームワークが欠如しているため、調査も困難である。この目的のために、データをよりよく活用し、トレーニング効率を高め、モデル品質を向上させるフレームワークであるDeepSpeed Data efficiencyを紹介します。具体的には,一般的なカリキュラム学習ライブラリを用いた効率的なデータサンプリング手法と,新しいランダム・レイヤワイズ・トークン・ドロップ手法による効率的なデータルーティング手法を提案する。 GPT-3 1.3B言語モデルの事前トレーニングでは、当社の作業は12.5倍少ないデータ/時間/コスト(Azureでレンタルすれば3.7K)を実現しています。 GPT-3 1.3B と BERT-large の事前トレーニングでは、データ/時間/コストの最大2倍のコストで同じモデル品質を達成できます。 DeepSpeed Data efficiency は使いやすく、チューニングも容易で、GPT-3 MoE モデル事前トレーニングや小型 GPT-2/ViT ファインタニングなどのタスクに簡単に適用でき、そのメリットを検証できる。 Recent advances on deep learning models come at the price of formidable training cost. The increasing model size is one of the root causes, but another less-emphasized fact is that data scale is actually increasing at a similar speed as model scale, and the training cost is proportional to both of them. Compared to the rapidly evolving model architecture, how to efficiently use the training data (especially for the expensive foundation model pretraining) is both less explored and difficult to realize due to the lack of a convenient framework that focuses on data efficiency capabilities. To this end, we present DeepSpeed Data Efficiency, a framework that makes better use of data, increases training efficiency, and improves model quality. Specifically, we propose and combine two data efficiency techniques: efficient data sampling via a general curriculum learning library, and efficient data routing via a novel random layerwise token dropping technique. For GPT-3 1.3B language model pretraining, our work achieves 12.5x less data/time/cost (\$3.7K if rent on Azure), while still maintaining 95% of model quality compared to baseline with full data and cost (\$46.3K). For GPT-3 1.3B and BERT-large pretraining, our work can also achieve the same model quality with up to 2x less data/time/cost, or achieve better model quality under same data/time/cost. DeepSpeed Data Efficiency is easy to use and tune, enabling us to easily apply it and verify its benefit on additional tasks including GPT-3 MoE model pretraining and small-scale GPT-2/ViT finetuning.	翻訳日:2024-01-18 03:34:26 公開日:2024-01-14
# NLPにおける望ましくないバイアス:測定の課題 Undesirable Biases in NLP: Addressing Challenges of Measurement ( http://arxiv.org/abs/2211.13709v4 ) ライセンス: Link先を確認	Oskar van der Wal, Dominik Bachmann, Alina Leidinger, Leendert van Maanen, Willem Zuidema, Katrin Schulz	(参考訳) 大規模言語モデルと自然言語処理(NLP)技術が急速に発展し、日々の生活に広まっていくにつれ、それらの利用が人々に与える影響を予想することが重要となる。近年、多くの注目を集めている問題の一つは、この技術が有害なバイアスを示しており、デロギ的ステレオタイプの生成から、異なる社会集団で異なる結果を生み出すまでである。これらのバイアスの評価と緩和に多くの労力が費やされてきたが、nlpモデルのバイアスを測定する方法には深刻な問題がある。本稿では,NLPモデルバイアスの問題を,直接観測できないバイアスのような概念の測定に特化している心理測定のレンズを用いて議論するための学際的アプローチを提案する。特に,心理計測から測定ツールの構成妥当性と信頼性の2つの中心的な概念を考察し,モデルバイアス測定の文脈でどのように適用できるかについて議論する。我々のゴールは、NLP実践者により良いバイアス測定を設計するための方法論的なツールを提供することであり、バイアス測定ツールの開発において、より一般的にサイコメトリックからツールを探索することである。 As Large Language Models and Natural Language Processing (NLP) technology rapidly develop and spread into daily life, it becomes crucial to anticipate how their use could harm people. One problem that has received a lot of attention in recent years is that this technology has displayed harmful biases, from generating derogatory stereotypes to producing disparate outcomes for different social groups. Although a lot of effort has been invested in assessing and mitigating these biases, our methods of measuring the biases of NLP models have serious problems and it is often unclear what they actually measure. In this paper, we provide an interdisciplinary approach to discussing the issue of NLP model bias by adopting the lens of psychometrics -- a field specialized in the measurement of concepts like bias that are not directly observable. In particular, we will explore two central notions from psychometrics, the construct validity and the reliability of measurement tools, and discuss how they can be applied in the context of measuring model bias. Our goal is to provide NLP practitioners with methodological tools for designing better bias measures, and to inspire them more generally to explore tools from psychometrics when working on bias measurement tools.	翻訳日:2024-01-18 03:33:14 公開日:2024-01-14
# 部分空間間の量子コヒーレンス:状態変換、コヒーレンスパワー、$k$コヒーレンスおよびその他の性質 Quantum coherence between subspaces: State transformation, Cohering Power, $k$-coherence and other properties ( http://arxiv.org/abs/2302.13148v4 ) ライセンス: Link先を確認	Azam Mani, Fatemeh Rezazadeh, Vahid Karimipour	(参考訳) 最初に[1]で導入され[2,3]で開発されたボックコヒーレンスの概念は、個々の原子上で任意の精密な測定を行うために実験能力がそれほど繊細でない場合を含む。我々は,この資源理論のさらなる研究を促進する枠組みを,いくつかの点で開発する。この枠組みを用いて,非一貫性操作による状態変換の問題を調査し,ブロック非一貫性操作による状態変換に必要な十分条件がメジャー化条件であることを示す。我々はまた、他の全ての状態およびすべてのユニタリゲートが非コヒーレント操作によって構築できる最大コヒーレント状態の形式を決定する。その後、量子チャネルのブロックコヒーレンスおよびブロックデコヒーレンスパワーの概念を定義し、これらのパワーを複数の種類のチャネルで決定する。最後に、ブロックコヒーレンスと、$k$-コヒーレンスと呼ばれる以前のコヒーレンスの拡張との関係について検討する。 The concept of bock-coherence, first introduced in [1] and developed in [2,3] encompasses the case where experimental capabilities are not so delicate to perform arbitrary refined measurements on individual atoms. We develop a framework which facilitates further investigation of this resource theory in several respects. Using this framework, we investigate the problem of state conversion by incoherent operations and show that a majorization condition is the necessary and sufficient condition for state transformation by block-incoherent operations. We also determine the form of the maximally coherent state from which all other states and all unitary gates can be constructed by incoherent operations. Thereafter, we define the concept of block-cohering and block-decohering powers of quantum channels and determine these powers for several types of channels. Finally, we explore the relation between block coherence and a previous extension of coherence, known as $k$-coherence.	翻訳日:2024-01-18 03:24:17 公開日:2024-01-14
# 情報理論上界に対する情報理論下界 Information Theoretic Lower Bounds for Information Theoretic Upper Bounds ( http://arxiv.org/abs/2302.04925v2 ) ライセンス: Link先を確認	Roi Livni	(参考訳) 確率的凸最適化の文脈において,出力モデルと経験的サンプル間の相互情報とアルゴリズムの一般化の関係について検討する。情報理論の一般化バウンダリへの関心が高まっているにもかかわらず、これらのバウンダリが様々な学習アルゴリズムの異常な性能に関する洞察を与えることができるかどうかは不明である。確率凸最適化の研究により,真のリスク最小化には次元依存的相互情報が必要であることが明らかになった。このことは、既存の情報理論の一般化境界は、次元に依存しないサンプル複雑性を持つSGDや正規化ERMのようなアルゴリズムの一般化能力の獲得に不足していることを示している。 We examine the relationship between the mutual information between the output model and the empirical sample and the generalization of the algorithm in the context of stochastic convex optimization. Despite increasing interest in information-theoretic generalization bounds, it is uncertain if these bounds can provide insight into the exceptional performance of various learning algorithms. Our study of stochastic convex optimization reveals that, for true risk minimization, dimension-dependent mutual information is necessary. This indicates that existing information-theoretic generalization bounds fall short in capturing the generalization capabilities of algorithms like SGD and regularized ERM, which have dimension-independent sample complexity.	翻訳日:2024-01-18 03:23:28 公開日:2024-01-14
# データ中心機械学習のための再ラベル法 The Re-Label Method For Data-Centric Machine Learning ( http://arxiv.org/abs/2302.04391v7 ) ライセンス: Link先を確認	Tong Guo	(参考訳) 業界深層学習アプリケーションでは、手作業でラベル付けしたデータは、一定の数のノイズデータを持っています。この問題を解決し、開発データセットで90以上のスコアを達成するために、人間のラベル付けにおける参照としてモデル予測を考慮し、ノイズデータを見つけ、ノイズデータを再ラベルする簡単な方法を提案する。本稿では,分類,シーケンスタグ付け,オブジェクト検出,シーケンス生成,クリックスルー率予測など,幅広いディープラーニングタスクのセットについて述べる。開発データセットの評価結果と人格評価結果は、このアイデアを検証する。 In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The dev dataset evaluation results and human evaluation results verify our idea.	翻訳日:2024-01-18 03:23:02 公開日:2024-01-14
# 不確実性を考慮した構造知識伝達による360$^\circ高分解能深さ推定 360$^\circ$ High-Resolution Depth Estimation via Uncertainty-aware Structural Knowledge Transfer ( http://arxiv.org/abs/2304.07967v2 ) ライセンス: Link先を確認	Zidong Cao, Hao Ai, Athanasios V. Vasilakos, Lin Wang	(参考訳) 高分解能(HR)全方位深度マップを予測するために、既存の手法では、完全に教師付き学習を通じて入力としてHR全方位画像(ODI)を利用するのが一般的である。しかし、実際にHR ODIを入力として使うのは、リソース制約されたデバイスのため望ましくない。さらに、深度マップはカラー画像よりも解像度が低いことが多い。そこで本研究では,HR深度GTマップが存在しない場合に,低分解能(LR) ODIから直接HR全方位深度を推定する。我々のキーとなる考え方は、HR画像のモダリティと対応するLR深度マップからシーン構造知識を移譲し、余分な推論コストを伴わずにHR深度推定の目標を達成することである。具体的には,ODIスーパーレゾリューション(SR)を補助タスクとして導入し,HR深度推定の性能を高めるために,両タスクを弱教師付きで協調的に訓練する。 ODI SRタスクは不確実性推定によってシーン構造的知識を抽出する。これにより,シーン構造知識伝達(SSKT)モジュールを2つのキーコンポーネントで提案する。まず,円筒型暗黙的補間関数(ciif)を用いて,円筒型神経補間重みを学習し,二つのタスク間でciifのパラメータを共有する。次に,hr深度推定タスクがシーン構造知識をより多く学ぶのに役立つ追加構造正規化を提供する特徴蒸留(fd)損失を提案する。広範な実験により,本手法はベースライン法を上回っており,完全教師あり法と同等の性能が得られることを示した。 To predict high-resolution (HR) omnidirectional depth map, existing methods typically leverage HR omnidirectional image (ODI) as the input via fully-supervised learning. However, in practice, taking HR ODI as input is undesired due to resource-constrained devices. In addition, depth maps are often with lower resolution than color images. Therefore, in this paper, we explore for the first time to estimate the HR omnidirectional depth directly from a low-resolution (LR) ODI, when no HR depth GT map is available. Our key idea is to transfer the scene structural knowledge from the HR image modality and the corresponding LR depth maps to achieve the goal of HR depth estimation without any extra inference cost. Specifically, we introduce ODI super-resolution (SR) as an auxiliary task and train both tasks collaboratively in a weakly supervised manner to boost the performance of HR depth estimation. The ODI SR task extracts the scene structural knowledge via uncertainty estimation. Buttressed by this, a scene structural knowledge transfer (SSKT) module is proposed with two key components. First, we employ a cylindrical implicit interpolation function (CIIF) to learn cylindrical neural interpolation weights for feature up-sampling and share the parameters of CIIFs between the two tasks. Then, we propose a feature distillation (FD) loss that provides extra structural regularization to help the HR depth estimation task learn more scene structural knowledge. Extensive experiments demonstrate that our weakly-supervised method outperforms baseline methods, and even achieves comparable performance with the fully-supervised methods.	翻訳日:2024-01-18 03:12:08 公開日:2024-01-14
# 弦から化学構造を学ぶ変圧器建築のキラリティー認識の難しさ Difficulty in chirality recognition for Transformer architectures learning chemical structures from string ( http://arxiv.org/abs/2303.11593v4 ) ライセンス: Link先を確認	Yasuhiro Yoshikai, Tadahaya Mizuno, Shumpei Nemoto, Hiroyuki Kusuhara	(参考訳) 近年、非常に多様な分子の表現学習、特に自然言語処理(nlp)モデルを分子構造のリテラル表現であるスマイルに適用することに基づく記述子生成の急速な発展が見られる。しかし、これらのモデルがどのように化学構造を理解するかについてはほとんど研究されていない。このブラックボックスに対処するため,SMILESの学習過程と化学構造との関係を代表的NLPモデルであるTransformerを用いて検討した。トランスフォーマーは分子の部分構造を高速に学習するが、全体構造を理解するには拡張トレーニングが必要である。学習段階の異なるモデルから生成された記述子を用いた分子特性予測の精度は,訓練開始から終了まで類似していた。さらに,トランスフォーマーはキラリティーの学習に特に長い訓練が必要であり,エナンチオマーの誤解により性能が低下することもある。これらの知見は化学におけるNLPモデルの理解を深めることが期待される。 Recent years have seen rapid development of descriptor generation based on representation learning of extremely diverse molecules, especially those that apply natural language processing (NLP) models to SMILES, a literal representation of molecular structure. However, little research has been done on how these models understand chemical structure. To address this black box, we investigated the relationship between the learning progress of SMILES and chemical structure using a representative NLP model, the Transformer. We show that while the Transformer learns partial structures of molecules quickly, it requires extended training to understand overall structures. Consistently, the accuracy of molecular property predictions using descriptors generated from models at different learning steps was similar from the beginning to the end of training. Furthermore, we found that the Transformer requires particularly long training to learn chirality and sometimes stagnates with low performance due to misunderstanding of enantiomers. These findings are expected to deepen the understanding of NLP models in chemistry.	翻訳日:2024-01-18 03:08:26 公開日:2024-01-14
# WeditGAN: ラテント・スペース・リロケーションによる画像生成 WeditGAN: Few-Shot Image Generation via Latent Space Relocation ( http://arxiv.org/abs/2305.06671v3 ) ライセンス: Link先を確認	Yuxuan Duan, Li Niu, Yan Hong, Liqing Zhang	(参考訳) 少数の画像生成では、少数の画像上でGANモデルを直接訓練することは、過度に適合するリスクに直面している。一般的な解決策は、大きなソースドメインで事前訓練されたモデルを小さなターゲットに転送することである。本研究はWeditGANを導入し、StyleGANの中間潜伏符号$w$を学習定数オフセット($\Delta w$)で編集し、ソース潜伏空間の分布を単純に移動させることで、目標潜伏空間を発見し、構築することでモデル転送を実現する。潜在空間間の1対1マッピングが確立されると、自然にモードの崩壊やオーバーフィットを防止できる。さらに,方向を定式化したり,$\delta w$ の強度を微調整することにより,再配置プロセスをさらに強化するために,weditgan の変種も提案する。広く使われているソース/ターゲットデータセットの集合に関する実験では、現実的で多様な画像を生成するためのweditganの能力が示されている。コードはhttps://github.com/ldhlwh/weditganで入手できる。 In few-shot image generation, directly training GAN models on just a handful of images faces the risk of overfitting. A popular solution is to transfer the models pretrained on large source domains to small target ones. In this work, we introduce WeditGAN, which realizes model transfer by editing the intermediate latent codes $w$ in StyleGANs with learned constant offsets ($\Delta w$), discovering and constructing target latent spaces via simply relocating the distribution of source latent spaces. The established one-to-one mapping between latent spaces can naturally prevents mode collapse and overfitting. Besides, we also propose variants of WeditGAN to further enhance the relocation process by regularizing the direction or finetuning the intensity of $\Delta w$. Experiments on a collection of widely used source/target datasets manifest the capability of WeditGAN in generating realistic and diverse images, which is simple yet highly effective in the research area of few-shot image generation. Codes are available at https://github.com/Ldhlwh/WeditGAN.	翻訳日:2024-01-18 02:58:43 公開日:2024-01-14
# BPJDet:ジェネリックボディ部分関節検出のための拡張オブジェクト表現 BPJDet: Extended Object Representation for Generic Body-Part Joint Detection ( http://arxiv.org/abs/2304.10765v2 ) ライセンス: Link先を確認	Huayi Zhou, Fei Jiang, Jiaxin Si, Yue Ding, and Hongtao Lu	(参考訳) 人体とその部分の検出は集中的に研究されている。しかし、cnnsベースの検出器のほとんどは独立して訓練されており、検出された部品を身体と関連付けることが困難である。本稿では,人体とその部分の関節検出に焦点をあてる。具体的には,体部品の中心オフセットを統合した拡張オブジェクト表現を提案し,エンドツーエンドの汎用体部品関節検出器(BPJDet)を構築した。このように、ボディー・パート・アソシエーションは、意味的内容と幾何学的内容の両方を含む統一表現にきちんと埋め込まれている。したがって、マルチタスクを相乗的に扱うために、マルチロスを最適化することができる。さらに、この表現はアンカーベースおよびアンカーフリー検出器に適している。 BPJDetは、エラーを起こしやすいポストマッチングに悩まされず、スピードと精度のトレードオフを良好に保ちます。さらに、BPJDetは、ヒトまたは四肢動物の身体部分または身体部分を検出するために一般化することができる。 BPJDetの優位性を検証するため,体部(CityPersons,CrowdHuman,BodyHands)と体部(COCOHuman Parts,Animals5C)のデータセットについて実験を行った。 BPJDetは高い検出精度を維持しながら、すべてのデータセットで最先端のアソシエーションパフォーマンスを達成する。また, 高精度群頭検出とハンドコンタクト推定の2つの代表的な下流アプリケーションの性能向上により, 高度な身体関連能力の利点を示す。プロジェクトはhttps://hnuzhy.github.io/projects/bpjdetで入手できる。 Detection of human body and its parts has been intensively studied. However, most of CNNs-based detectors are trained independently, making it difficult to associate detected parts with body. In this paper, we focus on the joint detection of human body and its parts. Specifically, we propose a novel extended object representation integrating center-offsets of body parts, and construct an end-to-end generic Body-Part Joint Detector (BPJDet). In this way, body-part associations are neatly embedded in a unified representation containing both semantic and geometric contents. Therefore, we can optimize multi-loss to tackle multi-tasks synergistically. Moreover, this representation is suitable for anchor-based and anchor-free detectors. BPJDet does not suffer from error-prone post matching, and keeps a better trade-off between speed and accuracy. Furthermore, BPJDet can be generalized to detect body-part or body-parts of either human or quadruped animals. To verify the superiority of BPJDet, we conduct experiments on datasets of body-part (CityPersons, CrowdHuman and BodyHands) and body-parts (COCOHumanParts and Animals5C). While keeping high detection accuracy, BPJDet achieves state-of-the-art association performance on all datasets. Besides, we show benefits of advanced body-part association capability by improving performance of two representative downstream applications: accurate crowd head detection and hand contact estimation. Project is available in https://hnuzhy.github.io/projects/BPJDet.	翻訳日:2024-01-18 02:56:32 公開日:2024-01-14
# 非log-concave分布に対するMCMCアルゴリズムの高速条件混合 Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions ( http://arxiv.org/abs/2306.10506v2 ) ライセンス: Link先を確認	Xiang Cheng, Bohan Wang, Jingzhao Zhang, Yusong Zhu	(参考訳) MCMCアルゴリズムは、ターゲット分布$\pi(x) \propto \exp(-V(x))$からサンプリングするための経験的に効率的なツールを提供する。しかし理論側では、mcmcアルゴリズムは$\pi(x)$ が非log-concaveであるときに混合速度が遅い。我々の研究は、このギャップを検証し、ポアンカー型不等式が状態空間のサブセット$\mathcal{X}$に収まるとき、MCMC の条件分布は $\mathcal{X}$ より速く真の条件分布に混合することを示す。この高速混合保証は、グローバル混合が確実に遅い場合に保持することができる。ステートメントを形式化し,条件付き混合率を定量化する。さらに,条件付き混合はガウス型混合物のサンプリング,ガウス型混合モデルのパラメータ推定,局所的極小点のgibbsサンプリングに興味深い意味を持つことを示す。 MCMC algorithms offer empirically efficient tools for sampling from a target distribution $\pi(x) \propto \exp(-V(x))$. However, on the theory side, MCMC algorithms suffer from slow mixing rate when $\pi(x)$ is non-log-concave. Our work examines this gap and shows that when Poincar\'e-style inequality holds on a subset $\mathcal{X}$ of the state space, the conditional distribution of MCMC iterates over $\mathcal{X}$ mixes fast to the true conditional distribution. This fast mixing guarantee can hold in cases when global mixing is provably slow. We formalize the statement and quantify the conditional mixing rate. We further show that conditional mixing can have interesting implications for sampling from mixtures of Gaussians, parameter estimation for Gaussian mixture models and Gibbs-sampling with well-connected local minima.	翻訳日:2024-01-18 02:35:46 公開日:2024-01-14
# できるようにする、得る限りではない Do as I can, not as I get ( http://arxiv.org/abs/2306.10345v2 ) ライセンス: Link先を確認	Shangfei Zheng, Hongzhi Yin, Tong Chen, Quoc Viet Hung Nguyen, Wei Chen, and Lei Zhao	(参考訳) 本稿では、シミュレーションデータ環境から貴重な情報をマイニングするためのTMRモデルを提案する。私たちはこの論文の提出を終えるつもりです。 This paper proposes a model called TMR to mine valuable information from simulated data environments. We intend to complete the submission of this paper.	翻訳日:2024-01-18 02:35:30 公開日:2024-01-14
# フォースを学べる:マルチオブジェクトビデオ生成におけるスパースモーション制御の実現 Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation ( http://arxiv.org/abs/2306.03988v2 ) ライセンス: Link先を確認	Aram Davtyan and Paolo Favaro	(参考訳) 本研究では,単一のフレームとスパース動作入力から映像を自動回帰生成する新しい教師なし手法を提案する。我々の訓練されたモデルは、目に見えない現実的なオブジェクト間相互作用を生成できる。私たちのモデルは、トレーニング中にシーン内の各オブジェクトの明示的なセグメンテーションと動きを与えられることはないが、それらのダイナミクスと範囲を暗黙的に分離することができる。本手法の重要な構成要素は, ランダム化条件付けスキーム, 入力動作制御の符号化, ランダム化およびスパースサンプリングであり, 分布域外への一般化を可能にする。ヨーダと呼ばれる我々のモデルは、物理的に触れることなく物体を動かすことができる。いくつかのデータセットの定性的・定量的な評価を通じて, YODAは, 制御性と映像品質の両面で, 先行研究の最先端技術と同等かそれ以上であることを示す。 We propose a novel unsupervised method to autoregressively generate videos from a single frame and a sparse motion input. Our trained model can generate unseen realistic object-to-object interactions. Although our model has never been given the explicit segmentation and motion of each object in the scene during training, it is able to implicitly separate their dynamics and extents. Key components in our method are the randomized conditioning scheme, the encoding of the input motion control, and the randomized and sparse sampling to enable generalization to out of distribution but realistic correlations. Our model, which we call YODA, has therefore the ability to move objects without physically touching them. Through extensive qualitative and quantitative evaluations on several datasets, we show that YODA is on par with or better than state of the art video generation prior work in terms of both controllability and video quality.	翻訳日:2024-01-18 02:32:20 公開日:2024-01-14
# 民話weisfeiler-lehmanによるグラフニューラルネットワークの設計空間の拡張 Extending the Design Space of Graph Neural Networks by Rethinking Folklore Weisfeiler-Lehman ( http://arxiv.org/abs/2306.03266v3 ) ライセンス: Link先を確認	Jiarui Feng, Lecheng Kong, Hao Liu, Dacheng Tao, Fuhai Li, Muhan Zhang, Yixin Chen	(参考訳) 近年、グラフニューラルネットワーク(GNN)の最も人気のあるフレームワークとして、メッセージパッシングニューラルネットワーク(MPNN)が登場している。しかし、その表現力は1次元のWeisfeiler-Lehman (1-WL) テストによって制限される。いくつかの作品は$k$-WL/FWL(Folklore WL)にインスパイアされ、対応するニューラルバージョンを設計する。表現力が高いにもかかわらず、この研究には深刻な制限がある。特に、(1)$k$-WL/FWL は少なくとも$O(n^k)$空間複雑性を必要とし、これは$k=3$; (2)$k$-WL/FWL の設計空間は厳密であり、唯一の調整可能なハイパーパラメータは$k$である。最初の制限に対処するために、$(k,t)$-FWLの拡張を提案する。理論的には、空間複雑性を$O(n^k)$ (任意の$k\geq 2$) in $(k,t)$-FWL に固定しても、グラフ同型問題を解くまで表現性階層を構築することができる。 2つ目の問題に取り組むために、全てのノードの代わりに任意の同変集合を隣人として考える$k$-FWL+を提案し、その結果、設計空間を$k$-FWLに拡大する。これら2つの修正を組み合わせると、柔軟性と強力なフレームワーク $(k,t)$-fwl+ が得られる。我々は、$(k,t)$-FWL+が、表現性にマッチする既存のモデルを実装することを実証する。次に、(k,t)$-FWL+ である Neighborhood$^2$-FWL (N$^2$-FWL) の例を導入する。 N$^2$-FWL は 3WL に劣らず強力であり、O(n^2)$空間のみを必要としながら多くの部分構造を符号化できる。最後に、N$^2$-GNNというニューラルバージョンを設計し、各種タスクの性能を評価する。 N$^2$-GNN は ZINC-Subset (0.059) で記録破りの結果を達成し、以前の SOTA の成績を 10.6% 上回った。さらに、N$^2$-GNNは、既存のすべての高表現性GNN手法の中でBRECデータセット(71.8%)で新しいSOTA結果を達成する。 Message passing neural networks (MPNNs) have emerged as the most popular framework of graph neural networks (GNNs) in recent years. However, their expressive power is limited by the 1-dimensional Weisfeiler-Lehman (1-WL) test. Some works are inspired by $k$-WL/FWL (Folklore WL) and design the corresponding neural versions. Despite the high expressive power, there are serious limitations in this line of research. In particular, (1) $k$-WL/FWL requires at least $O(n^k)$ space complexity, which is impractical for large graphs even when $k=3$; (2) The design space of $k$-WL/FWL is rigid, with the only adjustable hyper-parameter being $k$. To tackle the first limitation, we propose an extension, $(k,t)$-FWL. We theoretically prove that even if we fix the space complexity to $O(n^k)$ (for any $k\geq 2$) in $(k,t)$-FWL, we can construct an expressiveness hierarchy up to solving the graph isomorphism problem. To tackle the second problem, we propose $k$-FWL+, which considers any equivariant set as neighbors instead of all nodes, thereby greatly expanding the design space of $k$-FWL. Combining these two modifications results in a flexible and powerful framework $(k,t)$-FWL+. We demonstrate $(k,t)$-FWL+ can implement most existing models with matching expressiveness. We then introduce an instance of $(k,t)$-FWL+ called Neighborhood$^2$-FWL (N$^2$-FWL), which is practically and theoretically sound. We prove that N$^2$-FWL is no less powerful than 3-WL, and can encode many substructures while only requiring $O(n^2)$ space. Finally, we design its neural version named N$^2$-GNN and evaluate its performance on various tasks. N$^2$-GNN achieves record-breaking results on ZINC-Subset (0.059), outperforming previous SOTA results by 10.6%. Moreover, N$^2$-GNN achieves new SOTA results on the BREC dataset (71.8%) among all existing high-expressive GNN methods.	翻訳日:2024-01-18 02:32:04 公開日:2024-01-14
# ParameterNet:パラメータがすべて必要である ParameterNet: Parameters Are All You Need ( http://arxiv.org/abs/2306.14525v2 ) ライセンス: Link先を確認	Kai Han, Yunhe Wang, Jianyuan Guo, Enhua Wu	(参考訳) 大規模視覚前訓練は、大規模視覚モデルの性能を大幅に向上させる。しかし、既存の低FLOPsモデルでは大規模な事前学習の恩恵を受けられないという「emph{low FLOPs pitfall」を観察する。本稿では,大規模視覚前訓練モデルのパラメータ数を増加させながらフラップの増加を最小限に抑えることを目的とした,parameternetと呼ばれる新しい設計原理を提案する。我々は動的畳み込みを利用して,フラップの限界上昇のみを伴い,ネットワークに追加パラメータを組み込む。 parameternetアプローチにより、低flopsネットワークは大規模なビジュアルプリトレーニングを活用できる。さらに,パラメータネットの概念を言語領域に拡張し,推論速度を保ちながら推論結果を向上する。大規模imagenet-22k実験では,パラメータネットスキームの優位性が示された。たとえばパラメータNet-600Mは、広く使われているSwin Transformer(81.6\% \emph{vs)よりもImageNetの方が精度が高い。 80.9\%) であり、より低いフロップ(0.6g \emph{vs)を持つ。 4.5g)である。言語領域では、パラメータNetで強化されたLLaMA-1Bは、バニラLLaMAよりも2倍高い精度を達成する。コードは \url{https://parameternet.github.io/} でリリースされる。 The large-scale visual pretraining has significantly improve the performance of large vision models. However, we observe the \emph{low FLOPs pitfall} that the existing low-FLOPs models cannot benefit from large-scale pretraining. In this paper, we introduce a novel design principle, termed ParameterNet, aimed at augmenting the number of parameters in large-scale visual pretraining models while minimizing the increase in FLOPs. We leverage dynamic convolutions to incorporate additional parameters into the networks with only a marginal rise in FLOPs. The ParameterNet approach allows low-FLOPs networks to take advantage of large-scale visual pretraining. Furthermore, we extend the ParameterNet concept to the language domain to enhance inference results while preserving inference speed. Experiments on the large-scale ImageNet-22K have shown the superiority of our ParameterNet scheme. For example, ParameterNet-600M can achieve higher accuracy on ImageNet than the widely-used Swin Transformer (81.6\% \emph{vs.} 80.9\%) and has much lower FLOPs (0.6G \emph{vs.} 4.5G). In the language domain, LLaMA-1B enhanced with ParameterNet achieves 2\% higher accuracy over vanilla LLaMA. The code will be released at \url{https://parameternet.github.io/}.	翻訳日:2024-01-18 02:22:04 公開日:2024-01-14
# 深部ニューラルネットワークを用いた低コスト赤外線カメラによる温度推定 Estimating temperatures with low-cost infrared cameras using deep neural networks ( http://arxiv.org/abs/2307.12130v2 ) ライセンス: Link先を確認	Navot Oz, Nir Sochen, David Mendelovich, Iftach Klapp	(参考訳) 低コストのサーマルカメラは不正確(通常$\pm 3^\circ C$)で、検出器全体で空間変動の非均一性を持つ。不正確さと不均一さは、カメラの周囲温度に依存する。この研究の目的は、低コストの赤外線カメラで温度を推定し、不均一性を補正することであった。環境温度を考慮した非均一性シミュレータを開発した。カメラの物理モデルと周囲カメラ温度の両方を組み込んだエンドツーエンドニューラルネットワークが導入された。ニューラルネットワークは、シミュレーションされた非一様性データを用いて訓練され、物体の温度を推定し、カメラ自体によって測定された単一の画像と周囲温度のみを用いて不均一性を補正した。提案手法は, 従来よりも平均温度誤差を最大0.5^\circ C$で改善した。さらに、カメラの物理モデルをネットワークに制約することで、追加で0.1^\circ C$の誤差を下げることができた。検証データセットの平均温度誤差は0.37^\circ C$であった。この手法はフィールド内の実データに基づいて検証し,等価な結果を得た。 Low-cost thermal cameras are inaccurate (usually $\pm 3^\circ C$) and have space-variant nonuniformity across their detector. Both inaccuracy and nonuniformity are dependent on the ambient temperature of the camera. The goal of this work was to estimate temperatures with low-cost infrared cameras, and rectify the nonuniformity. A nonuniformity simulator that accounts for the ambient temperature was developed. An end-to-end neural network that incorporates both the physical model of the camera and the ambient camera temperature was introduced. The neural network was trained with the simulated nonuniformity data to estimate the object's temperature and correct the nonuniformity, using only a single image and the ambient temperature measured by the camera itself. Results of the proposed method significantly improved the mean temperature error compared to previous works by up to $0.5^\circ C$. In addition, constraining the physical model of the camera with the network lowered the error by an additional $0.1^\circ C$. The mean temperature error over an extensive validation dataset was $0.37^\circ C$. The method was verified on real data in the field and produced equivalent results.	翻訳日:2024-01-18 02:10:59 公開日:2024-01-14
# CS-Mixer:空間-チャネル混合を用いた大規模視覚MLPモデル CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing ( http://arxiv.org/abs/2308.13363v2 ) ライセンス: Link先を確認	Jonathan Cui, David A. Araujo, Suman Saha, Md. Faisal Kabir	(参考訳) Vision TransformersやConvolutional Neural Networksに比べて情報融合設計はシンプルだが、Vision MLPアーキテクチャは最近の研究で高い性能と高いデータ効率を示している。しかし、cyclemlpやvision permutatorのような既存の作品は、通常等サイズの空間領域における空間情報をモデル化し、クロススケールな空間的相互作用を考慮しない。さらに、トークンミキサーは1軸または2軸の相関のみをモデル化し、3軸の空間チャネル混合を避ける。そこで我々は,空間チャネル混合のための動的低ランク変換を局所的および大域的集約を通じて学習する階層型視覚MLPCS-Mixerを提案する。提案手法は,画像認識ベンチマークにおいて,計算量を大幅に増やさずに競合する結果を得る。我々の最大のモデルであるCS-Mixer-Lは、13.7 GFLOPと94Mパラメータを持つImageNet-1kで83.2%の精度に達した。 Despite their simpler information fusion designs compared with Vision Transformers and Convolutional Neural Networks, Vision MLP architectures have demonstrated strong performance and high data efficiency in recent research. However, existing works such as CycleMLP and Vision Permutator typically model spatial information in equal-size spatial regions and do not consider cross-scale spatial interactions. Further, their token mixers only model 1- or 2-axis correlations, avoiding 3-axis spatial-channel mixing due to its computational demands. We therefore propose CS-Mixer, a hierarchical Vision MLP that learns dynamic low-rank transformations for spatial-channel mixing through cross-scale local and global aggregation. The proposed methodology achieves competitive results on popular image recognition benchmarks without incurring substantially more compute. Our largest model, CS-Mixer-L, reaches 83.2% top-1 accuracy on ImageNet-1k with 13.7 GFLOPs and 94 M parameters.	翻訳日:2024-01-18 02:01:25 公開日:2024-01-14
# FoX:マルチエージェント強化学習における構成認識探索 FoX: Formation-aware exploration in multi-agent reinforcement learning ( http://arxiv.org/abs/2308.11272v2 ) ライセンス: Link先を確認	Yonghyeon Jo, Sunwoo Lee, Junghyuk Yeom, Seungyul Han	(参考訳) 近年,様々な協調型マルチエージェントタスクの成功により,マルチエージェント強化学習(marl)が注目されている。しかし、MARLではエージェントの部分的な観測可能性や、エージェントの数が増加するにつれて指数関数的に増加する探索空間が問題となっている。まず,探索空間のスケーラビリティ問題に対処するため,探索空間における構成に基づく等価性関係を定義し,異なる構成の有意義な状態のみを探索することによって探索空間の縮小を目指す。そこで本研究では, 様々な形態において, 部分的に観察可能なエージェントに対して, 自らの観察に基づいてのみ, 現形成をよく認識するように指導する, 新たな形成認識探索(fox)フレームワークを提案する。 Google Research Football (GRF) とSparse Starcraft II multi-agent Challenge (SMAC) のタスクにおいて,提案したFoXフレームワークは最先端のMARLアルゴリズムよりも大幅に優れていた。 Recently, deep multi-agent reinforcement learning (MARL) has gained significant popularity due to its success in various cooperative multi-agent tasks. However, exploration still remains a challenging problem in MARL due to the partial observability of the agents and the exploration space that can grow exponentially as the number of agents increases. Firstly, in order to address the scalability issue of the exploration space, we define a formation-based equivalence relation on the exploration space and aim to reduce the search space by exploring only meaningful states in different formations. Then, we propose a novel formation-aware exploration (FoX) framework that encourages partially observable agents to visit the states in diverse formations by guiding them to be well aware of their current formation solely based on their own observations. Numerical results show that the proposed FoX framework significantly outperforms the state-of-the-art MARL algorithms on Google Research Football (GRF) and sparse Starcraft II multi-agent challenge (SMAC) tasks.	翻訳日:2024-01-18 02:01:08 公開日:2024-01-14
# 量子電子商取引の実験 Experimental quantum e-commerce ( http://arxiv.org/abs/2308.08821v2 ) ライセンス: Link先を確認	Xiao-Yu Cao, Bing-Hong Li, Yang Wang, Yao Fu, Hua-Lei Yin, Zeng-Bing Chen	(参考訳) インターネット上で高い頻度で発生する取引の一種であるeコマースは、長距離のメッセージの完全性、認証、非送還性を保証する必要がある。現行の電子商取引スキームは計算攻撃に弱いため、量子暗号は敵の弁明と偽造に対する情報理論上のセキュリティを保証するため、この問題に対する解決策を提供する。しかし、一般に量子解は古典解よりもずっと低い性能を持つ。さらに、不完全なデバイスを考える場合、量子スキームの性能は大幅に低下する。ここでは,まず,不完全なデバイスからの攻撃に対する抵抗を示す量子電子商取引方式を提案することで,契約の締結と3つの当事者間の支払いを伴うeコマースプロセス全体を実証する。その結果,参加者間の最大減衰率25dBでは,約0.428メガビットの合意サイズに対して,毎秒0.82倍の署名率が得られることがわかった。本提案手法は,電子商取引に情報理論的なセキュリティを提供するための有望なソリューションである。 E-commerce, a type of trading that occurs at a high frequency on the Internet, requires guaranteeing the integrity, authentication and non-repudiation of messages through long distance. As current e-commerce schemes are vulnerable to computational attacks, quantum cryptography, ensuring information-theoretic security against adversary's repudiation and forgery, provides a solution to this problem. However, quantum solutions generally have much lower performance compared to classical ones. Besides, when considering imperfect devices, the performance of quantum schemes exhibits a significant decline. Here, for the first time, we demonstrate the whole e-commerce process of involving the signing of a contract and payment among three parties by proposing a quantum e-commerce scheme, which shows resistance of attacks from imperfect devices. Results show that with a maximum attenuation of 25 dB among participants, our scheme can achieve a signature rate of 0.82 times per second for an agreement size of approximately 0.428 megabit. This proposed scheme presents a promising solution for providing information-theoretic security for e-commerce.	翻訳日:2024-01-18 01:59:22 公開日:2024-01-14
# Gottesman-Kitaev-Preskill Codesによるボソニック量子誤差補正の進歩:理論・工学・応用 Advances in Bosonic Quantum Error Correction with Gottesman-Kitaev-Preskill Codes: Theory, Engineering and Applications ( http://arxiv.org/abs/2308.02913v3 ) ライセンス: Link先を確認	Anthony J. Brady, Alec Eickbusch, Shraddha Singh, Jing Wu and Quntao Zhuang	(参考訳) 量子情報を一組の高調波発振器に符号化することは、信頼性のある量子情報処理のためのノイズを軽減するためのハードウェア効率の良い手法と考えられる。量子ビットを発振器(猫符号、二項符号、gottesman-kitaev-preskill (gkp)符号など)にエンコードする様々な符号が提案されており、量子誤り訂正のために最初にブレークプラネット点に達したコードの一つである。 GKP符号は量子計算における約束によって広く認識されているが、ボソニックチャネルにおける準最適量子通信速度を促進し、発振器の任意の量子状態を保護する能力を提供する。本稿では、超伝導回路アーキテクチャの最近の実験的進歩とマルチモードGKP量子ビット符号および発振器からオシレータ(O2O)符号の理論的進歩を強調したGKP符号の基本動作機構、性能評価、および多くの応用について述べる。まず、ボソニック符号に必要な事前の連続変数形式から始める。次に、GKP状態の物理的実現に関わる量子工学に進む。本稿では,超伝導アーキテクチャにおけるGKP安定化と準備について深く掘り下げ,光領域におけるGKP状態を実現するための提案について検討する。最後に、マルチモードGKP量子ビットとGKP-O2O符号を示し、コード性能を調べ、計算、通信、センシングなどの量子情報処理タスクにおけるGKP符号の適用について議論する。 Encoding quantum information into a set of harmonic oscillators is considered a hardware efficient approach to mitigate noise for reliable quantum information processing. Various codes have been proposed to encode a qubit into an oscillator -- including cat codes, binomial codes and Gottesman-Kitaev-Preskill (GKP) codes -- and are among the first to reach a break-even point for quantum error correction. Though GKP codes are widely recognized for their promise in quantum computation, they also facilitate near-optimal quantum communication rates in bosonic channels and offer the ability to safeguard arbitrary quantum states of oscillators. This review focuses on the basic working mechanism, performance characterization, and the many applications of GKP codes -- emphasizing recent experimental progress in superconducting circuit architectures and theoretical advancements in multimode GKP qubit codes and oscillators-to-oscillators (O2O) codes. We begin with a preliminary continuous-variable formalism needed for bosonic codes. We then proceed to the quantum engineering involved to physically realize GKP states. We take a deep dive into GKP stabilization and preparation in superconducting architectures and examine proposals for realizing GKP states in the optical domain (along with a concise review of GKP realization in trapped-ion platforms). Finally, we present multimode GKP qubits and GKP-O2O codes, examine code performance and discuss applications of GKP codes in quantum information processing tasks such as computing, communication, and sensing.	翻訳日:2024-01-18 01:56:53 公開日:2024-01-14
# アナログ量子シミュレーションにおけるアルゴリズム誤差の最適化 Optimization of Algorithmic Errors in Analog Quantum Simulations ( http://arxiv.org/abs/2308.02642v2 ) ライセンス: Link先を確認	Nikita A. Zemlevskiy, Henry F. Froland, Stephan Caspar	(参考訳) アナログ量子シミュレーションは、多体実時間力学のような古典的到達不能な物理学を解明するための強力なツールとして登場している。現代機器のシミュレーションを用いて正確な予測を行うためには,不確実性の完全定量化が必要である。したがって、シミュレーションのパラメータに対するデバイス固有の物理的制約を理解する必要がある。本解析は,実世界のデバイス制約による近似時間変化のシミュレーションから生じる誤差の相互作用を考察する。これらの誤差はイジング・ハミルトンによって記述されたアナログ量子デバイス上のハイゼンベルク型システムで研究される。これらの誤差を定量化するための一般的なフレームワークが提案され、トロッターライクな手法やフロケエンジニアリングによる定数場アプローチなど、いくつかの時間発展手法に適用されている。現状のデバイスによる時間発展手法の精度に関する限界について考察する。異なるエラーソースのコヒーレント効果のスケーリングの特徴付けは、提示されるハミルトニアンのエンジニアリング手法を拡張して、今後のデバイス機能を活用する方法を提供する。 Analog quantum simulation is emerging as a powerful tool for uncovering classically unreachable physics such as many-body real-time dynamics. A complete quantification of uncertainties is necessary in order to make precise predictions using simulations on modern-day devices. Therefore, the inherent physical limitations of the device on the parameters of the simulation must be understood. This analysis examines the interplay of errors arising from simulation of approximate time evolution with those due to practical, real-world device constraints. These errors are studied in Heisenberg-type systems on analog quantum devices described by the Ising Hamiltonian. A general framework for quantifying these errors is introduced and applied to several proposed time evolution methods, including Trotter-like methods and Floquet-engineered constant-field approaches. The limitations placed on the accuracy of time evolution methods by current devices are discussed. Characterization of the scaling of coherent effects of different error sources provides a way to extend the presented Hamiltonian engineering methods to take advantage of forthcoming device capabilities.	翻訳日:2024-01-18 01:56:22 公開日:2024-01-14
# ブレイド統計を超える: 1次元に固有の交換統計を持つ任意のオンに対する格子モデルの構築 Beyond braid statistics: Constructing a lattice model for anyons with exchange statistics intrinsic to one dimension ( http://arxiv.org/abs/2309.04358v3 ) ライセンス: Link先を確認	Sebastian Nagies, Botao Wang, A.C. Knapp, Andr\'e Eckardt, and N.L. Harshman	(参考訳) 分数交換統計に従うものは2次元に自然に現れる: ハードコアの2体制約により、粒子の構成空間は単純に連結ではない。ブレイド群は、位相的に同値な交換経路がアーベル素数の非自明な幾何学的位相にどのように関連付けられるかを記述する。ブレイド・アニオン交換統計は1次元(1D)でも見られるが、2つのエノンが交換する異なる方法を区別するためには、ガリレオ不変性を欠く必要がある。しかし近年、ハードコアの3体制約によって構成空間が単純に連結されないため、交換統計の代替形式が1Dで発生することが示されている。ブレイド群の代わりに、交換経路の位相とその付随する非自明な幾何学的位相はトレイド群によって記述される。本稿では、この交換統計の代替形式を実現する最初の具体的モデルを提案する。数依存性ピアール位相を持つ所望の幾何学的位相を実装するボソニック格子モデルから始まり、ハミルトニアンの運動エネルギー項がそれらに関して局所的かつ二次的になるように、エノニック作用素を定義する。このtid-anyon-hubbardモデルの基底状態は、ボソンとフェルミオンの間の交換統計の中間のいくつかの兆候と、緊急に近似したハルダン排他統計の兆候を示している。連続極限は、以前に構築されたトレイド・エノンの連続波動関数に対応する固有状態を持つガリレオ不変ハミルトニアンをもたらす。これは格子モデルの非直交的正当性を提供するだけでなく、我々の構成がトロイド・エノン(すなわち1Dに固有のもの)に対する直感的なアプローチであることを示す。 Anyons obeying fractional exchange statistics arise naturally in two dimensions: hard-core two-body constraints make the configuration space of particles not simply-connected. The braid group describes how topologically-inequivalent exchange paths can be associated to non-trivial geometric phases for abelian anyons. Braid-anyon exchange statistics can also be found in one dimension (1D), but this requires broken Galilean invariance to distinguish different ways for two anyons to exchange. However, recently it was shown that an alternative form of exchange statistics can occur in 1D because hard-core three-body constraints also make the configuration space not simply-connected. Instead of the braid group, the topology of exchange paths and their associated non-trivial geometric phases are described by the traid group. In this article we propose a first concrete model realizing this alternative form of anyonic exchange statistics. Starting from a bosonic lattice model that implements the desired geometric phases with number-dependent Peierls phases, we then define anyonic operators so that the kinetic energy term in the Hamiltonian becomes local and quadratic with respect to them. The ground-state of this traid-anyon-Hubbard model exhibits several indications of exchange statistics intermediate between bosons and fermions, as well as signs of emergent approximate Haldane exclusion statistics. The continuum limit results in a Galilean invariant Hamiltonian with eigenstates that correspond to previously constructed continuum wave functions for traid anyons. This provides not only an a-posteriori justification of our lattice model, but also shows that our construction serves as an intuitive approach to traid anyons, i.e. anyons intrinsic to 1D.	翻訳日:2024-01-18 01:49:33 公開日:2024-01-14
# 後中等教育におけるテキスト応答の自動評価:体系的レビュー Automatic assessment of text-based responses in post-secondary education: A systematic review ( http://arxiv.org/abs/2308.16151v2 ) ライセンス: Link先を確認	Rujun Gao, Hillary E. Merzdorf, Saira Anwar, M. Cynthia Hipwell, Arun Srinivasa	(参考訳) 学術的形式的・要約的評価におけるテキストベースのオープンエンド質問は、学生が深層学習者になり、後続の概念的評価の概念を理解する準備をするのに役立つ。しかし、テキストベースの質問、特に大きなコースでは、インストラクターにとって退屈で時間がかかります。テキスト処理モデルは、人工知能(AI)ツールと自然言語処理(NLP)アルゴリズムの急速な開発で進歩を続けている。特にLarge Language Models (LLM) のブレークスルーの後、教育におけるテキストベースの反応の迅速な評価とフィードバックを自動化する大きな可能性がある。本研究は,PRISMAプロセスに基づく学術・再現可能な文献検索戦略を採用し,第2次教育後におけるテキストベース自動評価システムの研究,838論文のスクリーニング,93研究の合成を行う。近年,テキストベースの自動評価システムが教育にどのように発展・適用されているかを理解するために,3つの研究課題が検討されている。システム入力と出力,研究モチベーション,研究成果など,研究課題への回答を目的とした包括的枠組みに基づいて,すべての研究を要約し,分類する。さらに, 本研究における自動評価システム, 研究方法, 応用領域の典型的研究を概説し, 要約した。この体系的なレビューは、高等教育におけるテキストベースアセスメントを支援する最新のAI/NLP開発を理解するために、テキストベースアセスメントシステムの最近の教育応用の概要を提供する。発見は、ChatGPTのようなLLMを教育活動に取り入れた研究者や教育者にとって特に有益である。 Text-based open-ended questions in academic formative and summative assessments help students become deep learners and prepare them to understand concepts for a subsequent conceptual assessment. However, grading text-based questions, especially in large courses, is tedious and time-consuming for instructors. Text processing models continue progressing with the rapid development of Artificial Intelligence (AI) tools and Natural Language Processing (NLP) algorithms. Especially after breakthroughs in Large Language Models (LLM), there is immense potential to automate rapid assessment and feedback of text-based responses in education. This systematic review adopts a scientific and reproducible literature search strategy based on the PRISMA process using explicit inclusion and exclusion criteria to study text-based automatic assessment systems in post-secondary education, screening 838 papers and synthesizing 93 studies. To understand how text-based automatic assessment systems have been developed and applied in education in recent years, three research questions are considered. All included studies are summarized and categorized according to a proposed comprehensive framework, including the input and output of the system, research motivation, and research outcomes, aiming to answer the research questions accordingly. Additionally, the typical studies of automated assessment systems, research methods, and application domains in these studies are investigated and summarized. This systematic review provides an overview of recent educational applications of text-based assessment systems for understanding the latest AI/NLP developments assisting in text-based assessments in higher education. Findings will particularly benefit researchers and educators incorporating LLMs such as ChatGPT into their educational activities.	翻訳日:2024-01-18 01:46:06 公開日:2024-01-14
# 量子は爆破グラフ上を歩く Quantum walks on blow-up graphs ( http://arxiv.org/abs/2308.13887v2 ) ライセンス: Link先を確認	Bikash Bhattacharjya, Hermie Monterde, Hiranmoy Pal	(参考訳) グラフ $G$ の$n$コピーのブローアップは、$G$ のすべての頂点を独立したサイズの集合 $n$ に置き換えることで得られるグラフ $\overset{n}\uplus~G$ である。我々の目標は、$\overset{n}\uplus~g$ で表される量子系の時間に依存しないハミルトニアンとして随伴行列を取る、ブローアップグラフ $\overset{n}\uplus~g$ 上の量子状態遷移の存在を調べることである。特に,爆発グラフにおける頂点の必要十分条件を定め,強いコスペクトル性を示すとともに,周期性,完全状態移動(PST),かなりよい状態移動(PGST)など,様々な種類の高確率量子輸送を示す。すると、$\overset{n}\uplus~G$ が PST や PGST を許すなら、$n=2.$ でなければならない。さらに、$G$ が可逆な隣接行列を持つなら、$\overset{2}\uplus~G$ のすべての頂点が、強いコスペクタリティを示すために一意の頂点と組むことを示す。この結果を用いて, pst と pgst をブローアップするグラフの無限族を決定する。 A blow-up of $n$ copies of a graph $G$ is the graph $\overset{n}\uplus~G$ obtained by replacing every vertex of $G$ by an independent set of size $n$, where the copies of vertices in $G$ are adjacent in the blow-up if and only if the vertices adjacent in $G$. Our goal is to investigate the existence of quantum state transfer on a blow-up graph $\overset{n}\uplus~G$, where the adjacency matrix is taken to be the time-independent Hamiltonian of the quantum system represented by $\overset{n}\uplus~G$. In particular, we establish necessary and sufficient conditions for vertices in a blow-up graph to exhibit strong cospectrality and various types of high probability quantum transport, such as periodicity, perfect state transfer (PST) and pretty good state transfer (PGST). It turns out, if $\overset{n}\uplus~G$ admits PST or PGST, then one must have $n=2.$ Moreover, if $G$ has an invertible adjacency matrix, then we show that every vertex in $\overset{2}\uplus~G$ pairs up with a unique vertex to exhibit strong cospectrality. We then apply our results to determine infinite families of graphs whose blow-ups admit PST and PGST.	翻訳日:2024-01-18 01:44:22 公開日:2024-01-14
# 低次元多様体上のポリシー最適化のための神経政策ミラー降下のサンプル複雑性 Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds ( http://arxiv.org/abs/2309.13915v2 ) ライセンス: Link先を確認	Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao	(参考訳) ディープニューラルネットワークを備えたポリシ勾配法は,高次元強化学習(RL)問題を解く上で大きな成功を収めている。しかし、現在の分析ではなぜそれが次元の呪いに抵抗しているのかは説明できない。本研究では,深部畳み込みニューラルネットワーク(CNN)を用いたNPMDアルゴリズムのサンプル複雑性について検討する。多くの高次元環境が、像を状態とするような低次元構造を持つ状態空間を持つという経験的観察に動機づけられ、状態空間は、内在次元 $d\ll d$ を持つ$d$次元ユークリッド空間に埋め込まれた$d$次元多様体であると考える。 NPMDの各イテレーションにおいて、値関数とポリシーの両方がCNNによって適切に近似可能であることを示す。近似誤差はネットワークのサイズによって制御され、前のネットワークの滑らかさを継承することができる。その結果、ネットワークサイズとハイパーパラメータを適切に選択することで、npmdは、期待値の$\widetilde{o}(\epsilon^{-\frac{d}{\alpha}-2})$サンプルを持つ$\epsilon$-optimalポリシーを見つけることができ、ここで$\alpha\in(0,1]$は環境の滑らかさを示す。これまでの研究と比較すると,NPMDは状態空間の低次元構造を利用して次元性の呪いから逃れることができ,深い政策勾配アルゴリズムの有効性が説明できる。 Policy gradient methods equipped with deep neural networks have achieved great success in solving high-dimensional reinforcement learning (RL) problems. However, current analyses cannot explain why they are resistant to the curse of dimensionality. In this work, we study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN). Motivated by the empirical observation that many high-dimensional environments have state spaces possessing low-dimensional structures, such as those taking images as states, we consider the state space to be a $d$-dimensional manifold embedded in the $D$-dimensional Euclidean space with intrinsic dimension $d\ll D$. We show that in each iteration of NPMD, both the value function and the policy can be well approximated by CNNs. The approximation errors are controlled by the size of the networks, and the smoothness of the previous networks can be inherited. As a result, by properly choosing the network size and hyperparameters, NPMD can find an $\epsilon$-optimal policy with $\widetilde{O}(\epsilon^{-\frac{d}{\alpha}-2})$ samples in expectation, where $\alpha\in(0,1]$ indicates the smoothness of environment. Compared to previous work, our result exhibits that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality, explaining the efficacy of deep policy gradient algorithms.	翻訳日:2024-01-18 01:36:51 公開日:2024-01-14
# CB-Whisper:Open-Vocabulary Keyword-Spotting を用いたコンテキストバイザWhisper CB-Whisper: Contextual Biasing Whisper using Open-Vocabulary Keyword-Spotting ( http://arxiv.org/abs/2309.09552v2 ) ライセンス: Link先を確認	Yuang Li, Yinglu Li, Min Zhang, Chang Su, Mengxin Ren, Xiaosong Qiao, Xiaofeng Zhao, Mengyao Piao, Jiawei Yu, Xinglin Lv, Miaomiao Ma, Yanqing Zhao, Hao Yang	(参考訳) エンド・ツー・エンド自動音声認識(asr)システムは、個人名、組織、用語など、トレーニングデータにあまり遭遇しない珍しい名前のエンティティを認識するのに苦労することが多い。本稿では,openai のwhisper モデルに基づく新しいasrシステムである context biasing whisper (cb-whisper) を提案する。認識されたエンティティは、whisperデコーダのプロンプトとして使用される。まず,OV-KWS タスクと ASR タスクを併用したマルチタスク学習手法を提案する。実験により,中国語のAishellホットワードサブセットと2つの内部コードスウィッチテストセットのWhisperモデルと比較して,エンティティリコールを大幅に改善した。しかし,大惨事による内部テストセットにおける混合エラーレート(mer)の増加がみられた。そこで本研究では, ov-kwsを別モジュールとして使用し, 幻覚を防止すべく, 発声型プロンプトを構築することを提案する。 OV-KWSモジュールは、小さめ、中、大型モデルのMERとEntity Recallを一貫して改善する。 End-to-end automatic speech recognition (ASR) systems often struggle to recognize rare name entities, such as personal names, organizations, and terminologies not frequently encountered in the training data. This paper presents Contextual Biasing Whisper (CB-Whisper), a novel ASR system based on OpenAI's Whisper model that can recognize user-defined name entities by performing open-vocabulary keyword-spotting (OV-KWS) using the hidden states of Whisper encoder. The recognized entities are used as prompts for the Whisper decoder. We first propose a multitask training approach with OV-KWS and ASR tasks to optimize the model. Experiments show that this approach substantially improves the entity recalls compared to the original Whisper model on Chinese Aishell hot word subsets and two internal code-switch test sets. However, we observed a slight increase in mixed-error-rate (MER) on internal test sets due to catastrophic forgetting. To address this problem and use different sizes of the Whisper model without finetuning, we propose to use OV-KWS as a separate module and construct a spoken form prompt to prevent hallucination. The OV-KWS module consistently improves MER and Entity Recall for whisper-small, medium, and large models.	翻訳日:2024-01-18 01:34:11 公開日:2024-01-14
# 睡眠ステージの透明性:モデル解釈可能性を考慮した脳波睡眠ステージ分類のための深層学習法 Transparency in Sleep Staging: Deep Learning Method for EEG Sleep Stage Classification with Model Interpretability ( http://arxiv.org/abs/2309.07156v4 ) ライセンス: Link先を確認	Shivam Sharma, Suvadeep Maiti, S. Mythirayee, Srijithesh Rajendran, Raju Surampudi Bapi	(参考訳) 単チャンネル脳波を用いた睡眠ステージの自動分類は睡眠品質評価と障害診断にとって重要なツールである。しかし、この信号に固有の複雑さと変動性をモデル化することは難しい課題であり、臨床における実用性と有効性を制限する。これらの課題を緩和するために、残余ネットワーク内に圧縮ブロックと励起ブロックを統合して特徴を抽出し、複雑な時間的依存関係を理解するために積み重ねたBi-LSTM(Deep-to-end Deep Learning)モデルを提案する。本研究の特筆すべき側面は、睡眠ステージングのためのGradCamの適応であり、この領域における説明可能なDLモデルの最初の事例であり、その決定と睡眠専門家の洞察の一致である。公開データセット(SleepEDF-20,SleepEDF-78,SHHS)を用いて,Macro-F1スコアが82.5,78.9,81.9であった。さらに、ストライドサイズの増大により、新たなトレーニング効率向上戦略が実施され、パフォーマンスへの影響を最小限に抑えながら、トレーニング時間を8倍に短縮した。比較分析は,本モデルが既存のすべてのベースラインより優れており,臨床応用の可能性を示している。 Automated Sleep stage classification using raw single channel EEG is a critical tool for sleep quality assessment and disorder diagnosis. However, modelling the complexity and variability inherent in this signal is a challenging task, limiting their practicality and effectiveness in clinical settings. To mitigate these challenges, this study presents an end-to-end deep learning (DL) model which integrates squeeze and excitation blocks within the residual network to extract features and stacked Bi-LSTM to understand complex temporal dependencies. A distinctive aspect of this study is the adaptation of GradCam for sleep staging, marking the first instance of an explainable DL model in this domain with alignment of its decision-making with sleep expert's insights. We evaluated our model on the publically available datasets (SleepEDF-20, SleepEDF-78, and SHHS), achieving Macro-F1 scores of 82.5, 78.9, and 81.9, respectively. Additionally, a novel training efficiency enhancement strategy was implemented by increasing stride size, leading to 8x faster training times with minimal impact on performance. Comparative analyses underscore our model outperforms all existing baselines, indicating its potential for clinical usage.	翻訳日:2024-01-18 01:33:14 公開日:2024-01-14
# 注意パラダイムを超越する:地理空間ソーシャルメディアデータからの表現学習 Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media Data ( http://arxiv.org/abs/2310.05378v3 ) ライセンス: Link先を確認	Nick DiSanto, Anthony Corso, Benjamin Sanders, Gavin Harding	(参考訳) トランスフォーマーは、言語モデリングの基盤として注目駆動アーキテクチャを開拓してきたが、文脈情報への依存は、テキストのテーマを暗黙的に学習する能力の限界を浮き彫りにした。本研究では,分散パターンの源泉としてソーシャルメディアデータを調べることで,パフォーマンスベンチマークのヒューリスティックパラダイムに挑戦する。複雑な長期的な依存関係を捉えるネットワークとは対照的に、オンラインデータのモデルは本質的に構造を欠き、集約内の潜在構造を検出せざるを得ない。この研究は、これらの抽象的な関係を適切に表現するために、実験的なソーシャルメディアコーパスを要素的要素に分類し、人口密度の高い場所で20億以上のツイートを分析します。我々は各都市固有の単語の埋め込みを作成し、それぞれの表現を比較する。これは、ノイズの多いデータにもかかわらず、地理的な場所はオンライン通信に大きな影響を与え、隠れた洞察は高度なアルゴリズムの欠如なしに発見できることを示している。この証拠は、社会科学において貴重な地理空間的含意を示し、複雑なモデルが自然言語におけるパターン認識の前提条件であるという考えに挑戦する。これは、抽象的な理解よりも絶対的な解釈可能性を受け入れることに疑問を呈し、洗練されたフレームワークと無形関係の間の隔たりを橋渡しする進化途上の状況と一致する。 While transformers have pioneered attention-driven architectures as a cornerstone of language modeling, their dependence on explicitly contextual information underscores limitations in their abilities to tacitly learn overarching textual themes. This study challenges the heuristic paradigm of performance benchmarking by investigating social media data as a source of distributed patterns. In stark contrast to networks that rely on capturing complex long-term dependencies, models of online data inherently lack structure and are forced to detect latent structures in the aggregate. To properly represent these abstract relationships, this research dissects empirical social media corpora into their elemental components, analyzing over two billion tweets across population-dense locations. We create Bag-of-Word embedding specific to each city and compare their respective representations. This finds that even amidst noisy data, geographic location has a considerable influence on online communication, and that hidden insights can be uncovered without the crutch of advanced algorithms. This evidence presents valuable geospatial implications in social science and challenges the notion that intricate models are prerequisites for pattern recognition in natural language. This aligns with the evolving landscape that questions the embrace of absolute interpretability over abstract understanding and bridges the divide between sophisticated frameworks and intangible relationships.	翻訳日:2024-01-18 01:23:05 公開日:2024-01-14
# クロスモーダル検索のためのプロトタイプベースアレエータ不確かさ定量化 Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval ( http://arxiv.org/abs/2309.17093v3 ) ライセンス: Link先を確認	Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, Heng Tao Shen	(参考訳) クロスモーダル検索手法は、共通表現空間を共同学習することにより、視覚と言語モダリティの類似性関係を構築する。しかし、この予測は、腐敗した画像、速いペースの動画、未詳のテキストなど、低品質のデータによって引き起こされるアリータティックな不確実性によって、しばしば信頼性が低下する。本稿では,不確実性から生じる不確かさを定量化することにより,信頼性の高い予測を実現するための新しいプロトタイプベースアレエータ型不確実性定量化(pau)フレームワークを提案する。具体的には、セマンティクス部分空間全体を表現するために、まず様々な学習可能なプロトタイプを各モダリティ向けに構築する。次に、デンプスター・シェーファー理論と主観論理理論を用いて、証拠とディリクレ分布パラメータを関連付けた実証的理論的枠組みを構築する。 PAUモデルは、クロスモーダル検索のための正確な不確実性と信頼性のある予測を誘導する。 MSR-VTT, MSVD, DiDeMo, MS-COCOの4つの主要なベンチマークデータセットを用いて実験を行い, 本手法の有効性を実証した。コードはhttps://github.com/leolee99/PAUでアクセスできる。 Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity. Concretely, we first construct a set of various learnable prototypes for each modality to represent the entire semantics subspace. Then Dempster-Shafer Theory and Subjective Logic Theory are utilized to build an evidential theoretical framework by associating evidence with Dirichlet Distribution parameters. The PAU model induces accurate uncertainty and reliable predictions for cross-modal retrieval. Extensive experiments are performed on four major benchmark datasets of MSR-VTT, MSVD, DiDeMo, and MS-COCO, demonstrating the effectiveness of our method. The code is accessible at https://github.com/leolee99/PAU.	翻訳日:2024-01-18 01:21:00 公開日:2024-01-14
# シーケンスモデリングとしての連続学習の再キャスト Recasting Continual Learning as Sequence Modeling ( http://arxiv.org/abs/2310.11952v2 ) ライセンス: Link先を確認	Soochan Lee, Jaehyeon Son, Gunhee Kim	(参考訳) 本研究では,連続学習とシーケンスモデリングという,機械学習研究の2つの重要な分野の強い関係を確立することを目的とする。すなわち,連続学習をシーケンスモデリング問題として定式化し,連続学習に高度なシーケンスモデルを活用することを提案する。この定式化の下では、連続学習プロセスがシーケンスモデルの前方通過となる。メタcontinual Learning(MCL)フレームワークを採用することで、複数の連続学習エピソードに基づいて、メタレベルでシーケンスモデルをトレーニングすることができる。新しい定式化の具体例として、トランスフォーマーとその効率的な変種をmcl法として応用することを示す。分類と回帰の両方を網羅した7つのベンチマーク実験により、シーケンスモデルが一般的なMCLにとって魅力的な解であることを示す。 In this work, we aim to establish a strong connection between two significant bodies of machine learning research: continual learning and sequence modeling. That is, we propose to formulate continual learning as a sequence modeling problem, allowing advanced sequence models to be utilized for continual learning. Under this formulation, the continual learning process becomes the forward pass of a sequence model. By adopting the meta-continual learning (MCL) framework, we can train the sequence model at the meta-level, on multiple continual learning episodes. As a specific example of our new formulation, we demonstrate the application of Transformers and their efficient variants as MCL methods. Our experiments on seven benchmarks, covering both classification and regression, show that sequence models can be an attractive solution for general MCL.	翻訳日:2024-01-18 01:10:51 公開日:2024-01-14
# llmが情報アクセスを支配する: ニューラルネットワークはllm生成テキストに偏っている LLMs may Dominate Information Access: Neural Retrievers are Biased Towards LLM-Generated Texts ( http://arxiv.org/abs/2310.20501v2 ) ライセンス: Link先を確認	Sunhao Dai, Yuqi Zhou, Liang Pang, Weihao Liu, Xiaolin Hu, Yong Liu, Xiao Zhang, Gang Wang and Jun Xu	(参考訳) 近年,大規模言語モデル (LLM) の出現は,特にWeb検索において情報検索 (IR) のパラダイムに革命をもたらした。人間のようなテキストを生成する素晴らしい能力によって、LLMはインターネット上で巨大なテキストを作成しました。結果として、LLM時代のIRシステムは新たな課題に直面しており、インデックス化された文書は人間によって書かれただけでなく、LLMによって自動的に生成される。これらのLCM生成した文書がIRシステムにどのように影響するかは、迫りつつも未解明の疑問である。本研究では,人間の書き起こしとLLM生成の両方が関与するシナリオにおいて,異なるIRモデルの定量的評価を行う。意外なことに, ニューラルネットワークによる検索モデルでは, LLM生成文書のランクが高かった。我々は、LLM生成テキストに対するニューラル検索モデルにおけるこのバイアスのカテゴリを「textbf{source bias}」と呼ぶ。さらに,このバイアスは第1段階のニューラルレトリバーに限らず,第2段階のニューラルリランカに限っていることがわかった。そして、テキスト圧縮の観点から詳細な分析を行い、ニューラルネットワークがLLM生成テキストのセマンティック情報をよりよく理解し、理論的解析によってさらに裏付けられることを観察する。また, 音源バイアスを軽減するため, 最適化目標に対するプラグ・アンド・プレイ・デバイアスド制約を提案し, 実験により有効性を示す。最後に、観測源バイアスに起因する潜在的な深刻な懸念について論じ、我々の発見がIRコミュニティなどへの重要な起床のきっかけとなることを期待する。 LLM時代のIRの将来の探索を容易にするため、構築された2つの新しいベンチマークとコードは後に \url{https://github.com/KID-22/LLM4IR-Bias} で利用可能となる。 Recently, the emergence of large language models (LLMs) has revolutionized the paradigm of information retrieval (IR) applications, especially in web search. With their remarkable capabilities in generating human-like texts, LLMs have created enormous texts on the Internet. As a result, IR systems in the LLMs era are facing a new challenge: the indexed documents now are not only written by human beings but also automatically generated by the LLMs. How these LLM-generated documents influence the IR systems is a pressing and still unexplored question. In this work, we conduct a quantitative evaluation of different IR models in scenarios where both human-written and LLM-generated texts are involved. Surprisingly, our findings indicate that neural retrieval models tend to rank LLM-generated documents higher. We refer to this category of biases in neural retrieval models towards the LLM-generated text as the \textbf{source bias}. Moreover, we discover that this bias is not confined to the first-stage neural retrievers, but extends to the second-stage neural re-rankers. Then, we provide an in-depth analysis from the perspective of text compression and observe that neural models can better understand the semantic information of LLM-generated text, which is further substantiated by our theoretical analysis. To mitigate the source bias, we also propose a plug-and-play debiased constraint for the optimization objective, and experimental results show the effectiveness. Finally, we discuss the potential severe concerns stemming from the observed source bias and hope our findings can serve as a critical wake-up call to the IR community and beyond. To facilitate future explorations of IR in the LLM era, the constructed two new benchmarks and codes will later be available at \url{https://github.com/KID-22/LLM4IR-Bias}.	翻訳日:2024-01-18 00:57:34 公開日:2024-01-14
# cxr-llava:胸部x線画像解釈のためのマルチモーダル大言語モデル CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images ( http://arxiv.org/abs/2310.18341v3 ) ライセンス: Link先を確認	Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, Soon Ho Yoon	(参考訳) Purpose: This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists Materials and Methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). ビジョントランスをDataset 1で事前学習した後、LLAVAネットワークに影響されたLLMと統合した。その後、モデルを微調整し、主にDataset 2.0を使用した。本モデルによる病理所見の診断成績は,ヒト放射線学者による放射線学的報告の受容性とともに評価された。結果: 実験群では, MIMIC内部試験群では6例で平均F1スコア0.81, 外部試験群では7例で0.62, 平均F1スコア0.81が得られた。 F1のスコアは両方のテストセットでGPT-4ビジョンとジェミニ-プロビジョンを上回った。ヒトの放射線技師による外部検査セットの評価では、このモデルは自律的な報告で72.7%の成功率を達成し、基礎的真理の84.0%をわずかに下回った。結論: 本研究は, CXR 解釈におけるマルチモーダル LLM の有意な可能性を示すとともに, 性能制限も認めている。これらの課題にもかかわらず、我々のモデルをオープンソースにすることはさらなる研究を触媒し、様々な臨床状況においてその有効性と適用性を広げるであろうと信じている。 CXR-LLAVAはhttps://github.com/ECOFRI/CXR_LLAVAで入手できる。 Purpose: This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists Materials and Methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLAVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting. Results: The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.62 for seven major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports. Conclusion: This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts. CXR-LLAVA is available at https://github.com/ECOFRI/CXR_LLAVA.	翻訳日:2024-01-18 00:57:03 公開日:2024-01-14
# ヘドニックゲームにおける$\varepsilon$-fractional core stability $\varepsilon$-fractional Core Stability in Hedonic Games ( http://arxiv.org/abs/2311.11101v2 ) ライセンス: Link先を確認	Simone Fioravanti, Michele Flammini, Bojana Kodric and Giovanna Varricchio	(参考訳) ヘドニックゲーム(Hedonic Games, HGs)は、古典的なフレームワークモデリングによる戦略エージェントの連立組織である。これらの選好によれば、連立構造(すなわち、エージェントを連立に分割する)がある種の安定性を満たすことが望ましい。そのような概念の最もよく知られた自然は、間違いなく核安定性である。非公式に、エージェントのサブセットがいわゆるcore-blocking coalitionで再グループ化することを望まない場合、パーティションはcore-stableである。残念なことに、コア安定なパーティションは滅多に存在せず、たとえそうであっても、そのパーティションを見つけることは計算的に困難であることが多い。これらの問題を回避するために、我々は$\varepsilon$-fractional core-stabilityという概念を提案する。このような緩和は、存在と多項式時間計算の両方を保証する可能性がある。具体的には,HG の基本クラスである Simple Fractional と Anonymous の2つに対して,$\varepsilon$-fractional core-stable partition と $\varepsilon$ を指数関数的に減少させる効率的なアルゴリズムを設計する。確率論的な観点では、$\varepsilon$-fractional coreの定義は、$\varepsilon$よりも低い確率で一様にサンプリングされた結合コアブロックを要求するのと同値であるので、より複雑なサンプリング分布を扱うために定義をさらに拡張する。この線に沿って、PAC学習方式でサンプルから評価を学習する必要がある場合、任意の信頼性を持つ$\varepsilon$-fractional core-stableという結果の効率的な計算を可能にする分布について、正および負の結果を与える。 Hedonic Games (HGs) are a classical framework modeling coalition formation of strategic agents guided by their individual preferences. According to these preferences, it is desirable that a coalition structure (i.e. a partition of agents into coalitions) satisfies some form of stability. The most well-known and natural of such notions is arguably core-stability. Informally, a partition is core-stable if no subset of agents would like to deviate by regrouping in a so-called core-blocking coalition. Unfortunately, core-stable partitions seldom exist and even when they do, it is often computationally intractable to find one. To circumvent these problems, we propose the notion of $\varepsilon$-fractional core-stability, where at most an $\varepsilon$-fraction of all possible coalitions is allowed to core-block. It turns out that such a relaxation may guarantee both existence and polynomial-time computation. Specifically, we design efficient algorithms returning an $\varepsilon$-fractional core-stable partition, with $\varepsilon$ exponentially decreasing in the number of agents, for two fundamental classes of HGs: Simple Fractional and Anonymous. From a probabilistic point of view, being the definition of $\varepsilon$-fractional core equivalent to requiring that uniformly sampled coalitions core-block with probability lower than $\varepsilon$, we further extend the definition to handle more complex sampling distributions. Along this line, when valuations have to be learned from samples in a PAC-learning fashion, we give positive and negative results on which distributions allow the efficient computation of outcomes that are $\varepsilon$-fractional core-stable with arbitrarily high confidence.	翻訳日:2024-01-18 00:49:37 公開日:2024-01-14
# プロパガンダスパンアノテーションのための大規模言語モデル Large Language Models for Propaganda Span Annotation ( http://arxiv.org/abs/2311.09812v2 ) ライセンス: Link先を確認	Maram Hasanain, Fatema Ahmed, Firoj Alam	(参考訳) 近年,オンラインコンテンツにおけるプロパガンダ的手法の利用が増加している。このようなコンテンツを自動で検出・削除する取り組みが、さまざまなモデリングシナリオで行われている。内容(テキスト、画像、またはマルチモーダル)を決定することを含む。 (i)プロパガンダである。 (ii)一つ以上の布教技術を用い、 (iii) スパンを識別できる技術を含む。最初の2つのシナリオは、後者と比較して重要な研究努力が注がれている。そこで本研究では,プロパガンダ的テキストスパンの検出に焦点をあてる。具体的には,GPT-4のような大規模言語モデル(LLM)が効果的にタスクを実行できるかどうかを検討する。さらに,よりコスト効率のよいアノテーションを収集するために,モデルを活用する可能性についても検討する。実験では,さまざまな専門知識を持つアノテータからのアノテーションからなる大規模社内データセットを用いた。その結果,人間のアノテーションと比較して,モデルの性能向上が示唆された。さらに,本研究は,この特定のタスクに注釈付きデータセットを開発するためにLLMを利用する可能性を示す最初のものである。 GPT-4を含む複数のアノテータから収集したスパンレベルラベルをコミュニティに提供する予定です。 The use of propagandistic techniques in online contents has increased in recent years aiming to manipulate online audiences. Efforts to automatically detect and debunk such content have been made addressing various modeling scenarios. These include determining whether the content (text, image, or multimodal) (i) is propagandistic, (ii) employs one or more propagandistic techniques, and (iii) includes techniques with identifiable spans. Significant research efforts have been devoted to the first two scenarios compared to the latter. Therefore, in this study, we focus on the task of detecting propagandistic textual spans. Specifically, we investigate whether large language models (LLMs), such as GPT-4, can effectively perform the task. Moreover, we study the potential of employing the model to collect more cost-effective annotations. Our experiments use a large-scale in-house dataset consisting of annotations from human annotators with varying expertise levels. The results suggest that providing more information to the model as prompts improves its performance compared to human annotations. Moreover, our work is the first to show the potential of utilizing LLMs to develop annotated datasets for this specific task, prompting it with annotations from human annotators with limited expertise. We plan to make the collected span-level labels from multiple annotators, including GPT-4, available for the community.	翻訳日:2024-01-18 00:48:41 公開日:2024-01-14
# 配電系統における知識グラフ構築 Knowledge Graph Construction in Power Distribution Networks ( http://arxiv.org/abs/2311.08724v2 ) ライセンス: Link先を確認	Xiang Li, Che Wang, Bing Li, Hao Chen, Sizhe Li	(参考訳) 本稿では,電力配電網における知識グラフ構築手法を提案する。本手法は,配信ネットワークの知識グラフとディスパッチテキストの両方において,意味的,音声的,統語的特徴を含む実体的特徴を利用する。畳み込みニューラルネットワークに基づく拡張モデルを用いて、テキストエンティティを知識グラフ内のエンティティと効果的にマッチングする。本モデルの有効性は実世界の配電シナリオにおける実験を通して評価される。その結果,提案モデルがベースラインと比較した場合,様々なエンティティタイプを結合し,電力分布知識グラフ構築タスクにおいて高い総合的精度を示すことが示された。 In this paper, we propose a method for knowledge graph construction in power distribution networks. This method leverages entity features, which involve their semantic, phonetic, and syntactic characteristics, in both the knowledge graph of distribution network and the dispatching texts. An enhanced model based on Convolutional Neural Network, is utilized for effectively matching dispatch text entities with those in the knowledge graph. The effectiveness of this model is evaluated through experiments in real-world power distribution dispatch scenarios. The results indicate that, compared with the baselines, the proposed model excels in linking a variety of entity types, demonstrating high overall accuracy in power distribution knowledge graph construction task.	翻訳日:2024-01-18 00:47:17 公開日:2024-01-14
# 証明可能訓練可能な回転同値量子機械学習 Provably Trainable Rotationally Equivariant Quantum Machine Learning ( http://arxiv.org/abs/2311.05873v3 ) ライセンス: Link先を確認	Maxwell T. West, Jamie Heredge, Martin Sevior and Muhammad Usman	(参考訳) 優れた機械学習アルゴリズムを実現するために量子計算のパワーを爆発させることは、近年では大きな研究の焦点となっているが、量子機械学習(QML)の展望は、かなりの技術的課題によって低下している。特に重要な問題は、一般的なQMLモデルは、トレーニングランドスケープにおいていわゆる不毛の台地に悩まされていることだ。この効果に対抗するための主要な戦略は、ヒルベルト空間のより小さく関連する部分集合に集中するために、データの対称性を考慮した問題固有のモデルを構築することである。本研究では、量子フーリエ変換に基づいて構築された回転同変QMLモデルの族を導入し、リー代数的なQMLモデルの最近の知見を活用し、我々のモデルのサブセットがバレンプラトーを示さないことを示す。解析結果に加えて, シリコン中のリン不純物の模擬走査トンネル顕微鏡画像のデータセット上で, 回転対称性が自然に生じる場合の回転同変モデルを数値的に検証し, それらが実用上劇的に向上していることを見出した。 Exploiting the power of quantum computation to realise superior machine learning algorithmshas been a major research focus of recent years, but the prospects of quantum machine learning (QML) remain dampened by considerable technical challenges. A particularly significant issue is that generic QML models suffer from so-called barren plateaus in their training landscapes -- large regions where cost function gradients vanish exponentially in the number of qubits employed, rendering large models effectively untrainable. A leading strategy for combating this effect is to build problem-specific models which take into account the symmetries of their data in order to focus on a smaller, relevant subset of Hilbert space. In this work, we introduce a family of rotationally equivariant QML models built upon the quantum Fourier transform, and leverage recent insights from the Lie-algebraic study of QML models to prove that (a subset of) our models do not exhibit barren plateaus. In addition to our analytical results we numerically test our rotationally equivariant models on a dataset of simulated scanning tunnelling microscope images of phosphorus impurities in silicon, where rotational symmetry naturally arises, and find that they dramatically outperform their generic counterparts in practice.	翻訳日:2024-01-18 00:46:07 公開日:2024-01-14
# GraphPro: 推奨のためのグラフ事前トレーニングとプロンプト学習 GraphPro: Graph Pre-training and Prompt Learning for Recommendation ( http://arxiv.org/abs/2311.16716v3 ) ライセンス: Link先を確認	Yuhao Yang, Lianghao Xia, Da Luo, Kangyi Lin, Chao Huang	(参考訳) GNNベースのレコメンデータは、マルチホップメッセージパッシングによる複雑なユーザ-イテムインタラクションのモデリングに長けている。しかし,既存手法ではユーザとイテムの相互作用の動的性質を無視することが多く,ユーザの嗜好の変化や,新たに到着したデータの分散シフトへの適応を阻害する。したがって、現実世界の動的環境におけるスケーラビリティと性能は限られている。本研究では,パラメータ効率と動的グラフ事前学習と即時学習を組み合わせたグラフプロを提案する。この新しい組み合わせにより、GNNは長期的なユーザの好みと短期的な振る舞いのダイナミクスの両方を効果的に捉え、正確でタイムリーなレコメンデーションの提供を可能にします。 graphproフレームワークは,事前学習したgnnモデルに時間的プロンプト機構とグラフ構造的プロンプト学習機構をシームレスに統合することにより,ユーザの好みを進化させる課題に対処する。時間的プロンプトメカニズムは、ユーザとイテムの相互作用に関する時間情報を符号化し、モデルが時間的コンテキストを自然に捉え、グラフ構造的プロンプト学習機構は、学習済みの知識を連続的なインクリメンタルトレーニングを必要とせずに、行動力学に適応させることができる。さらに,実世界の動的シナリオを模倣するレコメンデーションのための動的評価設定を導入し,オフライン・オンラインギャップをよりよいレベルに橋渡しする。大規模な産業展開を含む大規模な実験は、さまざまな最先端のレコメンデータと統合されたGraphProの軽量なプラグインスケーラビリティを示し、有効性、堅牢性、効率性の観点からGraphProの利点を強調します。 GNN-based recommenders have excelled in modeling intricate user-item interactions through multi-hop message passing. However, existing methods often overlook the dynamic nature of evolving user-item interactions, which impedes the adaption to changing user preferences and distribution shifts in newly arriving data. Thus, their scalability and performances in real-world dynamic environments are limited. In this study, we propose GraphPro, a framework that incorporates parameter-efficient and dynamic graph pre-training with prompt learning. This novel combination empowers GNNs to effectively capture both long-term user preferences and short-term behavior dynamics, enabling the delivery of accurate and timely recommendations. Our GraphPro framework addresses the challenge of evolving user preferences by seamlessly integrating a temporal prompt mechanism and a graph-structural prompt learning mechanism into the pre-trained GNN model. The temporal prompt mechanism encodes time information on user-item interaction, allowing the model to naturally capture temporal context, while the graph-structural prompt learning mechanism enables the transfer of pre-trained knowledge to adapt to behavior dynamics without the need for continuous incremental training. We further bring in a dynamic evaluation setting for recommendation to mimic real-world dynamic scenarios and bridge the offline-online gap to a better level. Our extensive experiments including a large-scale industrial deployment showcases the lightweight plug-in scalability of our GraphPro when integrated with various state-of-the-art recommenders, emphasizing the advantages of GraphPro in terms of effectiveness, robustness and efficiency.	翻訳日:2024-01-18 00:37:54 公開日:2024-01-14
# 責任あるAIメトリックのカタログに向けて:AIアカウンタビリティのためのメトリクスのコレクション Towards a Responsible AI Metrics Catalogue: A Collection of Metrics for AI Accountability ( http://arxiv.org/abs/2311.13158v2 ) ライセンス: Link先を確認	Boming Xia, Qinghua Lu, Liming Zhu, Sung Une Lee, Yue Liu, Zhenchang Xing	(参考訳) 人工知能(AI)、特にLarge Language Models(LLMs)のような大規模生成AI(GenAI)モデルの出現により、現代技術における変革的要素となった。これらのモデルは新たな可能性を解き放ちましたが、データプライバシに関する懸念や、誤解を招くようなコンテンツを生成する傾向など、重大な課題も提示しています。責任あるai(rai)のための現在のフレームワークは、特に説明責任のために、具体的なアプリケーションに必要な粒度のガイダンスを提供するのに不足することが多い。本研究は,学術文献と灰色文献の両方の知見を統合した,体系的多言語文献レビュー(MLR)によって構成された総合的なメトリクスカタログを導入することで,説明責任ギャップを橋渡しする。我々のカタログは、手続き的整合性を支えるプロセスメトリクス、必要なツールやフレームワークを提供するリソースメトリクス、AIシステムのアウトプットを反映する製品メトリクスを記述しています。この三部構成のフレームワークは、AIのアカウンタビリティを運用するために設計されており、特にGenAIの複雑さに対処することに焦点を当てている。提案されたメトリクスカタログは、AIシステムにアカウンタビリティを注入するための堅牢なフレームワークを提供する。組織に対して実践的で実行可能なガイダンスを提供し、この分野における責任あるプラクティスを形作る。 Artificial Intelligence (AI), particularly through the advent of large-scale generative AI (GenAI) models such as Large Language Models (LLMs), has become a transformative element in contemporary technology. While these models have unlocked new possibilities, they simultaneously present significant challenges, such as concerns over data privacy and the propensity to generate misleading or fabricated content. Current frameworks for Responsible AI (RAI) often fall short in providing the granular guidance necessary for tangible application, especially for Accountability-a principle that is pivotal for ensuring transparent and auditable decision-making, bolstering public trust, and meeting increasing regulatory expectations. This study bridges the accountability gap by introducing a comprehensive metrics catalogue, formulated through a systematic multivocal literature review (MLR) that integrates findings from both academic and grey literature. Our catalogue delineates process metrics that underpin procedural integrity, resource metrics that provide necessary tools and frameworks, and product metrics that reflect the outputs of AI systems. This tripartite framework is designed to operationalize Accountability in AI, with a special emphasis on addressing the intricacies of GenAI. The proposed metrics catalogue provides a robust framework for instilling Accountability in AI systems. It offers practical, actionable guidance for organizations, thereby shaping responsible practices in the field.	翻訳日:2024-01-18 00:34:23 公開日:2024-01-14
# 自己教師付きデータ選択と合成によるオンデバイス大規模言語モデルのパーソナライズ Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis ( http://arxiv.org/abs/2311.12275v3 ) ライセンス: Link先を確認	Ruiyang Qin, Jun Xia, Zhenge Jia, Meng Jiang, Ahmed Abbasi, Peipei Zhou, Jingtong Hu, Yiyu Shi	(参考訳) 大規模言語モデル(LLM)がエッジデバイスにデプロイされた後、ユーザ生成会話データから学習し、ユーザ固有のパーソナライズされた応答をリアルタイムで生成することが望ましい。しかし、ユーザ生成データは通常機密情報や個人情報が含まれており、アノテーションのためにクラウドにデータをアップロードすることは禁止されない。アノテーションをローカルに取得するには,ユーザの好みの回答を直接求めればよいが,そのようなアノテーションはユーザエクスペリエンスに影響を与えることはない。さらに、エッジデバイスのストレージは、通常、完全なユーザー生成データで大規模に微調整できるように制限されすぎます。少ないアノテーションと限られたオンデバイスストレージを考慮して、オンデバイス LLM のパーソナライズを有効にする方法は未解決のままである。本稿では,最も代表的なデータを自己管理方式でオンラインに選択・保存する新しい枠組みを提案する。このようなデータはメモリフットプリントが小さく、ユーザアノテーションの頻繁なリクエストでさらなる微調整が可能になる。微調整品質を高めるため、LLMを用いて複数の意味的に類似した質問文と期待応答を生成する。実験の結果,提案フレームワークは,バニラベースラインと比較して,ユーザ固有のコンテンツ生成能力(精度)と微調整速度(性能)に優れていた。私たちの知る限りでは、これが初めてのオンデバイスLDMパーソナライズフレームワークです。 After a large language model (LLM) is deployed on edge devices, it is desirable for these devices to learn from user-generated conversation data to generate user-specific and personalized responses in real-time. However, user-generated data usually contains sensitive and private information, and uploading such data to the cloud for annotation is not preferred if not prohibited. While it is possible to obtain annotation locally by directly asking users to provide preferred responses, such annotations have to be sparse to not affect user experience. In addition, the storage of edge devices is usually too limited to enable large-scale fine-tuning with full user-generated data. It remains an open question how to enable on-device LLM personalization, considering sparse annotation and limited on-device storage. In this paper, we propose a novel framework to select and store the most representative data online in a self-supervised way. Such data has a small memory footprint and allows infrequent requests of user annotations for further fine-tuning. To enhance fine-tuning quality, multiple semantically similar pairs of question texts and expected responses are generated using the LLM. Our experiments show that the proposed framework achieves the best user-specific content-generating capability (accuracy) and fine-tuning speed (performance) compared with vanilla baselines. To the best of our knowledge, this is the very first on-device LLM personalization framework.	翻訳日:2024-01-18 00:33:34 公開日:2024-01-14
# 雑音ラベルを用いたカリキュラム学習による強化学習におけるパリティ課題の探索 Exploring Parity Challenges in Reinforcement Learning through Curriculum Learning with Noisy Labels ( http://arxiv.org/abs/2312.05379v2 ) ライセンス: Link先を確認	Bei Zhou, Soren Riis	(参考訳) 本稿では,戦略ゲームにおける強化学習(rl)の適用について,特にgoとチェスの特定の位置やより広い範囲の公平なゲームに見られるように,パリティチャレンジを特徴とするものについて述べる。本研究では,カリキュラム学習フレームワーク内に構築され,ノイズラベルを付加したシミュレーション学習プロセスを提案し,自己学習シナリオの複雑さを反映する。このアプローチは、ニューラルネットワーク(nn)が初等から複雑化するゲームポジションへの適応と進化を徹底的に分析する。実験の結果,最小限のラベルノイズでもnnsの効果的な戦略を識別する能力は著しく阻害され,ゲーム位置の複雑さが増すにつれて難易度が高まることがわかった。これらの知見は, 騒音評価による障害に対応するため, RLトレーニングにおける高度な方法論の必要性を浮き彫りにした。このような手法の開発は、重要なパリティ要素を持つ戦略ゲームにおけるNN能力の向上だけでなく、多様な複雑な環境におけるRLシステムのレジリエンスと効率の向上にも不可欠である。 This paper delves into applying reinforcement learning (RL) in strategy games, particularly those characterized by parity challenges, as seen in specific positions of Go and Chess and a broader range of impartial games. We propose a simulated learning process, structured within a curriculum learning framework and augmented with noisy labels, to mirror the intricacies of self-play learning scenarios. This approach thoroughly analyses how neural networks (NNs) adapt and evolve from elementary to increasingly complex game positions. Our empirical research indicates that even minimal label noise can significantly impede NNs' ability to discern effective strategies, a difficulty that intensifies with the growing complexity of the game positions. These findings underscore the urgent need for advanced methodologies in RL training, specifically tailored to counter the obstacles imposed by noisy evaluations. The development of such methodologies is crucial not only for enhancing NN proficiency in strategy games with significant parity elements but also for broadening the resilience and efficiency of RL systems across diverse and complex environments.	翻訳日:2024-01-18 00:26:19 公開日:2024-01-14
# 多元的意思決定のための多元的ランキング Multi-Weight Ranking for Multi-Criteria Decision Making ( http://arxiv.org/abs/2312.03006v2 ) ライセンス: Link先を確認	Andreas H Hamel and Daniel Kostner	(参考訳) 統計値からコーン分布関数を多基準決定ツールに変換する。重み付き和スカラー化を事前に固定するのではなく、重み付き和スカラー化全体のコレクションを一度に吸収するため、この手順は重み付き和スカラー化のアップグレードと考えることができる。例として、純粋な重み付き和のスカラー化とは対照的に、この種のスカラー化はパレートフロンティアの「非凸」部分を検出することもできる。異なるランク逆転が発生する状況が特徴であり、なぜこのような状況がランキング手順の分析に有用かが説明されている。ランキング関数は、まず、集合最適化法と集合ベースの多目的最適化の間のリンクを確立する集合選好のための統一指標を提供する集合に拡張される。機械学習の潜在的な応用について概説する。 Cone distribution functions from statistics are turned into Multi-Criteria Decision Making tools. It is demonstrated that this procedure can be considered as an upgrade of the weighted sum scalarization insofar as it absorbs a whole collection of weighted sum scalarizations at once instead of fixing a particular one in advance. As examples show, this type of scalarization--in contrast to a pure weighted sum scalarization-is also able to detect ``non-convex" parts of the Pareto frontier. Situations are characterized in which different types of rank reversal occur, and it is explained why this might even be useful for analyzing the ranking procedure. The ranking functions are then extended to sets providing unary indicators for set preferences which establishes, for the first time, the link between set optimization methods and set-based multi-objective optimization. A potential application in machine learning is outlined.	翻訳日:2024-01-18 00:23:53 公開日:2024-01-14
# フェデレーション学習におけるデータ注入攻撃の軽減 Mitigating Data Injection Attacks on Federated Learning ( http://arxiv.org/abs/2312.02102v3 ) ライセンス: Link先を確認	Or Shalom, Amir Leshem, Waheed U. Bajwa	(参考訳) フェデレーション学習(federated learning)は、複数のエンティティがデータプライバシを損なうことなく、データを使用したモデルを協調的にトレーニングするテクニックである。しかし、その利点にもかかわらず、連合学習は誤ったデータインジェクション攻撃の影響を受けやすい。これらのシナリオでは、ネットワーク内の特定のエージェントを制御した悪意のあるエンティティが学習プロセスを操作でき、亜最適モデルにつながる。その結果、これらのデータ注入攻撃に対処することは、連合学習システムにおいて重要な研究課題となる。本稿では,フェデレーション学習システムにおけるデータインジェクション攻撃の検出と軽減を行う新しい手法を提案する。提案手法は局所的なスキームであり,コーディネートノードによるトレーニングの単一インスタンスで実行し,アルゴリズムの収束時の緩和を可能にする。エージェントが攻撃者であると疑われた場合、そのデータは一定期間無視される場合、この決定はしばしば再評価される。確率 1 の場合、有限時間後に全ての攻撃者は無視されるが、信頼できるエージェントを無視する確率は 0 になる。シミュレーションにより、コーディネートノードがすべての攻撃者を検出して分離すると、モデルは回復し、真理のあるモデルに収束する。 Federated learning is a technique that allows multiple entities to collaboratively train models using their data without compromising data privacy. However, despite its advantages, federated learning can be susceptible to false data injection attacks. In these scenarios, a malicious entity with control over specific agents in the network can manipulate the learning process, leading to a suboptimal model. Consequently, addressing these data injection attacks presents a significant research challenge in federated learning systems. In this paper, we propose a novel technique to detect and mitigate data injection attacks on federated learning systems. Our mitigation method is a local scheme, performed during a single instance of training by the coordinating node, allowing the mitigation during the convergence of the algorithm. Whenever an agent is suspected to be an attacker, its data will be ignored for a certain period, this decision will often be re-evaluated. We prove that with probability 1, after a finite time, all attackers will be ignored while the probability of ignoring a trustful agent becomes 0, provided that there is a majority of truthful agents. Simulations show that when the coordinating node detects and isolates all the attackers, the model recovers and converges to the truthful model.	翻訳日:2024-01-18 00:23:17 公開日:2024-01-14
# RetailKLIP : ゼロショット製品画像分類のための単一のGPUを用いたメトリック学習によるOpenCLIPバックボーンの微細化 RetailKLIP : Finetuning OpenCLIP backbone using metric learning on a single GPU for Zero-shot retail product image classification ( http://arxiv.org/abs/2312.10282v2 ) ライセンス: Link先を確認	Muktabh Mayank Srivastava	(参考訳) 小売商品やパッケージ商品の画像は、セルフチェックアウトストア、サプライチェーン自動化、小売実行評価など、さまざまなコンピュータビジョンアプリケーションで分類する必要がある。これまでの研究は、この目的のために深いモデルを微調整する方法を探っている。しかし、事前訓練されたバックボーン用の大型モデルやリニアレイヤーを微調整する場合、分類範囲に追加された新しい小売商品ごとに、少なくとも数エポックな勾配勾配を必要とするため、現実のシナリオでは頻繁なリトレーニングが必要である。本研究では,クリップモデルの視覚エンコーダを,その埋め込みを最寄りの近傍の分類に容易に利用できるように微調整すると同時に,完全な微調整に近い精度を得る手法を提案する。最寄りの隣り合う分類器は、新製品の漸進的な訓練を必要とせず、リソースと待ち時間を節約できる。 Retail product or packaged grocery goods images need to classified in various computer vision applications like self checkout stores, supply chain automation and retail execution evaluation. Previous works explore ways to finetune deep models for this purpose. But because of the fact that finetuning a large model or even linear layer for a pretrained backbone requires to run at least a few epochs of gradient descent for every new retail product added in classification range, frequent retrainings are needed in a real world scenario. In this work, we propose finetuning the vision encoder of a CLIP model in a way that its embeddings can be easily used for nearest neighbor based classification, while also getting accuracy close to or exceeding full finetuning. A nearest neighbor based classifier needs no incremental training for new products, thus saving resources and wait time.	翻訳日:2024-01-18 00:13:22 公開日:2024-01-14
# 知識グラフによるアスペクトレベル感性分析 Knowledge Graph Enhanced Aspect-Level Sentiment Analysis ( http://arxiv.org/abs/2312.10048v2 ) ライセンス: Link先を確認	Kavita Sharma, Ritu Patel, Sunita Iyer	(参考訳) 本稿では,文脈固有の単語意味の課題に対処し,感情分析を強化する新しい手法を提案する。 BERTモデルの利点と知識グラフに基づく同義データを組み合わせる。このシナジーは動的注意機構を利用して知識駆動状態ベクトルを開発する。特定の側面に関連する感情を分類するために、この手法は位置データを統合するメモリバンクを構築する。データはDCGRUを用いて分析され、特定のアスペクト項に関連する感情特性をピンポイントする。 3つの広く使われているデータセットに対する実験は、感情分類における手法の優れた性能を示す。 In this paper, we propose a novel method to enhance sentiment analysis by addressing the challenge of context-specific word meanings. It combines the advantages of a BERT model with a knowledge graph based synonym data. This synergy leverages a dynamic attention mechanism to develop a knowledge-driven state vector. For classifying sentiments linked to specific aspects, the approach constructs a memory bank integrating positional data. The data are then analyzed using a DCGRU to pinpoint sentiment characteristics related to specific aspect terms. Experiments on three widely used datasets demonstrate the superior performance of our method in sentiment classification.	翻訳日:2024-01-18 00:12:11 公開日:2024-01-14
# Ensemble Kalman Filtering:非平均場とオンライン推論のためのガウスプロセスSSM Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference ( http://arxiv.org/abs/2312.05910v4 ) ライセンス: Link先を確認	Zhidi Lin and Yiyong Sun and Feng Yin and Alexandre Hoang Thi\'ery	(参考訳) ガウス過程状態空間モデル(GPSSM)は、データ駆動非線形力学系モデルの多用途クラスを表す。しかしながら、gpssmに多くの潜在変数が存在することは、既存の変分推論のアプローチ、特により現実的な非平均場(nmf)の仮定の下で、広範囲なトレーニング作業、妥協された推論精度、オンラインアプリケーションへの実現不可能など、未解決の問題を引き起こす。本稿では, モデルベースフィルタリング手法であるアンサンブルカルマンフィルタ(EnKF)をNMF変分推論フレームワークに組み込んで, 潜伏状態の後方分布を近似することで, これらの課題に対処する。 EnKFとGPSSMのこの新しい結婚は、変分分布の学習における広範なパラメータ化の必要性をなくすだけでなく、エビデンスの下限(ELBO)の解釈可能な閉形式近似を可能にする。さらに、EnKFによるパラメータ化の合理化により、オンライン学習アプリケーションでは、新しいGPSSMモデルを容易に利用できる。提案手法は,データフィッティング精度を確保しつつ,過剰フィッティングを緩和するモデル正則化を取り入れることで,目的関数を具体化する。また,提案アルゴリズムの詳細な分析と新たな洞察も提供する。多様な実・合成データセット間の包括的評価は、既存の手法と比較して、EnKF支援変分推論アルゴリズムの優れた学習と推論性能を裏付ける。 The Gaussian process state-space models (GPSSMs) represent a versatile class of data-driven nonlinear dynamical system models. However, the presence of numerous latent variables in GPSSM incurs unresolved issues for existing variational inference approaches, particularly under the more realistic non-mean-field (NMF) assumption, including extensive training effort, compromised inference accuracy, and infeasibility for online applications, among others. In this paper, we tackle these challenges by incorporating the ensemble Kalman filter (EnKF), a well-established model-based filtering technique, into the NMF variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO). Moreover, owing to the streamlined parameterization via the EnKF, the new GPSSM model can be easily accommodated in online learning applications. We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting. We also provide detailed analysis and fresh insights for the proposed algorithms. Comprehensive evaluation across diverse real and synthetic datasets corroborates the superior learning and inference performance of our EnKF-aided variational inference algorithms compared to existing methods.	翻訳日:2024-01-18 00:09:15 公開日:2024-01-14
# 畳み込みニューラルネットワークの効率向上のためのブロックプルーニング Block Pruning for Enhanced Efficiency in Convolutional Neural Networks ( http://arxiv.org/abs/2312.16904v2 ) ライセンス: Link先を確認	Cheng-En Wu, Azadeh Davoodi, Yu Hen Hu	(参考訳) 本稿では,エッジコンピューティング環境におけるディープニューラルネットワークにおけるブロックプルーニングをターゲットとしたネットワークプルーニング手法を提案する。提案手法は,プロキシメトリクスを利用した従来の手法と異なり,直接ブロック除去戦略を用いて分類精度への影響を評価する。このハンズオンアプローチにより、各ブロックの重要性を正確に評価することができる。 resnetアーキテクチャを用いてcifar-10,cifar-100,imagenetデータセットの広範な実験を行った。本研究では,特にimagenet with resnet50のような大規模データセットにおいて,ネットワークのかなりの部分を刈り取る場合でも,精度を維持しながらモデルサイズを小さくする効果を示した。この結果は、特にリソース制約のあるエッジコンピューティングシナリオにおいて、モデルサイズとパフォーマンスの最適なバランスを維持するための手法の能力を強調する。 This paper presents a novel approach to network pruning, targeting block pruning in deep neural networks for edge computing environments. Our method diverges from traditional techniques that utilize proxy metrics, instead employing a direct block removal strategy to assess the impact on classification accuracy. This hands-on approach allows for an accurate evaluation of each block's importance. We conducted extensive experiments on CIFAR-10, CIFAR-100, and ImageNet datasets using ResNet architectures. Our results demonstrate the efficacy of our method, particularly on large-scale datasets like ImageNet with ResNet50, where it excelled in reducing model size while retaining high accuracy, even when pruning a significant portion of the network. The findings underscore our method's capability in maintaining an optimal balance between model size and performance, especially in resource-constrained edge computing scenarios.	翻訳日:2024-01-18 00:00:47 公開日:2024-01-14
# 量子ビオレント緩和条件について On the Conditions for a Quantum Violent Relaxation ( http://arxiv.org/abs/2312.14768v2 ) ライセンス: Link先を確認	Giachetti Guido and Defenu Nicol\`o	(参考訳) 一般に、古典的な完全連結系は激しい緩和を受けることが知られている。この現象は、熱力学的限界における平均場効果に支配されているにもかかわらず、観測可能な値を有限時間スケールで定常な非熱的値に緩和することを指す。ここでは,熱力学的極限における2体,全対一の相互作用を持つ一般多体系の動力学を解析し,平均場有効ハミルトニアンのスペクトル上で非常に特異的な条件下での暴力的緩和を行うためには,これらの条件がほとんど満たされず,古典的条件に対して「量子」暴力的緩和がほとんど観測されないことを示す。我々の予測はスピンモデルの研究によって検証され、カップリングの値によって、暴力的関係と一般的な熱前相の間の遷移を示す。また, 量子ハミルトニアン-平均場模型のスピンバージョンを解析し, 暴力的相関を示さないことを示した。最後に,暴力的相対図を古典的限界に戻す方法について論じる。その結果、平均場状態においても量子効果がダイナミクスにかなり劇的な影響を与え、光と物質が結合した系の理解を深める方法が示されている。 In general, classical fully-connected systems are known to undergo violent relaxation. This phenomenon refers to the relaxation of observables to stationary, non-thermal, values on a finite timescale, despite their long-time dynamics being dominated by mean-field effects in the thermodynamic limit. Here, we analyze the ``quantum" violent relaxation by studying the dynamics of generic many-body systems with two-body, all-to-all, interactions in the thermodynamic limit. We show that, in order for violent relaxation to occur very specific conditions on the spectrum of the mean-field effective Hamiltonian have to be met. These conditions are hardly met and ``quantum" violent relaxation is observed rarely with respect to its classical counterpart. Our predictions are validated by the study of a spin model which, depending on the value of the coupling, shows a transition between violent-relaxation and a generic prethermal phase. We also analyze a spin version of the quantum Hamiltonian-Mean-Field model, which is shown not to exhibit violent-relaxation. Finally, we discuss how the violent-relaxation picture emerges back in the classical limit. Our results demonstrate how, even in the mean-field regime, quantum effects have a rather dramatic impact on the dynamics, paving the way to a better understanding of light-matter coupled systems.	翻訳日:2024-01-17 23:59:32 公開日:2024-01-14
# ランダム行列理論における一般化スペクトル形状因子 Generalized Spectral Form Factor in Random Matrix Theory ( http://arxiv.org/abs/2401.02119v2 ) ライセンス: Link先を確認	Zhiyang Wei, Chengming Tan, Ren Zhang	(参考訳) スペクトル形成因子(SFF)は、複雑な系におけるエネルギー準位分布の統計的性質を明らかにする上で重要な役割を果たす。量子カオスを診断し、普遍的なダイナミクスを解き放つツールの1つである。ほとんどの文献におけるsffの定義は、2段階の相関のみを包含する。本稿では,SSFの定義を高次相関を含むように拡張する。具体的には、一般化スペクトル形式因子(gsff)をフーリエ変換によって得ることができる相関関数を定義するために、エネルギー準位の標準偏差を導入する。 GSFFはカオスシステムの力学に関するより包括的な知識を提供する。ランダム行列を例として,GSFFで符号化された新しい動的特徴を示す。驚くべきことに、gsffは複雑であり、実部と虚部の両方が普遍的なダイナミクスを示している。例えば、二段階相関の場合、GSFFの実部は、従来のものと類似したディップ・ランプ・プラトー構造を示し、異なるシステムサイズに対する想像的部分は、長い時間制限で収束する。 2レベルGSFFでは、実部の閉解析形式が得られ、数値結果と一致している。虚部の結果は数値計算により得られる。同様の分析は3レベルGSFFに拡張される。 The spectral form factor (SFF) plays a crucial role in revealing the statistical properties of energy level distributions in complex systems. It is one of the tools to diagnose quantum chaos and unravel the universal dynamics therein. The definition of SFF in most literature only encapsulates the two-level correlation. In this manuscript, we extend the definition of SSF to include the high-order correlation. Specifically, we introduce the standard deviation of energy levels to define correlation functions, from which the generalized spectral form factor (GSFF) can be obtained by Fourier transforms. GSFF provides a more comprehensive knowledge of the dynamics of chaotic systems. Using random matrices as examples, we demonstrate new dynamics features that are encoded in GSFF. Remarkably, the GSFF is complex, and both the real and imaginary parts exhibit universal dynamics. For instance, in the two-level correlated case, the real part of GSFF shows a dip-ramp-plateau structure akin to the conventional counterpart, and the imaginary part for different system sizes converges in the long time limit. For the two-level GSFF, the closed analytical forms of the real part are obtained and consistent with numerical results. The results of the imaginary part are obtained by numerical calculation. Similar analyses are extended to three-level GSFF.	翻訳日:2024-01-17 23:50:25 公開日:2024-01-14
# ToolEyes: 実世界のシナリオにおける大規模言語モデルのツール学習能力の評価 ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios ( http://arxiv.org/abs/2401.00741v2 ) ライセンス: Link先を確認	Junjie Ye, Guanyu Li, Songyang Gao, Caishuang Huang, Yilong Wu, Sixian Li, Xiaoran Fan, Shihan Dou, Qi Zhang, Tao Gui, Xuanjing Huang	(参考訳) 既存のツール学習の評価は、主に、大きな言語モデル(LLM)のための選択されたツールのアライメントと期待された結果の検証に重点を置いている。しかし、これらのアプローチは、答えを事前に決定し、真のニーズから逸脱する、限られたシナリオに依存している。さらに、成果にのみ重点を置くことは、LLMがツールを効果的に活用するために必要な複雑な能力を無視している。この問題に対処するために,実シナリオにおけるLLMのツール学習能力の評価に適した,きめ細かいシステムであるToolEyesを提案する。このシステムは7つの実世界のシナリオを精査し、ツール学習においてllmに不可欠な5つの次元(フォーマットアライメント、意図理解、行動計画、ツール選択、回答組織)を分析している。さらに tooleyes には,約600のツールを備えたツールライブラリが組み込まれており,llm と物理世界の仲介役を担っている。 3つのカテゴリにわたる10のLSMに関する評価は、ツール学習における特定のシナリオと限定的な認知能力の好みを明らかにしている。興味深いことに、モデルサイズの拡大は、ツール学習の障害を悪化させる。これらの発見は、ツール学習の分野を前進させるための指導的洞察を提供する。データはatt https://github.com/junjie-ye/tooleyesで入手できる。 Existing evaluations of tool learning primarily focus on validating the alignment of selected tools for large language models (LLMs) with expected outcomes. However, these approaches rely on a limited set of scenarios where answers can be pre-determined, diverging from genuine needs. Furthermore, a sole emphasis on outcomes disregards the intricate capabilities essential for LLMs to effectively utilize tools. To tackle this issue, we propose ToolEyes, a fine-grained system tailored for the evaluation of the LLMs' tool learning capabilities in authentic scenarios. The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning: format alignment, intent comprehension, behavior planning, tool selection, and answer organization. Additionally, ToolEyes incorporates a tool library boasting approximately 600 tools, serving as an intermediary between LLMs and the physical world. Evaluations involving ten LLMs across three categories reveal a preference for specific scenarios and limited cognitive abilities in tool learning. Intriguingly, expanding the model size even exacerbates the hindrance to tool learning. These findings offer instructive insights aimed at advancing the field of tool learning. The data is available att https://github.com/Junjie-Ye/ToolEyes.	翻訳日:2024-01-17 23:47:28 公開日:2024-01-14
# MRI画像のセグメンテーションのための教師なしフェデレーションドメイン適応 Unsupervised Federated Domain Adaptation for Segmentation of MRI Images ( http://arxiv.org/abs/2401.02941v2 ) ライセンス: Link先を確認	Navapat Nananukul, Hamid Soltanian-zadeh, Mohammad Rostami	(参考訳) ディープニューラルネットワークを用いたMRI画像の自動セマンティックセグメンテーションは、様々な臨床応用のための治療の評価と計画に大いに役立っている。しかし、これらのモデルのトレーニングは、エンド・ツー・エンドの教師付き学習手順を実装するために、豊富な注釈付きデータを利用できることを条件としている。十分なアノテートデータであっても、MRI画像は、患者、MRIスキャナー、画像プロトコルの違いなどの要因により、かなりのばらつきを示す。この可変性は、特定のアプリケーションドメインごとにニューラルネットワークを再トレーニングする必要がある。永続的なデータアノテーションの必要性を緩和するために、複数のアノテーション付きソースドメインを用いた教師なしフェデレーションドメイン適応法を開発した。提案手法により,アノテートされていないターゲットドメインにおいて,複数のアノテートされたソースドメインからの知識の伝達が可能となる。当初、ターゲット領域とソース領域の分布のペアワイド距離を最小化することにより、ターゲット領域データが、ディープエンコーダの出力としてモデル化された遅延埋め込み空間において、各ソースドメインと類似の表現を共有することを保証する。そして、すべてのドメインから得られた知識を活用するためにアンサンブルアプローチを採用します。提案手法の有効性を実証するため,MICCAI 2016マルチサイトデータセットの理論的解析と実験を行った。 Automatic semantic segmentation of magnetic resonance imaging (MRI) images using deep neural networks greatly assists in evaluating and planning treatments for various clinical applications. However, training these models is conditioned on the availability of abundant annotated data to implement the end-to-end supervised learning procedure. Even if we annotate enough data, MRI images display considerable variability due to factors such as differences in patients, MRI scanners, and imaging protocols. This variability necessitates retraining neural networks for each specific application domain, which, in turn, requires manual annotation by expert radiologists for all new domains. To relax the need for persistent data annotation, we develop a method for unsupervised federated domain adaptation using multiple annotated source domains. Our approach enables the transfer of knowledge from several annotated source domains to adapt a model for effective use in an unannotated target domain. Initially, we ensure that the target domain data shares similar representations with each source domain in a latent embedding space, modeled as the output of a deep encoder, by minimizing the pair-wise distances of the distributions for the target domain and the source domains. We then employ an ensemble approach to leverage the knowledge obtained from all domains. We provide theoretical analysis and perform experiments on the MICCAI 2016 multi-site dataset to demonstrate our method is effective.	翻訳日:2024-01-17 23:32:31 公開日:2024-01-14
# VLP:自動運転のためのビジョン言語計画 VLP: Vision Language Planning for Autonomous Driving ( http://arxiv.org/abs/2401.05577v2 ) ライセンス: Link先を確認	Chenbin Pan, Burhaneddin Yaman, Tommaso Nesti, Abhirup Mallik, Alessandro G Allievi, Senem Velipasalar, Liu Ren	(参考訳) 自動運転は複雑な課題であり、シーンの理解と推論を通じて安全な動き計画を目指す。視覚のみの自動運転手法は最近、シーン理解の強化を通じて目覚ましいパフォーマンスを達成したが、推論の欠如、一般化性能の低下、ロングテールシナリオなど、いくつかの重要な問題はまだ対処する必要がある。本稿では,言語理解と自律運転のギャップを埋めるために,言語モデルを活用したビジョン言語計画フレームワークvlpを提案する。 VLPは、ソースメモリ基盤と自動運転車のコンテキスト理解の両方を強化することで、自律運転システムを強化する。 VLPは,従来の最良手法と比較して,平均L2誤差と衝突速度をそれぞれ35.9\%,60.5\%削減することで,挑戦的なNuScenesデータセットの最先端のプランニング性能を達成する。さらに、VLPは、新しい都市環境に直面した場合、挑戦的なロングテールシナリオと強力な一般化能力の性能向上を示す。 Autonomous driving is a complex and challenging task that aims at safe motion planning through scene understanding and reasoning. While vision-only autonomous driving methods have recently achieved notable performance, through enhanced scene understanding, several key issues, including lack of reasoning, low generalization performance and long-tail scenarios, still need to be addressed. In this paper, we present VLP, a novel Vision-Language-Planning framework that exploits language models to bridge the gap between linguistic understanding and autonomous driving. VLP enhances autonomous driving systems by strengthening both the source memory foundation and the self-driving car's contextual understanding. VLP achieves state-of-the-art end-to-end planning performance on the challenging NuScenes dataset by achieving 35.9\% and 60.5\% reduction in terms of average L2 error and collision rates, respectively, compared to the previous best method. Moreover, VLP shows improved performance in challenging long-tail scenarios and strong generalization capabilities when faced with new urban environments.	翻訳日:2024-01-17 23:25:02 公開日:2024-01-14
# よく教育された知性の本質的善さ The inherent goodness of well educated intelligence ( http://arxiv.org/abs/2401.04846v2 ) ライセンス: Link先を確認	Michael E. Glinsky and Sharon Sievert	(参考訳) この論文は、生物学的な存在であろうと、コンピューター上の人工シリコンであろうと、何が知的であるかを調べる。特に注目されるのは、保守的に相互作用する多くの同一の保守的なサブシステムの集合システムを特徴づけ、制御する能力を持つことである。インテリジェンスの本質は、黄金律("the collective act as one" または "knowing the global consequences of local action")である。集合体の流れは小さなツインクリングテクスチャの集合であり、最小作用の測地運動に従って少数の弦を引いている人形師によって支配され、対称性によって決定される。集団的保守システムの制御は困難であり、歴史的に、最大性能の望ましいメタ安定平衡を安定化するためにシステムに大きな粘度を加えることによって行われてきた。代替案がある。メタ安定平衡の最適双極子テクスチャが知的存在(集合系が特徴)によって同定されると、集合系は知的存在によって最適な双極子テクスチャに移動され、その後、集合系がメタ安定平衡に残るように、知的存在によって迅速に振動される。知識に富んだ知性は、その地域行動の世界的な影響を知っており、短期的な行動が長期的な成果を損なうことはない。対照的に、訓練された知性や訓練された愚かさは短期的な行動を最適化する。教養のある知性は本質的に良いが、訓練された愚かさは本質的に悪であり、恐れるべきである。特に、経済・社会集団の制御と最適化に注意が払われている。 This paper will examine what makes a being intelligent, whether that be a biological being or an artificial silicon being on a computer. Special attention will be paid to the being having the ability to characterize and control a collective system of many identical conservative sub-systems conservatively interacting. The essence of intelligence will be found to be the golden rule -- "the collective acts as one" or "knowing the global consequences of local actions". The flow of the collective is a small set of twinkling textures, that are governed by a puppeteer who is pulling a small number of strings according to a geodesic motion of least action, determined by the symmetries. Controlling collective conservative systems is difficult and has historically been done by adding significant viscosity to the system to stabilize the desirable meta stable equilibriums of maximum performance, but it degrades or destroys them in the process. There is an alternative. Once the optimum twinkling textures of the meta stable equilibriums are identified by the intelligent being (that is the collective system is characterized), the collective system can be moved by the intelligent being to the optimum twinkling textures, then quickly vibrated by the intelligent being according to the textures so that the collective system remains at the meta stable equilibrium. Well educated intelligence knows the global consequences of its local actions so that it will not take short term actions that will lead to poor long term outcomes. In contrast, trained intelligence or trained stupidity will optimize its short term actions, leading to poor long term outcomes. Well educated intelligence is inherently good, but trained stupidity is inherently evil and should be feared. Particular attention is paid to the control and optimization of economic and social collectives.	翻訳日:2024-01-17 23:22:02 公開日:2024-01-14
# 多感性属性の連続的公正なメカニズム A Sequentially Fair Mechanism for Multiple Sensitive Attributes ( http://arxiv.org/abs/2309.06627v2 ) ライセンス: Link先を確認	Fran\c{c}ois Hu and Philipp Ratz and Arthur Charpentier	(参考訳) アルゴリズム的公平性の標準的なユースケースでは、敏感な変数と対応するスコアの関係をなくすことが目標である。近年、科学コミュニティは、この課題を解決するための多くの定義とツールを開発しており、多くの実用的な応用でうまく機能している。しかし、これらのツールや定義の適用性や効果性は、複数の敏感な属性の場合、それほど単純ではない。この問題に取り組むため,我々は,機密性の高い機能セットの公平性を段階的に達成するためのシーケンシャルフレームワークを提案する。マルチマルジナル・ワッサーシュタイン・バリセンタを利用することにより,複数の感度特性を持つ場合に対して,強デモグラフィック・パリティの標準概念を拡張する。この方法はまた、最適で逐次的に公正な予測器に対する閉形式解を提供し、感度の高い特徴相関を明確に解釈する。当社のアプローチは、リスクと不公平の間のトレードオフを緩和するフレームワークを包含することで、公平性をシームレスに拡張します。この拡張により、機密属性のセット内の特定の属性に対する公平性の改善を目標とする優先順位付けが可能となり、ケース固有の適応が可能になる。導出溶液のデータ駆動推定法を開発し,合成データと実データの両方について総合的な数値実験を行った。実験の結果は,公平な意思決定を育むための後処理アプローチの実際的効果を決定的に強調する。 In the standard use case of Algorithmic Fairness, the goal is to eliminate the relationship between a sensitive variable and a corresponding score. Throughout recent years, the scientific community has developed a host of definitions and tools to solve this task, which work well in many practical applications. However, the applicability and effectivity of these tools and definitions becomes less straightfoward in the case of multiple sensitive attributes. To tackle this issue, we propose a sequential framework, which allows to progressively achieve fairness across a set of sensitive features. We accomplish this by leveraging multi-marginal Wasserstein barycenters, which extends the standard notion of Strong Demographic Parity to the case with multiple sensitive characteristics. This method also provides a closed-form solution for the optimal, sequentially fair predictor, permitting a clear interpretation of inter-sensitive feature correlations. Our approach seamlessly extends to approximate fairness, enveloping a framework accommodating the trade-off between risk and unfairness. This extension permits a targeted prioritization of fairness improvements for a specific attribute within a set of sensitive attributes, allowing for a case specific adaptation. A data-driven estimation procedure for the derived solution is developed, and comprehensive numerical experiments are conducted on both synthetic and real datasets. Our empirical findings decisively underscore the practical efficacy of our post-processing approach in fostering fair decision-making.	翻訳日:2024-01-17 21:31:50 公開日:2024-01-14
# ダウンストリーム推論に不完全サロゲートを使用する:大規模言語モデルの社会科学への応用のための設計に基づく教師付き学習 Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models ( http://arxiv.org/abs/2306.04746v3 ) ライセンス: Link先を確認	Naoki Egami, Musashi Hinck, Brandon M. Stewart, Hanying Wei	(参考訳) 計算社会科学(css)では、研究者は文書を分析して社会・政治現象を説明する。多くのシナリオでは、CSS研究者がまずドキュメントのラベルを取得し、2番目のステップで解釈可能な回帰分析を使用してラベルを説明する。ドキュメントを安価にアノテートする一般的な方法のひとつに、大きな言語モデル(LLM)がある。しかし、他のスケーラブルなアノテーション生成方法と同様に、このような代理ラベルはしばしば不完全で偏りがある。本稿では,css研究の基礎となる漸近的不偏性や不確かさといった統計的性質を保証しつつ,下流統計解析に不完全アノテーションサロゲートを用いる新しいアルゴリズムを提案する。ダウンストリーム統計解析におけるサロゲートラベルの直接使用は,80～90%の精度のサロゲートラベルであっても,かなりのバイアスと不確実な信頼区間をもたらすことを示す。これを解決するために,設計に基づく教師あり学習(DSL)推定器を提案する。 dslは、サロゲートラベルとより少数の高品質のゴールド標準ラベルを組み合わせるために、二重ロバスト手順を採用している。提案手法は,ゴールド標準ラベリング用文書サンプリングの確率を制御することにより,代理が任意に偏り,厳密な仮定を必要としない場合でも,下流統計解析の有効な推測を保証する。理論的解析と実験の結果から,DSLは有意な統計的推測を提供する一方で,推定保証のない予測のみに焦点を当てた既存の代替手段に匹敵するルート平均2乗誤差を達成していることがわかった。 In computational social science (CSS), researchers analyze documents to explain social and political phenomena. In most scenarios, CSS researchers first obtain labels for documents and then explain labels using interpretable regression analyses in the second step. One increasingly common way to annotate documents cheaply at scale is through large language models (LLMs). However, like other scalable ways of producing annotations, such surrogate labels are often imperfect and biased. We present a new algorithm for using imperfect annotation surrogates for downstream statistical analyses while guaranteeing statistical properties -- like asymptotic unbiasedness and proper uncertainty quantification -- which are fundamental to CSS research. We show that direct use of surrogate labels in downstream statistical analyses leads to substantial bias and invalid confidence intervals, even with high surrogate accuracy of 80-90%. To address this, we build on debiased machine learning to propose the design-based supervised learning (DSL) estimator. DSL employs a doubly-robust procedure to combine surrogate labels with a smaller number of high-quality, gold-standard labels. Our approach guarantees valid inference for downstream statistical analyses, even when surrogates are arbitrarily biased and without requiring stringent assumptions, by controlling the probability of sampling documents for gold-standard labeling. Both our theoretical analysis and experimental results show that DSL provides valid statistical inference while achieving root mean squared errors comparable to existing alternatives that focus only on prediction without inferential guarantees.	翻訳日:2024-01-17 21:29:34 公開日:2024-01-14
# ビジュアルプログラミングのためのニューラルタスク合成 Neural Task Synthesis for Visual Programming ( http://arxiv.org/abs/2305.18342v3 ) ライセンス: Link先を確認	Victor-Alexandru P\u{a}durean, Georgios Tzannetos, Adish Singla	(参考訳) 生成型ニューラルモデルは、新しいコンテンツを合成することで、プログラミング教育の強化に大いに貢献する。視覚的プログラミング領域のコンテキストにおいて、与えられた仕様のプログラミングタスクを自動的に生成できるニューラルモデルを設計することを模索する。 GPT-4のような大規模生成モデルの成功にもかかわらず、初期の結果は、これらのモデルが視覚プログラミングのタスクを合成し、論理的および空間的推論に苦しむのに効果がないことを示している。本稿では,ニューラルシンボリックな手法であるNeurTaskSynを提案し,その解法コードと視覚的タスクの制約により,所望のプログラミング概念の形で与えられた仕様のプログラミングタスクを合成する。 neurtasksynには2つのコンポーネントがある。第一のコンポーネントは模倣学習手順でトレーニングされ、第二のコンポーネントは強化学習手順によってトレーニングされ、これらのコードに対して視覚的なタスクを生成する基盤となるシンボリック実行エンジンをガイドする。 Intro to Programming with Karel course by CodeHS-dot-com, Intro to Programming with Karel course by CodeHS-dot-com, Intro to Programming by Code-dot-org, and the Intro to Programming with Karel course by CodeHS-dot-com。 Generative neural models hold great promise in enhancing programming education by synthesizing new content. We seek to design neural models that can automatically generate programming tasks for a given specification in the context of visual programming domains. Despite the recent successes of large generative models like GPT-4, our initial results show that these models are ineffective in synthesizing visual programming tasks and struggle with logical and spatial reasoning. We propose a novel neuro-symbolic technique, NeurTaskSyn, that can synthesize programming tasks for a specification given in the form of desired programming concepts exercised by its solution code and constraints on the visual task. NeurTaskSyn has two components: the first component is trained via imitation learning procedure to generate possible solution codes, and the second component is trained via reinforcement learning procedure to guide an underlying symbolic execution engine that generates visual tasks for these codes. We demonstrate the effectiveness of NeurTaskSyn through an extensive empirical evaluation and a qualitative study on reference tasks taken from the Hour of Code: Classic Maze challenge by Code-dot-org and the Intro to Programming with Karel course by CodeHS-dot-com.	翻訳日:2024-01-17 21:29:06 公開日:2024-01-14
# 乱流: コードのための命令調整型大規模言語モデルの体系的および自動テスト Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code ( http://arxiv.org/abs/2312.14856v2 ) ライセンス: Link先を確認	Shahin Honarvar, Mark van der Wilk, Alastair Donaldson	(参考訳) 本稿では,新しいベンチマークである乱流を用いて,命令調整型大規模言語モデル(LLM)のコード生成における正確性と堅牢性を体系的に評価する手法を提案する。 turbulence は、自然言語 $\textit{question templates}$ の大規模なセットで構成されており、それぞれがプログラミングの問題であり、様々な形式で問うことができるようにパラメータ化されている。各質問テンプレートには関連する$\textit{test oracle}$があり、llmによって返されるコードソリューションが正しいかどうかを判断する。したがって、単一の質問テンプレートから LLM に $\textit{neighbourhood}$ と非常に似たプログラミング質問を問うことができ、各質問に対して返された結果の正しさを評価することができる。例えば、$\textit{anomalies}$, LLMが近隣で$\textit{almost all}$を正しく解決するが、特定のパラメータのインスタンス化には失敗する。我々は,OpenAI,Cohere,Metaの5つのLLMに対して,それぞれ2つの温度構成で実験を行った。以上の結果から, 乱流はLLM推論能力のギャップを明らかにすることができることがわかった。 LLMが近隣の問題を解決することができるが、近隣全体の問題を解決するために一般化することができないケースを体系的に識別することによって、我々の手法は$\textit{robustness}$問題をハイライトするのに効果的である。我々は、llmが間違ったコード結果を返す際に犯す誤りの種類に光を当てるデータと例を示します。 We present a method for systematically evaluating the correctness and robustness of instruction-tuned large language models (LLMs) for code generation via a new benchmark, Turbulence. Turbulence consists of a large set of natural language $\textit{question templates}$, each of which is a programming problem, parameterised so that it can be asked in many different forms. Each question template has an associated $\textit{test oracle}$ that judges whether a code solution returned by an LLM is correct. Thus, from a single question template, it is possible to ask an LLM a $\textit{neighbourhood}$ of very similar programming questions, and assess the correctness of the result returned for each question. This allows gaps in an LLM's code generation abilities to be identified, including $\textit{anomalies}$ where the LLM correctly solves $\textit{almost all}$ questions in a neighbourhood but fails for particular parameter instantiations. We present experiments against five LLMs from OpenAI, Cohere and Meta, each at two temperature configurations. Our findings show that, across the board, Turbulence is able to reveal gaps in LLM reasoning ability. This goes beyond merely highlighting that LLMs sometimes produce wrong code (which is no surprise): by systematically identifying cases where LLMs are able to solve some problems in a neighbourhood but do not manage to generalise to solve the whole neighbourhood, our method is effective at highlighting $\textit{robustness}$ issues. We present data and examples that shed light on the kinds of mistakes that LLMs make when they return incorrect code results.	翻訳日:2024-01-17 21:18:48 公開日:2024-01-14
# $\mathbb{Z}_2\times \mathbb{Z}_2$ Equivariant Quantum Neural Networks: Benchmarking against Classical Neural Networks $\mathbb{Z}_2\times \mathbb{Z}_2$ Equivariant Quantum Neural Networks: Benchmarking against Classical Neural Networks ( http://arxiv.org/abs/2311.18744v2 ) ライセンス: Link先を確認	Zhongtian Dong, Mar\c{c}al Comajoan Cara, Gopal Ramesh Dahale, Roy T. Forestano, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva, Eyup B. Unlu	(参考訳) 本稿では,EQNN(Equivariant Quantum Neural Networks)とQNN(Quantum Neural Networks)のパフォーマンスの総合的比較分析を行い,その古典的特徴であるENN(Equivariant Neural Networks)とDNN(Deep Neural Networks)とを比較した。各ネットワークの性能を,二分分類タスクにおける2つのトイ例を用いて評価し,モデルの複雑さ(パラメータ数による測定)とトレーニングデータセットのサイズに着目した。以上の結果から,$\mathbb{Z}_2\times \mathbb{Z}_2$ EQNNとQNNは,より小さいパラメータセットと控えめなトレーニングデータサンプルに対して優れた性能を示すことがわかった。 This paper presents a comprehensive comparative analysis of the performance of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks (QNN), juxtaposed against their classical counterparts: Equivariant Neural Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of each network with two toy examples for a binary classification task, focusing on model complexity (measured by the number of parameters) and the size of the training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$ EQNN and the QNN provide superior performance for smaller parameter sets and modest training data samples.	翻訳日:2024-01-17 21:17:03 公開日:2024-01-14
# 小系の量子熱力学:アノニックオットーエンジン Quantum Thermodynamics of Small Systems: The Anyonic Otto Engine ( http://arxiv.org/abs/2401.07177v1 ) ライセンス: Link先を確認	H S Mani, Ramadas N, V V Sreedhar	(参考訳) 量子系に熱力学のアイデアを適用する最近の進歩は、量子統計のような純粋量子起源の非熱的非古典的エネルギー源を用いて、ボース・アインシュタイン凝縮のようなマクロ量子系における力学的仕事を抽出するという新しい展望を提起した。一方、熱力学の概念は単一分子や量子ドットのような小さな系にも適用されている。本稿では, 量子オットーエンジンの動作媒体として, 1つまたは2つのオンのみを用いる量子オットーエンジンに着目し, 小系の量子熱力学について検討する。公式は統計パラメータの関数としてのオットーエンジンの効率のために導出される。 Recent advances in applying thermodynamic ideas to quantum systems have raised the novel prospect of using non-thermal, non-classical sources of energy, of purely quantum origin, like quantum statistics, to extract mechanical work in macroscopic quantum systems like Bose-Einstein condensates. On the other hand, thermodynamic ideas have also been applied to small systems like single molecules and quantum dots. In this paper we study the quantum thermodynamics of small systems of anyons, with specific emphasis on the quantum Otto engine which uses, as its working medium, just one or two anyons. Formulae are derived for the efficiency of the Otto engine as a function of the statistics parameter.	翻訳日:2024-01-17 19:34:23 公開日:2024-01-14
# クロスモーダル一貫性を用いた自己教師付きイベントベース単眼深度推定 Self-supervised Event-based Monocular Depth Estimation using Cross-modal Consistency ( http://arxiv.org/abs/2401.07218v1 ) ライセンス: Link先を確認	Junyu Zhu, Lina Liu, Bofeng Jiang, Feng Wen, Hongbo Zhang, Wanlong Li, Yong Liu	(参考訳) イベントカメラは、ピクセルごとの明るさ変化をキャプチャし、非同期の ``events'' ストリームを出力できる、新しい視覚センサである。時間分解能が高く、ダイナミックレンジが高く、帯域幅が低く、消費電力が低く、動きがぼやけないため、高速モーションや照明条件に挑戦するシーンでは従来のカメラより優れている。そこで,従来のカメラでは難しいシーンに対処するために,イベントからの教師付き単眼深度推定がいくつか提案されている。しかし、深さアノテーションはコストと時間を要する。本稿では,アノテーションのコストを下げるために,EMoDepthという自己教師型イベントベース単分子深度推定フレームワークを提案する。 EMoDepthは、ピクセル座標内のイベントに整合した強度フレームからのクロスモーダル一貫性を使用して、トレーニングプロセスを制約する。さらに、推論では、単眼深度予測にはイベントのみを使用する。さらに,高い推論速度を維持しつつ,深度推定のための機能を効果的に融合するマルチスケールなスキップ接続アーキテクチャを設計した。 MVSECとDSECデータセットの実験では、私たちのコントリビューションが効果的であり、既存の教師付きイベントベースおよび教師なしフレームベースメソッドよりも精度が高いことが示されている。 An event camera is a novel vision sensor that can capture per-pixel brightness changes and output a stream of asynchronous ``events''. It has advantages over conventional cameras in those scenes with high-speed motions and challenging lighting conditions because of the high temporal resolution, high dynamic range, low bandwidth, low power consumption, and no motion blur. Therefore, several supervised monocular depth estimation from events is proposed to address scenes difficult for conventional cameras. However, depth annotation is costly and time-consuming. In this paper, to lower the annotation cost, we propose a self-supervised event-based monocular depth estimation framework named EMoDepth. EMoDepth constrains the training process using the cross-modal consistency from intensity frames that are aligned with events in the pixel coordinate. Moreover, in inference, only events are used for monocular depth prediction. Additionally, we design a multi-scale skip-connection architecture to effectively fuse features for depth estimation while maintaining high inference speed. Experiments on MVSEC and DSEC datasets demonstrate that our contributions are effective and that the accuracy can outperform existing supervised event-based and unsupervised frame-based methods.	翻訳日:2024-01-17 19:23:38 公開日:2024-01-14
# PT対称量子系における量子カオス Quantum chaos in PT symmetric quantum systems ( http://arxiv.org/abs/2401.07215v1 ) ライセンス: Link先を確認	Kshitij Sharma, Himanshu Sahu and Subroto Mukerjee	(参考訳) 本研究では,非エルミート力学系における$\mathcal{pt}$-symmetryと量子カオスの相互作用を考察する。量子カオスの標準診断、すなわち複素レベル間隔比と時間外順序相関子(otocs)の拡張を考察し、$\mathcal{pt}$-symmetric quantum kick rotorモデルについて検討する。蹴られたローターは、古典的および量子的カオスを研究するためのパラダイム的動的システムと見なされてきた。量子キックローターに非ハーミティシティを導入することで、エルミート系に存在しない新しい位相と遷移を明らかにする。複素レベルの間隔比の研究から、3つのレジームを見つけ出した: 1つは積分可能で$\mathcal{pt}$-symmetry、もう1つは$\mathcal{pt}$-symmetryでカオス的、もう1つはカオス的だが破れた$\mathcal{pt}$-symmetryである。複素レベル間隔比は3つの相を区別できることがわかった。 OTOCの計算は、半古典的極限における古典的リャプノフ指数の計算と関係があるので、これらの状態と位相境界におけるその性質について検討する。 $\mathcal{PT}$-対称性の位相において、OTOCは積分可能およびカオス的状態の両方においてエルミート系で観察されるような振る舞いを示す。さらに、$\mathcal{PT}$-対称性の破れ相において、OTOCは後代の固有値スペクトルの複素性質から生じる追加の指数的成長を示す。我々はオトクの後期行動の分析形態を導出する。正規化OTOCを定義して、$\mathcal{PT}$-対称性の破れによる影響を軽減することにより、OTOCは$\mathcal{PT}$-対称性のカオス相から$\mathcal{PT}$-対称性の破れ、カオス相への遷移において特異な挙動を示すことを示す。 In this study, we explore the interplay between $\mathcal{PT}$-symmetry and quantum chaos in a non-Hermitian dynamical system. We consider an extension of the standard diagnostics of quantum chaos, namely the complex level spacing ratio and out-of-time-ordered correlators (OTOCs), to study the $\mathcal{PT}$-symmetric quantum kicked rotor model. The kicked rotor has long been regarded as a paradigmatic dynamic system to study classical and quantum chaos. By introducing non-Hermiticity in the quantum kicked rotor, we uncover new phases and transitions that are absent in the Hermitian system. From the study of the complex level spacing ratio, we locate three regimes -- one which is integrable and $\mathcal{PT}$-symmetry, another which is chaotic with $\mathcal{PT}$-symmetry and a third which is chaotic but with broken $\mathcal{PT}$-symmetry. We find that the complex level spacing ratio can distinguish between all three phases. Since calculations of the OTOC can be related to those of the classical Lyapunov exponent in the semi-classical limit, we investigate its nature in these regimes and at the phase boundaries. In the phases with $\mathcal{PT}$-symmetry, the OTOC exhibits behaviour akin to what is observed in the Hermitian system in both the integrable and chaotic regimes. Moreover, in the $\mathcal{PT}$-symmetry broken phase, the OTOC demonstrates additional exponential growth stemming from the complex nature of the eigenvalue spectrum at later times. We derive the analytical form of the late-time behaviour of the OTOC. By defining a normalized OTOC to mitigate the effects caused by $\mathcal{PT}$-symmetry breaking, we show that the OTOC exhibits singular behaviour at the transition from the $\mathcal{PT}$-symmetric chaotic phase to the $\mathcal{PT}$-symmetry broken, chaotic phase.	翻訳日:2024-01-17 19:23:18 公開日:2024-01-14
# 深度非依存単一画像デハジング Depth-agnostic Single Image Dehazing ( http://arxiv.org/abs/2401.07213v1 ) ライセンス: Link先を確認	Honglei Xu and Yan Shu and Shaohui Liu	(参考訳) 単一画像デハジングは困難な不適切な問題である。ディープラーニングベースのメソッドをトレーニングするための既存のデータセットは、手作りまたは合成スキームによって生成される。しかし、前者は小さなスケールに悩まされることが多く、後者はヘイズ分布ではなくシーン深度を学習させ、デハジング能力を低下させる。そこで本研究では,深度に依存しないデータセット(DA-HAZE)を生成することで,ヘイズ密度とシーン深度の関係を分離する合成手法を提案する。一方、異なるスケールのデータセットを生成するため、Global Shuffle Strategy(GSS)が提案され、モデルの一般化能力が向上する。 DA-HAZEでトレーニングされたモデルは、SOTSとDA-SOTS(DA-HAZEのテストセット)の差が少なく、現実世界のベンチマークで大幅に改善されている。さらに、深さに依存しないデハジングは、より複雑なタスクである。したがって、より強力な特徴モデリング能力と計算コストの少ない効率的なアーキテクチャが必要である。我々は、専用に設計されたブロックを組み込んだデハージングのために、U-Netベースのアーキテクチャを再考する。しかし,ブロックの性能は限定的な特徴融合法によって制限される。この目的のために我々は,バニラ特徴融合法により最小限のコストで有望な結果が得られるConvolutional Skip Connection (CSC) モジュールを提案する。広範な実験結果から,最先端の手法が証明された。 CSCを備えることで、シーンの深さに関係のあるヘイズ分布であっても、より優れたパフォーマンスと合理的な計算コストを達成することができる。 Single image dehazing is a challenging ill-posed problem. Existing datasets for training deep learning-based methods can be generated by hand-crafted or synthetic schemes. However, the former often suffers from small scales, while the latter forces models to learn scene depth instead of haze distribution, decreasing their dehazing ability. To overcome the problem, we propose a simple yet novel synthetic method to decouple the relationship between haze density and scene depth, by which a depth-agnostic dataset (DA-HAZE) is generated. Meanwhile, a Global Shuffle Strategy (GSS) is proposed for generating differently scaled datasets, thereby enhancing the generalization ability of the model. Extensive experiments indicate that models trained on DA-HAZE achieve significant improvements on real-world benchmarks, with less discrepancy between SOTS and DA-SOTS (the test set of DA-HAZE). Additionally, Depth-agnostic dehazing is a more complicated task because of the lack of depth prior. Therefore, an efficient architecture with stronger feature modeling ability and fewer computational costs is necessary. We revisit the U-Net-based architectures for dehazing, in which dedicatedly designed blocks are incorporated. However, the performances of blocks are constrained by limited feature fusion methods. To this end, we propose a Convolutional Skip Connection (CSC) module, allowing vanilla feature fusion methods to achieve promising results with minimal costs. Extensive experimental results demonstrate that current state-of-the-art methods. equipped with CSC can achieve better performance and reasonable computational expense, whether the haze distribution is relevant to the scene depth.	翻訳日:2024-01-17 19:22:40 公開日:2024-01-14
# アンサンブルモデルによるクラスインクリメンタル学習の強化 Enhanced Few-Shot Class-Incremental Learning via Ensemble Models ( http://arxiv.org/abs/2401.07208v1 ) ライセンス: Link先を確認	Mingli Zhu, Zihao Zhu, Sihong Chen, Chen Chen, Baoyuan Wu	(参考訳) few-shot class-incremental learning (fscil) は、新しいクラスを限られたトレーニングデータに継続的に適合させることを目的としている。主な課題は、珍しい新しいトレーニングサンプルを過度に適合させ、古いクラスを忘れることである。破滅的な忘れ物の研究が盛んに行われているが、過度に適合する問題はFSCILではあまり注目されていない。課題を克服するために,データ拡張と協調して一般化を促進する新しいアンサンブルモデルフレームワークを設計した。このように拡張モデルは、下流タスクへの迅速な適応を保証するために、豊富な機能を格納するライブラリとして機能する。具体的には、多入力多出力アンサンブル構造に空間認識データ拡張戦略を適用し、特徴抽出器の多様化と増分セッションにおける過度な適合の緩和を図る。さらに、モデル一般化をさらに改善するために、自己教師付き学習も統合されている。包括的実験により,提案手法はfscilのオーバーフィッティング問題を実際に軽減し,最先端手法を上回った。 Few-shot class-incremental learning (FSCIL) aims to continually fit new classes with limited training data, while maintaining the performance of previously learned classes. The main challenges are overfitting the rare new training samples and forgetting old classes. While catastrophic forgetting has been extensively studied, the overfitting problem has attracted less attention in FSCIL. To tackle overfitting challenge, we design a new ensemble model framework cooperated with data augmentation to boost generalization. In this way, the enhanced model works as a library storing abundant features to guarantee fast adaptation to downstream tasks. Specifically, the multi-input multi-output ensemble structure is applied with a spatial-aware data augmentation strategy, aiming at diversifying the feature extractor and alleviating overfitting in incremental sessions. Moreover, self-supervised learning is also integrated to further improve the model generalization. Comprehensive experimental results show that the proposed method can indeed mitigate the overfitting problem in FSCIL, and outperform the state-of-the-art methods.	翻訳日:2024-01-17 19:22:15 公開日:2024-01-14
# コンパクト内部表現を用いた教師なし領域適応 Unsupervised Domain Adaptation Using Compact Internal Representations ( http://arxiv.org/abs/2401.07207v1 ) ライセンス: Link先を確認	Mohammad Rostami	(参考訳) 教師なしドメイン適応に取り組むための主要なテクニックは、ソースとターゲットの両方のドメインからデータポイントを共有埋め込み空間にマッピングすることである。埋め込み空間へのマッピングエンコーダは、埋め込み空間がドメイン非依存になるように訓練され、ソースドメインで訓練された分類器が対象領域でうまく一般化できる。非教師なしドメイン適応(unsupervised domain adaptation, uda)の性能をさらに高めるために, ソース領域の内部分布をよりコンパクトにし, 対象領域に一般化するモデルの能力を向上させる付加的手法を開発し, 埋め込み空間における異なるクラスに対するデータ表現間のマージンを増大させることにより, udaのモデル性能を向上させることを実証する。内部表現をよりコンパクトにするために、内部学習されたソースドメインのマルチモーダル分布をガウス混合モデル(gmm)として推定する。推定したGMMを用いて、ソースドメイン内の異なるクラス間の分離を強化し、ドメインシフトの影響を軽減する。我々は,提案手法を覆すために理論的分析を行う。提案手法の有効性を評価するため,広く使用されているUDAベンチマークUDAデータセットを用いて実験を行った。その結果,本手法はモデルの一般化性を向上し,既存の手法よりも優れていた。 A major technique for tackling unsupervised domain adaptation involves mapping data points from both the source and target domains into a shared embedding space. The mapping encoder to the embedding space is trained such that the embedding space becomes domain agnostic, allowing a classifier trained on the source domain to generalize well on the target domain. To further enhance the performance of unsupervised domain adaptation (UDA), we develop an additional technique which makes the internal distribution of the source domain more compact, thereby improving the model's ability to generalize in the target domain.We demonstrate that by increasing the margins between data representations for different classes in the embedding space, we can improve the model performance for UDA. To make the internal representation more compact, we estimate the internally learned multi-modal distribution of the source domain as Gaussian mixture model (GMM). Utilizing the estimated GMM, we enhance the separation between different classes in the source domain, thereby mitigating the effects of domain shift. We offer theoretical analysis to support outperofrmance of our method. To evaluate the effectiveness of our approach, we conduct experiments on widely used UDA benchmark UDA datasets. The results indicate that our method enhances model generalizability and outperforms existing techniques.	翻訳日:2024-01-17 19:21:57 公開日:2024-01-14
# 斜め射影を用いた確率的低次元ベクトル自己回帰モデリング Probabilistic Reduced-Dimensional Vector Autoregressive Modeling with Oblique Projections ( http://arxiv.org/abs/2401.07206v1 ) ライセンス: Link先を確認	Yanfang Mo and S. Joe Qin	(参考訳) 本稿では,高次元雑音データから低次元ダイナミクスを抽出する確率的還元次元ベクトル自己回帰モデルを提案する。このモデルは斜射影を用いて、測定空間を縮小次元ダイナミクスと相補的な静的部分空間に対応する部分空間に分割する。予測誤差共分散に関する最良の予測可能性のために最適な斜め分解を求める。そこで我々は,最大可能性と予測最大化(EM)フレームワークを用いた反復PredVARアルゴリズムを開発した。このアルゴリズムは、潜在ダイナミクスと最適斜め射影の見積もりを交互に更新し、ランク順の予測可能性を持つ動的潜在変数と、外部射影モデルと一致する明示的潜在varモデルを生成する。合成ロレンツ系とイーストマン化学の工業プロセスから得られたデータセットを用いて,提案手法の優れた性能と効率を実証した。 In this paper, we propose a probabilistic reduced-dimensional vector autoregressive (PredVAR) model to extract low-dimensional dynamics from high-dimensional noisy data. The model utilizes an oblique projection to partition the measurement space into a subspace that accommodates the reduced-dimensional dynamics and a complementary static subspace. An optimal oblique decomposition is derived for the best predictability regarding prediction error covariance. Building on this, we develop an iterative PredVAR algorithm using maximum likelihood and the expectation-maximization (EM) framework. This algorithm alternately updates the estimates of the latent dynamics and optimal oblique projection, yielding dynamic latent variables with rank-ordered predictability and an explicit latent VAR model that is consistent with the outer projection model. The superior performance and efficiency of the proposed approach are demonstrated using data sets from a synthesized Lorenz system and an industrial process from Eastman Chemical.	翻訳日:2024-01-17 19:21:35 公開日:2024-01-14
# Crafter: ディープモデルにおけるインバージョンベースのアイデンティティ盗難に対する顔認識 Crafter: Facial Feature Crafting against Inversion-based Identity Theft on Deep Models ( http://arxiv.org/abs/2401.07205v1 ) ライセンス: Link先を確認	Shiming Wang, Zhe Ji, Liyao Xiang, Hao Zhang, Xinbing Wang, Chenghu Zhou, Bo Li	(参考訳) エッジにおける機能向上(モバイルデバイスなど)と、より厳しいプライバシー要件により、ディープラーニング対応アプリケーションがエッジで機密性の高い生データを前処理し、さらに処理するために機能をバックエンドクラウドに送信する、という最近のトレンドになっている。典型的なアプリケーションは、異なる個人から収集された顔画像に対して機械学習(ML)サービスを実行することである。アイデンティティの盗難を防止するため、従来の手法では、その特徴からアイデンティティ情報を隠蔽するための対戦ゲームベースのアプローチが一般的である。しかし、そのような手法は攻撃者が既知の防御戦略に対して反撃を行う適応攻撃に対して防御することはできない。本稿では,機械学習タスクがクラウド上で適切に実行されることを保証しつつ,適応型モデル反転攻撃から識別情報を保護するために,エッジに展開する特徴工法であるCrafterを提案する。重要な防御戦略は、攻撃者がプライベートアイデンティティについてほとんど得ることができない非プライベートに攻撃者を誤解させることである。この場合、製作された機能は、適応型モデル更新を伴う攻撃者のための毒の訓練サンプルのように振る舞う。実験の結果,crafterは,最先端のゲームベース手法では達成できない基本攻撃と可能な適応攻撃の両方を効果的に防御できることが示されている。 With the increased capabilities at the edge (e.g., mobile device) and more stringent privacy requirement, it becomes a recent trend for deep learning-enabled applications to pre-process sensitive raw data at the edge and transmit the features to the backend cloud for further processing. A typical application is to run machine learning (ML) services on facial images collected from different individuals. To prevent identity theft, conventional methods commonly rely on an adversarial game-based approach to shed the identity information from the feature. However, such methods can not defend against adaptive attacks, in which an attacker takes a countermove against a known defence strategy. We propose Crafter, a feature crafting mechanism deployed at the edge, to protect the identity information from adaptive model inversion attacks while ensuring the ML tasks are properly carried out in the cloud. The key defence strategy is to mislead the attacker to a non-private prior from which the attacker gains little about the private identity. In this case, the crafted features act like poison training samples for attackers with adaptive model updates. Experimental results indicate that Crafter successfully defends both basic and possible adaptive attacks, which can not be achieved by state-of-the-art adversarial game-based methods.	翻訳日:2024-01-17 19:21:18 公開日:2024-01-14
# 知覚的プロキシとしての圧縮画像表現の探索 Exploring Compressed Image Representation as a Perceptual Proxy: A Study ( http://arxiv.org/abs/2401.07200v1 ) ライセンス: Link先を確認	Chen-Hsiu Huang and Ja-Ling Wu	(参考訳) 本稿では,対象分類タスクと解析変換を併用したエンドツーエンド学習画像圧縮コーデックを提案する。本研究は、圧縮された潜在表現が、カスタマイズされたDNNベースの品質指標に匹敵する精度で人間の知覚距離判定を予測できることを確認した。さらに,様々なニューラルエンコーダを調査し,画像課題に対する知覚損失ネットワークとしての解析変換の有効性を,品質判断を超えて実証する。実験の結果,市販のニューラルエンコーダは,付加的なVGGネットワークを必要とせず,知覚モデリングに熟練していることがわかった。この研究は、セマンティック認識とコーディング効率のよいニューラルエンコーダの貴重な参照開発として役立つことを期待している。 We propose an end-to-end learned image compression codec wherein the analysis transform is jointly trained with an object classification task. This study affirms that the compressed latent representation can predict human perceptual distance judgments with an accuracy comparable to a custom-tailored DNN-based quality metric. We further investigate various neural encoders and demonstrate the effectiveness of employing the analysis transform as a perceptual loss network for image tasks beyond quality judgments. Our experiments show that the off-the-shelf neural encoder proves proficient in perceptual modeling without needing an additional VGG network. We expect this research to serve as a valuable reference developing of a semantic-aware and coding-efficient neural encoder.	翻訳日:2024-01-17 19:20:53 公開日:2024-01-14
# 単層WSe$_2$:幾何量子速度限界における励起子谷ダイナミクスのレーザーフィールドデチューニングによる最適化 Laser-field detuning assisted optimization of exciton valley dynamics in monolayer WSe$_2$: Geometric quantum speed limit ( http://arxiv.org/abs/2401.07191v1 ) ライセンス: Link先を確認	Kang Lan, Shijie Xie, and Jiyong Fu	(参考訳) バレーダイナミクスの最適化は、2次元半導体の文脈でキュービットを正確に操作するための有効な手段である。本研究では,単層膜WSe$_2$における励起子の内部チャネルと間隔チャネルの両方を包含する包括的モデルを構築し,同時に光-物質相互作用を考慮し,初期コヒーレント励起子状態によるバレーダイナミクスの最適制御について検討する。量子速度限界(QSL)理論に基づき、目標状態に達する谷のダイナミクスの進化時間を削減するための2つの最適制御スキームを提案し、時間とともに進化速度を向上する。さらに, 動的最適化の実施は, 光励起モードと磁気誘起谷分割により決定される, K-K'谷間における励起子-レーザー磁場の変形差と密接に関連していることを強調した。特に、小さな調律差が実際の力学経路を初期状態と最終状態の間の測地線の長さに向かって収束させ、最小の時間でシステムが進化することを明らかにする。特に谷のコヒーレンスの存在下では、実際の進化時間と計算されたQSL時間がほぼ一致し、谷の量子ビットに基づく情報伝達の忠実度が高い。顕著なことに,初期分極を伴わずに谷の偏極を生じさせる大きな微調整差を採用することにより,谷の力学の進化速度の興味深い向上を示す。我々の研究は、バレートロニクス応用における励起物理学の光学的チューニングのための新しいパラダイムを開き、また、量子ビットにおける情報伝送の速度制限のような緊急問題に対する解決策を提供するかもしれない。 Optimizing valley dynamics is an effective instrument towards precisely manipulating qubit in the context of two-dimensional semiconductor. In this work, we construct a comprehensive model, involving both intra- and intervalley channels of excitons in monolayer WSe$_2$, and simultaneously takes the light-matter interaction into account, to investigate the optimal control of valley dynamics with an initial coherent excitonic state. Based on the quantum speed limit (QSL) theory, we propose two optimal control schemes aiming to reduce the evolution time of valley dynamics reaching the target state, along with to boost the evolution speed over a period of time. Further, we emphasize that the implementation of dynamical optimization is closely related to the detuning difference -- the difference of exciton-laser field detunings between the K and K' valleys -- which is determined by the optical excitation mode and magnetically-induced valley splitting. In particular, we reveal that a small detuning difference drives the actual dynamical path to converge towards the geodesic length between the initial and final states, allowing the system to evolve with the least time. Especially, in the presence of valley coherence, the actual evolution time and the calculated QSL time almost coincide, facilitating high fidelity in information transmission based on the valley qubit. Remarkably, we demonstrate an intriguing enhancement in evolution speed of valley dynamics, by adopting a large detuning difference, which induces an emerging valley polarization even without initial polarization. Our work opens a new paradigm for optically tuning excitonic physics in valleytronic applications, and may also offer solutions to some urgent problems such as speed limit of information transmission in qubit.	翻訳日:2024-01-17 19:20:41 公開日:2024-01-14
# 構造化データ自然言語ビジェクションへの道のりとLLMアノテーションの役割 Inroads to a Structured Data Natural Language Bijection and the role of LLM annotation ( http://arxiv.org/abs/2401.07190v1 ) ライセンス: Link先を確認	Blake Vente	(参考訳) この研究は、シーケンシャル・ツー・シーケンシャルなトランスフォーマー言語モデルで複数のタスクを使用すると、いくつかのメトリクスのパフォーマンスが向上する、という理論を裏付ける限られた証拠を見出している。特に、マルチタスクのジェネラリスト t5-小は、F_1$$0.771$から0.692$まで、専門家 t5-小よりも優れている。これはさらに、同じネットワークであっても、異なる方法で同じデータを"再使用"することは、いくつかのメトリクスでより高いパフォーマンスにつながる可能性があることを示唆している。しかし、逆タスクだけでは最適化戦略に過ぎず、この研究で探索されたモデルサイズにおいて、大幅な全体的な改善は得られない。また、$\approx 4500$ LLMアノテートレコード($12800$ WebNLGトレーニングレコードに組み込まれている)を追加すると、合成データのない同じt5小モデルと比較して、自動メートル法のパフォーマンスは大幅に変化しない。これはモデルサイズによる学習能力のボトルネックによるものかもしれないし、観察された減少はコーパスの分布的差異によるものかもしれない。より大きなモデルや人的評価を用いた将来の研究は、これらのタスクのパフォーマンスに寄与するメカニズムをより完全に説明する必要がある。 This work finds limited evidence supporting the theory that using multiple tasks with sequence-to-sequence transformer language models can improve performance on some metrics. In particular, the multi-task generalist t5-small outperforms the specialist t5-small with a $F_1$ of $0.771$ up from $0.692$, which may point to underlying cross-task knowledge generalization. This further suggests that even with the same network, "re-using" the same data in a different way may lead to higher performance in some metrics. However, the inverse task alone is likely only an optimization strategy, since it does not yield a significant general improvement at the model sizes explored in this work. Also, adding $\approx 4500$ LLM annotated records (interlaced with the $12800$ WebNLG training records) does not substantially change automatic metric performance compared to the same t5-small model without the synthetic data. This may be due to a learning capacity bottleneck on account of model size, and decreases observed may be due to distributional differences in the corpora. Future research using larger models or human evaluation is required to more fully explain the mechanisms contributing to performance on these tasks.	翻訳日:2024-01-17 19:20:09 公開日:2024-01-14
# ステレオネットワークにおける敵攻撃の左右差 Left-right Discrepancy for Adversarial Attack on Stereo Networks ( http://arxiv.org/abs/2401.07188v1 ) ライセンス: Link先を確認	Pengfei Wang, Xiaofei Hui, Beijia Lu, Nimrod Lilith, Jun Liu, Sameer Alam	(参考訳) ステレオマッチングニューラルネットワークは、左右の画像から中間的特徴を抽出するシームズ構造を含むことが多い。これらの中間的な左右の特徴の類似性は、差分推定の精度に大きな影響を及ぼす。本稿では,左右画像の特徴の相違を最大化するために特別に設計された摂動雑音を生成する新しい攻撃手法を提案する。例えば、KITTIデータセットでは219%のMAE、Scene Flowデータセットでは85%のMAEで既存の最先端攻撃手法より優れている。さらに,このアプローチを拡張して,ステレオニューラルネットワークへのアクセスを不要とした,プロキシネットワークブラックボックス攻撃手法も導入した。この方法は、異なるビジョンタスクから任意のネットワークをプロキシとして活用し、逆ノイズを生成し、ステレオネットワークが誤った予測を効果的に生み出す。本研究は,立体視システムの強靭性向上に寄与する貴重な知見を提供するため,浅層構造における不一致に対するステレオネットワークの顕著な感度を強調した。 Stereo matching neural networks often involve a Siamese structure to extract intermediate features from left and right images. The similarity between these intermediate left-right features significantly impacts the accuracy of disparity estimation. In this paper, we introduce a novel adversarial attack approach that generates perturbation noise specifically designed to maximize the discrepancy between left and right image features. Extensive experiments demonstrate the superior capability of our method to induce larger prediction errors in stereo neural networks, e.g. outperforming existing state-of-the-art attack methods by 219% MAE on the KITTI dataset and 85% MAE on the Scene Flow dataset. Additionally, we extend our approach to include a proxy network black-box attack method, eliminating the need for access to stereo neural network. This method leverages an arbitrary network from a different vision task as a proxy to generate adversarial noise, effectively causing the stereo network to produce erroneous predictions. Our findings highlight a notable sensitivity of stereo networks to discrepancies in shallow layer features, offering valuable insights that could guide future research in enhancing the robustness of stereo vision systems.	翻訳日:2024-01-17 19:19:41 公開日:2024-01-14
# 深層学習の統計理論に関する調査研究:近似, トレーニングダイナミクス, 生成モデル A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models ( http://arxiv.org/abs/2401.07187v1 ) ライセンス: Link先を確認	Namjoon Suh and Guang Cheng	(参考訳) 本稿では,3つの観点から,ニューラルネットワークの統計理論に関する文献をレビューする。第一部では、回帰または分類の非パラメトリックフレームワークにおいて、ニューラルネットワークの過剰リスクに関する結果についてレビューする。これらの結果はニューラルネットワークの明示的な構築に依存しており、近似理論からのツールが採用されているため、過剰リスクの高速収束率につながる。これらの構成を通して、ネットワークの幅と深さは、サンプルサイズ、データ次元、関数の滑らかさという観点から表現できる。それでも、その基盤となる分析は、ディープニューラルネットワークの非凸な状況におけるグローバルな最小化にのみ適用される。これは、第2部のニューラルネットワークのトレーニングダイナミクスをレビューする動機となります。具体的には、勾配に基づく手法でトレーニングされたニューラルネットワークが、目に見えないデータに対してうまく一般化できるソリューションを見つける方法」に答えようとする論文をレビューする。特に、ニューラルネットワークカーネル(NTK)パラダイムと平均フィールド(MF)パラダイムの2つのよく知られたパラダイムがレビューされている。最後に,GAN(Generative Adversarial Networks)や拡散モデル,Large Language Models(LLMs)におけるICL(In-context Learning)などの生成モデルに関する最近の理論的進歩について概説する。以前の2つのモデルは、現代の生成AI時代の主要な柱として知られており、ICLは、文脈におけるいくつかの例から学ぶLLMの強力な能力である。最後に,深層学習理論に期待できるいくつかの方向性を提案する。 In this article, we review the literature on statistical theories of neural networks from three perspectives. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression or classification. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks, in that tools from the approximation theory are adopted. Through these constructions, the width and depth of the networks can be expressed in terms of sample size, data dimension, and function smoothness. Nonetheless, their underlying analysis only applies to the global minimizer in the highly non-convex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review papers that attempt to answer ``how the neural network trained via gradient-based methods finds the solution that can generalize well on unseen data.'' In particular, two well-known paradigms are reviewed: the Neural Tangent Kernel (NTK) paradigm, and Mean-Field (MF) paradigm. In the last part, we review the most recent theoretical advancements in generative models including Generative Adversarial Networks (GANs), diffusion models, and in-context learning (ICL) in the Large Language Models (LLMs). The former two models are known to be the main pillars of the modern generative AI era, while ICL is a strong capability of LLMs in learning from a few examples in the context. Finally, we conclude the paper by suggesting several promising directions for deep learning theory.	翻訳日:2024-01-17 19:19:24 公開日:2024-01-14
# 量子アニーリングによる近似最適化におけるスケーリングアドバンテージ Scaling Advantage in Approximate Optimization with Quantum Annealing ( http://arxiv.org/abs/2401.07184v1 ) ライセンス: Link先を確認	Humberto Munoz Bauza and Daniel A. Lidar	(参考訳) 量子アニーリング(quantum annealing)は、量子進化を利用して最低エネルギー状態を見つけるヒューリスティック最適化アルゴリズムである。量子アニーラは近年、より大きく、より高度に連結された離散最適化と量子シミュレーションの問題に取り組むために拡大している。しかし、多くの試みにもかかわらず、量子アニールハードウェアを用いた正確な最適化における計算量子の優位性はいまだ解明されていない。ここでは、近似最適化における量子アニールスケーリングの利点を示す。利点は古典的ヒューリスティックアルゴリズム(PT-ICM)に比較して、アイソエネルゲティッククラスタ移動(PT-ICM)による並列テンパリングである。このセッティングは、高精度スピンスピン相互作用を持つ2次元スピングラス問題の族である。この利点を得るために、我々は量子アニール補正(QAC)を実装し、D波アドバンテージ量子アニールの性質を利用したビットフリップ誤り訂正符号をエネルギーペナルティで埋め込み、次数5の相互作用グラフ上で1300以上の誤り抑制論理量子ビットを生成する。このグラフ上でランダムなスピングラスのインスタンスを生成し、低エネルギー状態に対する時間-解法の一般化である時間-エプシロンをベンチマークする。その結果,QACではPT-ICMよりも少なくとも1.0%の最適性ギャップを有する低エネルギー状態のサンプリングにおいて,量子アニールがスケーリング上の優位性を示すことがわかった。これは近似最適化におけるアルゴリズム量子スピードアップの最初の実演である。 Quantum annealing is a heuristic optimization algorithm that exploits quantum evolution to approximately find lowest energy states. Quantum annealers have scaled up in recent years to tackle increasingly larger and more highly connected discrete optimization and quantum simulation problems. Nevertheless, despite numerous attempts, a computational quantum advantage in exact optimization using quantum annealing hardware has so far remained elusive. Here, we present evidence for a quantum annealing scaling advantage in approximate optimization. The advantage is relative to the top classical heuristic algorithm: parallel tempering with isoenergetic cluster moves (PT-ICM). The setting is a family of 2D spin-glass problems with high-precision spin-spin interactions. To achieve this advantage, we implement quantum annealing correction (QAC): an embedding of a bit-flip error-correcting code with energy penalties that leverages the properties of the D-Wave Advantage quantum annealer to yield over 1,300 error-suppressed logical qubits on a degree-5 interaction graph. We generate random spin-glass instances on this graph and benchmark their time-to-epsilon, a generalization of the time-to-solution metric for low-energy states. We demonstrate that with QAC, quantum annealing exhibits a scaling advantage over PT-ICM at sampling low energy states with an optimality gap of at least 1.0%. This amounts to the first demonstration of an algorithmic quantum speedup in approximate optimization.	翻訳日:2024-01-17 19:18:50 公開日:2024-01-14
# LLMフィードバックからの強化学習と対向ゴールミスジェネリゼーション Reinforcement Learning from LLM Feedback to Counteract Goal Misgeneralization ( http://arxiv.org/abs/2401.07181v1 ) ライセンス: Link先を確認	Houda Nait El Barj, Theophile Sautory	(参考訳) 本稿では,大規模言語モデル(LLM)フィードバックを活用した強化学習(RL)における目標誤一般化に対処する手法を提案する。目標の一般化(goal misgeneralization) rlにおける堅牢性障害の一種は、エージェントが分散能力を保持しながら、意図したものではなくプロキシを追求した場合に発生する。本手法はLLMを用いて,トレーニング中のRLエージェントのポリシーを分析し,潜在的な障害シナリオを特定する。 RLエージェントはこれらのシナリオにデプロイされ、LLMの好みとフィードバックを通じて報酬モデルが学習される。このLLMインフォームド報酬モデルを使用して、元のデータセット上でRLエージェントをさらに訓練する。本手法を迷路ナビゲーションタスクに適用し,特に真とプロキシの目標がある程度区別可能であり,行動バイアスが顕著な場合に,目標一般化の顕著な改善を示す。本研究は、LLMがタスク能力の不足にもかかわらず、効率的にRLエージェントを監督し、LLMを用いてRLにおける目標指向学習を強化するためのスケーラブルな監視と価値ある洞察を提供する方法を示す。 We introduce a method to address goal misgeneralization in reinforcement learning (RL), leveraging Large Language Model (LLM) feedback during training. Goal misgeneralization, a type of robustness failure in RL occurs when an agent retains its capabilities out-of-distribution yet pursues a proxy rather than the intended one. Our approach utilizes LLMs to analyze an RL agent's policies during training and identify potential failure scenarios. The RL agent is then deployed in these scenarios, and a reward model is learnt through the LLM preferences and feedback. This LLM-informed reward model is used to further train the RL agent on the original dataset. We apply our method to a maze navigation task, and show marked improvements in goal generalization, especially in cases where true and proxy goals are somewhat distinguishable and behavioral biases are pronounced. This study demonstrates how the LLM, despite its lack of task proficiency, can efficiently supervise RL agents, providing scalable oversight and valuable insights for enhancing goal-directed learning in RL through the use of LLMs.	翻訳日:2024-01-17 19:18:11 公開日:2024-01-14
# 欧州のGDP予測とテキストデータ Forecasting GDP in Europe with Textual Data ( http://arxiv.org/abs/2401.07179v1 ) ライセンス: Link先を確認	Luca Barbaglia, Sergio Consoli, Sebastiano Manzan	(参考訳) 我々は、欧州5大経済圏の国内総生産(gdp)およびその他のマクロ経済変数を予測するためのニュースベースの感情指標の情報内容を評価する。われわれのデータセットには、5つの言語で26の新聞の2700万記事が含まれている。これらの指標はマクロ経済変数を予測するための重要な予測因子であり、予測内容はリアルタイムに予測者が利用できる他の指標の制御に堅牢であることを示す。 We evaluate the informational content of news-based sentiment indicators for forecasting Gross Domestic Product (GDP) and other macroeconomic variables of the five major European economies. Our data set includes over 27 million articles for 26 major newspapers in 5 different languages. The evidence indicates that these sentiment indicators are significant predictors to forecast macroeconomic variables and their predictive content is robust to controlling for other indicators available to forecasters in real-time.	翻訳日:2024-01-17 19:17:34 公開日:2024-01-14
# 幾何誤差最小化による都市景観の超解像 City Scene Super-Resolution via Geometric Error Minimization ( http://arxiv.org/abs/2401.07272v1 ) ライセンス: Link先を確認	Zhengyang Lu and Feng Wang	(参考訳) 超解像技術は画像の粒度向上に不可欠であり、特に複雑な都市部では、幾何学的構造を保存することが、データインフォームドな文化遺産の応用に不可欠である。本稿では,幾何学的誤差最小化による都市景観超解法を提案する。幾何一貫性機構は、ハフ変換を利用して都市景観の規則的な幾何学的特徴を抽出し、低解像度画像と高解像度画像の間の幾何学的誤差の計算を可能にする。超解像過程における混合平均二乗誤差と幾何整合誤差を最小化することにより、提案手法は詳細および幾何正則性を効率的に復元する。 SET14,BSD300,Cityscapes,GSV-Citiesのデータセットに対する広範囲な検証は,提案手法が既存の最先端手法,特に都市シーンにおいて優れていることを示す。 Super-resolution techniques are crucial in improving image granularity, particularly in complex urban scenes, where preserving geometric structures is vital for data-informed cultural heritage applications. In this paper, we propose a city scene super-resolution method via geometric error minimization. The geometric-consistent mechanism leverages the Hough Transform to extract regular geometric features in city scenes, enabling the computation of geometric errors between low-resolution and high-resolution images. By minimizing mixed mean square error and geometric align error during the super-resolution process, the proposed method efficiently restores details and geometric regularities. Extensive validations on the SET14, BSD300, Cityscapes and GSV-Cities datasets demonstrate that the proposed method outperforms existing state-of-the-art methods, especially in urban scenes.	翻訳日:2024-01-17 19:10:31 公開日:2024-01-14
# SpineCLUE:コントラスト学習と不確実性推定を用いた自動動詞識別 SpineCLUE: Automatic Vertebrae Identification Using Contrastive Learning and Uncertainty Estimation ( http://arxiv.org/abs/2401.07271v1 ) ライセンス: Link先を確認	Sheng Zhang, Minheng Chen, Junxian Wu, Ziyue Zhang, Tonglong Li, Cheng Xue, Youyong Kong	(参考訳) 任意の分野における椎体同定は脊椎疾患の診断において重要な役割を担っている。ほとんどの脊椎ctは頸部、胸部、腹部などの局所領域のみを含んでいる。したがって、識別は特定の椎骨や特定の数の脊椎に依存すべきではない。既存の脊椎レベルの方法は、この課題を満たせない。本稿では,脊椎レベルでの3次元CT椎骨識別の課題に対処する3段階の手法を提案する。脊椎のローカライゼーション、セグメンテーション、識別のタスクを順次実行することにより、脊椎の解剖学的事前情報をその過程を通して効果的に活用する。具体的には,個々の椎骨の局在情報を取得する2要素密度クラスタリングアルゴリズムを導入し,その後のセグメンテーションと識別処理を容易にする。さらに,クラス間類似性とクラス内変動性の問題に取り組むため,教師付きコントラスト学習法を用いて識別ネットワークを事前学習する。識別結果をさらに最適化するために,分類ネットワークの不確実性を推定し,メッセージ融合モジュールを用いて不確実性スコアを合成し,スピンに関する情報を集約した。本手法は, verse19 および verse20 challenge ベンチマークで最新の結果を得た。さらに,本手法は,広範囲の異常例を含む収集データセット上での卓越した一般化性能を示す。 Vertebrae identification in arbitrary fields-of-view plays a crucial role in diagnosing spine disease. Most spine CT contain only local regions, such as the neck, chest, and abdomen. Therefore, identification should not depend on specific vertebrae or a particular number of vertebrae being visible. Existing methods at the spine-level are unable to meet this challenge. In this paper, we propose a three-stage method to address the challenges in 3D CT vertebrae identification at vertebrae-level. By sequentially performing the tasks of vertebrae localization, segmentation, and identification, the anatomical prior information of the vertebrae is effectively utilized throughout the process. Specifically, we introduce a dual-factor density clustering algorithm to acquire localization information for individual vertebra, thereby facilitating subsequent segmentation and identification processes. In addition, to tackle the issue of interclass similarity and intra-class variability, we pre-train our identification network by using a supervised contrastive learning method. To further optimize the identification results, we estimated the uncertainty of the classification network and utilized the message fusion module to combine the uncertainty scores, while aggregating global information about the spine. Our method achieves state-of-the-art results on the VerSe19 and VerSe20 challenge benchmarks. Additionally, our approach demonstrates outstanding generalization performance on an collected dataset containing a wide range of abnormal cases.	翻訳日:2024-01-17 19:10:16 公開日:2024-01-14
# 850-950nm波長の高速光子数分解検出器 High-Performance Photon Number Resolving Detectors for 850-950 nm wavelengths ( http://arxiv.org/abs/2401.07265v1 ) ライセンス: Link先を確認	J. W. N. Los, Mariia Sidorova, B. L. Rodriguez, Patrick Qualm, J. Chang, S. Steinhauer, V. Zwiller, I. Esmaeil Zadeh	(参考訳) 2001年の最初のデモンストレーション以来、超伝導-ナノワイヤ単光子検出器は20年間にわたって大きな発展を遂げてきた。 SNSPDは現代のほとんどの量子光学実験において選択の検知器であり、徐々に他の光子飢えの光学分野への道を見つけつつある。しかし、ほとんど全ての実験で、snspdは2進検出器として使われており、0光子と1光子以上しか区別できず、光子番号情報は失われる。近年の研究では、2から5個の光子を数える原理光子数解法(PNR) SNSPDが実証されている。光子数分解能力は、HOM干渉、フォトニック量子コンピューティング、量子通信、非ガウス量子状態準備など、様々な量子光学実験で要求されている。特に、850nmから950nmの波長域のpnr検出器は、高品質の半導体量子ドットと高性能セシウムベースの量子メモリが利用可能であるため、非常に興味深い。本稿では,NbTiNをベースとしたSNSPDのシステム検出効率が94%以上,1光子の11ps以下のタイミングジッタ,2光子の7ps以下のタイミングジッタを実証する。さらに重要なのは、従来の極低温電気読み出し回路で最大7光子を検出できることです。理論的解析により,検出器の現在のPNR性能は,読み出し回路の信号と雑音比,帯域幅を改善することでさらに向上できることを示す。私たちの結果は、光量子コンピューティングと量子通信の将来に有望です。 Since their first demonstration in 2001, superconducting-nanowire single-photon detectors have witnessed two decades of great developments. SNSPDs are the detector of choice in most modern quantum optics experiments and are slowly finding their way into other photon starved fields of optics. Until now, however, in nearly all experiments SNSPDs were used as binary detectors, meaning they can only distinguish between 0 and more than 1 photons and photon number information is lost. Recent research works have demonstrated proof of principle photon number resolving (PNR) SNSPDs counting 2 to 5 photons. The photon-number-resolving capability is highly demanded in various quantum-optics experiments, including HOM interference, photonic quantum computing, quantum communication, and non Gaussian quantum state preparation. In particular, PNR detectors at the wavelength range of 850 to 950 nm are of great interest due to the availability of high quality semiconductor quantum dots and high-performance Cesium-based quantum memories. In this paper, we demonstrate NbTiN based SNSPDs with over 94 percent system detection efficiency, sub 11 ps timing jitter for one photon, and sub 7 ps for two photon. More importantly, our detectors resolve up to 7 photons using conventional cryogenic electric readout circuitry. Through theoretical analysis, we show that the current PNR performance of our detectors can still be further improved by improving the signal to noise ratio and bandwidth of our readout circuitry. Our results are promising for the future of optical quantum computing and quantum communication.	翻訳日:2024-01-17 19:09:57 公開日:2024-01-14
# BET: エラー確率決定による深層強化学習の解説 BET: Explaining Deep Reinforcement Learning through The Error-Prone Decisions ( http://arxiv.org/abs/2401.07263v1 ) ライセンス: Link先を確認	Xiao Liu, Jie Zhao, Wubing Chen, Mao Tan, Yongxing Su	(参考訳) 多くの困難なシナリオにおいて、Deep Reinforcement Learning (DRL)エージェントの印象的な機能にもかかわらず、彼らのブラックボックス決定プロセスは、安全に敏感なドメインへのデプロイメントを著しく制限している。以前のいくつかの自己解釈可能な研究は、エージェントの決定の重大な状態を明らかにすることに焦点を当てている。しかし、エラーを起こしやすい状態は特定できない。この問題に対処するために,backbone extract tree (bet) と呼ばれる新しい自己解釈可能な構造を提案する。高いレベルでは、BETはエージェントが一貫して一様決定を行う状態はエラーの確率を減少させるという仮説を立てている。この現象を効果的にモデル化するために、ベットはこれらの状態を近隣で表現し、それぞれが代表的状態のキュレーションによって定義される。したがって、これらの代表的なベンチマークからより離れた位置にある状態はエラーを起こしやすい。我々は,様々なRL環境におけるBETの評価を行い,既存の自己解釈モデルよりも説明の忠実度が優れていることを示す。さらに,高度なマルチエージェント協調ゲームであるStarCraft IIにおいて,エージェントの説明を行うためのユースケースを示す。私たちの知る限りでは,このような複雑なシナリオを,完全に透過的な構造を使って最初に説明します。 Despite the impressive capabilities of Deep Reinforcement Learning (DRL) agents in many challenging scenarios, their black-box decision-making process significantly limits their deployment in safety-sensitive domains. Several previous self-interpretable works focus on revealing the critical states of the agent's decision. However, they cannot pinpoint the error-prone states. To address this issue, we propose a novel self-interpretable structure, named Backbone Extract Tree (BET), to better explain the agent's behavior by identify the error-prone states. At a high level, BET hypothesizes that states in which the agent consistently executes uniform decisions exhibit a reduced propensity for errors. To effectively model this phenomenon, BET expresses these states within neighborhoods, each defined by a curated set of representative states. Therefore, states positioned at a greater distance from these representative benchmarks are more prone to error. We evaluate BET in various popular RL environments and show its superiority over existing self-interpretable models in terms of explanation fidelity. Furthermore, we demonstrate a use case for providing explanations for the agents in StarCraft II, a sophisticated multi-agent cooperative game. To the best of our knowledge, we are the first to explain such a complex scenarios using a fully transparent structure.	翻訳日:2024-01-17 19:09:31 公開日:2024-01-14
# 人点雲上の3次元ランドマーク検出:ベンチマークと二重カスケード変換器フレームワーク 3D Landmark Detection on Human Point Clouds: A Benchmark and A Dual Cascade Point Transformer Framework ( http://arxiv.org/abs/2401.07251v1 ) ライセンス: Link先を確認	Fan Zhang, Shuyi Mao, Qing Li, Xiaojiang Peng	(参考訳) 3Dランドマーク検出は、3D登録、ポーズ推定、仮想トライオンなど、さまざまなアプリケーションにおいて重要な役割を果たす。 2次元のランドマーク検出やポーズ推定でかなりの成功を収めてきたが、未秩序な3次元点雲におけるランドマーク検出に関する報告がほとんどない。本稿では,人間点雲における3次元ランドマーク検出という新たな課題について紹介する。まず,3Dランドマーク検出コミュニティを支援するために,HPoint103という総合的な人点クラウドデータセットを構築した。このデータセットは、商用ソフトウェアとアクターで作成された103のヒューマンポイントクラウドで構成され、それぞれが手動で11の安定したランドマークで注釈付けされている。次に, 2次元カスケード点変換器(D-CPT)モデルを提案する。 D-CPTは、ポイントクラウドストリーム全体にわたってカスケードトランスフォーマーデコーダ層を通じてランドマークを徐々に洗練し、同時にローカルリージョン上のRefineNetとのランドマーク座標を拡張している。 hpoint103とdhp19の一般的なポイントベース手法による比較評価は,d-cptの劇的な性能低下を示している。さらに、既存のメソッドへのRefineNetの統合は、パフォーマンスを継続的に改善します。 3D landmark detection plays a pivotal role in various applications such as 3D registration, pose estimation, and virtual try-on. While considerable success has been achieved in 2D human landmark detection or pose estimation, there is a notable scarcity of reported works on landmark detection in unordered 3D point clouds. This paper introduces a novel challenge, namely 3D landmark detection on human point clouds, presenting two primary contributions. Firstly, we establish a comprehensive human point cloud dataset, named HPoint103, designed to support the 3D landmark detection community. This dataset comprises 103 human point clouds created with commercial software and actors, each manually annotated with 11 stable landmarks. Secondly, we propose a Dual Cascade Point Transformer (D-CPT) model for precise point-based landmark detection. D-CPT gradually refines the landmarks through cascade Transformer decoder layers across the entire point cloud stream, simultaneously enhancing landmark coordinates with a RefineNet over local regions. Comparative evaluations with popular point-based methods on HPoint103 and the public dataset DHP19 demonstrate the dramatic outperformance of our D-CPT. Additionally, the integration of our RefineNet into existing methods consistently improves performance.	翻訳日:2024-01-17 19:09:10 公開日:2024-01-14
# 単純再正規化戦略によるシャープネス認識最小化の安定化 Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy ( http://arxiv.org/abs/2401.07250v1 ) ライセンス: Link先を確認	Chengli Tan, Jiangshe Zhang, Junmin Liu, Yicheng Wang, Yunda Hao	(参考訳) 近年、一般化性能の向上に驚くべき効果があるため、シャープネス認識最小化(SAM)が注目されているが、現在の点における正確な勾配の方向に沿って損失が減少せず、近くの別の点で評価された代理勾配の方向に従っているため、SAMを用いたニューラルネットワークのトレーニングは非常に不安定である。この問題に対処するため,我々は,サロゲート勾配のノルムが正確な勾配のノルムと同じ状態を維持するように,stablesamと呼ばれる単純な再正規化戦略を提案する。我々の戦略は実装が簡単で、samとその派生製品と統合できるほど柔軟で、ほとんど計算コストがかからない。また,凸最適化と学習理論の基本的なツールを用いてシャープネス認識訓練の理論解析を行い,確率的勾配降下(sgd)と比較して,samの有効性は限られた学習率でのみ保証されることを明らかにした。対照的に、StableSAMは学習率のこの仕組みを拡張し、小さな修正でSAMよりも一貫して性能を向上できるかを示す。最後に,いくつかの代表的なデータセットとタスクにおけるstablesamの性能向上を示す。 Recently, sharpness-aware minimization (SAM) has attracted a lot of attention because of its surprising effectiveness in improving generalization performance.However, training neural networks with SAM can be highly unstable since the loss does not decrease along the direction of the exact gradient at the current point, but instead follows the direction of a surrogate gradient evaluated at another point nearby. To address this issue, we propose a simple renormalization strategy, dubbed StableSAM, so that the norm of the surrogate gradient maintains the same as that of the exact gradient. Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost. With elementary tools from convex optimization and learning theory, we also conduct a theoretical analysis of sharpness-aware training, revealing that compared to stochastic gradient descent (SGD), the effectiveness of SAM is only assured in a limited regime of learning rate. In contrast, we show how StableSAM extends this regime of learning rate and when it can consistently perform better than SAM with minor modification. Finally, we demonstrate the improved performance of StableSAM on several representative data sets and tasks.	翻訳日:2024-01-17 19:08:48 公開日:2024-01-14
# 不規則サンプリング時系列のプロトタイプからの系列間情報による命令 Imputation with Inter-Series Information from Prototypes for Irregular Sampled Time Series ( http://arxiv.org/abs/2401.07249v1 ) ライセンス: Link先を確認	Zhihao Yu, Xu Chu, Liantao Ma, Yasha Wang, Wenwu Zhu	(参考訳) 不規則にサンプリングされた時系列はユビキタスであり、値の欠如による分析に重大な課題がある。既存の方法がインプテーションに対処するにもかかわらず、彼らは主にシリーズ内情報を活用することに集中し、不確実性や記憶効果を減らすなど、シリーズ間情報が提供する潜在的な利点を無視している。このギャップを埋めるため,本論文では,不規則にサンプリングされた時系列の欠落値に対して,直列情報と直列情報の両方を統合した再帰的インプテーションモデル prime を提案する。本フレームワークは、シリーズ間情報を学習するプロトタイプメモリモジュールと、インプテーションのためのプロトタイプ情報を利用する双方向ゲートリカレントユニットと、インプテーションを調整するための注意的プロトタイプリファインメントモジュールとを備える。我々は3つのデータセットについて広範な実験を行い、PRIMEの最先端モデルに対する優位性を平均二乗誤差に対して最大26%改善した。 Irregularly sampled time series are ubiquitous, presenting significant challenges for analysis due to missing values. Despite existing methods address imputation, they predominantly focus on leveraging intra-series information, neglecting the potential benefits that inter-series information could provide, such as reducing uncertainty and memorization effect. To bridge this gap, we propose PRIME, a Prototype Recurrent Imputation ModEl, which integrates both intra-series and inter-series information for imputing missing values in irregularly sampled time series. Our framework comprises a prototype memory module for learning inter-series information, a bidirectional gated recurrent unit utilizing prototype information for imputation, and an attentive prototypical refinement module for adjusting imputations. We conducted extensive experiments on three datasets, and the results underscore PRIME's superiority over the state-of-the-art models by up to 26% relative improvement on mean square error.	翻訳日:2024-01-17 19:08:25 公開日:2024-01-14
# 表情認識のためのミックスコントラスト微調整によるマスク画像事前学習 MIMIC: Mask Image Pre-training with Mix Contrastive Fine-tuning for Facial Expression Recognition ( http://arxiv.org/abs/2401.07245v1 ) ライセンス: Link先を確認	Fan Zhang, Xiaobao Guo, Xiaojiang Peng, Alex Kot	(参考訳) 現在、顔認識(fer)における最先端の研究は、特徴抽出のために顔認識データセット上で教師ありに事前学習される畳み込みニューラルネットワーク(cnns)バックボーンの利用を好んでいる。しかし、膨大な顔認識データセットと、顔ラベルの収集に関連する高いコストのため、この事前学習パラダイムにはかなりの費用がかかる。この目的に向けて,中規模汎用画像データセット上での自己教師付きアプローチによる視覚トランスフォーマー(vits)の事前学習を提案する。さらに、顔データセットとFERデータセットの間に存在するドメイン格差と比較すると、一般的なデータセットとFERデータセットとのばらつきはより顕著である。そこで本研究では,この領域の差異を効果的に緩和するための対比的微調整手法を提案する。具体的には,Mix Contrastive Fine-tuning (MIMIC) を用いた Mask Image pre-training という新しいFERトレーニングパラダイムを提案する。初期段階では、一般画像のマスク画像再構成により、ViTを事前訓練する。その後, 微調整段階において, 混合教師付きコントラスト学習プロセスを導入し, 混合戦略によりより広範囲の正のサンプルでモデルを強化した。 3つのベンチマークデータセットで実施された広範な実験を通じて、MIMICは以前のトレーニングパラダイムよりも優れており、より良い表現を学ぶ能力を示している。注目すべきは、バニラ ViT が複雑な補助設計モジュールを必要とせずに素晴らしい性能を達成できることである。さらに、モデルサイズをスケールアップする場合、MIMICは性能飽和がなく、現在の最先端手法よりも優れている。 Cutting-edge research in facial expression recognition (FER) currently favors the utilization of convolutional neural networks (CNNs) backbone which is supervisedly pre-trained on face recognition datasets for feature extraction. However, due to the vast scale of face recognition datasets and the high cost associated with collecting facial labels, this pre-training paradigm incurs significant expenses. Towards this end, we propose to pre-train vision Transformers (ViTs) through a self-supervised approach on a mid-scale general image dataset. In addition, when compared with the domain disparity existing between face datasets and FER datasets, the divergence between general datasets and FER datasets is more pronounced. Therefore, we propose a contrastive fine-tuning approach to effectively mitigate this domain disparity. Specifically, we introduce a novel FER training paradigm named Mask Image pre-training with MIx Contrastive fine-tuning (MIMIC). In the initial phase, we pre-train the ViT via masked image reconstruction on general images. Subsequently, in the fine-tuning stage, we introduce a mix-supervised contrastive learning process, which enhances the model with a more extensive range of positive samples by the mixing strategy. Through extensive experiments conducted on three benchmark datasets, we demonstrate that our MIMIC outperforms the previous training paradigm, showing its capability to learn better representations. Remarkably, the results indicate that the vanilla ViT can achieve impressive performance without the need for intricate, auxiliary-designed modules. Moreover, when scaling up the model size, MIMIC exhibits no performance saturation and is superior to the current state-of-the-art methods.	翻訳日:2024-01-17 19:08:06 公開日:2024-01-14
# DCDet:動的クロスベース3Dオブジェクト検出器 DCDet: Dynamic Cross-based 3D Object Detector ( http://arxiv.org/abs/2401.07240v1 ) ライセンス: Link先を確認	Shuai Liu, Boyang Li, Zhiyu Fang and Kai Huang	(参考訳) 近年, 3次元物体検出の研究において有意な進歩がみられた。しかし、ほとんどの先行研究は、センターベースまたはアンカーベースラベル割り当てスキームの利用に焦点を当てている。代替ラベル割り当て戦略は、3Dオブジェクト検出において未探索のままである。センターベースのラベル割り当てはトレーニングのために十分な正のサンプルを生成しないことが多いが、アンカーベースのラベル割り当ては、様々なスケールのオブジェクトを扱う際に不均衡な問題に遭遇する傾向がある。これらの課題を解決するために, 動的クロスラベル割当(DCLA)方式を導入し, 対象物に対して動的に正のサンプルを交叉領域から割り当てることで, 十分な正のサンプルとバランスの取れた正のサンプルをトレーニング用に提供する。さらに,様々なスケールの物体を正確に後退させる課題に対処するために,回転重み付き交叉係数(rwiou)を用いて回帰損失のl1メトリックを置き換えた。広汎な実験により,DCLAとRWIoUに基づく回帰損失の一般化と有効性を示した。コードはhttps://github.com/Say2L/DCDet.gitで入手できる。 Recently, significant progress has been made in the research of 3D object detection. However, most prior studies have focused on the utilization of center-based or anchor-based label assignment schemes. Alternative label assignment strategies remain unexplored in 3D object detection. We find that the center-based label assignment often fails to generate sufficient positive samples for training, while the anchor-based label assignment tends to encounter an imbalanced issue when handling objects of varying scales. To solve these issues, we introduce a dynamic cross label assignment (DCLA) scheme, which dynamically assigns positive samples for each object from a cross-shaped region, thus providing sufficient and balanced positive samples for training. Furthermore, to address the challenge of accurately regressing objects with varying scales, we put forth a rotation-weighted Intersection over Union (RWIoU) metric to replace the widely used L1 metric in regression loss. Extensive experiments demonstrate the generality and effectiveness of our DCLA and RWIoU-based regression loss. The Code will be available at https://github.com/Say2L/DCDet.git.	翻訳日:2024-01-17 19:07:42 公開日:2024-01-14
# コヒーレント駆動量子調和振動子電池 Coherently Driven Quantum Harmonic Oscillator Battery ( http://arxiv.org/abs/2401.07238v1 ) ライセンス: Link先を確認	Kuldeep Gangwar and Anirban Pathak	(参考訳) 量子調和振動子(QHO)バッテリモデルは、実験的に実現可能であり、複数のエネルギーを蓄積する高いエルゴトロピーと容量を有するため、近年重要視されている。 QHOのバッテリモデルは、いくつかの基本的な質問に答えるために再検討されている。無制限充電は可能か? 触媒システムの使用は、量子電池へのエネルギー移動を促進するか? これらの質問は、QHO電池と相互作用するQHO充電器にレーザーを光らせるモデルを考えることにより、数値的および解析的に答えられる。既存の作品とは対照的に、得られた答えは概ね否定的である。特に,本研究では,QHO間の相互作用の影響を受け,大域的な充電器電池システムの周波数にレーザ周波数を調整した。固定レーザー場振幅$\textit{F}$の場合、この電池は、レーザ周波数をチャージャーとバッテリの局所周波数に調整することで蓄えられたエネルギーと比較して、グローバルチャージャーバッテリシステムの周波数に合わせるとより多くのエネルギーを蓄えることができると報告されている。また, 簡易モデルであるオープンQHOの帯電過程と, レーザフィールドの切換え後の自己放電(散逸)過程についても検討し, 簡易モデルにおけるQHOの帯電過程が触媒(非触媒)電池の帯電過程よりも高速であることを明らかにした。さらに, 自己放出過程は, 環境との相互作用に対して不安定となる帯電過程の約2倍高速であることが観察された。 Quantum harmonic oscillator (QHO) battery models have been studied with significant importance in the recent past because these batteries are experimentally realizable and have high ergotropy and capacity to store more than one quanta of energy. QHO battery models are reinvestigated here to answer a set of fundamental questions: Do such models have any benefit? Is unbounded charging possible? Does the use of a catalyst system enhance the energy transfer to quantum batteries? These questions are answered both numerically and analytically by considering a model that allows a laser to shine on a QHO charger that interacts with a QHO battery. In contrast to some of the existing works, the obtained answers are mostly negative. Specifically, in the present work, the laser frequency is tuned with the frequency of the global charger-battery system, which is affected by the interaction between QHOs. It is reported that for a fixed laser field amplitude $\textit{F}$, the battery can store more energy when tuned with the frequency of the global charger-battery system compared to energy stored by tuning the laser frequency with local frequencies of the charger and battery. The charging process of the open QHO, which is a simplified model, and the self-discharging (dissipation) process after switching off the laser field are also investigated to reveal that the charging process of QHO in the simplified model is faster than the charging process of the catalytic (non-catalytic) battery. Further, it's observed that the self-discharging process is almost two times faster than the charging process which makes such models unstable against interaction with the environment.	翻訳日:2024-01-17 19:07:25 公開日:2024-01-14
# 大規模言語モデルからのイベントシーケンス知識の蒸留 Distilling Event Sequence Knowledge From Large Language Models ( http://arxiv.org/abs/2401.07237v1 ) ライセンス: Link先を確認	Somin Wadhwa, Oktie Hassanzadeh, Debarun Bhattacharjya, Ken Barker, Jian Ni	(参考訳) イベントシーケンスモデルは、イベントの分析と予測に非常に有効であることが判明している。このようなモデルの構築には、豊富な高品質なイベントシーケンスデータが必要になる。しかし、特定のアプリケーションでは、クリーンな構造化されたイベントシーケンスは利用できず、自動シーケンス抽出はノイズが多く不完全なデータをもたらす。本研究では,確率的イベントモデル構築に効果的に使用できるイベントシーケンスを生成するための大規模言語モデル(llm)の利用を検討する。これは、LLMからイベントシーケンス知識を蒸留するメカニズムと見なすことができる。本手法は、因果関係を持つ事象概念の知識グラフ(KG)を用いて、因果関係生成のための生成言語モデルを導出する。提案手法は,入力KGの知識ギャップを埋めて,高品質なイベントシーケンスを生成することができることを示す。さらに,パターンマイニングや確率的イベントモデルから有用で複雑な構造化知識を発見するために,生成されたシーケンスをどのように活用するかを検討する。我々は、シーケンス生成コードと評価フレームワーク、およびイベントシーケンスデータのコーパスをリリースする。 Event sequence models have been found to be highly effective in the analysis and prediction of events. Building such models requires availability of abundant high-quality event sequence data. In certain applications, however, clean structured event sequences are not available, and automated sequence extraction results in data that is too noisy and incomplete. In this work, we explore the use of Large Language Models (LLMs) to generate event sequences that can effectively be used for probabilistic event model construction. This can be viewed as a mechanism of distilling event sequence knowledge from LLMs. Our approach relies on a Knowledge Graph (KG) of event concepts with partial causal relations to guide the generative language model for causal event sequence generation. We show that our approach can generate high-quality event sequences, filling a knowledge gap in the input KG. Furthermore, we explore how the generated sequences can be leveraged to discover useful and more complex structured knowledge from pattern mining and probabilistic event models. We release our sequence generation code and evaluation framework, as well as corpus of event sequence data.	翻訳日:2024-01-17 19:06:55 公開日:2024-01-14
# 信用リスク予測のためのフェデレーション学習アプローチにおけるデータ不均衡の効果 The Effects of Data Imbalance Under a Federated Learning Approach for Credit Risk Forecasting ( http://arxiv.org/abs/2401.07234v1 ) ライセンス: Link先を確認	Shuyao Zhang, Jordan Tay, Pedro Baiz	(参考訳) 信用リスク予測は、顧客へのローンの付与と損失の最小化において、商業銀行や他の金融機関にとって重要な役割を担っている。しかしながら、従来の機械学習手法では、セキュリティ上の脅威やプライバシリークのリスクを生じさせる可能性のあるグローバルモデルを構築するために、センシティブなクライアント情報を外部サーバと共有する必要がある。新たに開発されたプライバシー保護型分散機械学習技術であるfederated learning(fl)は、プライベートなローカルデータに直接アクセスすることなく、グローバルモデルのトレーニングを可能にする。本研究は,信用リスク評価におけるフェデレーション学習の有効性を検証し,データ不均衡がモデル性能に及ぼす影響を示した。多層型パーセプトロン (mlp) とlong short-term memory (lstm) の2つのニューラルネットワークアーキテクチャと、1つのツリーアンサンブルアーキテクチャであるextreme gradient boosting (xgboost) を、3つの異なるデータセットにまたがって、異なる数のクライアントとデータ分散構成を含む様々なシナリオで検討した。フェデレーションモデルが、より小さなデータセットを持つ非支配的なクライアントのローカルモデルを上回ることを実証する。この傾向は特に高度に不均衡なデータシナリオで顕著であり、モデルの性能が17.92%向上した。しかし、支配的なクライアント(より多くのデータを持つクライアント)にとって、フェデレーションされたモデルは優れたパフォーマンスを示しておらず、この種のクライアントが参加を促進するための特別なインセンティブの必要性が示唆される。 Credit risk forecasting plays a crucial role for commercial banks and other financial institutions in granting loans to customers and minimise the potential loss. However, traditional machine learning methods require the sharing of sensitive client information with an external server to build a global model, potentially posing a risk of security threats and privacy leakage. A newly developed privacy-preserving distributed machine learning technique known as Federated Learning (FL) allows the training of a global model without the necessity of accessing private local data directly. This investigation examined the feasibility of federated learning in credit risk assessment and showed the effects of data imbalance on model performance. Two neural network architectures, Multilayer Perceptron (MLP) and Long Short-Term Memory (LSTM), and one tree ensemble architecture, Extreme Gradient Boosting (XGBoost), were explored across three different datasets under various scenarios involving different numbers of clients and data distribution configurations. We demonstrate that federated models consistently outperform local models on non-dominant clients with smaller datasets. This trend is especially pronounced in highly imbalanced data scenarios, yielding a remarkable average improvement of 17.92% in model performance. However, for dominant clients (clients with more data), federated models may not exhibit superior performance, suggesting the need for special incentives for this type of clients to encourage their participation.	翻訳日:2024-01-17 19:06:39 公開日:2024-01-14
# 二項化ニューロモルフィックネットワークとしてのポラリトン格子 Polariton lattices as binarized neuromorphic networks ( http://arxiv.org/abs/2401.07232v1 ) ライセンス: Link先を確認	Evgeny Sedov and Alexey Kavokin	(参考訳) 本研究では, 励起子-偏光子縮合格子に基づく新規なニューロモルフィックネットワークアーキテクチャを導入し, 非共鳴光ポンピングにより複雑に相互接続し, エネルギー化する。ネットワークはバイナリフレームワークを採用しており、各ニューロンはペア結合凝縮の空間的コヒーレンスによって促進され、バイナリ操作を行う。このコヒーレンスはポラリトンの弾道伝播から生まれ、効率的でネットワーク全体の通信を保証する。双対ニューロンスイッチング機構は、偏光子の励起成分を介して非線形反発によって駆動され、連続重み付けニューラルネットワークよりも計算効率とスケーラビリティの利点を提供する。本ネットワークは並列処理が可能であり,シーケンシャルおよびパルス符号化バイナリシステムと比較して計算速度が向上する。システムの性能は手書き文字認識のためのMNISTデータセットを用いて評価され、97.5%の予測精度で示されるように、既存の偏極性ニューロモルフィックシステムを上回る可能性を示した。 We introduce a novel neuromorphic network architecture based on a lattice of exciton-polariton condensates, intricately interconnected and energized through non-resonant optical pumping. The network employs a binary framework, where each neuron, facilitated by the spatial coherence of pairwise coupled condensates, performs binary operations. This coherence, emerging from the ballistic propagation of polaritons, ensures efficient, network-wide communication. The binary neuron switching mechanism, driven by the nonlinear repulsion through the excitonic component of polaritons, offers computational efficiency and scalability advantages over continuous weight neural networks. Our network enables parallel processing, enhancing computational speed compared to sequential or pulse-coded binary systems. The system's performance was evaluated using the MNIST dataset for handwritten digit recognition, showcasing the potential to outperform existing polaritonic neuromorphic systems, as demonstrated by its impressive predicted classification accuracy of up to 97.5%.	翻訳日:2024-01-17 19:06:13 公開日:2024-01-14
# 先行知識を用いた非観測変数付き因果加法モデルの発見とその時系列データへの応用 Use of Prior Knowledge to Discover Causal Additive Models with Unobserved Variables and its Application to Time Series Data ( http://arxiv.org/abs/2401.07231v1 ) ライセンス: Link先を確認	Takashi Nicholas Maeda, Shimizu Shohei	(参考訳) 本稿では,無観測変数 (CAM-UV) を持つ因果加法モデルの2つの手法を提案する。 CAM-UV は、因果関数が一般化加法モデルの形式をとり、潜在的共同設立者が存在すると仮定する。まず,先行知識を活用した効率的な因果発見手法を提案する。次に,時系列データの因果関係を推定する手法の拡張を提案する。元のCAM-UVアルゴリズムは、観測変数間の因果順序を求めるのではなく、観測変数ごとに原因を特定することを目的としているという点で、既存の因果関数モデルとは異なる。したがって,本論文で最初に提案する手法は,特定の変数が他の変数の原因になり得ないことを理解するなど,事前の知識を活用できる。さらに,時間的影響に先行する先行知識を組み込むことで,時系列データにおける因果発見のための第1のアルゴリズムを第2の手法に拡張する。提案手法をシミュレーションデータを用いて検証し,先行知識の蓄積に伴って因果発見の精度が向上することを示す。さらに, シミュレーションデータと実世界データの両方を用いて, 既存の時系列因果発見法と比較し, 第二の手法を検証した。 This paper proposes two methods for causal additive models with unobserved variables (CAM-UV). CAM-UV assumes that the causal functions take the form of generalized additive models and that latent confounders are present. First, we propose a method that leverages prior knowledge for efficient causal discovery. Then, we propose an extension of this method for inferring causality in time series data. The original CAM-UV algorithm differs from other existing causal function models in that it does not seek the causal order between observed variables, but rather aims to identify the causes for each observed variable. Therefore, the first proposed method in this paper utilizes prior knowledge, such as understanding that certain variables cannot be causes of specific others. Moreover, by incorporating the prior knowledge that causes precedes their effects in time, we extend the first algorithm to the second method for causal discovery in time series data. We validate the first proposed method by using simulated data to demonstrate that the accuracy of causal discovery increases as more prior knowledge is accumulated. Additionally, we test the second proposed method by comparing it with existing time series causal discovery methods, using both simulated data and real-world data.	翻訳日:2024-01-17 19:05:56 公開日:2024-01-14
# CCTVカメラを用いた高分解能交通データ収集への2次元ホログラフィーの適用 Application of 2D Homography for High Resolution Traffic Data Collection using CCTV Cameras ( http://arxiv.org/abs/2401.07220v1 ) ライセンス: Link先を確認	Linlin Zhang, Xiang Yu, Abdulateef Daud, Abdul Rashid Mussah, Yaw Adu-Gyamfi	(参考訳) 交通カメラは、渋滞やインシデント監視などの監視活動の主要な情報源である。これまで、国家機関は、複雑なカメラのキャリブレーションの要件や高解像度データを生成することができないなど、現在の自動視覚システムの制限のために、ネットワークカメラからデータを抽出するための手作業に頼り続けている。本研究では,インフラ搭載CCTVカメラから車両数,速度,加速度などの高精細トラフィックデータを抽出するための3段階のビデオ分析フレームワークを実装した。このフレームワークの重要なコンポーネントは、オブジェクト認識、パースペクティブ変換、およびトラフィックデータ収集のための車両軌道再構成である。まず,最先端の車両認識モデルを用いて車両の検出と分類を行う。次に、カメラの歪みを補正し、部分閉塞を低減するために、2点線形視点にインスパイアされたアルゴリズムを用いて、関心領域(ROI)を自動的に抽出し、2Dホモグラフィー技術によりCCTVビューを鳥眼ビュー(BEV)に変換する。カメラは2層マトリクスシステムでキャリブレーションされ、画像座標を実世界計測に変換することで速度と加速度の抽出を可能にする。個々の車両軌跡は、BEVにおいてMotpyとBYTETrackという2つの時間空間ベースのオブジェクトトラッカーを用いて構築・比較される。その結果,指向性トラヒック数に対する誤差率は+/-4.5%であり,プローブデータからの推定値と比較して,カメラ推定速度バイアスが10%mse以下であった。交通カメラから高解像度データを抽出することは、交通管理の改善や危険な運転行動の特定、事故のリスクの高い地域、その他の安全上の問題など、いくつかの意味を持つ。 Traffic cameras remain the primary source data for surveillance activities such as congestion and incident monitoring. To date, State agencies continue to rely on manual effort to extract data from networked cameras due to limitations of the current automatic vision systems including requirements for complex camera calibration and inability to generate high resolution data. This study implements a three-stage video analytics framework for extracting high-resolution traffic data such vehicle counts, speed, and acceleration from infrastructure-mounted CCTV cameras. The key components of the framework include object recognition, perspective transformation, and vehicle trajectory reconstruction for traffic data collection. First, a state-of-the-art vehicle recognition model is implemented to detect and classify vehicles. Next, to correct for camera distortion and reduce partial occlusion, an algorithm inspired by two-point linear perspective is utilized to extracts the region of interest (ROI) automatically, while a 2D homography technique transforms the CCTV view to bird's-eye view (BEV). Cameras are calibrated with a two-layer matrix system to enable the extraction of speed and acceleration by converting image coordinates to real-world measurements. Individual vehicle trajectories are constructed and compared in BEV using two time-space-feature-based object trackers, namely Motpy and BYTETrack. The results of the current study showed about +/- 4.5% error rate for directional traffic counts, less than 10% MSE for speed bias between camera estimates in comparison to estimates from probe data sources. Extracting high-resolution data from traffic cameras has several implications, ranging from improvements in traffic management and identify dangerous driving behavior, high-risk areas for accidents, and other safety concerns, enabling proactive measures to reduce accidents and fatalities.	翻訳日:2024-01-17 19:05:35 公開日:2024-01-14
# MapNeXt: オンラインベクトル化HDマップ構築のためのトレーニングとスケーリングの再開 MapNeXt: Revisiting Training and Scaling Practices for Online Vectorized HD Map Construction ( http://arxiv.org/abs/2401.07323v1 ) ライセンス: Link先を確認	Toyota Li	(参考訳) ハイディフィニション(HD)マップは自動操縦のナビゲーションに欠かせない。実行時に軽量なHDマップ構築機能を自動運転システムに統合することは、最近、有望な方向として現れている。カメラがステレオ情報を認識できるので、可搬性と経済性という魅力的なサインはさておき、視覚のみの知覚が際立っている。最新のMapTRアーキテクチャは、オンラインHDマップ構築タスクをエンドツーエンドで解決するが、その可能性はまだ検討されていない。本研究では,MapTRのフルスケールアップグレードを提案し,次世代のHDマップ学習アーキテクチャであるMapNeXtを提案する。 MapTRのトレーニングダイナミクスに光を当て、MapNeXt-TinyはMapTR-TinyのmAPを49.0%から54.8%に引き上げる。マップセグメンテーションの成果を楽しみ、mapnext-baseは、以前の技術であるマルチモダリティmaptrを上回り、$\sim1.8\times$を高速にしながら、63.9%までマップを持ち上げる。クエリの増加は適切な消化のためにデコーダネットワークを広く好んでおり、大きなバックボーンはベルやホイッスルを使わずに最終的な精度を着実に向上させる。親指の2つのルールに基づいて、MapNeXt-Hugeは、挑戦的なnuScenesベンチマークで最先端のパフォーマンスを達成する。具体的には、マップレスビジョンのみのシングルモデルパフォーマンスを初めて78%以上にプッシュし、既存のメソッドから最高のモデルを16%上回らせました。 High-Definition (HD) maps are pivotal to autopilot navigation. Integrating the capability of lightweight HD map construction at runtime into a self-driving system recently emerges as a promising direction. In this surge, vision-only perception stands out, as a camera rig can still perceive the stereo information, let alone its appealing signature of portability and economy. The latest MapTR architecture solves the online HD map construction task in an end-to-end fashion but its potential is yet to be explored. In this work, we present a full-scale upgrade of MapTR and propose MapNeXt, the next generation of HD map learning architecture, delivering major contributions from the model training and scaling perspectives. After shedding light on the training dynamics of MapTR and exploiting the supervision from map elements thoroughly, MapNeXt-Tiny raises the mAP of MapTR-Tiny from 49.0% to 54.8%, without any architectural modifications. Enjoying the fruit of map segmentation pre-training, MapNeXt-Base further lifts the mAP up to 63.9% that has already outperformed the prior art, a multi-modality MapTR, by 1.4% while being $\sim1.8\times$ faster. Towards pushing the performance frontier to the next level, we draw two conclusions on practical model scaling: increased query favors a larger decoder network for adequate digestion; a large backbone steadily promotes the final accuracy without bells and whistles. Building upon these two rules of thumb, MapNeXt-Huge achieves state-of-the-art performance on the challenging nuScenes benchmark. Specifically, we push the mapless vision-only single-model performance to be over 78% for the first time, exceeding the best model from existing methods by 16%.	翻訳日:2024-01-17 19:00:38 公開日:2024-01-14
# RSUD20K:自動運転における道路シーン理解のためのデータセット RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving ( http://arxiv.org/abs/2401.07322v1 ) ライセンス: Link先を確認	Hasib Zunair, Shakib Khan, and A. Ben Hamza	(参考訳) 道路シーンの理解は、機械が視覚環境を知覚できるように、自動運転において不可欠である。しかし、最近のオブジェクト検出器は、特定の地理的な場所から収集されたデータセットを学習するために調整されている。本稿では,バングラデシュ道路の運転視点から20K以上の高解像度画像で構成され,13のオブジェクトに対する130K境界ボックスアノテーションを含む道路シーン理解のための新しいデータセットであるRSUD20Kを提案する。この挑戦的なデータセットは、様々な道路のシーン、狭い通りとハイウェイを含み、さまざまな視点からのオブジェクトと、密集した乱雑な物体と様々な気象条件のある混雑した環境からのシーンを含んでいる。我々の作業は以前の取り組みを大幅に改善し、詳細なアノテーションを提供し、オブジェクトの複雑さを増大させます。我々はデータセットを徹底的に検証し、最先端の物体検出器をベンチマークし、画像アノテーションとして大規模ビジョンモデルを探索する。 Road scene understanding is crucial in autonomous driving, enabling machines to perceive the visual environment. However, recent object detectors tailored for learning on datasets collected from certain geographical locations struggle to generalize across different locations. In this paper, we present RSUD20K, a new dataset for road scene understanding, comprised of over 20K high-resolution images from the driving perspective on Bangladesh roads, and includes 130K bounding box annotations for 13 objects. This challenging dataset encompasses diverse road scenes, narrow streets and highways, featuring objects from different viewpoints and scenes from crowded environments with densely cluttered objects and various weather conditions. Our work significantly improves upon previous efforts, providing detailed annotations and increased object complexity. We thoroughly examine the dataset, benchmarking various state-of-the-art object detectors and exploring large vision models as image annotators.	翻訳日:2024-01-17 19:00:02 公開日:2024-01-14
# プライバシ関連ソースコードの検索 Finding Privacy-relevant Source Code ( http://arxiv.org/abs/2401.07316v1 ) ライセンス: Link先を確認	Feiyang Tang and Bjarte M. {\O}stvold	(参考訳) プライバシコードレビューは、開発者と法律の専門家がデータ保護規則の遵守を保証するための重要なプロセスである。しかし、リソースの制約のためタスクは困難である。この問題に対処するために、個人データの処理に直接関与するコードの特定の方法であるプライバシ関連メソッドの概念を紹介します。次に、ソースコード内のこれらのプライバシ関連メソッドを特定し分類することで、コードレビューを支援する自動アプローチを提案する。静的解析を用いて,50の一般的なライブラリにおけるそれらの発生に基づいて,一連のメソッドを識別する。次に、これらのメソッドを、githubアプリケーションのトップ30の実際の個人データと呼び出し頻度に従ってランク付けします。最高ランクのメソッドは、実際にはプライバシに関連するものとして指定するメソッドです。評価のために,100のオープンソースアプリケーションを調査した結果,プライバシ関連の個人データ処理手法の5%に満たないことがわかった。これにより、コードレビューに要する時間を削減できる。 Signal Desktop と Cal.com のケーススタディでは,プライバシ規制の遵守を容易にする拡張レポートの作成を支援するコードレビュアーのアプローチの有効性をさらに検証している。 Privacy code review is a critical process that enables developers and legal experts to ensure compliance with data protection regulations. However, the task is challenging due to resource constraints. To address this, we introduce the concept of privacy-relevant methods - specific methods in code that are directly involved in the processing of personal data. We then present an automated approach to assist in code review by identifying and categorizing these privacy-relevant methods in source code. Using static analysis, we identify a set of methods based on their occurrences in 50 commonly used libraries. We then rank these methods according to their frequency of invocation with actual personal data in the top 30 GitHub applications. The highest-ranked methods are the ones we designate as privacy-relevant in practice. For our evaluation, we examined 100 open-source applications and found that our approach identifies fewer than 5% of the methods as privacy-relevant for personal data processing. This reduces the time required for code reviews. Case studies on Signal Desktop and Cal.com further validate the effectiveness of our approach in aiding code reviewers to produce enhanced reports that facilitate compliance with privacy regulations.	翻訳日:2024-01-17 18:59:45 公開日:2024-01-14
# mapgpt:統一視覚言語ナビゲーションのための地図案内プロンプト MapGPT: Map-Guided Prompting for Unified Vision-and-Language Navigation ( http://arxiv.org/abs/2401.07314v1 ) ライセンス: Link先を確認	Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong	(参考訳) 脳にGPTを装着した身体エージェントは、様々なタスクにおいて異常な思考と意思決定能力を示した。しかしながら、視覚・言語ナビゲーション(VLN)のための既存のゼロショットエージェントは、エージェントが全体の環境を理解するために効果的な「グローバルビュー」を構築することなく、GPTに過剰な環境情報を処理し、局所的な環境内の潜在的な場所を選択することを促すだけである。本稿では,ゼロショットvlnタスクのための新しいmap-guided gptベースの経路計画エージェントmapgptを提案する。具体的には、オンラインで構築されたトポロジカルマップを、地図誘導のグローバルな探索を促進するプロンプトに変換し、エージェントが局所的な探索に支障を来すのを避けるために、明示的に複数ステップの経路計画を出力し、更新する必要がある。大規模な実験により、我々のMapGPTは有効であり、R2RデータセットとREVERIEデータセット(それぞれ38.8%と28.4%の成功率)において印象的な性能を達成し、新たに登場したGPTモデルのグローバル思考とパス計画能力を示す。異なるデータセットにまたがる様々な命令スタイルに対応するために、パラメータの微調整や特定のプロンプト設計を必要とする以前のvlnエージェントとは異なり、mapgptは異なる命令スタイルにシームレスに適応できるため、より統一されている。 Embodied agents equipped with GPT as their brain have exhibited extraordinary thinking and decision-making abilities across various tasks. However, existing zero-shot agents for vision-and-language navigation (VLN) only prompt the GPT to handle excessive environmental information and select potential locations within localized environments, without constructing an effective ''global-view'' (e.g., a commonly-used map) for the agent to understand the overall environment. In this work, we present a novel map-guided GPT-based path-planning agent, dubbed MapGPT, for the zero-shot VLN task. Specifically, we convert a topological map constructed online into prompts to encourage map-guided global exploration, and require the agent to explicitly output and update multi-step path planning to avoid getting stuck in local exploration. Extensive experiments demonstrate that our MapGPT is effective, achieving impressive performance on both the R2R and REVERIE datasets (38.8% and 28.4% success rate, respectively) and showcasing the newly emerged global thinking and path planning capabilities of the GPT model. Unlike previous VLN agents, which require separate parameters fine-tuning or specific prompt design to accommodate various instruction styles across different datasets, our MapGPT is more unified as it can adapt to different instruction styles seamlessly, which is the first of its kind in this field.	翻訳日:2024-01-17 18:59:27 公開日:2024-01-14
# ベンガル語抑うつ的ソーシャルメディアテキスト検出のためのトランスフォーマーモデルによる大規模言語モデルの調和 : 総合的研究 Harnessing Large Language Models Over Transformer Models for Detecting Bengali Depressive Social Media Text: A Comprehensive Study ( http://arxiv.org/abs/2401.07310v1 ) ライセンス: Link先を確認	Ahmadul Karim Chowdhury, Md. Saidur Rahman Sujon, Md. Shirajus Salekin Shafi, Tasin Ahmmad, Sifat Ahmed, Khan Md Hasib, Faisal Muhammad Shah	(参考訳) うつ病を診断する静かな闘争が世界中で広まる中で、私たちの研究はメンタルヘルスとソーシャルメディアの重大なつながりに発展しました。 GPT 3.5, GPT 4 や提案した GPT 3.5 の微調整モデル DepGPT や高度な深層学習モデル (LSTM, Bi-LSTM, GRU, BiGRU) や Transformer モデル (BERT, BanglaBERT, SahajBERT, BanglaBERT-Base) を用いて, 抑うつの早期発見に焦点を当てた。この研究はRedditとXのデータセットを「抑うつ」セグメントと「非抑うつ」セグメントに分類し、メンタルヘルスの専門知識を持つネイティブスピーカーによってベンガル語に翻訳し、ベンガル社会メディア抑うつデータセット(BSMDD)を作成した。我々の研究は、各モデルに対する完全なアーキテクチャの詳細と、ゼロショットおよび少数ショット学習技術を用いて、ベンガルの抑うつ的テキスト分類におけるそれらの性能を評価する方法を提供する。我々の研究は、各ドメインにFastTextを組み込んだSahajBERTとBi-LSTMの優位性を示すとともに、トランスフォーマーモデルによる説明可能性の問題にも取り組み、LLM(特にDepGPT)の有効性を強調し、様々な学習文脈における柔軟性と能力を示す。実験結果によると,提案モデルであるdepgptは,ゼロショットと少数ショットのシナリオではalpaca lora 7bよりも優れており,0.9796に近い精度と0.9804のf1-score,高リコール,異常な精度を実現している。 gpt-3.5ターボとalpaca lora 7bは競争力は高いが、ゼロショットと少数ショットの状況では効果が比較的低い。この研究は、様々な言語状況におけるLLMの有効性と柔軟性を強調し、うつ病検出モデルの複雑な分野に関する洞察力のある情報を提供する。 In an era where the silent struggle of underdiagnosed depression pervades globally, our research delves into the crucial link between mental health and social media. This work focuses on early detection of depression, particularly in extroverted social media users, using LLMs such as GPT 3.5, GPT 4 and our proposed GPT 3.5 fine-tuned model DepGPT, as well as advanced Deep learning models(LSTM, Bi-LSTM, GRU, BiGRU) and Transformer models(BERT, BanglaBERT, SahajBERT, BanglaBERT-Base). The study categorized Reddit and X datasets into "Depressive" and "Non-Depressive" segments, translated into Bengali by native speakers with expertise in mental health, resulting in the creation of the Bengali Social Media Depressive Dataset (BSMDD). Our work provides full architecture details for each model and a methodical way to assess their performance in Bengali depressive text categorization using zero-shot and few-shot learning techniques. Our work demonstrates the superiority of SahajBERT and Bi-LSTM with FastText embeddings in their respective domains also tackles explainability issues with transformer models and emphasizes the effectiveness of LLMs, especially DepGPT, demonstrating flexibility and competence in a range of learning contexts. According to the experiment results, the proposed model, DepGPT, outperformed not only Alpaca Lora 7B in zero-shot and few-shot scenarios but also every other model, achieving a near-perfect accuracy of 0.9796 and an F1-score of 0.9804, high recall, and exceptional precision. Although competitive, GPT-3.5 Turbo and Alpaca Lora 7B show relatively poorer effectiveness in zero-shot and few-shot situations. The work emphasizes the effectiveness and flexibility of LLMs in a variety of linguistic circumstances, providing insightful information about the complex field of depression detection models.	翻訳日:2024-01-17 18:58:57 公開日:2024-01-14
# 超伝導回路を用いた量子情報処理-オープン量子系における量子ゲートとアルゴリズムの実現と特徴化 Quantum information processing with superconducting circuits: realizing and characterizing quantum gates and algorithms in open quantum systems ( http://arxiv.org/abs/2401.07302v1 ) ライセンス: Link先を確認	Hamid Sakhouf	(参考訳) この論文は超伝導デバイスを用いた量子情報処理、特にオープン量子システムにおける量子ゲートとアルゴリズムの実現に焦点を当てている。このような装置は、超伝導共振器に結合したトランスモン型超伝導量子ビットによって構成される。量子ゲートとアルゴリズムの実現には、一段階のアプローチが用いられる。 x$回転と2および3キュービットのゲート絡み込みを実現するための高速で効率的なスキームを提案する。これらの動作中、強いマイクロ波フィールドの追加により共振器光子番号がキャンセルされる。当初は真空状態での共振器の調製は必要とせず、共振器の減衰には敏感である。さらに、これらの操作のロバスト性は、マスター方程式におけるトランスモン系のデコヒーレンスと共振器崩壊の影響を含めることで示され、その結果、量子シミュレーションにおいて高い忠実性が得られる。さらに,実装したx-回転ゲートと位相ゲートを用いて,Groverのアルゴリズムを2と3の量子ビットに対して実装する方法を提案する。また、量子プロセストモグラフィーを用いて、2および3キュービットのシングルショットエンタングゲートの性能をフルに評価し、0.93以上のプロセス忠実度が得られることを数値シミュレーションで示す。これらのゲートは、ベルとグリーンベルガー=ホルン=ザイリンガー(GHZ)の絡み合った状態を作り出すために使用される。 This thesis focuses on quantum information processing using the superconducting device, especially, on realizing quantum gates and algorithms in open quantum systems. Such a device is constructed by transmon-type superconducting qubits coupled to a superconducting resonator. For the realization of quantum gates and algorithms, a one-step approach is used. We suggest faster and more efficient schemes for realizing $X$-rotation and entangling gates for two and three qubits. During these operations, the resonator photon number is canceled owing to the strong microwave field added. They do not require the resonator to be initially prepared in the vacuum state and the scheme is insensitive to resonator decay. Furthermore, the robustness of these operations is demonstrated by including the effect of the decoherence of transmon systems and the resonator decay in a master equation, and as a result, high fidelity will be achieved in quantum simulation. In addition, using the implemented x-rotation gates as well as the phase gates, we present an alternative way for implementing Grover's algorithm for two and three qubits, which does not require a series of single gates. As well, we also demonstrate by a numerical simulation the use of quantum process tomography to fully characterize the performance of a single-shot entangling gate for two and three qubits and obtain process fidelities greater than 0.93. These gates are used to create Bell and Greenberger-Horne-Zeilinger (GHZ) entangled states.	翻訳日:2024-01-17 18:58:13 公開日:2024-01-14
# 小さい言語モデルは自己修正できる Small Language Model Can Self-correct ( http://arxiv.org/abs/2401.07301v1 ) ライセンス: Link先を確認	Haixia Han, Jiaqing Liang, Jie Shi, Qianyu He, Yanghua Xiao	(参考訳) ChatGPTのようなジェネレーティブ言語モデル(LM)は、様々な下流タスクで顕著なパフォーマンスを示している。それでも、最も顕著な欠点の1つは、自信のあるトーンで不正確または偽の情報を生成することである。従来の研究では、高度なパイプラインを考案し、大規模なLMを誘導して自己補正能力を示すよう促している。しかし、大きなLMは、自然に人間のように全てのステップを完了させるのではなく、その答えを個別に検証し、修正するよう明示的に促される。さらに、これらの複雑なプロンプトは小さなlmsでは極めて困難である。本稿では,60億個のパラメータを持つ小さなLMであっても,自己トリガー方式でLMの初期出力を補正することを目的として,生成言語モデルに \underline{I}ntrinsic \underline{S}elf-\underline{C}orrection (ISC) を導入する。具体的には,自己修正データ構築のためのパイプラインを考案し,微調整による内在的自己修正能力を有するモデルへの支援を目的とした部分的回答マスク(pam)を提案する。我々は,60億から13億のパラメータサイズを持つLMを用いて,常識推論と事実知識推論を含む2つのタスクで実験を行う。 ISCを用いて生成した出力は自己補正なしで生成した出力よりも優れていた。内在的な自己修正能力を持たせることで、小さなlmsでも出力品質がさらに向上できると考えています。 Generative Language Models (LMs) such as ChatGPT have exhibited remarkable performance across various downstream tasks. Nevertheless, one of their most prominent drawbacks is generating inaccurate or false information with a confident tone. Previous studies have devised sophisticated pipelines and prompts to induce large LMs to exhibit the capability for self-correction. However, large LMs are explicitly prompted to verify and modify its answers separately rather than completing all steps spontaneously like humans. Moreover, these complex prompts are extremely challenging for small LMs to follow. In this paper, we introduce the \underline{I}ntrinsic \underline{S}elf-\underline{C}orrection (ISC) in generative language models, aiming to correct the initial output of LMs in a self-triggered manner, even for those small LMs with 6 billion parameters. Specifically, we devise a pipeline for constructing self-correction data and propose Partial Answer Masking (PAM), aiming to endow the model with the capability for intrinsic self-correction through fine-tuning. We conduct experiments using LMs with parameters sizes ranging from 6 billion to 13 billion in two tasks, including commonsense reasoning and factual knowledge reasoning. Our experiments demonstrate that the outputs generated using ISC outperform those generated without self-correction. We believe that the output quality of even small LMs can be further improved by empowering them with the ability to intrinsic self-correct.	翻訳日:2024-01-17 18:57:50 公開日:2024-01-14
# 絡み合い、量子場の埋め込みとフォン・ノイマン代数の分類 Embezzlement of entanglement, quantum fields, and the classification of von Neumann algebras ( http://arxiv.org/abs/2401.07299v1 ) ライセンス: Link先を確認	Lauritz van Luijk, Alexander Stottmeister, Reinhard F. Werner, Henrik Wilming	(参考訳) 我々はフォン・ノイマン代数の設定における絡み合いの包括的処理を提供し、フォン・ノイマン代数の分類との関係と相対論的量子場理論への応用について論じる。絡み合いのエンベゼルメント(英: embezzlement of entanglement)とは、共有絡み合いリソース状態から任意の精度で絡み合い状態を生成するタスクであり、通信を使わずに、任意に資源を摂動させる。非相対論的量子論とは対照的に、量子場の記述はタイプi(有限または無限次元行列代数)を超えるフォン・ノイマン代数を必要とし、特にタイプiiiの代数は自然に現れる。したがって、量子場理論は、潜在的により大きな種類の横領資源を許容する。コンヌのIII型ノイマン代数の分類は、エンタングルメントの埋め込みのタスクを用いて定量的な操作的解釈を与えることができることを示す。具体的には、すべてのタイプ iii$_\lambda$ factor と $\lambda>0$ host embezzling state と、タイプ iii$_1$ factor 上のすべての正規状態がembezzlingであることを示す。さらに、半有限因子(I型またはII型)はエンベジング状態をホストすることができず、正確なエンベジング状態は非分離ヒルベルト空間を必要とすることを証明している。これらの結果は、重みのフローにおけるエンベジング状態と不変状態の間の1対1の対応から導かれる。本研究は、iii$_1$因子を「普遍的横領者」として特徴づけ、相対論的量子場理論がベルの不等式を最大に破る理由について、簡単な説明を与える。結果の多くはモジュラー理論と重みのフローを幅広く用いているが、ITPFI因子の普遍的なエンベジングは基本的議論によってIII$_1$であることを示す。 We provide a comprehensive treatment of embezzlement of entanglement in the setting of von Neumann algebras and discuss its relation to the classification of von Neumann algebras as well as its application to relativistic quantum field theory. Embezzlement of entanglement is the task of producing any entangled state to arbitrary precision from a shared entangled resource state using local operations without communication while perturbing the resource arbitrarily little. In contrast to non-relativistic quantum theory, the description of quantum fields requires von Neumann algebras beyond type I (finite or infinite dimensional matrix algebras) -- in particular, algebras of type III appear naturally. Thereby, quantum field theory allows for a potentially larger class of embezzlement resources. We show that Connes' classification of type III von Neumann algebras can be given a quantitative operational interpretation using the task of embezzlement of entanglement. Specifically, we show that all type III$_\lambda$ factors with $\lambda>0$ host embezzling states and that every normal state on a type III$_1$ factor is embezzling. Furthermore, semifinite factors (type I or II) cannot host embezzling states, and we prove that exact embezzling states require non-separable Hilbert spaces. These results follow from a one-to-one correspondence between embezzling states and invariant states on the flow of weights. Our findings characterize type III$_1$ factors as "universal embezzlers" and provide a simple explanation as to why relativistic quantum field theories maximally violate Bell inequalities. While most of our results make extensive use of modular theory and the flow of weights, we establish that universally embezzling ITPFI factors are of type III$_1$ by elementary arguments.	翻訳日:2024-01-17 18:57:12 公開日:2024-01-14
# 一般化低ランク行列帯域問題のための効率的なフレームワーク Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems ( http://arxiv.org/abs/2401.07298v1 ) ライセンス: Link先を確認	Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee	(参考訳) 確率的文脈的低ランク行列バンドイット問題において、アクションの期待された報酬は、アクションの特徴行列と固定されたいくつかの内部積によって与えられるが、最初は未知の$d_1$ by $d_2$ matrix $\Theta^$ with rank $r \ll \{d_1, d_2\}$で与えられる。本稿では,一般化線形モデル(GLM)の枠組みの下で,最近 \cite{lu2021low} で提案されている一般化低ランク行列バンドイット問題について検討する。この問題に対する既存のアルゴリズムの計算不可能性と理論的制約を克服するために,まず,部分空間推定におけるsteinの手法を用いて \cite{jun2019bilinear} のアイデアを修飾した g-estt フレームワークを提案し,その推定部分空間を正規化アイデアで活用する。さらに,推定部分空間上の新しい排他的概念を用いてG-ESTTの効率を著しく向上させ,G-ESTSフレームワークを提案する。また、G-ESTT が $\tilde{O}(\sqrt{(d_1+d_2)MrT})$ bound of regret を達成できるのに対し、G-ESTS は $\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})$ bound of regret を対数的な仮定で達成できる。 M = O(((d_1+d_2)^2)$(d_1+d_2)^2)$ という合理的な仮定の下では、G-ESTT の後悔は $\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low}$D_{rr}$ の現在の最善後悔と一致する。完全性のために,提案アルゴリズム,特にG-ESTSは,計算可能であり,一連のシミュレーションに基づいて,他の最先端(一般化)線形行列バンドイット法より一貫して優れていることを示す実験を行う。 In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $\Theta^$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the $\tilde{O}(\sqrt{(d_1+d_2)MrT})$ bound of regret while G-ESTS can achineve the $\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})$ bound of regret under mild assumption up to logarithm terms, where $M$ is some problem dependent value. Under a reasonable assumption that $M = O((d_1+d_2)^2)$ in our problem setting, the regret of G-ESTT is consistent with the current best regret of $\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low} ($D_{rr}$ will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.	翻訳日:2024-01-17 18:56:36 公開日:2024-01-14
# 量子場からのエンベジング絡み Embezzling entanglement from quantum fields ( http://arxiv.org/abs/2401.07292v1 ) ライセンス: Link先を確認	Lauritz van Luijk, Alexander Stottmeister, Reinhard F. Werner, Henrik Wilming	(参考訳) エンベッズメント(英: Embezzlement)とは、補助系(「エンベッズラー」)の参照状態から、局所的な量子演算を通じて絡み合った量子状態を抽出する反直感的な可能性を指す。我々は、フォン・ノイマン代数の数学的分類とエンベジングエンタングルメントの操作課題との深い関係を報告する。この結果は相対論的量子場が普遍的エンベズラーであることを意味する:任意の次元の絡み合った状態は任意の精度でそれらからエンベズできる。これは相対論的量子場理論の真空状態に存在する無限個の絡み合いの操作的特徴を与える。 Embezzlement refers to the counterintuitive possibility of extracting entangled quantum states from a reference state of an auxiliary system (the "embezzler") via local quantum operations while hardly perturbing the latter. We report a deep connection between the mathematical classification of von Neumann algebras and the operational task of embezzling entanglement. This result implies that relativistic quantum fields are universal embezzlers: Any entangled state of any dimension can be embezzled from them with arbitrary precision. This provides an operational characterization of the infinite amount of entanglement present in the vacuum state of relativistic quantum field theories.	翻訳日:2024-01-17 18:55:33 公開日:2024-01-14
# 適応ガウス演算を用いた一般化光子減算を用いたフライング論理量子ビットの生成 Generation of Flying Logical Qubits using Generalized Photon Subtraction with Adaptive Gaussian Operations ( http://arxiv.org/abs/2401.07287v1 ) ライセンス: Link先を確認	Kan Takase, Fumiya Hanamura, Hironari Nagayoshi, J. Eli Bourassa, Rafael N. Alexander, Akito Kawasaki, Warit Asavanant, Mamoru Endo, and Akira Furusawa	(参考訳) 光移動波におけるGottesman-Kitaev-Preskill qubitと呼ばれる論理量子ビットの生成は、大規模で普遍的なフォールトトレラントな光量子コンピュータを実現する上で大きな課題である。近年,光子数測定とホモダイン測定を用いて基本GKP量子ビットの確率的生成が実証されている。しかし、生成速度はわずか数Hzであり、成功確率が著しく改善されない限り、実用速度で耐故障性GKP量子ビットを生成することは困難である。本稿では,複数の量子状態からgkp量子ビットを適応ガウス演算により効率的に合成する方法を提案する。光子数測定を利用する初期状態準備において、適応演算により一定の閾値を超える任意の測定結果を成功とみなすことができる。この閾値を一般化光子減算法を用いて低下させる。初期状態はホモダイン測定とその後の適応操作によりGKP量子ビットに合成される。その結果、現実的なスケールで耐故障性GKP量子ビットを生成する単一ショット成功確率は、従来の手法の100万倍である10$\%$を超えている。この提案は、原理実証段階から実用段階へ光量子コンピュータを進化させるための強力なツールとなる。 The generation of a logical qubit called the Gottesman-Kitaev-Preskill qubit in an optical traveling wave is a major challenge for realizing large-scale universal fault-tolerant optical quantum computers. Recently, probabilistic generation of elementary GKP qubits has been demonstrated using photon number measurements and homodyne measurements. However, the generation rate is only a few Hz, and it will be difficult to generate fault-tolerant GKP qubits at a practical rate unless success probability is significantly improved. Here, we propose a method to efficiently synthesize GKP qubits from several quantum states by adaptive Gaussian operations. In the initial state preparation that utilizes photon number measurements, an adaptive operation allows any measurement outcome above a certain threshold to be considered as a success. This threshold is lowered by utilizing the generalized photon subtraction method. The initial states are synthesized into a GKP qubit by homodyne measurements and a subsequent adaptive operation. As a result, the single-shot success probability of generating fault-tolerant GKP qubits in a realistic scale system exceeds 10$\%$, which is one million times better than previous methods. This proposal will become a powerful tool for advancing optical quantum computers from the proof-of-principle stage to practical application.	翻訳日:2024-01-17 18:55:22 公開日:2024-01-14
# CANDLE:Commonsense Reasoningのための大規模言語モデルからの反復的概念化とインスティファイション蒸留 CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning ( http://arxiv.org/abs/2401.07286v1 ) ライセンス: Link先を確認	Weiqi Wang, Tianqing Fang, Chunyang Li, Haochen Shi, Wenxuan Ding, Baixuan Xu, Zhaowei Wang, Jiaxin Bai, Xin Liu, Jiayang Cheng, Chunkit Chan, Yangqiu Song	(参考訳) 概念化とインスタンス化のシーケンシャルなプロセスは、既存の知識を未知のシナリオに適用できるため、一般化可能なコモンセンス推論に不可欠である。しかし、既存の研究はインスタンス化のステップを過小評価する傾向にあり、両方の種類の知識を収集するために事前に構築された概念分類やヒューマンアノテーションに強く依存しているため、完全な推論のためのインスタンス化された知識が不足し、コストが高く、スケーラビリティが制限される。これらの課題に取り組むため,我々は,大言語モデルに批判的フィルタリングを伴う2種類の知識を生成するよう指示することにより,コンテクスト化された概念化と,コモンセンス知識ベース上でのインスタンス化を反復的に行う蒸留フレームワークであるroウソクを紹介する。 CANDLEをATOMICに適用することにより、600万の概念化と三重項のインスタンス化を含む総合的な知識基盤を構築する。どちらの種類の知識も元のATOMICデータセットにしっかりと根付いており、本質的な評価はその例外的な品質と多様性を示している。実験結果から,4つの下流タスクにまたがる留学生モデルにおける蒸留ろうそくの効果が示唆された。私たちのコード、データ、モデルはhttps://github.com/HKUST-KnowComp/CANDLE.comで公開されています。 The sequential process of conceptualization and instantiation is essential to generalizable commonsense reasoning as it allows the application of existing knowledge to unfamiliar scenarios. However, existing works tend to undervalue the step of instantiation and heavily rely on pre-built concept taxonomies and human annotations to collect both types of knowledge, resulting in a lack of instantiated knowledge to complete reasoning, high cost, and limited scalability. To tackle these challenges, we introduce CANDLE, a distillation framework that iteratively performs contextualized conceptualization and instantiation over commonsense knowledge bases by instructing large language models to generate both types of knowledge with critic filtering. By applying CANDLE to ATOMIC, we construct a comprehensive knowledge base comprising six million conceptualizations and instantiated commonsense knowledge triples. Both types of knowledge are firmly rooted in the original ATOMIC dataset, and intrinsic evaluations demonstrate their exceptional quality and diversity. Empirical results indicate that distilling CANDLE on student models provides benefits across four downstream tasks. Our code, data, and models are publicly available at https://github.com/HKUST-KnowComp/CANDLE.	翻訳日:2024-01-17 18:54:55 公開日:2024-01-14
# 拡張テキスト読解によるドメイン適応の改善 Improving Domain Adaptation through Extended-Text Reading Comprehension ( http://arxiv.org/abs/2401.07284v1 ) ライセンス: Link先を確認	Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang	(参考訳) 大規模言語モデルのドメイン特化能力を高めるため、ドメイン特化コーパスでの事前学習が一般的である。最近の研究は、Regexベースのパターンでフォーマットされた読解データを用いてモデルを適用することで、ドメイン固有のタスクのパフォーマンスが大幅に向上することを示した。しかし、regexベースのパターンはドメイン固有の知識を使って生のコーパスを解析できない。さらに、質問と回答のペアは、事前に定義された形式でコーパスから直接抽出され、コンテキストが限定される。この制限に対処するため,LLMとクラスタリングによる読解理解を改善した。 LLMは、理解段階を洗練させるためにコーパス内のドメイン知識を活用することに焦点を当て、クラスタリングは、コンテキストを読書段階に拡張することで関連する知識を提供する。さらに,パラメータ効率の高い微調整を取り入れ,ドメイン適応の効率化を図る。 AdaptLLMと比較して、ドメイン固有のタスクで5%以上の改善を実現している。私たちのコードはhttps://github.com/microsoft/LMOpsで公開されます。 To enhance the domain-specific capabilities of large language models, continued pre-training on a domain-specific corpus is a prevalent method. Recent work demonstrates that adapting models using reading comprehension data formatted by regex-based patterns can significantly improve performance on domain-specific tasks. However, regex-based patterns are incapable of parsing raw corpora using domain-specific knowledge. Furthermore, the question and answer pairs are extracted directly from the corpus in predefined formats offers limited context. To address this limitation, we improve reading comprehension via LLM and clustering. LLM focuses on leveraging domain knowledge within the corpus to refine comprehension stage, while clustering supplies relevant knowledge by extending the context to enrich reading stage. Additionally, our method incorporates parameter-efficient fine-tuning to improve the efficiency of domain adaptation. In comparison to AdaptLLM, our method achieves an improvement exceeding 5% in domain-specific tasks. Our code will available at https://github.com/microsoft/LMOps.	翻訳日:2024-01-17 18:54:32 公開日:2024-01-14
# FROST-BRDF:BRDF取得のための高速かつロバストなサンプリング手法 FROST-BRDF: A Fast and Robust Optimal Sampling Technique for BRDF Acquisition ( http://arxiv.org/abs/2401.07283v1 ) ライセンス: Link先を確認	Ehsan Miandji, Tanaboon Tongbuasirilai, Saghi Hajisharif, Behnaz Kavoosighafi, Jonas Unger	(参考訳) 実世界の材料の効率的かつ正確なbrdf取得は、何百万もの入射光と視野方向のサンプリングを必要とする困難な研究課題である。取得過程を高速化するためには, BRDFの完全回復が正確かつ堅牢であるような, サンプリング方向の最小セットを見つける必要がある。本稿では,BRDF の取得を圧縮センシング問題として定式化し,センサオペレータは最適なサンプル方向のセットに従ってBRDF 信号のサブサンプリングを行う。この問題を解決するために,光ビューのサンプルを配置し,回復誤差を最小限に抑えるための最適サブサンプリング演算子を設計するためのFROST(Fast and Robust Optimal Smpling Technique)を提案する。 FROSTは,Multiple Measurement Vector (MMV)信号モデルの下で,圧縮センシングのための最適サブサンプリング演算子をスパース表現に設計する問題を提起する。提案された再構成は、正確には、すなわち近似がないため、難解な組合せ問題を標準的な最適化手法で解けるものに変換する。その結果、FROSTには圧縮センシングの分野からの強い理論的保証が伴う。 BRDFデータセットを用いた10倍のクロスバリデーションを用いたFROST-BRDFの網羅的解析を行い,再建品質に対する最先端技術と比較して大きな優位性を示した。最後に、frostは概念的にも実装的にもシンプルで、各実行時に一貫性のある結果をもたらし、少なくとも以前の技術よりも2桁高速である。 Efficient and accurate BRDF acquisition of real world materials is a challenging research problem that requires sampling millions of incident light and viewing directions. To accelerate the acquisition process, one needs to find a minimal set of sampling directions such that the recovery of the full BRDF is accurate and robust given such samples. In this paper, we formulate BRDF acquisition as a compressed sensing problem, where the sensing operator is one that performs sub-sampling of the BRDF signal according to a set of optimal sample directions. To solve this problem, we propose the Fast and Robust Optimal Sampling Technique (FROST) for designing a provably optimal sub-sampling operator that places light-view samples such that the recovery error is minimized. FROST casts the problem of designing an optimal sub-sampling operator for compressed sensing into a sparse representation formulation under the Multiple Measurement Vector (MMV) signal model. The proposed reformulation is exact, i.e. without any approximations, hence it converts an intractable combinatorial problem into one that can be solved with standard optimization techniques. As a result, FROST is accompanied by strong theoretical guarantees from the field of compressed sensing. We perform a thorough analysis of FROST-BRDF using a 10-fold cross-validation with publicly available BRDF datasets and show significant advantages compared to the state-of-the-art with respect to reconstruction quality. Finally, FROST is simple, both conceptually and in terms of implementation, it produces consistent results at each run, and it is at least two orders of magnitude faster than the prior art.	翻訳日:2024-01-17 18:54:18 公開日:2024-01-14
# 自己訓練による白血セルの半教師付きセマンティクスセグメンテーション Semi-supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cel ( http://arxiv.org/abs/2401.07278v1 ) ライセンス: Link先を確認	Vinh Quoc Luu, Duy Khanh Le, Huy Thanh Nguyen, Minh Thanh Nguyen, Thinh Tien Nguyen, Vinh Quang Dinh	(参考訳) 医療における人工知能(AI)は、特に白血球がんの診断において、2つの主要な課題によって妨げられている: 白血球セグメンテーションのための大規模ラベル付きデータセットの欠如と、時代遅れのセグメンテーション方法である。最初の課題に対処するためには、大規模なデータセットを効率的にアノテートするために、半教師付き学習フレームワークを導入する必要がある。本稿では,fixmatchを組み込んだ新しい自己学習パイプラインを提案することで,この問題に対処した。自己学習パイプラインにFixMatchを組み込むことで、ほとんどのケースでパフォーマンスが向上することがわかった。 DeepLab-V3アーキテクチャの一貫性を備えた自己学習スキームとResNet-50で、Zheng 1, Zheng 2, LISCデータセットでそれぞれ90.69%、87.37%、76.49%に達した。 Artificial Intelligence (AI) in healthcare, especially in white blood cell cancer diagnosis, is hindered by two primary challenges: the lack of large-scale labeled datasets for white blood cell (WBC) segmentation and outdated segmentation methods. To address the first challenge, a semi-supervised learning framework should be brought to efficiently annotate the large dataset. In this work, we address this issue by proposing a novel self-training pipeline with the incorporation of FixMatch. We discover that by incorporating FixMatch in the self-training pipeline, the performance improves in the majority of cases. Our performance achieved the best performance with the self-training scheme with consistency on DeepLab-V3 architecture and ResNet-50, reaching 90.69%, 87.37%, and 76.49% on Zheng 1, Zheng 2, and LISC datasets, respectively.	翻訳日:2024-01-17 18:53:53 公開日:2024-01-14
# promptformer: asr用のconformerトランスデューサ Promptformer: Prompted Conformer Transducer for ASR ( http://arxiv.org/abs/2401.07360v1 ) ライセンス: Link先を確認	Sergio Duarte-Torres, Arunasish Sen, Aman Rana, Lukas Drude, Alejandro Gomez-Alanis, Andreas Schwarz, Leif R\"adel, Volker Leutnant	(参考訳) コンテキストキューは、自動音声認識(ASR)システムにおけるマルチターンインタラクションを改善する情報を運ぶ。本稿では,注目機構の音響表現とテキストコンテキストを融合させるために,ハイパープロンプトにインスパイアされた新しいメカニズムを提案する。マルチターンインタラクションを用いたテストセットでは,強いベースライン上で5.9%の相対単語誤り率低減(rwerr)を達成した。提案手法は文脈の欠如により劣化せず,文脈を伴わずにモデルが訓練されても改善につながることを示す。さらに,文脈埋め込み生成に事前学習された文片モデルを用いることで,外部bertモデルに勝ることを示す。 Context cues carry information which can improve multi-turn interactions in automatic speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired by hyper-prompting to fuse textual context with acoustic representations in the attention mechanism. Results on a test set with multi-turn interactions show that our method achieves 5.9% relative word error rate reduction (rWERR) over a strong baseline. We show that our method does not degrade in the absence of context and leads to improvements even if the model is trained without context. We further show that leveraging a pre-trained sentence-piece model for context embedding generation can outperform an external BERT model.	翻訳日:2024-01-17 18:47:38 公開日:2024-01-14
# 科学と深層学習における信頼性と解釈可能性 Reliability and Interpretability in Science and Deep Learning ( http://arxiv.org/abs/2401.07359v1 ) ライセンス: Link先を確認	Luigi Scorzato	(参考訳) 近年,機械学習(ML)手法の信頼性に関する疑問が重要視され,関連する不確実性の分析が研究の動機となっている。しかしながら、これらの研究の多くは、標準の誤り解析をMLモデル、特に標準科学的モデリングからかなり離れたディープニューラルネットワーク(DNN)モデルに適用している。したがって、標準誤差解析を、dnnモデルと標準科学的モデリングの相違の可能性と、これらの相違が信頼性評価に与える影響についてのより深い認識論的分析と統合する必要がある。この記事にはいくつかの貢献がある。まず、理論自由科学の錯覚に対するモデル仮定(MLと従来の科学の両方)のユビキタスな役割を強調します。第二に、モデル仮定は、言語非依存であることが示される、その(エピスティックな)複雑さの観点から分析される。 dnnモデルの高い認識論的複雑性は、その信頼性と長期的な進歩の予測を妨げていると論じている。今後の可能性も示唆されている。第3に,責任あるaiの文脈で導入されたモデル認識複雑性とその解釈可能性との密接な関係を明らかにする。モデル(ブラックボックスの問題)の理解の欠如は、個々のスキルとは無関係な方法で、その解釈可能性に影響を与える。また、解釈可能性が、統計分析だけでは理解できないあらゆるモデルの信頼性を評価するための前提条件であることも明らかにした。本稿では,従来の科学的モデルとDNNモデルの比較に焦点を当てる。しかし、ランダムフォレストやロジスティック回帰モデルも簡単に考慮される。 In recent years, the question of the reliability of Machine Learning (ML) methods has acquired significant importance, and the analysis of the associated uncertainties has motivated a growing amount of research. However, most of these studies have applied standard error analysis to ML models, and in particular Deep Neural Network (DNN) models, which represent a rather significant departure from standard scientific modelling. It is therefore necessary to integrate the standard error analysis with a deeper epistemological analysis of the possible differences between DNN models and standard scientific modelling and the possible implications of these differences in the assessment of reliability. This article offers several contributions. First, it emphasises the ubiquitous role of model assumptions (both in ML and traditional Science) against the illusion of theory-free science. Secondly, model assumptions are analysed from the point of view of their (epistemic) complexity, which is shown to be language-independent. It is argued that the high epistemic complexity of DNN models hinders the estimate of their reliability and also their prospect of long-term progress. Some potential ways forward are suggested. Thirdly, this article identifies the close relation between a model's epistemic complexity and its interpretability, as introduced in the context of responsible AI. This clarifies in which sense, and to what extent, the lack of understanding of a model (black-box problem) impacts its interpretability in a way that is independent of individual skills. It also clarifies how interpretability is a precondition for assessing the reliability of any model, which cannot be based on statistical analysis alone. This article focuses on the comparison between traditional scientific models and DNN models. But, Random Forest and Logistic Regression models are also briefly considered.	翻訳日:2024-01-17 18:46:57 公開日:2024-01-14
# AI生成合成画像の認識のためのハラスティング機械学習 Harnessing Machine Learning for Discerning AI-Generated Synthetic Images ( http://arxiv.org/abs/2401.07358v1 ) ライセンス: Link先を確認	Yuyang Wang, Yizhi Hao, Amando Xu Cong	(参考訳) デジタルメディアの領域では、AI生成合成画像の出現は、実物と製作された視覚コンテンツを区別する上で大きな課題をもたらしている。これらの画像は、しばしば真偽とは区別できないが、デジタルメディアの信頼性への脅威となり、偽情報や詐欺につながる可能性がある。我々の研究は、AI生成画像と実画像の識別に機械学習技術を活用することで、この課題に対処する。私たちのアプローチの中心は、"Real"と"Fake"とラベル付けされた画像の包括的なコレクションであるCIFAKEデータセットです。 ResNet、VGGNet、DenseNetといった先進的なディープラーニングアーキテクチャを洗練・適応し、トランスファーラーニングを利用して合成画像の識別精度を向上させる。また,これらを,バニラサポートベクトルマシン(SVM)と独自の畳み込みニューラルネットワーク(CNN)からなるベースラインモデルと比較した。 DenseNetは97.74%の精度で、私たちの最適化されたディープラーニングモデルは従来の手法より優れていることを示した。本研究は,これらの高度なモデルを合成画像検出に適用し,最適化し,様々なメトリクスを用いた比較分析を行い,従来の機械学習手法よりもai生成画像の識別に優れた性能を示す。この研究は、デジタルメディアの整合性の分野を前進させるだけでなく、デジタルメディアにおけるAI生成コンテンツの倫理的・技術的側面を探求するための基盤となる。 In the realm of digital media, the advent of AI-generated synthetic images has introduced significant challenges in distinguishing between real and fabricated visual content. These images, often indistinguishable from authentic ones, pose a threat to the credibility of digital media, with potential implications for disinformation and fraud. Our research addresses this challenge by employing machine learning techniques to discern between AI-generated and genuine images. Central to our approach is the CIFAKE dataset, a comprehensive collection of images labeled as "Real" and "Fake". We refine and adapt advanced deep learning architectures like ResNet, VGGNet, and DenseNet, utilizing transfer learning to enhance their precision in identifying synthetic images. We also compare these with a baseline model comprising a vanilla Support Vector Machine (SVM) and a custom Convolutional Neural Network (CNN). The experimental results were significant, demonstrating that our optimized deep learning models outperform traditional methods, with DenseNet achieving an accuracy of 97.74%. Our application study contributes by applying and optimizing these advanced models for synthetic image detection, conducting a comparative analysis using various metrics, and demonstrating their superior capability in identifying AI-generated images over traditional machine learning techniques. This research not only advances the field of digital media integrity but also sets a foundation for future explorations into the ethical and technical dimensions of AI-generated content in digital media.	翻訳日:2024-01-17 18:45:51 公開日:2024-01-14
# BUGSPHP:PHPの自動プログラム修復のためのデータセット BUGSPHP: A dataset for Automated Program Repair in PHP ( http://arxiv.org/abs/2401.07356v1 ) ライセンス: Link先を確認	K.D. Pramod, W.T.N. De Silva, W.U.K. Thabrew, Ridwan Shariffdeen, Sandareka Wickramanayake	(参考訳) 自動プログラム修正(APR)は、デバッグとバグ修正時間を節約することで開発者の生産性を向上させる。 APRはC/C++とJavaプログラムで広く研究されているが、ベンチマークPHPバグデータセットがないため、PHPプログラムのバグについてはほとんど研究されていない。 PHPが20年以上にわたって最も広く使われているサーバーサイド言語の一つであり、eコマース、ソーシャルネットワーク、コンテンツ管理といったさまざまなコンテキストで使われていることは驚くべきことです。本稿では,実世界のアプリケーションであるBUGSPHPにおけるPHPバグのベンチマークデータセットを提案する。データセットはトレーニングとテストデータセットで構成され、GitHubから別々にキュレーションされ、ローカルに処理される。トレーニングデータセットには600,000以上のバグ修正コミットが含まれている。テストデータセットには、開発者が提供するテストケースを備えた手作業によるバグ修正コミット513が含まれている。 Automated Program Repair (APR) improves developer productivity by saving debugging and bug-fixing time. While APR has been extensively explored for C/C++ and Java programs, there is little research on bugs in PHP programs due to the lack of a benchmark PHP bug dataset. This is surprising given that PHP has been one of the most widely used server-side languages for over two decades, being used in a variety of contexts such as e-commerce, social networking, and content management. This paper presents a benchmark dataset of PHP bugs on real-world applications called BUGSPHP, which can enable research on analysis, testing, and repair for PHP programs. The dataset consists of training and test datasets, separately curated from GitHub and processed locally. The training dataset includes more than 600,000 bug-fixing commits. The test dataset contains 513 manually validated bug-fixing commits equipped with developer-provided test cases to assess patch correctness.	翻訳日:2024-01-17 18:45:26 公開日:2024-01-14
# 低高度空域認証管理のための工学フェアと等価ソフトウェアシステムを目指して Towards Engineering Fair and Equitable Software Systems for Managing Low-Altitude Airspace Authorizations ( http://arxiv.org/abs/2401.07353v1 ) ライセンス: Link先を確認	Usman Gohar, Michael C. Hunter, Agnieszka Marczak-Czajka, Robyn R. Lutz, Myra B. Cohen, Jane Cleland-Huang	(参考訳) 小型無人航空機システム(SUAS)は様々な用途に広く採用されている。これにより、共有空域内の運用上の複雑さと報告されたインシデントの増加が導入され、安全性への懸念が高まっている。これに対し、アメリカ連邦航空局(FAA)は、そのミッションを安全に完了させるSUASの予測能力に基づいて、空域へのアクセスを制御するUAS Traffic Management (UTM)システムを開発している。しかし、飛行要求を迅速に承認または否定できる完全自動化システムはバイアスを起こしやすいため、多様な利害関係者にとって安全、透明性、公平性を考慮しなければならない。本稿では,自動化システムにおいて考慮すべき要因について,利害関係者の視点を考察する最初の研究を行う。その結果、飛行特性と環境条件が最も重要視されているが、パイロットとドローンの能力も考慮すべきである。さらに、いくつかの回答者はAIをサポートする自動化への反対を示し、自動意思決定における完全な透明性の必要性を強調した。結果は、UTM飛行認可決定の自動化の課題に関する社会的視点を提供し、より広範なsUASコミュニティに受け入れられる解決策の継続的な設計の枠組み化を支援する。 Small Unmanned Aircraft Systems (sUAS) have gained widespread adoption across a diverse range of applications. This has introduced operational complexities within shared airspaces and an increase in reported incidents, raising safety concerns. In response, the U.S. Federal Aviation Administration (FAA) is developing a UAS Traffic Management (UTM) system to control access to airspace based on an sUAS's predicted ability to safely complete its mission. However, a fully automated system capable of swiftly approving or denying flight requests can be prone to bias and must consider safety, transparency, and fairness to diverse stakeholders. In this paper, we present an initial study that explores stakeholders' perspectives on factors that should be considered in an automated system. Results indicate flight characteristics and environmental conditions were perceived as most important but pilot and drone capabilities should also be considered. Further, several respondents indicated an aversion to any AI-supported automation, highlighting the need for full transparency in automated decision-making. Results provide a societal perspective on the challenges of automating UTM flight authorization decisions and help frame the ongoing design of a solution acceptable to the broader sUAS community.	翻訳日:2024-01-17 18:45:13 公開日:2024-01-14
# EU法における生成AI - 責任、プライバシ、知的財産権、サイバーセキュリティ Generative AI in EU Law: Liability, Privacy, Intellectual Property, and Cybersecurity ( http://arxiv.org/abs/2401.07348v1 ) ライセンス: Link先を確認	Claudio Novelli, Federico Casolari, Philipp Hacker, Giorgio Spedicato, Luciano Floridi	(参考訳) 生成AIの出現、特にChatGPTとその後継者のような大規模言語モデル(LLM)を通じて、AIの世界におけるパラダイムシフトを象徴する。高度なLCMはマルチモーダリティを示し、多様なデータフォーマットを扱い、アプリケーションの範囲を広げる。しかし、これらのモデルの複雑さと創発的な自律性は、予測可能性と法的コンプライアンスの課題をもたらす。本稿では、欧州連合の文脈におけるジェネレーティブAIとLLMの法的および規制的な意味を掘り下げ、責任、プライバシー、知的財産権、サイバーセキュリティの側面を分析する。人工知能法(AIA)の草案を含む、既存のおよび提案されたEUの法律の妥当性を批判的に検証し、ジェネレーティブAIの一般的な問題、特にLLMの課題に対処する。本稿は、立法枠組みにおける潜在的なギャップと欠点を特定し、生成モデルの安全かつコンプライアンスの確保と、EUの進化するデジタルランドスケープと法的基準との整合性を確保するための勧告を提案する。 The advent of Generative AI, particularly through Large Language Models (LLMs) like ChatGPT and its successors, marks a paradigm shift in the AI landscape. Advanced LLMs exhibit multimodality, handling diverse data formats, thereby broadening their application scope. However, the complexity and emergent autonomy of these models introduce challenges in predictability and legal compliance. This paper delves into the legal and regulatory implications of Generative AI and LLMs in the European Union context, analyzing aspects of liability, privacy, intellectual property, and cybersecurity. It critically examines the adequacy of the existing and proposed EU legislation, including the Artificial Intelligence Act (AIA) draft, in addressing the unique challenges posed by Generative AI in general and LLMs in particular. The paper identifies potential gaps and shortcomings in the legislative framework and proposes recommendations to ensure the safe and compliant deployment of generative models, ensuring they align with the EU's evolving digital landscape and legal standards.	翻訳日:2024-01-17 18:44:51 公開日:2024-01-14
# 誰が言った? 幼児教室における音声分析の自動化 Who Said What? An Automated Approach to Analyzing Speech in Preschool Classrooms ( http://arxiv.org/abs/2401.07342v1 ) ライセンス: Link先を確認	Anchen Sun, Juan J Londono, Batya Elbaum, Luis Estrada, Roberto Jose Lazo, Laura Vitale, Hugo Gonzalez Villasanti, Riccardo Fusaroli, Lynn K Perry, Daniel S Messinger	(参考訳) 幼児は、騒音の多い幼稚園の教室で覚醒時間の大部分を過ごします。これらの環境では、教師との子どもの音声対話は言語結果に重要な貢献者であるが、手動による対話の翻訳は禁止されている。児童・教師向けレコーダーの音声を用いて,話者分類(ALICE)と発話書き起こし(Whisper)の両方にオープンソースソフトウェアを利用する自動フレームワークを提案する。本研究では,110分間の授業記録において,児童語マイクロフォン(n=4児)から85分間,教師・女性マイクロホン(n=2教師)から25分間の成績を比較した。すなわち、正しく分類された教師と子供の発話の割合は.76であり、誤り訂正されたカッパは.50、重み付けされたF1は.76である。教師と児童の書き起こしにおける単語エラー率は .15 であり、Whisper と専門家の書き起こしを同等にするためには、15%の単語を削除、追加、あるいは変更する必要がある。また, 単語の平均発話長, 質問文である教師と児童の発話率, 2.5秒以内で回答した発話の割合などの音声特徴は, 専門家と自動書き起こしとは別々に計算した場合に類似していた。その結果, 児童の言語発達を支援する教室音声の分析の進歩が示唆された。自然言語処理を用いた今後の研究は、話者分類の改善と、自動化された学習フレームワークの適用から、13人の子供と4人の教師が1年間に17回観察した教室記録を含むより大きなデータセットまでの分析が進められている。 Young children spend substantial portions of their waking hours in noisy preschool classrooms. In these environments, children's vocal interactions with teachers are critical contributors to their language outcomes, but manually transcribing these interactions is prohibitive. Using audio from child- and teacher-worn recorders, we propose an automated framework that uses open source software both to classify speakers (ALICE) and to transcribe their utterances (Whisper). We compare results from our framework to those from a human expert for 110 minutes of classroom recordings, including 85 minutes from child-word microphones (n=4 children) and 25 minutes from teacher-worn microphones (n=2 teachers). The overall proportion of agreement, that is, the proportion of correctly classified teacher and child utterances, was .76, with an error-corrected kappa of .50 and a weighted F1 of .76. The word error rate for both teacher and child transcriptions was .15, meaning that 15% of words would need to be deleted, added, or changed to equate the Whisper and expert transcriptions. Moreover, speech features such as the mean length of utterances in words, the proportion of teacher and child utterances that were questions, and the proportion of utterances that were responded to within 2.5 seconds were similar when calculated separately from expert and automated transcriptions. The results suggest substantial progress in analyzing classroom speech that may support children's language development. Future research using natural language processing is underway to improve speaker classification and to analyze results from the application of the automated it framework to a larger dataset containing classroom recordings from 13 children and 4 teachers observed on 17 occasions over one year.	翻訳日:2024-01-17 18:44:33 公開日:2024-01-14
# オンラインソーシャルリーダーにおけるシェイクスピアとカンパニーの残余 The Afterlives of Shakespeare and Company in Online Social Readership ( http://arxiv.org/abs/2401.07340v1 ) ライセンス: Link先を確認	Maria Antoniak, David Mimno, Rosamond Thalken, Melanie Walsh, Matthew Wilkens, Gregory Yauney	(参考訳) GoodreadsやLibraryThingといったソーシャル読書プラットフォームの成長により、非常に大規模かつ詳細な読書活動を分析することができます。しかし21世紀のシステムは、現代の読者のみに視点を与えてくれる。一方、シェイクスピア・アンド・カンパニーの貸出図書館記録のデジタル化は、戦間期パリのより小規模のコミュニティの読書活動の窓口となっている。本稿では,シェイクスピア・アンド・カンパニー・コミュニティとグッドリードズ・コミュニティの比較を行う。類似点と相違点の定量化によって、これらのデータセット間での作業の増加や人気低下のパターンを特定できる。また,共読パターンの類似性と差異を計測することで,作業の受信方法の違いを測定することもできる。最後に、共読の完全なネットワークを調べることで、文学的レセプションの全体構造の変化を観察することができる。 The growth of social reading platforms such as Goodreads and LibraryThing enables us to analyze reading activity at very large scale and in remarkable detail. But twenty-first century systems give us a perspective only on contemporary readers. Meanwhile, the digitization of the lending library records of Shakespeare and Company provides a window into the reading activity of an earlier, smaller community in interwar Paris. In this article, we explore the extent to which we can make comparisons between the Shakespeare and Company and Goodreads communities. By quantifying similarities and differences, we can identify patterns in how works have risen or fallen in popularity across these datasets. We can also measure differences in how works are received by measuring similarities and differences in co-reading patterns. Finally, by examining the complete networks of co-readership, we can observe changes in the overall structures of literary reception.	翻訳日:2024-01-17 18:44:02 公開日:2024-01-14
# CodeAgent: リアルタイムリポジトリレベルのコーディング課題のためのツール統合エージェントシステムによるコード生成の強化 CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges ( http://arxiv.org/abs/2401.07339v1 ) ライセンス: Link先を確認	Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, Zhi Jin	(参考訳) 大規模言語モデル(llm)は自動コード生成において有望であるが、一般的にはスタンドアロンコード単位生成のような単純なタスクでのみ優れている。しかし、実際のソフトウェア開発には、複雑な依存関係と広範なドキュメントを持つ複雑なコードリポジトリ(リポジトリという名前)が伴うことが多い。このギャップを埋めるために、我々の研究は、より現実的な、現実世界のリポジトリレベルのコード生成でLLMを評価することに重点を置いています。我々は,リポジトリレベルのコード生成のための手作業によるベンチマークであるCodeAgentBenchを紹介する。このベンチマークは、合計101サンプルを含む5つの高品質pythonプロジェクトで構成されている。我々は,リポジトリレベルのタスクにおいて9つの主要なllmを評価し,その性能の低下を観察した。そこで本研究では,レポレベルの効率的なコード生成に外部ツールを活用する新しいLLMベースのエージェントフレームワークであるCodeAgentを提案する。 CodeAgentは5つのプログラミングツールを統合し、情報検索、コードシンボルナビゲーション、コードテストのためのソフトウェアアーティファクトとのインタラクションを可能にする。これらのツールの使用を最適化するための4つのエージェント戦略を実装した。 CodeAgentBenchの実験では、CodeAgentはLLMの性能を大幅に向上させ、18.1\%から250\%に改善した。 HumanEvalベンチマークのさらなるテストでは、さまざまなコード生成タスクに対するCodeAgentの適応性と有効性を確認している。 CodeAgentはGithub Copilotのような商用製品よりも優れており、精度と効率が優れている。これらの結果は、コード生成におけるcodeagentの堅牢な能力を示し、実際のリポジトリレベルのコーディング課題の可能性を強調している。 Large Language Models (LLMs) have shown promise in automated code generation but typically excel only in simpler tasks such as generating standalone code units. Real-world software development, however, often involves complex code repositories (named repo) with complex dependencies and extensive documentation. To fill this gap, our research pivots towards evaluating LLMs in a more realistic setting -- real-world repo-level code generation. We introduce CodeAgentBench, a manually curated benchmark for repo-level code generation. This benchmark comprises five high-quality Python projects, encompassing a total of 101 samples. We assess nine leading LLMs on repo-level tasks and observe a decline in their performance. To tackle this, we present CodeAgent, a novel LLM-based agent framework that employs external tools for effective repo-level code generation. CodeAgent integrates five programming tools, enabling interaction with software artifacts for information retrieval, code symbol navigation, and code testing. We implement four agent strategies to optimize these tools' usage. Our experiments on CodeAgentBench show that CodeAgent enhances LLM performance significantly, with improvements ranging from 18.1\% to 250\%. Further tests on the HumanEval benchmark confirm CodeAgent's adaptability and efficacy across various code generation tasks. Notably, CodeAgent outperforms commercial products like Github Copilot, showcasing superior accuracy and efficiency. These results demonstrate CodeAgent's robust capabilities in code generation, highlighting its potential for real-world repo-level coding challenges.	翻訳日:2024-01-17 18:43:48 公開日:2024-01-14
# マンダリン多モーダル感情音声データベースの構築と評価 Construction and Evaluation of Mandarin Multimodal Emotional Speech Database ( http://arxiv.org/abs/2401.07336v1 ) ライセンス: Link先を確認	Zhu Ting, Li Liangqi, Duan Shufei, Zhang Xueying, Xiao Zhongzhe, Jia Hairng, Liang Huizhi	(参考訳) コーパス設計、主題選択、記録詳細及びデータ処理の側面から詳細に記述した、調音運動、音響、声門および顔の微小表現を含むマルチモーダル感情音声マンダリンデータベースを設計、確立する。信号は離散的な感情ラベル(中性、幸福、快楽、無関心、怒り、悲しみ、悲しみ)と次元的な感情ラベル(快楽、覚醒、支配)でラベル付けされる。本稿では,次元アノテーションデータの統計的解析により,次元アノテーションの有効性を検証する。注釈者のscl-90スケールデータを検証し、解析用パッドアノテーションデータと組み合わせ、アノテーションの異常現象と注釈者の心理状態との関係を探究する。本稿では,データベースの音声品質と感情識別の検証のために,svm,cnn,dnnの3つの基本モデルを用いて,これら7つの感情の認識率を計算する。その結果,音響データのみを用いた場合の7感情の平均認識率は約82%であった。声門データのみを使用する場合、平均認識率は約72%である。 kinematicsのデータだけで、平均認識率は55.7%に達する。したがって、データベースは高品質であり、特にマルチモーダル感情音声分析のタスクにおいて、音声分析研究の重要な情報源として使用できる。 A multi-modal emotional speech Mandarin database including articulatory kinematics, acoustics, glottal and facial micro-expressions is designed and established, which is described in detail from the aspects of corpus design, subject selection, recording details and data processing. Where signals are labeled with discrete emotion labels (neutral, happy, pleasant, indifferent, angry, sad, grief) and dimensional emotion labels (pleasure, arousal, dominance). In this paper, the validity of dimension annotation is verified by statistical analysis of dimension annotation data. The SCL-90 scale data of annotators are verified and combined with PAD annotation data for analysis, so as to explore the internal relationship between the outlier phenomenon in annotation and the psychological state of annotators. In order to verify the speech quality and emotion discrimination of the database, this paper uses 3 basic models of SVM, CNN and DNN to calculate the recognition rate of these seven emotions. The results show that the average recognition rate of seven emotions is about 82% when using acoustic data alone. When using glottal data alone, the average recognition rate is about 72%. Using kinematics data alone, the average recognition rate also reaches 55.7%. Therefore, the database is of high quality and can be used as an important source for speech analysis research, especially for the task of multimodal emotional speech analysis.	翻訳日:2024-01-17 18:43:23 公開日:2024-01-14
# ELLA-V: アライメント誘導配列並べ替えによる安定型ニューラルコーデック言語モデリング ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering ( http://arxiv.org/abs/2401.07333v1 ) ライセンス: Link先を確認	Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Xie Chen	(参考訳) VALL-Eのような音響的および言語的プロンプトに基づく言語モデル(LM)アプローチは、ゼロショット音声生成の分野で顕著な進歩を遂げた。しかし、既存の方法にはいくつかの制限がある。 1) 音声及び音素トークン間のアライメントの制限による出力合成音声における繰り返し、転置及び省略 2)自己回帰(AR)言語モデルを用いた合成音声のきめ細かい制御の課題 3)ARによる復号化の性質,特に欲張り戦略の下での無限沈黙生成。そこで本研究では,音素レベルでの合成音声のきめ細かい制御を可能にする,単純かつ効率的なlmベースのゼロショットテキスト・ツー・スパイチ(tts)フレームワークであるella-vを提案する。 ELLA-Vの鍵となるのは、対応する音響トークンよりも先に音素トークンが現れる音響トークンと音素トークンの連成である。実験結果から,本モデルは精度でVALL-Eより優れ,グリージーおよびサンプリングに基づく復号方式によりより安定した結果が得られることがわかった。 ELLA-Vのコードはクリーンアップ後にオープンソース化される。オーディオサンプルはhttps://ereboas.github.io/ELLAV/で入手できる。 The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation. However, existing methods still have some limitations: 1) repetitions, transpositions, and omissions in the output synthesized speech due to limited alignment constraints between audio and phoneme tokens; 2) challenges of fine-grained control over the synthesized speech with autoregressive (AR) language model; 3) infinite silence generation due to the nature of AR-based decoding, especially under the greedy strategy. To alleviate these issues, we propose ELLA-V, a simple but efficient LM-based zero-shot text-to-speech (TTS) framework, which enables fine-grained control over synthesized audio at the phoneme level. The key to ELLA-V is interleaving sequences of acoustic and phoneme tokens, where phoneme tokens appear ahead of the corresponding acoustic tokens. The experimental findings reveal that our model outperforms VALL-E in terms of accuracy and delivers more stable results using both greedy and sampling-based decoding strategies. The code of ELLA-V will be open-sourced after cleanups. Audio samples are available at https://ereboas.github.io/ELLAV/.	翻訳日:2024-01-17 18:43:01 公開日:2024-01-14
# 注意型UNetによるモノのインターネット上の軽量画像セマンティック通信システム Attention-based UNet enabled Lightweight Image Semantic Communication System over Internet of Things ( http://arxiv.org/abs/2401.07329v1 ) ライセンス: Link先を確認	Guoxin Ma, Haonan Tong, Nuocheng Yang, and Changchuan Yin	(参考訳) 本稿では,モノのインターネット(IoT)デバイス上に展開される軽量画像意味コミュニケーションシステムの問題について検討する。考慮されたシステムモデルでは、デバイスはデータ伝送効率の高い究極のビデオサービスにおけるユーザの行動認識をサポートするためにセマンティック通信技術を使用する必要がある。しかし、ディープラーニング(DL)ベースのコーデックトレーニングと推論の複雑な計算プロセスのため、IoTデバイスがセマンティックコーデックをデプロイするのは計算コストがかかる。セマンティック通信システムをIoTデバイスで展開するために,低計算複雑性と小型モデルサイズを実現する,注目ベースの軽量画像セマンティック通信(LSSC)システムを提案する。特に,我々はまず,エッジサーバのコーデックをLSSCシステムにトレーニングさせ,IoTデバイス上でのトレーニング計算負荷を低減する。次に,畳み込みブロックアテンションモジュール(cbam)を導入し,画像意味的特徴を抽出し,ダウンサンプリング層数を減らし,浮動小数点演算(flops)を削減する。最後に,コーデックの構造を実験的に調整し,ダウンサンプリング層の最適数を求める。シミュレーションの結果,提案するLSSCシステムにより,意味コーデックFLOPを14%削減し,モデルサイズを55%削減できることがわかった。さらに,提案方式は低チャネルsnr(signal-to-noise)領域において,従来の通信方式よりも高い伝送精度を実現することができる。 This paper studies the problem of the lightweight image semantic communication system that is deployed on Internet of Things (IoT) devices. In the considered system model, devices must use semantic communication techniques to support user behavior recognition in ultimate video service with high data transmission efficiency. However, it is computationally expensive for IoT devices to deploy semantic codecs due to the complex calculation processes of deep learning (DL) based codec training and inference. To make it affordable for IoT devices to deploy semantic communication systems, we propose an attention-based UNet enabled lightweight image semantic communication (LSSC) system, which achieves low computational complexity and small model size. In particular, we first let the LSSC system train the codec at the edge server to reduce the training computation load on IoT devices. Then, we introduce the convolutional block attention module (CBAM) to extract the image semantic features and decrease the number of downsampling layers thus reducing the floating-point operations (FLOPs). Finally, we experimentally adjust the structure of the codec and find out the optimal number of downsampling layers. Simulation results show that the proposed LSSC system can reduce the semantic codec FLOPs by 14%, and reduce the model size by 55%, with a sacrifice of 3% accuracy, compared to the baseline. Moreover, the proposed scheme can achieve a higher transmission accuracy than the traditional communication scheme in the low channel signal-to-noise (SNR) region.	翻訳日:2024-01-17 18:42:40 公開日:2024-01-14
# 従来のアプローチを超えて:乳房超音波診断のためのマルチタスクネットワーク Beyond Traditional Approaches: Multi-Task Network for Breast Ultrasound Diagnosis ( http://arxiv.org/abs/2401.07326v1 ) ライセンス: Link先を確認	Dat T. Chung, Minh-Anh Dang, Mai-Anh Vu, Minh T. Nguyen, Thanh-Huy Nguyen, and Vinh Q. Dinh	(参考訳) 乳腺超音波は非侵襲的アプローチとして癌診断において重要な役割を担っている。近年、深層学習の発展に伴い、腫瘍の局在化と癌分類のタスクにおいて多くのCNNベースのアプローチが研究されている。従来のシングルモデルは両方のタスクで優れたパフォーマンスを達成したが、これらのメソッドは推論時間、GPU要求、各モデルの微調整にいくつかの制限がある。本研究では,分割と分類の両方を行うために,エンドツーエンドのマルチタスクアーキテクチャを再設計し,構築することを目的とする。提案手法では,セグメンテーションタスクにおけるDeepLabV3+アーキテクチャの79.8%と86.4%で,優れた性能と時間効率を実現した。 Breast Ultrasound plays a vital role in cancer diagnosis as a non-invasive approach with cost-effective. In recent years, with the development of deep learning, many CNN-based approaches have been widely researched in both tumor localization and cancer classification tasks. Even though previous single models achieved great performance in both tasks, these methods have some limitations in inference time, GPU requirement, and separate fine-tuning for each model. In this study, we aim to redesign and build end-to-end multi-task architecture to conduct both segmentation and classification. With our proposed approach, we achieved outstanding performance and time efficiency, with 79.8% and 86.4% in DeepLabV3+ architecture in the segmentation task.	翻訳日:2024-01-17 18:42:14 公開日:2024-01-14
# Sachdev-Ye-Kitaev模型のランダムプルーニング Randomly Pruning the Sachdev-Ye-Kitaev model ( http://arxiv.org/abs/2401.07325v1 ) ライセンス: Link先を確認	Richard Berkovits	(参考訳) SYKモデル(Sachdev-Ye-Kitaev model)はその短時間のカオス的振る舞いで知られ、量子重力やホログラフィーなどの様々な分野への応用において基本的な役割を果たす。エネルギースペクトルの普遍的なカオス的振る舞いが停止するエネルギースケールを表すThouless Energyは、スペクトル自身から決定することができる。古典的あるいは量子コンピュータ上でSYKモデルをシミュレートする場合、結合をランダムに切断することでハミルトン項の項数を最小化することが有利である。本稿では,多数の結合を排除しながらも,カオス的挙動は短時間で持続することを示した。これは,完全連結SYKモデル(具体的には$O(KL)$)において,元の$O(L^4)$結合のごく一部しか保持されていない場合でも事実である。ここで、$l$はサイト数を表し、$k\sim 10$である。短時間スケールに対応する長距離エネルギースケールの特性を数値的特異値分解(svd)とレベル数分散計算により検証する。 The Sachdev-Ye-Kitaev model (SYK) is renowned for its short-time chaotic behavior, which plays a fundamental role in its application to various fields such as quantum gravity and holography. The Thouless energy, representing the energy scale at which the universal chaotic behavior in the energy spectrum ceases, can be determined from the spectrum itself. When simulating the SYK model on classical or quantum computers, it is advantageous to minimize the number of terms in the Hamiltonian by randomly pruning the couplings. In this paper, we demonstrate that even with a significant pruning, eliminating a large number of couplings, the chaotic behavior persists up to short time scales This is true even when only a fraction of the original $O(L^4)$ couplings in the fully connected SYK model, specifically $O(KL)$, is retained. Here, $L$ represents the number of sites, and $K\sim 10$. The properties of the long-range energy scales, corresponding to short time scales, are verified through numerical singular value decomposition (SVD) and level number variance calculations.	翻訳日:2024-01-17 18:42:02 公開日:2024-01-14
# 小さなLLMは弱いツール学習者:マルチLLMエージェント Small LLMs Are Weak Tool Learners: A Multi-LLM Agent ( http://arxiv.org/abs/2401.07324v1 ) ライセンス: Link先を確認	Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang	(参考訳) 大規模言語モデル(LLM)エージェントはスタンドアロンのLLMの機能を大幅に拡張し、外部ツール(API、関数など)と対話し、自己指向的な複雑なタスクを完了させる。ツール利用の課題は、LCMがユーザクエリを理解し、回答を生成するだけでなく、タスク計画、メモリ管理、ツールの実行、結果の要約にも長けていることである。従来のアプローチでは、これらすべての機能で単一のLLMをトレーニングすることに重点を置いているが、特に小さなモデルでは、パフォーマンス上の制限が明らかになっている。さらに、LDM全体がツールの更新時に再トレーニングを必要とする場合がある。これらの課題を克服するため,我々は,上記の機能をプランナー,呼び出し元,要約元に分解する新しい戦略を提案する。各コンポーネントは、特定の機能に焦点を当てた単一のLCMによって実装され、タスクを達成するために他のコンポーネントと協調する。このモジュール化フレームワークは、個々の更新と、各機能を構築するためのより小さなllmの使用を促進する。このフレームワークを効果的にトレーニングするために,2段階のトレーニングパラダイムを導入する。まず、サブタスクを識別することなく、データセット全体のバックボーンLDMを微調整し、タスクを包括的に理解するモデルを提供する。次に、微調整LDMを用いて、各サブタスク上で連続的に微調整されるプランナー、呼び出し元、および要約器をインスタンス化する。ツール使用ベンチマークによる評価は,提案したマルチLLMフレームワークが従来の単一LLMアプローチを超越していることを示し,ツール学習の有効性とメリットを強調している。 Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs, empowering them to interact with external tools (e.g., APIs, functions) and complete complex tasks in a self-directed fashion. The challenge of tool use demands that LLMs not only understand user queries and generate answers but also excel in task planning, memory management, tool invocation, and result summarization. While traditional approaches focus on training a single LLM with all these capabilities, performance limitations become apparent, particularly with smaller models. Moreover, the entire LLM may require retraining when tools are updated. To overcome these challenges, we propose a novel strategy that decomposes the aforementioned capabilities into a planner, caller, and summarizer. Each component is implemented by a single LLM that focuses on a specific capability and collaborates with other components to accomplish the task. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability. To effectively train this framework, we introduce a two-stage training paradigm. First, we fine-tune a backbone LLM on the entire dataset without discriminating sub-tasks, providing the model with a comprehensive understanding of the task. Second, the fine-tuned LLM is used to instantiate the planner, caller, and summarizer respectively, which are continually fine-tuned on respective sub-tasks. Evaluation across various tool-use benchmarks illustrates that our proposed multi-LLM framework surpasses the traditional single-LLM approach, highlighting its efficacy and advantages in tool learning.	翻訳日:2024-01-17 18:41:39 公開日:2024-01-14
# 強い誘導バイアス:二元画像分類のためのGzip A Strong Inductive Bias: Gzip for binary image classification ( http://arxiv.org/abs/2401.07392v1 ) ライセンス: Link先を確認	Marco Scilipoti, Marina Fuster and Rodrigo Ramele	(参考訳) ディープラーニングネットワークは、産業と研究のためのコンピュータビジョンのデファクトスタンダードになっている。しかし、従兄弟である自然言語処理(nlp)における最近の進展は、強い帰納的バイアスを持つパラメータレスモデルが計算上安価でより単純な代替手段として役立つことを示した。本稿では,Gzipのような汎用圧縮機と隣接した2値画像分類モデルを提案する。 Resnet、EfficientNet、Mobilenetといった一般的なディープラーニングネットワークと比較し、精度が向上し、2桁以上の空間を数ショット設定で大幅に削減できることを示します。結果として、少数のシナリオにおいて、より強い帰納バイアスを持つモデルの未解決ポテンシャルが弱まると信じている。 Deep learning networks have become the de-facto standard in Computer Vision for industry and research. However, recent developments in their cousin, Natural Language Processing (NLP), have shown that there are areas where parameter-less models with strong inductive biases can serve as computationally cheaper and simpler alternatives. We propose such a model for binary image classification: a nearest neighbor classifier combined with a general purpose compressor like Gzip. We test and compare it against popular deep learning networks like Resnet, EfficientNet and Mobilenet and show that it achieves better accuracy and utilizes significantly less space, more than two order of magnitude, within a few-shot setting. As a result, we believe that this underlines the untapped potential of models with stronger inductive biases in few-shot scenarios.	翻訳日:2024-01-17 18:34:12 公開日:2024-01-14
# 膝またはROC Knee or ROC ( http://arxiv.org/abs/2401.07390v1 ) ライセンス: Link先を確認	Veronica Wendt, Byunggu Yu, Caleb Kelly, and Junwhan Kim	(参考訳) セルフアテンショントランスフォーマは、より小さなデータセットによる画像分類の精度を示している。しかし、現在までのテストは、画像人口の既知の表現を伴う単一クラス画像検出に基づいている。入力画像クラスが1より大きい場合と、画像人口の表現に関する完全な情報を持たないテストセットの場合、精度の計算が適応しなければならない。受信側動作特性(ROC)の精度は、マルチクラスの入力画像のインスタンスに対処できる。しかし、このアプローチは、画像の人口表現が不明な場合では不適当である。そこで我々は, 膝法を用いて, アドホックベースでしきい値を決定する計算精度について検討した。マルチクラス画像検出のためにcifar-10画像から作成した多クラスデータセットにおけるroc曲線と膝閾値について検討した。 Self-attention transformers have demonstrated accuracy for image classification with smaller data sets. However, a limitation is that tests to-date are based upon single class image detection with known representation of image populations. For instances where the input image classes may be greater than one and test sets that lack full information on representation of image populations, accuracy calculations must adapt. The Receiver Operating Characteristic (ROC) accuracy thresh-old can address the instances of multi-class input images. However, this approach is unsuitable in instances where image population representation is unknown. We consider calculating accuracy using the knee method to determine threshold values on an ad-hoc basis. Results of ROC curve and knee thresholds for a multi-class data set, created from CIFAR-10 images, are discussed for multi-class image detection.	翻訳日:2024-01-17 18:33:57 公開日:2024-01-14
# クラスタリングアルゴリズムの迅速なレビュー A Rapid Review of Clustering Algorithms ( http://arxiv.org/abs/2401.07389v1 ) ライセンス: Link先を確認	Hui Yin, Amir Aryani, Stephen Petrie, Aishwarya Nambissan, Aland Astudillo, Shengyuan Cao	(参考訳) クラスタリングアルゴリズムは、データ内の固有のパターンと類似性に基づいて、データをグループまたはクラスタにまとめることを目的としている。それらは、マーケティングやeコマース、ヘルスケア、データ組織と分析、ソーシャルメディアなど、今日の生活において重要な役割を担っている。クラスタリングアルゴリズムは数多く存在し、新しいものを導入する開発が進行中である。各アルゴリズムには独自の強みと弱みがあり、現在、すべてのタスクに普遍的に適用可能なアルゴリズムは存在しない。本研究では,既存のクラスタリングアルゴリズムを分析し,基本原理と特徴,クラスタへのデータポイント割り当て,データセットのキャパシティ,クラスタ番号とアプリケーション領域という,5つの異なる次元にわたるメインストリームアルゴリズムを分類する。この分類は、様々な観点からクラスタリングアルゴリズムを理解し、特定のタスクを解くのに適したアルゴリズムを特定するのに役立つ。最後に,クラスタリングアルゴリズムの現状と今後の方向性について考察した。また、オープンな課題と未解決の問題を特定し議論した。 Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data organization and analysis, and social media. Numerous clustering algorithms exist, with ongoing developments introducing new ones. Each algorithm possesses its own set of strengths and weaknesses, and as of now, there is no universally applicable algorithm for all tasks. In this work, we analyzed existing clustering algorithms and classify mainstream algorithms across five different dimensions: underlying principles and characteristics, data point assignment to clusters, dataset capacity, predefined cluster numbers and application area. This classification facilitates researchers in understanding clustering algorithms from various perspectives and helps them identify algorithms suitable for solving specific tasks. Finally, we discussed the current trends and potential future directions in clustering algorithms. We also identified and discussed open challenges and unresolved issues in the field.	翻訳日:2024-01-17 18:33:46 公開日:2024-01-14
# デバイス非依存モデルによるネットワークインタラクションの最適化 Optimising network interactions through device agnostic models ( http://arxiv.org/abs/2401.07387v1 ) ライセンス: Link先を確認	Luca Manneschi, Ian T. Vidamour, Kilian D. Stenning, Jack C. Gartside, Charles Swindells, Guru Venkat, David Griffin, Susan Stepney, Will R. Branford, Thomas Hayward, Matt O Ellis, Eleni Vasilaki	(参考訳) 物理的に実装されたニューラルネットワークは、デバイス固有の物理的特性を計算ツールとして活用することにより、ディープラーニングモデルの性能を達成する可能性を秘めている。この計算のための物理過程の探索は、情報を処理する貴重な資源となる固有の力学も考慮する必要がある。しかし、既存の計算手法は、しばしば正確な数学的記述を欠いているデバイス力学に影響を与えるパラメータにディープラーニング技術の成功を拡張できない。本研究では,動的物理システムとのインタラクションを完全にデータ駆動方式で最適化するための普遍的なフレームワークを定式化する。このフレームワークは、神経確率微分方程式を微分可能なデジタル双対として採用し、デバイスの決定論的および確率的両方の振る舞いを効果的にキャプチャする。トレーニングされたモデルによる微分の利用は、物理ニューラルネットワークの最適化に不可欠な数学的推定を提供し、その物理ノードの固有の時間計算能力を活用する。実際のデバイスの動作を正確にモデル化するために,様々な実験環境で動作可能なニューラルsde変種を定式化した。本研究は、物理的に定義されたニューラルネットワークの展開を成功させる上で、システム確率を正確に捉えることの重要性を強調しながら、シミュレーションと相互作用する動的デバイスの物理的実装を通じて、フレームワークの適用性を示す。 Physically implemented neural networks hold the potential to achieve the performance of deep learning models by exploiting the innate physical properties of devices as computational tools. This exploration of physical processes for computation requires to also consider their intrinsic dynamics, which can serve as valuable resources to process information. However, existing computational methods are unable to extend the success of deep learning techniques to parameters influencing device dynamics, which often lack a precise mathematical description. In this work, we formulate a universal framework to optimise interactions with dynamic physical systems in a fully data-driven fashion. The framework adopts neural stochastic differential equations as differentiable digital twins, effectively capturing both deterministic and stochastic behaviours of devices. Employing differentiation through the trained models provides the essential mathematical estimates for optimizing a physical neural network, harnessing the intrinsic temporal computation abilities of its physical nodes. To accurately model real devices' behaviours, we formulated neural-SDE variants that can operate under a variety of experimental settings. Our work demonstrates the framework's applicability through simulations and physical implementations of interacting dynamic devices, while highlighting the importance of accurately capturing system stochasticity for the successful deployment of a physically defined neural network.	翻訳日:2024-01-17 18:33:33 公開日:2024-01-14
# マシンはどのように学習するか? AIcon2abs法の評価 How do machines learn? Evaluating the AIcon2abs method ( http://arxiv.org/abs/2401.07386v1 ) ライセンス: Link先を確認	Rubens Lacerda Queiroz, Cabral Lima, Fabio Ferrentini Sampaio, Priscila Machado Vieira Lima	(参考訳) 本稿では,最近の提案手法であるaicon2abs(queiroz et al., 2021)について評価する。これは、容易に理解できる機械学習メカニズムであるWiSARDを使用することで可能であり、ほとんど労力を要せず、ターゲットユーザからの技術的バックグラウンドも必要としない。 WiSARDはデジタルコンピューティングに忠実であり、トレーニングはRAMタイプのメモリへの書き込みから成り、分類はこれらのメモリからの読み込みから成り立っている。このモデルにより、学習や分類タスクの内部実現を簡単に可視化し、理解することができる。さらに、WiSARDモデルはトレーニングや分類にインターネット接続を必要としないため、いくつかの例から学ぶことができる。この機能により、マシンの観察が容易になり、使用する新しい例ごとに特定のタスクの精度が向上する。 WiSARDはこれまでに学んだことの「メンタルイメージ」を作成でき、特定のクラスに関連する重要な特徴を識別できる。 AIcon2abs法の有効性の評価は,作業負荷が約6時間である遠隔コースの評価を通じて行った。 8歳から11歳の子供5人、12歳から17歳の青年5人、21歳から72歳の成人24人であった。データ分析はハイブリッドアプローチを採用した。 AIcon2absは、研究対象者の約100%によって評価され、収集されたデータは、意図された結果に関して非常に満足な結果を示した。この研究は、CEP/HUCFF/FM/UFRJ Human Research Ethics Committeeによって承認されている。 This paper evaluates AIcon2abs (Queiroz et al., 2021), a recently proposed method that enables awareness among the general public on machine learning. Such is possible due to the use of WiSARD, an easily understandable machine learning mechanism, thus requiring little effort and no technical background from the target users. WiSARD is adherent to digital computing; training consists of writing to RAM-type memories, and classification consists of reading from these memories. The model enables easy visualization and understanding of training and classification tasks' internal realization through ludic activities. Furthermore, the WiSARD model does not require an Internet connection for training and classification, and it can learn from a few or one example. This feature makes it easier to observe the machine, increasing its accuracy on a particular task with each new example used. WiSARD can also create "mental images" of what it has learned so far, evidencing key features pertaining to a given class. The assessment of the AIcon2abs method's effectiveness was conducted through the evaluation of a remote course with a workload of approximately 6 hours. It was completed by thirty-four Brazilian subjects: 5 children between 8 and 11 years old; 5 adolescents between 12 and 17 years old; and 24 adults between 21 and 72 years old. Data analysis adopted a hybrid approach. AIcon2abs was well-rated by almost 100% of the research subjects, and the data collected revealed quite satisfactory results concerning the intended outcomes. This research has been approved by the CEP/HUCFF/FM/UFRJ Human Research Ethics Committee.	翻訳日:2024-01-17 18:33:12 公開日:2024-01-14
# DRLC:LLM批判からのDense Rewardsによる強化学習 DRLC: Reinforcement Learning with Dense Rewards from LLM Critic ( http://arxiv.org/abs/2401.07382v1 ) ライセンス: Link先を確認	Meng Cao, Lei Shu, Lei Yu, Yun Zhu, Nevan Wichers, Yinxiao Liu, Lei Meng	(参考訳) 強化学習(rl)は、言語モデルを人間の好みなど、区別できない報酬信号に合わせることができる。しかし、これらの報酬信号のスパース性から生じる大きな課題は、通常、世代全体に対して1つの報酬しか存在しないことである。この報酬の幅は非効率で不安定な学習につながる可能性がある。本稿では,LLMの批判的能力を活用して,学習過程を通じて深い報酬を生み出す新しい枠組みを提案する。我々のアプローチには、政策モデルと並んで批判言語モデルが組み込まれています。この批評家は、タスク記述、質問、ポリシーモデルの出力、環境の報酬信号を入力として促され、出力の各セグメントの品質を反映したトークンまたはスパンレベルの密集した報酬を提供する。我々は,感情制御,言語モデルのデトックス化,要約という3つのテキスト生成タスクに対するアプローチを評価する。実験結果から, 人工的な高密度報酬をトレーニングに取り入れることで, PPOベースラインを総合的な報酬で一貫した性能向上が得られることがわかった。さらに,同じモデルが政策と批判の両方として機能する環境では,自己批判的報酬が学習効率を高めることを実証する。 Reinforcement learning (RL) can align language models with non-differentiable reward signals, such as human preferences. However, a major challenge arises from the sparsity of these reward signals - typically, there is only one reward for the entire generation. This sparsity of rewards can lead to inefficient and unstable learning. In this paper, we introduce a novel framework leveraging the critique ability of LLMs to produce dense rewards throughout the learning process. Our approach incorporates a critic language model alongside the policy model. This critic is prompted with the task description, question, policy model's output, and environment's reward signal as input, and provides token or span-level dense rewards that reflect the quality of each segment of the output. We assess our approach on three text generation tasks: sentiment control, language model detoxification, and summarization. Experimental results show that incorporating artificial dense rewards in training yields consistent performance gains over the PPO baseline with holistic rewards. Furthermore, in a setting where the same model serves as both policy and critic, we demonstrate that "self-critique" rewards also boost learning efficiency.	翻訳日:2024-01-17 18:32:48 公開日:2024-01-14
# 物理インフォームドニューラルネットワークを用いた単細胞データからの動的遺伝子制御ネットワークの推定 Inference of dynamical gene regulatory networks from single-cell data with physics informed neural networks ( http://arxiv.org/abs/2401.07379v1 ) ライセンス: Link先を確認	Maria Mircea, Diego Garlaschelli, Stefan Semrau	(参考訳) 発達生物学の主な目的の1つは、多能性前駆体を正確に特定された細胞タイプに堅牢に分化させる遺伝子制御ネットワーク(GRN)を明らかにすることである。実験データからGRNを推定する既存の方法の多くは、推定されたGRNが単に遺伝子発現の類似性や相関を反映しているため、予測能力に制限がある。ここでは,物理インフォームドニューラルネットワーク(PINN)を用いて,生物学的プロセスの機械的理解を提供する予測的,動的GRNのパラメータを推定する方法を示す。具体的には, 分岐挙動を示すGRNについて検討し, 細胞分化をモデル化する。パラメータ推論タスクにおいて、PINNは通常のフィードフォワードニューラルネットワークよりも優れており、関連する2つの実験シナリオを分析する。 1 遺伝子発現経路が利用可能な細胞通信システム及び 2. 細胞間通信が欠如している細胞集団のスナップショット測定。我々の分析は、PINNで分析される将来の実験の設計を知らせ、この強力なニューラルネットワークモデルをさらに探求するための出発点を提供する。 One of the main goals of developmental biology is to reveal the gene regulatory networks (GRNs) underlying the robust differentiation of multipotent progenitors into precisely specified cell types. Most existing methods to infer GRNs from experimental data have limited predictive power as the inferred GRNs merely reflect gene expression similarity or correlation. Here, we demonstrate, how physics-informed neural networks (PINNs) can be used to infer the parameters of predictive, dynamical GRNs that provide mechanistic understanding of biological processes. Specifically we study GRNs that exhibit bifurcation behavior and can therefore model cell differentiation. We show that PINNs outperform regular feed-forward neural networks on the parameter inference task and analyze two relevant experimental scenarios: 1. a system with cell communication for which gene expression trajectories are available and 2. snapshot measurements of a cell population in which cell communication is absent. Our analysis will inform the design of future experiments to be analyzed with PINNs and provides a starting point to explore this powerful class of neural network models further.	翻訳日:2024-01-17 18:32:27 公開日:2024-01-14
# 最近傍探索に基づく地球モーバー距離の効率的な近似 Efficient approximation of Earth Mover's Distance Based on Nearest Neighbor Search ( http://arxiv.org/abs/2401.07378v1 ) ライセンス: Link先を確認	Guangyu Meng, Ruyu Zhou, Liu Liu, Peixian Liang, Fang Liu, Danny Chen, Michael Niemier, X.Sharon Hu	(参考訳) Earth Mover's Distance (EMD) は、2つの分布間の重要な類似度尺度であり、コンピュータビジョンやその他の多くのアプリケーションドメインで使用される。しかし、その正確な計算は計算量とメモリ集約性であり、大規模問題に対するスケーラビリティと適用性を妨げる。計算コストを削減するために様々な近似EMDアルゴリズムが提案されているが、精度が低下し、追加のメモリ使用量や手動パラメータチューニングが必要になる可能性がある。本稿では,NNS-EMDという新しい手法を用いて,近縁探索(NNS)を用いてEMDを近似し,高い精度,低時間複雑度,高メモリ効率を実現する。 NNS操作は、NNSイテレーション毎のデータポイント数を削減し、並列処理の機会を提供する。我々はさらに、大規模なデータセットに特に有益であるGPU上のベクトル化により、NS-EMDを加速する。我々は,NNS-EMDを画像分類および検索タスクにおける正確なEMDアルゴリズムと最先端の近似EMDアルゴリズムを比較した。また、NNS-EMDを用いてトランスポートマッピングを計算し、画像間の色移動を実現する。 NNS-EMDは、正確なEMD実装よりも44倍から135倍高速で、既存の近似EMD法よりも精度、スピードアップ、メモリ効率が優れている。 Earth Mover's Distance (EMD) is an important similarity measure between two distributions, used in computer vision and many other application domains. However, its exact calculation is computationally and memory intensive, which hinders its scalability and applicability for large-scale problems. Various approximate EMD algorithms have been proposed to reduce computational costs, but they suffer lower accuracy and may require additional memory usage or manual parameter tuning. In this paper, we present a novel approach, NNS-EMD, to approximate EMD using Nearest Neighbor Search (NNS), in order to achieve high accuracy, low time complexity, and high memory efficiency. The NNS operation reduces the number of data points compared in each NNS iteration and offers opportunities for parallel processing. We further accelerate NNS-EMD via vectorization on GPU, which is especially beneficial for large datasets. We compare NNS-EMD with both the exact EMD and state-of-the-art approximate EMD algorithms on image classification and retrieval tasks. We also apply NNS-EMD to calculate transport mapping and realize color transfer between images. NNS-EMD can be 44x to 135x faster than the exact EMD implementation, and achieves superior accuracy, speedup, and memory efficiency over existing approximate EMD methods.	翻訳日:2024-01-17 18:32:12 公開日:2024-01-14
# 道路網のトポロジ的クレデンシャルに基づく指向性構成のためのデータ駆動型レジリエンスフレームワーク A Data-driven Resilience Framework of Directionality Configuration based on Topological Credentials in Road Networks ( http://arxiv.org/abs/2401.07371v1 ) ライセンス: Link先を確認	H M Imran Kays, Khondhaker Al Momin, K.K. "Muralee" Muraleetharan, Arif Mohaimin Sadri	(参考訳) 道路再設定は交通流の向上,渋滞の低減,既存のインフラや資源による道路網全体の性能向上を目的とした交通計画の重要な側面である。本稿では,最適化に基づくBrute Force検索手法と意思決定支援フレームワークを統合して,道路構成のランク付けを行い,性能向上を図る。提案フレームワークは、最適化プロセス中に生成されたシナリオからの入力を組み合わせたマルチ基準決定分析(MCDA)アプローチを取り入れている。最適化からのデータを利用することで,システム走行時間(stt)と全リンクトラヒックフロー(tltf)を最も影響力のある決定変数として識別する。開発したフレームワークはグラフ理論を利用して交通ネットワークのトポロジをモデル化し,ネットワーク科学のメトリクスと確率的ユーザ均衡トラフィック割り当てを適用し,各道路構成が全体のネットワーク性能に与える影響を評価する。道路構成のランク付けには、リッジ回帰などの機械学習アルゴリズムを使用し、各基準(TBC、STT、TLTF)の最適な重みを決定する。さらに、ネットワークベース分析により、選択された構成が個々の道路セグメントを最適化するだけでなく、システムレベルの効率を向上させることが保証される。マルチ基準の意思決定分析、機械学習、ネットワークサイエンスメトリクスを統合することで、提案されたフレームワークは、交通計画者が情報とデータ駆動による意思決定を可能にし、より持続可能な、効率的でレジリエントな道路構成を実現する。 Roadway reconfiguration is a crucial aspect of transportation planning, aiming to enhance traffic flow, reduce congestion, and improve overall road network performance with existing infrastructure and resources. This paper presents a novel roadway reconfiguration technique by integrating optimization based Brute Force search approach and decision support framework to rank various roadway configurations for better performance. The proposed framework incorporates a multi-criteria decision analysis (MCDA) approach, combining input from generated scenarios during the optimization process. By utilizing data from optimization, the model identifies total betweenness centrality (TBC), system travel time (STT), and total link traffic flow (TLTF) as the most influential decision variables. The developed framework leverages graph theory to model the transportation network topology and apply network science metrics as well as stochastic user equilibrium traffic assignment to assess the impact of each roadway configuration on the overall network performance. To rank the roadway configurations, the framework employs machine learning algorithms, such as ridge regression, to determine the optimal weights for each criterion (i.e., TBC, STT, TLTF). Moreover, the network-based analysis ensures that the selected configurations not only optimize individual roadway segments but also enhance system-level efficiency, which is particularly helpful as the increasing frequency and intensity of natural disasters and other disruptive events underscore the critical need for resilient transportation networks. By integrating multi-criteria decision analysis, machine learning, and network science metrics, the proposed framework would enable transportation planners to make informed and data-driven decisions, leading to more sustainable, efficient, and resilient roadway configurations.	翻訳日:2024-01-17 18:31:50 公開日:2024-01-14
# GAN系列を用いた歩行者検出のための合成画像の生成 Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs ( http://arxiv.org/abs/2401.07370v1 ) ライセンス: Link先を確認	Viktor Seib and Malte Roosen and Ida Germann and Stefan Wirtz and Dietrich Paulus	(参考訳) 注釈付きデータセットを作成するには、かなりの量の手作業が必要です。本稿では,概念実証において,新しい画像生成パイプラインを提案することでこの問題に対処する。パイプラインは3つの異なる生成的敵ネットワーク(以前は公開されていた)で構成されており、歩行者検出のためのデータセットを増強する新しい方法で結合されている。生成した画像が必ずしも人間の目にとって視覚的に快適であるとは限らないにもかかわらず、我々の検出ベンチマークは結果がベースラインをはるかに上回っていることを明らかにしている。提案された概念実証作業は2020年に行われ、現在は3年間の維持期間を経て技術報告として公開されている。 Creating annotated datasets demands a substantial amount of manual effort. In this proof-of-concept work, we address this issue by proposing a novel image generation pipeline. The pipeline consists of three distinct generative adversarial networks (previously published), combined in a novel way to augment a dataset for pedestrian detection. Despite the fact that the generated images are not always visually pleasant to the human eye, our detection benchmark reveals that the results substantially surpass the baseline. The presented proof-of-concept work was done in 2020 and is now published as a technical report after a three years retention period.	翻訳日:2024-01-17 18:31:20 公開日:2024-01-14
# CoVO-MPC:サンプリングベースMPCの理論解析と最適共分散設計 CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design ( http://arxiv.org/abs/2401.07369v1 ) ライセンス: Link先を確認	Zeji Yi, Chaoyi Pan, Guanqi He, Guannan Qu, Guanya Shi	(参考訳) サンプリングベースのモデル予測制御(MPC)は、その柔軟性と並列化性により、モデルベースの強化学習など、多くの領域において実用的で効果的なアプローチである。その魅力的な経験的性能にもかかわらず、特に収束解析とハイパーパラメータチューニングの観点からの理論的理解はいまだ欠落している。本稿では,広く使用されているサンプリングベースMPC法であるモデル予測パス積分制御(MPPI)の収束特性を特徴付ける。時間変動LQRシステムをカバーする2次最適化では,MPPIは少なくとも線形収束率を満足することを示す。さらに、より一般的な非線形システムにも拡張します。我々の理論解析は, サンプリングに基づく新しいMPCアルゴリズム, CoVo-MPC (CoVariance-Optimal MPC) に直接導出し, サンプリング共分散を最適にスケジュールし, 収束率を最適化する。実証的には、CoVo-MPCは標準的なMPPIよりも43～54%優れています。ビデオと付録は \url{https://lecar-lab.github.io/covo-mpc/} で入手できる。 Sampling-based Model Predictive Control (MPC) has been a practical and effective approach in many domains, notably model-based reinforcement learning, thanks to its flexibility and parallelizability. Despite its appealing empirical performance, the theoretical understanding, particularly in terms of convergence analysis and hyperparameter tuning, remains absent. In this paper, we characterize the convergence property of a widely used sampling-based MPC method, Model Predictive Path Integral Control (MPPI). We show that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems. We then extend to more general nonlinear systems. Our theoretical analysis directly leads to a novel sampling-based MPC algorithm, CoVariance-Optimal MPC (CoVo-MPC) that optimally schedules the sampling covariance to optimize the convergence rate. Empirically, CoVo-MPC significantly outperforms standard MPPI by 43-54% in both simulations and real-world quadrotor agile control tasks. Videos and Appendices are available at \url{https://lecar-lab.github.io/CoVO-MPC/}.	翻訳日:2024-01-17 18:31:11 公開日:2024-01-14
# 大規模言語モデルを用いたnlpのアクティブラーニング Active Learning for NLP with Large Language Models ( http://arxiv.org/abs/2401.07367v1 ) ライセンス: Link先を確認	Xuesong Wang	(参考訳) トレーニングサンプルの人間のアノテーションは高価で、退屈で、特に自然言語処理(NLP)タスクでは困難である。ラベリングコストを削減し、サンプル効率を高めるために、アクティブラーニング(AL)技術は、できるだけ少数のサンプルをラベル付けして、合理的または同様の結果に達することができる。さらにコストを削減し、LLM(Large Language Models)の大幅な進歩により、LLMはサンプルを注釈付けするのによい候補となる。 llms (gpt-3.5 と gpt-4) を用いて3つの異なるデータセットにサンプルをラベル付けする精度とコストについて検討した。 AL設定のサンプルに人為的アノテーションを使用できるように,不正確なラベル付きサンプルを選択するための一貫性ベースの戦略を提案し,これを混合アノテーション戦略と呼ぶ。次に,(1)ヒューマンアノテーションのみを使用する,(2)提案する混合アノテーション戦略を使用する、という2つの異なる設定でalの性能をテストする。 3つのALクエリ戦略の下でのALモデルの精度は、3つのテキスト分類データセット、すなわちAGのニュース、TREC-6、Rotten Tomatoesで報告される。 AGのNewsとRotten Tomatoesでは、混合アノテーション戦略でトレーニングされたモデルは、人間のアノテーションと同様またはより良い結果が得られる。この手法は、アクティブな学習環境における精度とコスト効率の観点から、アノテータとしてのLLMの大きな可能性を明らかにする。 Human annotation of training samples is expensive, laborious, and sometimes challenging, especially for Natural Language Processing (NLP) tasks. To reduce the labeling cost and enhance the sample efficiency, Active Learning (AL) technique can be used to label as few samples as possible to reach a reasonable or similar results. To reduce even more costs and with the significant advances of Large Language Models (LLMs), LLMs can be a good candidate to annotate samples. This work investigates the accuracy and cost of using LLMs (GPT-3.5 and GPT-4) to label samples on 3 different datasets. A consistency-based strategy is proposed to select samples that are potentially incorrectly labeled so that human annotations can be used for those samples in AL settings, and we call it mixed annotation strategy. Then we test performance of AL under two different settings: (1) using human annotations only; (2) using the proposed mixed annotation strategy. The accuracy of AL models under 3 AL query strategies are reported on 3 text classification datasets, i.e., AG's News, TREC-6, and Rotten Tomatoes. On AG's News and Rotten Tomatoes, the models trained with the mixed annotation strategy achieves similar or better results compared to that with human annotations. The method reveals great potentials of LLMs as annotators in terms of accuracy and cost efficiency in active learning settings.	翻訳日:2024-01-17 18:30:50 公開日:2024-01-14
# インコンテキスト演算子のPDE一般化:1次元スカラー非線形保存則に関する研究 PDE Generalization of In-Context Operator Networks: A Study on 1D Scalar Nonlinear Conservation Laws ( http://arxiv.org/abs/2401.07364v1 ) ライセンス: Link先を確認	Liu Yang, Stanley J. Osher	(参考訳) 幅広いPDE関連科学学習タスクのための単一大規模モデルを構築することができるか? このモデルは、微調整なしで新しい形式であっても新しいPDEに一般化できるだろうか? In-context operator learningとそれに対応するモデル In-Context Operator Networks (ICON) [1] はこれらの質問の最初の探索を表す。最初の質問に関するICONの能力は[1]で実証されている。本稿では,時間発展を持つpdesの族であるイコンの保存則に対する一般化能力について検討し,第2の質問について考察する。第二の質問に対する正の答え、すなわち ICON は、微調整なしで新しい形式を持つ PDE に対してうまく一般化できることを示す。また,関数や方程式をICONの機能範囲に変換することで,ICONが対処できる問題の範囲を広げる方法について述べる。本論文の進展は,PDE関連タスクの基礎モデルを,コンテキスト内演算子学習フレームワークの下で学習するための重要なステップであると考えている。 Can we build a single large model for a wide range of PDE-related scientific learning tasks? Can this model generalize to new PDEs, even of new forms, without any fine-tuning? In-context operator learning and the corresponding model In-Context Operator Networks (ICON) [1] represent an initial exploration of these questions. The capability of ICON regarding the first question has been demonstrated in [1]. In this paper, we explore the second question by investigating the generalization capabilities of ICON for conservation laws, a family of PDEs with temporal evolution. We show the positive answer to the second question, i.e., ICON can generalize well to some PDEs with new forms without any fine-tuning. We also show how to broaden the range of problems that ICON can address, by transforming functions and equations to ICON's capability scope. We believe that the progress in this paper is a significant step towards the goal of training a foundation model for PDE-related tasks under the in-context operator learning framework.	翻訳日:2024-01-17 18:30:27 公開日:2024-01-14
# PersonalityChat: Facts and Traitsを用いたパーソナライズダイアログモデリングのための会話蒸留 PersonalityChat: Conversation Distillation for Personalized Dialog Modeling with Facts and Traits ( http://arxiv.org/abs/2401.07363v1 ) ライセンス: Link先を確認	Ehsan Lotfi, Maxime De Bruyn, Jeska Buhmann, Walter Daelemans	(参考訳) 新しいLarge Language Models(LLM)は、大きな会話データセットをキュレートする効率的なツールを提供する。これまでの研究は主にタスク指向またはジェネリックなopen-domainダイアログにフォーカスしており、複雑なプロンプトに従うllmの機能を完全には検討していない。本研究では,パーソナライゼーションに重点を置き,クラウドソースにとって困難かつコストのかかるデータセットのキュレーションにllmを用いる。パーソナラチャットは,一般的なペルソナチャットデータセットに基づく合成会話データセットだが,ペルソナと(ビッグ5)パーソナリティ特性の両方を条件とする。このデータセットに基づいて微調整されたモデルを評価することで、パーソナリティ特性ラベルが生成対話モデルの特性に基づくパーソナライズに利用できることを示す。また,パーソナリティチャットとペルソナチャットを頭対頭で比較し,蒸留データセットのトレーニングにより,小モデル環境においてより流動的でコヒーレントな対話エージェントが得られることを示す。 The new wave of Large Language Models (LLM) has offered an efficient tool to curate sizeable conversational datasets. So far studies have mainly focused on task-oriented or generic open-domain dialogs, and have not fully explored the ability of LLMs in following complicated prompts. In this work, we focus on personalization, and employ LLMs to curate a dataset which is difficult and costly to crowd-source: PersonalityChat is a synthetic conversational dataset based upon the popular PersonaChat dataset, but conditioned on both personas and (Big-5) personality traits. Evaluating models fine-tuned on this dataset, we show that the personality trait labels can be used for trait-based personalization of generative dialogue models. We also perform a head-to-head comparison between PersonalityChat and PersonaChat, and show that training on the distilled dataset results in more fluent and coherent dialog agents in the small-model regime.	翻訳日:2024-01-17 18:30:12 公開日:2024-01-14
# 液晶からの絡み合った光子:波長可変量子光源の新しいパラダイム Entangled photons from liquid crystals: a new paradigm of tunable quantum light sources ( http://arxiv.org/abs/2401.07362v1 ) ライセンス: Link先を確認	Vitaliy Sultanov, Alja\v{z} Kav\v{c}i\v{c}, Manolis Kokkinakis, Nerea Sebasti\'an, Natan Osterman, Maria V. Chekhova, and Matja\v{z} Humar	(参考訳) 液晶が複雑な構造に自己集合する能力、電界に対する強い応答、複雑な光学系への積分性、そして最近は相当な2階の光学非線形性により、様々な線形および非線形光学デバイスの基礎となっている。しかし、光の量子状態の源としての利用は、これまで研究されていない。本稿では、強誘電性ネマティック液晶における自発的パラメトリックダウンコンバージョンに基づく、絡み合った光子の効率的な電場可変広帯域源を示す。光子対の放出速度と偏光状態は、サンプルに沿って数ボルトを印加するか分子配向をねじり、ほぼどんな偏光状態も発生させることで劇的に変化させることができる。ここで開発された概念は、複雑な位相構造や量子光を生成するマルチピクセルデバイスにまで拡張することができる。 Due to the ability of liquid crystals to self-assemble into complex structures, their strong response to the electric field, integrability into complex optical systems, and recently also considerable second-order optical nonlinearity, they are a base for various linear and nonlinear optical devices. However, their use as sources of quantum states of light has not been explored so far. Here, we demonstrate an efficient electric-field tunable broadband source of entangled photons based on spontaneous parametric down-conversion in a ferroelectric nematic liquid crystal. The emission rate and the polarization state of the photon pairs can be drastically altered by either applying a few volts or twisting the molecular orientation along the sample, enabling the generation of almost any polarization state. The concepts developed here could be extended to complex topological structures and multi-pixel devices generating quantum light.	翻訳日:2024-01-17 18:29:53 公開日:2024-01-14

Title

Authors

Abstract

論文公表日・翻訳日

# 医療IoTサイバーセキュリティのためのZero-Trust Machine Learning Green Architecture: レビュー,分析,実装

A Novel Zero-Trust Machine Learning Green Architecture for Healthcare IoT Cybersecurity: Review, Analysis, and Implementation ( http://arxiv.org/abs/2401.07368v1 )

ライセンス: Link先を確認

Zag ElSayed, Nelly Elsayed, Sajjad Bay,

(参考訳) 医療アプリケーションにおけるIoT(Internet of Things)デバイスの統合は、患者のケア、監視、データ管理に革命をもたらした。 Global IoT in Healthcare Marketの評価額は2023年で252億ドルである。しかし、これらのデバイスの急速な関与は、患者のプライバシーと医療データの整合性に重大な脅威をもたらす情報セキュリティ上の懸念をもたらす。本稿では、医療アプリケーション内のIoTデバイスにおけるセキュリティ脆弱性に対処し、軽減するために設計された、機械学習(ML)ベースのアーキテクチャを紹介する。先進的な畳み込みMLアーキテクチャを活用することで、提案アーキテクチャは、潜在的な脅威を積極的に監視し、検出し、機密性の高い医療情報の機密性と整合性を確保するとともに、コストを最小化し、医療や緊急環境に特化したポータビリティを高めることを目的としている。実験結果は、CICIoT2023データセットを用いてシミュレーションしたゼロデイ検出精度を実証し、x10の係数でコストを削減した結果に基づいて、様々な攻撃を予測するための最大93.6%の精度を示している。このアプローチの重要性は、IoTデバイスのセキュリティ姿勢を強化し、信頼できる医療システムの堅牢な実装を維持することです。

The integration of Internet of Things (IoT) devices in healthcare applications has revolutionized patient care, monitoring, and data management. The Global IoT in Healthcare Market value is $252.2 Billion in 2023. However, the rapid involvement of these devices brings information security concerns that pose critical threats to patient privacy and the integrity of healthcare data. This paper introduces a novel machine learning (ML) based architecture explicitly designed to address and mitigate security vulnerabilities in IoT devices within healthcare applications. By leveraging advanced convolution ML architecture, the proposed architecture aims to proactively monitor and detect potential threats, ensuring the confidentiality and integrity of sensitive healthcare information while minimizing the cost and increasing the portability specialized for healthcare and emergency environments. The experimental results underscore the accuracy of up to 93.6% for predicting various attacks based on the results demonstrate a zero-day detection accuracy simulated using the CICIoT2023 dataset and reduces the cost by a factor of x10. The significance of our approach is in fortifying the security posture of IoT devices and maintaining a robust implementation of trustful healthcare systems.

翻訳日:2024-03-25 12:37:32 公開日:2024-01-14

# 都市車両網における多目的最適道路ユニット配置

Multi-objective Optimal Roadside Units Deployment in Urban Vehicular Networks ( http://arxiv.org/abs/2402.18581v1 )

ライセンス: Link先を確認

Weian Guo, Zecheng Kang, Dongyang Li, Lun Zhang, Li Li,

(参考訳) 都市車両網では,交通効率,安全,関連サービスの重要性が増している。このようなネットワーク内では、道路側ユニット(RSU)が通信を容易にする中間体として機能する。したがって、RSUの展開は通信サービスの質を確保する上で最も重要である。しかし、時間遅延やデプロイメントコストといった最適化の目的は、様々な観点から一般的に開発されている。その結果、対立が目的の間に生じる可能性がある。さらに、都市環境においては、建物、庭園、湖沼、その他のインフラなど様々な障害が存在するため、RSUの展開に課題が生じる。したがって、複数の目的が存在すること、障害によって課される制約、そして大規模な最適化空間を探索する必要性により、配置は重大な困難に直面する。本稿では,2種類の多目的最適化アルゴリズムを提案する。マルチポピュレーション戦略と適応探索手法を利用して,大規模決定変数空間を効率的に探索する。 RSUの過密配置の問題を緩和するために、最適化手順中にRSU密度を調整するための校正機構が採用されている。提案手法は, 車両とRSU間のデータオフロードを, 反復的ベストレスポンスシーケンスゲーム(IBRSG)をセットアップすることで処理する。提案したアルゴリズムと最先端のアルゴリズムを比較することで,高密度・低密度の都市シナリオにおいて,我々の戦略が優れていることを示す。また,提案手法は車両ネットワークの効率を大幅に改善することを示した。

The significance of transportation efficiency, safety, and related services is increasing in urban vehicular networks. Within such networks, roadside units (RSUs) serve as intermediates in facilitating communication. Therefore, the deployment of RSUs is of utmost importance in ensuring the quality of communication services. However, the optimization objectives, such as time delay and deployment cost, are commonly developed from diverse perspectives. As a result, it is possible that conflicts may arise among the objectives. Furthermore, in urban environments, the presence of various obstacles, such as buildings, gardens, lakes, and other infrastructure, poses challenges for the deployment of RSUs. Hence, the deployment encounters significant difficulties due to the existence of multiple objectives, constraints imposed by obstacles, and the necessity to explore a large-scale optimization space. To address this issue, two versions of multi-objective optimization algorithms are proposed in this paper. By utilizing a multi-population strategy and an adaptive exploration technique, the methods efficiently explore a large-scale decision-variable space. In order to mitigate the issue of an overcrowded deployment of RSUs, a calibrating mechanism is adopted to adjust RSU density during the optimization procedures. The proposed methods also take care of data offloading between vehicles and RSUs by setting up an iterative best response sequence game (IBRSG). By comparing the proposed algorithms with several state-of-the-art algorithms, the results demonstrate that our strategies perform better in both high-density and low-density urban scenarios. The results also indicate that the proposed solutions substantially improve the efficiency of vehicular networks.

翻訳日:2024-03-25 08:36:53 公開日:2024-01-14

# AI-Enabled GPT-4 Assistant APIを用いた体系的文献レビュー(SLR)の選択フェーズの合理化

Streamlining the Selection Phase of Systematic Literature Reviews (SLRs) Using AI-Enabled GPT-4 Assistant API ( http://arxiv.org/abs/2402.18582v1 )

ライセンス: Link先を確認

Seyed Mohammad Ali Jafari,

(参考訳) 学術文献の増大は、最新の研究動向に追随する上で、重大な課題となっている。そこで本研究では,SLR(Systematic Literature Reviews)における記事選択フェーズの効率を効率化するための,先駆的なAIベースのツールを提案する。 OpenAIのGPT-4アシスタントAPIの堅牢な機能を利用することで、このツールは幅広い学術分野にわたる記事選択プロセスを均質化することに成功した。データ準備、AIによる記事評価、構造化された結果提示からなる三部作のアプローチにより、このツールは文学レビューの時間的消費タスクを著しく加速する。重要なことに、このツールは、SLRプロセスが実質的な人間の判断を伴う管理や経済学などの分野において、非常に有益である可能性がある。標準GPTモデルを採用することで、潜在的なバイアスを大幅に低減し、SLR選択フェーズの速度と精度を高めることができる。これは研究者の生産性と正確さを増幅するだけでなく、学術出版の活発化の中で学術研究が行なわれる過程において、かなりの進歩を示している。

The escalating volume of academic literature presents a formidable challenge in staying updated with the newest research developments. Addressing this, this study introduces a pioneering AI-based tool, configured specifically to streamline the efficiency of the article selection phase in Systematic Literature Reviews (SLRs). Utilizing the robust capabilities of OpenAI's GPT-4 Assistant API, the tool successfully homogenizes the article selection process across a broad array of academic disciplines. Implemented through a tripartite approach consisting of data preparation, AI-mediated article assessment, and structured result presentation, this tool significantly accelerates the time-consuming task of literature reviews. Importantly, this tool could be highly beneficial in fields such as management and economics, where the SLR process involves substantial human judgment. The adoption of a standard GPT model can substantially reduce potential biases and enhance the speed and precision of the SLR selection phase. This not only amplifies researcher productivity and accuracy but also denotes a considerable stride forward in the way academic research is conducted amidst the surging body of scholarly publications.

翻訳日:2024-03-25 08:36:53 公開日:2024-01-14

# クラウドストレージにおけるセキュリティとプライバシの問題

Security and Privacy Issues in Cloud Storage ( http://arxiv.org/abs/2401.04076v2 )

ライセンス: Link先を確認

Norah Asiri,

(参考訳) クラウドコンピューティングが大きな可能性を秘めているにもかかわらず、消費者がそれに価値ある熱意とペースで採用していない。これは、消費者がクラウドコンピューティングを機密データに使用することをためらう理由であり、消費者がクラウドコンピューティングを一般的なクラウドストレージや、特にクラウドストレージに使用することを妨げている脅威である。クラウドコンピューティングは、独自の構造のため、独自の問題以外に、従来型のセキュリティとプライバシの脅威を継承する。クラウドコンピューティングに関連するいくつかの脅威は、時折プロバイダが意識していない従業員からの内部攻撃、消費者とプロバイダ間の合意の透明性の欠如、データ損失、トラフィックハイジャック、共有テクノロジ、および安全でないアプリケーションインターフェースなどである。このような脅威は、消費者がその機能を安全に使えるようにするための対策が必要である。このレビューでは、コンシューマや企業でさえ意識していないギャップとして、最もセキュリティとプライバシの問題に光を当てています。また、クラウドコンピューティングのシナリオに関わるパーティも定義しています。これらの脅威の結果も示しています。

Even with the vast potential that cloud computing has, so far, it has not been adopted by the consumers with the enthusiasm and pace that it be worthy; this is a very reason statement why consumers still hesitated of using cloud computing for their sensitive data and the threats that prevent the consumers from shifting to use cloud computing in general and cloud storage in particular. The cloud computing inherits the traditional potential security and privacy threats besides its own issues due to its unique structures. Some threats related to cloud computing are the insider malicious attacks from the employees that even sometime the provider unconscious about, the lack of transparency of agreement between consumer and provider, data loss, traffic hijacking, shared technology and insecure application interface. Such threats need remedies to make the consumer use its features in secure way. In this review, we spot the light on the most security and privacy issues which can be attributed as gaps that sometimes the consumers or even the enterprises are not aware of. We also define the parties that involve in scenario of cloud computing that also may attack the entire cloud systems. We also show the consequences of these threats.

翻訳日:2024-03-18 08:46:40 公開日:2024-01-14

# Killer Apps: 高速で大規模なAI兵器

Killer Apps: Low-Speed, Large-Scale AI Weapons ( http://arxiv.org/abs/2402.01663v1 )

ライセンス: Link先を確認

Philip Feldman, Aaron Dant, James R. Foulds

(参考訳) 人工知能(ai)と機械学習(ml)の加速は、openai、meta、antropicなどの組織による最先端生成前訓練トランスフォーマー(gpt)モデルの開発によって強調され、戦争とセキュリティにおける新たな挑戦と機会を提示している。現在注目されているのは、武器システムにおけるAIの統合と、速度論的衝突における迅速な意思決定におけるその役割である。しかし、同様に重要だが見落とされがちな側面は、情報領域内のインターネットスケールにおけるAIベースの心理的操作の可能性である。これらの能力は、世界中の個人、組織、社会に重大な脅威をもたらす可能性がある。本稿では,AI兵器の概念,その展開,検出,潜在的な対策について検討する。

The accelerating advancements in Artificial Intelligence (AI) and Machine Learning (ML), highlighted by the development of cutting-edge Generative Pre-trained Transformer (GPT) models by organizations such as OpenAI, Meta, and Anthropic, present new challenges and opportunities in warfare and security. Much of the current focus is on AI's integration within weapons systems and its role in rapid decision-making in kinetic conflict. However, an equally important but often overlooked aspect is the potential of AI-based psychological manipulation at internet scales within the information domain. These capabilities could pose significant threats to individuals, organizations, and societies globally. This paper explores the concept of AI weapons, their deployment, detection, and potential countermeasures.

翻訳日:2024-02-11 17:03:31 公開日:2024-01-14

# 生成ゴースト:aiの余生のメリットとリスクを予測

Generative Ghosts: Anticipating Benefits and Risks of AI Afterlives ( http://arxiv.org/abs/2402.01662v1 )

ライセンス: Link先を確認

Meredith Ringel Morris and Jed R. Brubaker

(参考訳) AIシステムは、パフォーマンスの幅と深さの両方を急速に改善するので、特定の人物をモデルにしたエージェントの可能性を含む、ますます強力で現実的なエージェントを作るのに役立ちます。私たちは、生涯のうちに、人々が愛する人や死後のより広い世界と対話するカスタムAIエージェントを作るのが一般的になることを期待しています。なぜなら、そのようなエージェントは、創造者が生み出したコンテンツを単に包み込むのではなく、新しいコンテンツを生成することができるからです。本稿では, 生成ゴーストの潜在的な実装に関する設計空間について論じる。次に, 生成的幽霊の実用的, 倫理的意義について論じ, 個人や社会に対する潜在的に肯定的, 否定的な影響について論じる。これらの考察に基づき、我々はAIとHCI研究コミュニティのための研究アジェンダを策定し、人々が安全で有益な方法でAIのアフターリーブを創造し、相互作用できるようにする。

As AI systems quickly improve in both breadth and depth of performance, they lend themselves to creating increasingly powerful and realistic agents, including the possibility of agents modeled on specific people. We anticipate that within our lifetimes it may become common practice for people to create a custom AI agent to interact with loved ones and/or the broader world after death. We call these generative ghosts, since such agents will be capable of generating novel content rather than merely parroting content produced by their creator while living. In this paper, we first discuss the design space of potential implementations of generative ghosts. We then discuss the practical and ethical implications of generative ghosts, including potential positive and negative impacts on individuals and society. Based on these considerations, we lay out a research agenda for the AI and HCI research communities to empower people to create and interact with AI afterlives in a safe and beneficial manner.

翻訳日:2024-02-11 17:03:17 公開日:2024-01-14

# MorpheusNet: 組み込みオンラインシステムのための資源効率の良い睡眠ステージ分類器

MorpheusNet: Resource efficient sleep stage classifier for embedded on-line systems ( http://arxiv.org/abs/2401.10284v1 )

ライセンス: Link先を確認

Ali Kavoosi, Morgan P. Mitchell, Raveen Kariyawasam, John E. Fleming, Penny Lewis, Heidi Johansen-Berg, Hayriye Cagnan, Timothy Denison

(参考訳) 睡眠ステージ分類(ssc)は労働集約的な作業であり、手動分類のための電気生理学的記録の時間を調べる必要がある。これは、治療目的で睡眠ステージを活用する際の制限要因である。ウェアラブルデバイスの普及と拡張により、SSCの自動化により、大規模な睡眠ベースの治療法の展開が可能になる。ディープラーニングはこのプロセスを自動化する潜在的な方法として注目を集めている。これまでの研究では、手動のエキスパートスコアに匹敵する精度を示した。しかし、従来の手法では膨大な量のメモリと計算資源を必要とする。これにより、リアルタイムに分類し、エッジにモデルをデプロイする能力が制限される。このギャップに対処するため、私たちは、外部の計算ソース(例えば携帯電話、クラウド)にアクセスせずに、睡眠ステージをリアルタイムで予測できるモデルを提供することを目指している。このアルゴリズムは、組み込みバッテリー駆動システムで使用可能な電力効率が良い。我々の小型睡眠ステージ分類器は、ハードウェア設定が制約されたほとんどの市販マイクロコントローラ(MCU)に展開できる。これは、我々のアプローチのメモリフットプリントが大幅に少ないためです。モデルは3つの一般公開されたデータベースでテストされ、その性能は最先端に匹敵するが、モデルの複雑さは桁違いに減らした(最先端に比べて最大280倍も小さい)。さらに、パラメータを8ビットに量子化し、平均0.95%の精度でモデルを最適化した。ファームウェアに実装されると、量子化されたモデルはarm cortexm4プロセッサ上で1.6秒のレイテンシを達成し、オンラインのsscベースの治療に使用できる。

Sleep Stage Classification (SSC) is a labor-intensive task, requiring experts to examine hours of electrophysiological recordings for manual classification. This is a limiting factor when it comes to leveraging sleep stages for therapeutic purposes. With increasing affordability and expansion of wearable devices, automating SSC may enable deployment of sleep-based therapies at scale. Deep Learning has gained increasing attention as a potential method to automate this process. Previous research has shown accuracy comparable to manual expert scores. However, previous approaches require sizable amount of memory and computational resources. This constrains the ability to classify in real time and deploy models on the edge. To address this gap, we aim to provide a model capable of predicting sleep stages in real-time, without requiring access to external computational sources (e.g., mobile phone, cloud). The algorithm is power efficient to enable use on embedded battery powered systems. Our compact sleep stage classifier can be deployed on most off-the-shelf microcontrollers (MCU) with constrained hardware settings. This is due to the memory footprint of our approach requiring significantly fewer operations. The model was tested on three publicly available data bases and achieved performance comparable to the state of the art, whilst reducing model complexity by orders of magnitude (up to 280 times smaller compared to state of the art). We further optimized the model with quantization of parameters to 8 bits with only an average drop of 0.95% in accuracy. When implemented in firmware, the quantized model achieves a latency of 1.6 seconds on an Arm CortexM4 processor, allowing its use for on-line SSC-based therapies.

翻訳日:2024-01-28 16:22:42 公開日:2024-01-14

# 臨床脳波分類のためのウィンドウ積み重ねメタモデル

Window Stacking Meta-Models for Clinical EEG Classification ( http://arxiv.org/abs/2401.10283v1 )

ライセンス: Link先を確認

Yixuan Zhu, Rohan Kandasamy, Luke J. W. Canham, David Western

(参考訳) ウィンドウニングは、EEG機械学習の分類やその他の時系列タスクにおいて一般的なテクニックである。しかし,この手法を用いると,計算コストが記録全体や記録セット全体のグローバルな関係の学習を阻害する。さらに、親記録からウィンドウに受け継がれたラベルは、そのウィンドウの内容を正確に反映するものではない。これらの問題を解決するために,時間ウインドウドデータ集約に適したメタラーニングの原則を取り入れた多段階モデルアーキテクチャを導入する。さらに、これらの問題を緩和するための2つの異なる戦略をテストしました。テンプル大学病院異常脳波コーポレーション(TUAB)で試験を行ったところ、ベンチマークの精度は89.8%から99.0パーセントに劇的に向上した。このブレークスルー性能は、このデータセットの事前のパフォーマンス予測を超え、EEG解釈課題に対する機械学習ソリューションの臨床応用の道を開く。テンプル大学病院脳波コーパス(tueg)のより広範で多種多様なデータセットを用いて86.7%の精度を得た。

Windowing is a common technique in EEG machine learning classification and other time series tasks. However, a challenge arises when employing this technique: computational expense inhibits learning global relationships across an entire recording or set of recordings. Furthermore, the labels inherited by windows from their parent recordings may not accurately reflect the content of that window in isolation. To resolve these issues, we introduce a multi-stage model architecture, incorporating meta-learning principles tailored to time-windowed data aggregation. We further tested two distinct strategies to alleviate these issues: lengthening the window and utilizing overlapping to augment data. Our methods, when tested on the Temple University Hospital Abnormal EEG Corpus (TUAB), dramatically boosted the benchmark accuracy from 89.8 percent to 99.0 percent. This breakthrough performance surpasses prior performance projections for this dataset and paves the way for clinical applications of machine learning solutions to EEG interpretation challenges. On a broader and more varied dataset from the Temple University Hospital EEG Corpus (TUEG), we attained an accuracy of 86.7%, nearing the assumed performance ceiling set by variable inter-rater agreement on such datasets.

翻訳日:2024-01-28 16:22:18 公開日:2024-01-14

# truth forest: チューニングなし介入による大規模言語モデルにおける多元的真理性の実現に向けて

Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning ( http://arxiv.org/abs/2312.17484v2 )

ライセンス: Link先を確認

Zhongzhi Chen, Xingwu Sun, Xianfeng Jiao, Fengzong Lian, Zhanhui Kang, Di Wang, Cheng-Zhong Xu

(参考訳) 大きな言語モデル(LLM)が様々なタスクで大きな成功を収めたが、幻覚を生じさせることに苦しむ。多次元直交プローブを用いて隠れ真理表現を明らかにすることでllmの真理性を高める方法である真理フォレストを提案する。具体的には、プローブに直交制約を組み込むことで真理をモデリングするための複数の直交基底を生成する。さらに,LLMにおける識別と真理特徴の生成のギャップを減らし,シーケンス内の幅広い位置を考慮に入れた体系的手法であるRandom Peekを導入する。このアプローチを用いることで,Llama-2-7Bの真偽を40.8\%から74.5\%に改善した。同様に、微調整されたモデルでも顕著な改善が見られる。我々はプローブを用いて真理特徴の徹底的な解析を行った。可視化の結果,直交プローブが真理関連特徴を補完し,データセットの固有構造を明らかにするクラスタを形成することがわかった。

Despite the great success of large language models (LLMs) in various tasks, they suffer from generating hallucinations. We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. Specifically, it creates multiple orthogonal bases for modeling truth by incorporating orthogonal constraints into the probes. Moreover, we introduce Random Peek, a systematic technique considering an extended range of positions within the sequence, reducing the gap between discerning and generating truth features in LLMs. By employing this approach, we improved the truthfulness of Llama-2-7B from 40.8\% to 74.5\% on TruthfulQA. Likewise, significant improvements are observed in fine-tuned models. We conducted a thorough analysis of truth features using probes. Our visualization results show that orthogonal probes capture complementary truth-related features, forming well-defined clusters that reveal the inherent structure of the dataset.

翻訳日:2024-01-19 19:36:36 公開日:2024-01-14

# ブロックチェーンを用いた農産物サプライチェーンの枠組み

A Framework for Agricultural Food Supply Chain using Blockchain ( http://arxiv.org/abs/2401.09476v1 )

ライセンス: Link先を確認

Sudarssan N

(参考訳) 本論文の主な目的は、食品サプライチェーンシステムの信頼性と透明性を確立し、ブロックチェーン技術の助けを借りて、すべての人の食品安全性を確保することである。食品サプライチェーン(英: food supply chain)は、農家や生産者から買い手までの作物を追跡するプロセスである。ブロックチェーンの出現により、多数の農業必需品を提供する安全で不正な環境がより簡単になった。貿易のグローバル化により、現在のサプライチェーン市場は、データの統合、複雑な取引、流通に関わる様々な企業を含んでいる。情報改ざん抵抗、需給関係、追跡可能な監視は、この結果として生じる困難である。 Blockchainは、改ざんに抵抗する情報を提供する分散台帳技術である。この戦略は、中央集権的な権威、仲介者、ビジネスヒストリーの必要性を排除し、高いレベルの完全性、責任、安全性を維持しながら、生産とセキュリティを高めることができる。農業分野における食品サプライチェーンの整合性と透明性を確保するため,ブロックチェーンとIoTに基づく枠組みが提案されている。

The main aim of the paper is to create a trust and transparency in the food supply chain system, ensuring food safety for everyone with the help of Blockchain Technology. Food supply chain is the process of tracing a crop from the farmer or producer to the buyer. With the advent of blockchain, providing a safe and fraud-free environment for the provision of numerous agricultural necessities has become much easier. Because of the globalization of trade, the present supply chain market today includes various companies involving integration of data, complex transactions and distribution. Information tamper resistance, supply-demand relationships, and traceable oversight are all difficulties that arise as a result of this. Blockchain is a distributed ledger technology that can provide information that is resistant to tampering. This strategy can eliminate the need for a centralized trusted authority, intermediaries, and business histories, allowing for increased production and security while maintaining the highest levels of integrity, liability, and safety. In order to have an integrity and transparency in food supply chain in the agricultural sector, a framework is proposed here based on block chain and IoT.

翻訳日:2024-01-19 19:10:03 公開日:2024-01-14

# 二重対向アクティベーション異常検出:対向オートエンコーダは異常発生器である

Double-Adversarial Activation Anomaly Detection: Adversarial Autoencoders are Anomaly Generators ( http://arxiv.org/abs/2101.04645v5 )

ライセンス: Link先を確認

J.-P. Schulze, P. Sperl, K. B\"ottinger

(参考訳) 異常検出は、固有のクラス不均衡のため、機械学習アルゴリズムにとって難しいタスクである。観測されたデータを手動で分析するのはコストが高く、時間を要するため、通常、使用可能な場合の既知の異常はごくわずかである。生成モデルとニューラルネットワークの隠れ活性化の解析に着想を得て,DA3Dと呼ばれる新しい教師なし異常検出手法を導入する。ここでは,通常のデータのみに基づく異常な反例を生成するために,対向オートエンコーダを用いる。これらの人工的な異常は、実際の、しかし目に見えない異常を検出することができる。新たな生成手法により,異常検出の教師なしタスクを教師付きタスクに変換する。 DA3Dは、ドメイン知識を必要としない純粋にデータ駆動の方法で最先端の異常検出手法の性能を上回る。

Anomaly detection is a challenging task for machine learning algorithms due to the inherent class imbalance. It is costly and time-demanding to manually analyse the observed data, thus usually only few known anomalies if any are available. Inspired by generative models and the analysis of the hidden activations of neural networks, we introduce a novel unsupervised anomaly detection method called DA3D. Here, we use adversarial autoencoders to generate anomalous counterexamples based on the normal data only. These artificial anomalies used during training allow the detection of real, yet unseen anomalies. With our novel generative approach, we transform the unsupervised task of anomaly detection to a supervised one, which is more tractable by machine learning and especially deep learning methods. DA3D surpasses the performance of state-of-the-art anomaly detection methods in a purely data-driven way, where no domain knowledge is required.

翻訳日:2024-01-18 22:38:52 公開日:2024-01-14

# 教師付き学習とVAEの統一 -- 天体-粒子再構成のための正規化フローベースニューラルネットワークモデルにおけるカバレッジ、体系、適合性

Unifying supervised learning and VAEs -- coverage, systematics and goodness-of-fit in normalizing-flow based neural network models for astro-particle reconstructions ( http://arxiv.org/abs/2008.05825v5 )

ライセンス: Link先を確認

Thorsten Gl\"usenkamp

(参考訳) ニューラルネットワークに基づく天体物理学における事象特性の予測はますます一般的になっている。しかし、多くの場合、結果は単に点予測として利用される。統計的不確実性、カバレッジ、体系的不確実性、あるいは適合度尺度はしばしば計算されない。ここでは、これらすべてのプロパティを単一のネットワークモデルに組み込むことができるトレーニングとネットワークアーキテクチャの特定の選択について説明する。データとラベルの連成分布のKL偏差は、確率的変分推論の1つの傘の下で教師付き学習と変分オートエンコーダ(VAE)を統一することができることを示す。この統一は、ニューラルネットワークモデルに適合するp値を計算することを可能にする拡張教師付き学習スキームを動機付ける。この構成では、ニューラルネットワークで償却された条件付き正規化フローが不可欠である。フローの正規化に特有の特定の「順序付き」輪郭に対して,数値積分を伴わずにカバレッジ確率を計算する方法について論じる。さらに,訓練中の効果的な限界化を通じて,系統的不確実性がどのように組み込まれるかを示す。提案した拡張教師あり訓練は,(1)カバレッジ計算,(2)システマティクス,(3)1つの機械学習モデルにおける適合度尺度を含む。原理上、関連する分布の形状に制約はなく、実際、機械は $\mathbb{r}^n \times \mathbb{s}^m$ のような積空間上で定義される複素多様分布を扱う。しかし、カバレッジ計算では、分布が縮退しすぎる場合、その解釈に注意が必要である。イベント選択や、不確実性の保証を必要とする高速な天文警報において、イベントごとの情報を活用する大きな可能性を見出した。

Neural-network based predictions of event properties in astro-particle physics are getting more and more common. However, in many cases the result is just utilized as a point prediction. Statistical uncertainties, coverage, systematic uncertainties or a goodness-of-fit measure are often not calculated. Here we describe a certain choice of training and network architecture that allows to incorporate all these properties into a single network model. We show that a KL-divergence objective of the joint distribution of data and labels allows to unify supervised learning and variational autoencoders (VAEs) under one umbrella of stochastic variational inference. The unification motivates an extended supervised learning scheme which allows to calculate a goodness-of-fit p-value for the neural network model. Conditional normalizing flows amortized with a neural network are crucial in this construction. We discuss how to calculate coverage probabilities without numerical integration for specific "base-ordered" contours that are unique to normalizing flows. Furthermore we show how systematic uncertainties can be included via effective marginalization during training. The proposed extended supervised training incorporates (1) coverage calculation, (2) systematics and (3) a goodness-of-fit measure in a single machine-learning model. There are in principle no constraints on the shape of the involved distributions, in fact the machinery works with complex multi-modal distributions defined on product spaces like $\mathbb{R}^n \times \mathbb{S}^m$. The coverage calculation, however, requires care in its interpretation when the distributions are too degenerate. We see great potential for exploiting this per-event information in event selections or for fast astronomical alerts which require uncertainty guarantees.

翻訳日:2024-01-18 22:36:40 公開日:2024-01-14

# 高濃度におけるロバストパラ水素誘起偏光

Robust Parahydrogen-Induced Polarization at High Concentrations ( http://arxiv.org/abs/2401.07243v1 )

ライセンス: Link先を確認

Laurynas Dagys, Martin C. Korzeczek, Anna J. Parker, James Eills, John W. Blanchard, Christian Bengs, Malcolm H. Levitt, Stephan Knecht, Ilai Schwartz, M. B. Plenio

(参考訳) パラ水素誘起偏極(PHIP)は、高い核スピン偏極を持つ標的分子を生成する強力な技術である。 PHIPプロセスは、パラ水素と標的分子の間の化学反応を伴い、続いて磁場操作により、指定された核の磁化に核一重項スピン秩序が変換される。単磁化偏極移動過程は中程度の濃度で効果的に作用するが、偏極と濃度の積として定義される高モル分極では効率が低下することが観察された。このモル分極への強い依存は、偏極移動中に試料の磁化によって生じる磁場からの干渉によるもので、複雑なダイナミクスをもたらし、技術のスケーラビリティに大きな影響を与える。遠方二極子場の影響を否定するパルスシーケンスでこの問題に対処し、同時にモル偏極の制限なく、目的のターゲットスピンへのsinglet-to-magnetization偏光移動を実現する。

Parahydrogen-Induced Polarization (PHIP) is a potent technique for generating target molecules with high nuclear spin polarization. The PHIP process involves a chemical reaction between parahydrogen and a target molecule, followed by the transformation of nuclear singlet spin order into magnetization of a designated nucleus through magnetic field manipulations. Although the singlet-to-magnetization polarization transfer process works effectively at moderate concentrations, it is observed to become much less efficient at high molar polarization, defined as the product of polarization and concentration. This strong dependence on the molar polarization is attributed to interference from the field produced by the sample's magnetization during polarization transfer, which leads to complex dynamics and can severely impact the scalability of the technique. We address this challenge with a pulse sequence that negates the influence of the distant dipolar field, while simultaneously achieving singlet-to-magnetization polarization transfer to the desired target spins, free from restrictions on the molar polarization.

翻訳日:2024-01-18 19:12:48 公開日:2024-01-14

# 病理組織学における画像検索について

On Image Search in Histopathology ( http://arxiv.org/abs/2401.08699v1 )

ライセンス: Link先を確認

H.R. Tizhoosh, Liron Pantanowitz

(参考訳) 病理組織像は、カメラ付き顕微鏡またはスライドスキャナ全体から得ることができる。これらの画像に基づく類似度計算を利用して患者をマッチングすることは、研究や臨床の文脈において有意な可能性を秘めている。近年の検索技術の進歩により、様々な組織タイプにまたがる細胞構造の微妙な定量化が可能となり、診断、予後、新しい患者の予測を診断および治療された患者のデータベースと比較できる。本稿では,組織病理学における画像検索技術の最近の進歩を総合的に概観し,効率的な画像検索法を求める計算病理学研究者のための簡潔な概要を提供する。

Pathology images of histopathology can be acquired from camera-mounted microscopes or whole slide scanners. Utilizing similarity calculations to match patients based on these images holds significant potential in research and clinical contexts. Recent advancements in search technologies allow for nuanced quantification of cellular structures across diverse tissue types, facilitating comparisons and enabling inferences about diagnosis, prognosis, and predictions for new patients when compared against a curated database of diagnosed and treated cases. In this paper, we comprehensively review the latest developments in image search technologies for histopathology, offering a concise overview tailored for computational pathology researchers seeking effective, fast and efficient image search methods in their work.

翻訳日:2024-01-18 18:42:18 公開日:2024-01-14

# 本当にデータが必要なのか?

Do We Really Even Need Data? ( http://arxiv.org/abs/2401.08702v1 )

ライセンス: Link先を確認

Kentaro Hoffman, Stephen Salerno, Awan Afiaz, Jeffrey T. Leek, Tyler H. McCormick

(参考訳) 人工知能と機械学習ツールがよりアクセスしやすくなり、科学者はデータ収集に新たな障害に直面し(例えば、コストの上昇、サーベイ応答率の低下)、事前訓練されたアルゴリズムからの予測を結果変数として使うようになった。財政的な理由や物流的な理由には訴えるが、推論に標準的なツールを使用することで、真の観測できない結果が予測された値に置き換えられる場合、独立した変数と利害関係の関係を誤って表現することができる。本稿では,このいわゆる ‘post-prediction inference'' 問題に固有の統計的課題を特徴付け,3つの潜在的な誤り源を解明する。 (i)予測結果と真に観察できない結果の関係二トレーニングデータの再サンプリング又は不確実性に対する機械学習モデルの堅牢性、及び (iii)バイアスだけでなく、予測から究極の推論手順への不確実性も適切に伝播する。また,推定後推論の枠組みを,調査サンプリング,データ欠落,半教師付き学習など,いくつかの関連分野にまたがる古典的作業と比較した。この対比は、古典的および近代的な推論問題における設計の役割を解明する。

As artificial intelligence and machine learning tools become more accessible, and scientists face new obstacles to data collection (e.g. rising costs, declining survey response rates), researchers increasingly use predictions from pre-trained algorithms as outcome variables. Though appealing for financial and logistical reasons, using standard tools for inference can misrepresent the association between independent variables and the outcome of interest when the true, unobserved outcome is replaced by a predicted value. In this paper, we characterize the statistical challenges inherent to this so-called ``post-prediction inference'' problem and elucidate three potential sources of error: (i) the relationship between predicted outcomes and their true, unobserved counterparts, (ii) robustness of the machine learning model to resampling or uncertainty about the training data, and (iii) appropriately propagating not just bias but also uncertainty from predictions into the ultimate inference procedure. We also contrast the framework for post-prediction inference with classical work spanning several related fields, including survey sampling, missing data, and semi-supervised learning. This contrast elucidates the role of design in both classical and modern inference problems.

翻訳日:2024-01-18 18:26:37 公開日:2024-01-14

# ニューラルネットワークサロゲートを用いた肘型ドラフトチューブの計算効率の最適化

Computationally Efficient Optimisation of Elbow-Type Draft Tube Using Neural Network Surrogates ( http://arxiv.org/abs/2401.08700v1 )

ライセンス: Link先を確認

Ante Sikirica, Ivana Lu\v{c}in, Marta Alvir, Lado Kranj\v{c}evi\'c and Zoran \v{C}arija

(参考訳) 本研究の目的は,肘型ドラフトチューブの設計のための単一目的・多目的最適化アルゴリズムの総合評価と,計算効率のよい最適化ワークフローの導入である。提案したワークフローは、数値シミュレーションから得られたデータに基づいて訓練されたディープニューラルネットワークサロゲートを利用する。サーロゲートの使用により、新しいデザインをより柔軟かつ迅速に評価することができる。線形還元による成功履歴に基づく適応微分進化と分解に基づく多目的進化アルゴリズムは, 最適アルゴリズムとして同定され, 単一目的最適化における異なる目的の影響と, 多目的最適化におけるドラフトチューブ設計への影響を判定するために用いられた。単一目的アルゴリズムの結果は、目的が別々に考慮された場合の多目的アルゴリズムの結果と一致している。しかし、特に計算コストの低いサロゲートに対して、多目的アプローチは一般的に選択されるべきである。最適多目的結果を得るために, 圧力回復係数と抗力係数についてそれぞれ1.5%, 17%の改善を示した。予測値と数値結果との差は, 圧力回復係数が0.5%未満, ドラッグ係数が3%以下である。再生可能エネルギーの需要が増加を続ける中、特に世界的な持続可能性の取り組みにおいて、本研究で議論されているデータ駆動最適化ワークフローの関連性がますます重要になる。

This study aims to provide a comprehensive assessment of single-objective and multi-objective optimisation algorithms for the design of an elbow-type draft tube, as well as to introduce a computationally efficient optimisation workflow. The proposed workflow leverages deep neural network surrogates trained on data obtained from numerical simulations. The use of surrogates allows for a more flexible and faster evaluation of novel designs. The success history-based adaptive differential evolution with linear reduction and the multi-objective evolutionary algorithm based on decomposition were identified as the best-performing algorithms and used to determine the influence of different objectives in the single-objective optimisation and their combined impact on the draft tube design in the multi-objective optimisation. The results for the single-objective algorithm are consistent with those of the multi-objective algorithm when the objectives are considered separately. Multi-objective approach, however, should typically be chosen, especially for computationally inexpensive surrogates. A multi-criteria decision analysis method was used to obtain optimal multi-objective results, showing an improvement of 1.5% and 17% for the pressure recovery factor and drag coefficient, respectively. The difference between the predictions and the numerical results is less than 0.5% for the pressure recovery factor and 3% for the drag coefficient. As the demand for renewable energy continues to increase, the relevance of data-driven optimisation workflows, as discussed in this study, will become increasingly important, especially in the context of global sustainability efforts.

翻訳日:2024-01-18 18:26:18 公開日:2024-01-14

# GNNを用いた高レベル合成における階層的ソース・ツー・ルートQoR予測

Hierarchical Source-to-Post-Route QoR Prediction in High-Level Synthesis with GNNs ( http://arxiv.org/abs/2401.08696v1 )

ライセンス: Link先を確認

Mingzhe Gao, Jieru Zhao, Zhe Lin, Minyi Guo

(参考訳) 高レベル合成(HLS)は、RTLプログラミングを避けてハードウェア設計プロセスを高速化する。しかし,時間経過後の品質(QoR)を考慮した場合,HLSのターンアラウンド時間は有意に増加する。この問題に対処するため,FPGA HLS の階層的後 QoR 予測手法を提案する。(1) C/C++ プログラムから直接遅延と後資源使用量を推定するモデリングフロー,(2) ソースコードの制御とデータフローグラフと HLS プラグマの効果を効果的に表現するグラフ構築手法,(3) ループ階層の影響を捉えることができる階層的 GNN トレーニングと予測手法である。実験結果から,本手法は様々な種類のQoR指標に対して10%未満の予測誤差を示し,最先端のGNN手法と比較して大幅に改善された。提案手法を採用することにより,HLSにおける設計空間探索のランタイムは数十分短縮され,得られたADRSは平均6.91%に短縮される。

High-level synthesis (HLS) notably speeds up the hardware design process by avoiding RTL programming. However, the turnaround time of HLS increases significantly when post-route quality of results (QoR) are considered during optimization. To tackle this issue, we propose a hierarchical post-route QoR prediction approach for FPGA HLS, which features: (1) a modeling flow that directly estimates latency and post-route resource usage from C/C++ programs; (2) a graph construction method that effectively represents the control and data flow graph of source code and effects of HLS pragmas; and (3) a hierarchical GNN training and prediction method capable of capturing the impact of loop hierarchies. Experimental results show that our method presents a prediction error of less than 10% for different types of QoR metrics, which gains tremendous improvement compared with the state-of-the-art GNN methods. By adopting our proposed methodology, the runtime for design space exploration in HLS is shortened to tens of minutes and the achieved ADRS is reduced to 6.91% on average.

翻訳日:2024-01-18 18:25:54 公開日:2024-01-14

# 専門知識と解釈可能なデータ駆動インテリジェンスを統合した感染性角膜炎の協調診断

Enabling Collaborative Clinical Diagnosis of Infectious Keratitis by Integrating Expert Knowledge and Interpretable Data-driven Intelligence ( http://arxiv.org/abs/2401.08695v1 )

ライセンス: Link先を確認

Zhengqing Fang, Shuowen Zhou, Zhouhang Yuan, Yuxuan Si, Mengze Li, Jinxu Li, Yesheng Xu, Wenjia Xie, Kun Kuang, Yingming Li, Fei Wu, and Yu-Feng Yao

(参考訳) 医用画像診断におけるデータ駆動人工知能(AI)は、シリコで顕著な性能を示したが、解釈可能性の欠如により、臨床医のワークフローに「ブラックボックス」を組み込むことは困難である。臨床医がデータから学んだ診断パターンを理解するために,AIベースのバイオマーカーと同一の診断パターンを持つ検索事例を含む可視化推論プロセスを提供する,解釈可能なモデル,知識誘導診断モデル(KGDM)を開発した。臨床医のプロンプトを人間とaiの相互作用を通じて解釈する推論に取り入れ、安全性の向上とより正確な予測に繋がる可能性がある。本研究は角膜盲症の原因である感染性角膜炎(IK)の診断におけるKGDMの性能,解釈可能性,臨床的有用性について検討した。 KGDMの分類性能は、予測検証データセット、外部テストデータセット、公開テストデータセットで評価される。解釈AIベースのバイオマーカーの診断確率比(DOR)は3.011から35.233の範囲で有効であり、臨床経験と一貫した診断パターンを示す。さらに、人間とAIの協調診断テストを実施し、コラボレーションの参加者は、人間とAIの双方を上回るパフォーマンスを達成した。解釈可能性と相互作用を相乗的に統合することにより、臨床医の専門知識とデータ駆動インテリジェンスの統合を促進する。 aiベースのバイオマーカーによる経験の浅い眼科医の促進と、経験者からの介入によるai予測の増大は、経験豊富な医療従事者が制限され、aiの安全性が懸念される他の疾患への拡大の可能性を秘めているkgdmを用いた伝染性角膜炎に対する有望な診断パラダイムを示している。

Although data-driven artificial intelligence (AI) in medical image diagnosis has shown impressive performance in silico, the lack of interpretability makes it difficult to incorporate the "black box" into clinicians' workflows. To make the diagnostic patterns learned from data understandable by clinicians, we develop an interpretable model, knowledge-guided diagnosis model (KGDM), that provides a visualized reasoning process containing AI-based biomarkers and retrieved cases that with the same diagnostic patterns. It embraces clinicians' prompts into the interpreted reasoning through human-AI interaction, leading to potentially enhanced safety and more accurate predictions. This study investigates the performance, interpretability, and clinical utility of KGDM in the diagnosis of infectious keratitis (IK), which is the leading cause of corneal blindness. The classification performance of KGDM is evaluated on a prospective validation dataset, an external testing dataset, and an publicly available testing dataset. The diagnostic odds ratios (DOR) of the interpreted AI-based biomarkers are effective, ranging from 3.011 to 35.233 and exhibit consistent diagnostic patterns with clinic experience. Moreover, a human-AI collaborative diagnosis test is conducted and the participants with collaboration achieved a performance exceeding that of both humans and AI. By synergistically integrating interpretability and interaction, this study facilitates the convergence of clinicians' expertise and data-driven intelligence. The promotion of inexperienced ophthalmologists with the aid of AI-based biomarkers, as well as increased AI prediction by intervention from experienced ones, demonstrate a promising diagnostic paradigm for infectious keratitis using KGDM, which holds the potential for extension to other diseases where experienced medical practitioners are limited and the safety of AI is concerned.

翻訳日:2024-01-18 18:25:32 公開日:2024-01-14

# 弱教師付き関係抽出のための表現学習

Representation Learning for Weakly Supervised Relation Extraction ( http://arxiv.org/abs/2105.00815v2 )

ライセンス: Link先を確認

Zhuang Li

(参考訳) 近年,情報抽出やそのサブタスクであるリレーション抽出が急速に進展している。関係抽出は文中のエンティティ間の意味関係を検出することができる。現在、関係抽出タスクに多くの効率的なアプローチが適用されている。教師付き学習アプローチは特に優れたパフォーマンスを持つ。しかし、まだ多くの難しい課題がある。最も深刻な問題の1つは、手動ラベル付きデータを取得するのが難しいことである。ほとんどの場合、教師付きアプローチの限られたデータは、粗悪なパフォーマンスに等しい。そこで,本研究では,トレーニングデータに制限のある状況下では,教師なし事前学習による教師ありベースラインシステムの性能向上に注目する。機能(feature)は、教師付きアプローチを改善する上で重要なコンポーネントの1つです。伝統的なアプローチは通常手作りの特徴を適用し、専門知識と高価な人的労働を必要とする。しかし、この種の機能はデータのスパーシティに支障をきたす可能性がある。トレーニングセットのサイズが小さい場合、モデルパラメータは低い推定値になる可能性がある。本論文では,関係表現の構文・意味的パターンを多用した分散テキスト表現の特徴を学習するための,教師なし事前学習モデルを提案する。実験により, 従来の手作りの特徴と組み合わせることで, 関係抽出のためのロジスティック分類モデルの性能が向上することが実証された。

Recent years have seen rapid development in Information Extraction, as well as its subtask, Relation Extraction. Relation Extraction is able to detect semantic relations between entities in sentences. Currently, many efficient approaches have been applied to relation extraction tasks. Supervised learning approaches especially have good performance. However, there are still many difficult challenges. One of the most serious problems is that manually labeled data is difficult to acquire. In most cases, limited data for supervised approaches equals lousy performance. Thus here, under the situation with only limited training data, we focus on how to improve the performance of our supervised baseline system with unsupervised pre-training. Feature is one of the key components in improving the supervised approaches. Traditional approaches usually apply hand-crafted features, which require expert knowledge and expensive human labor. However, this type of feature might suffer from data sparsity: when the training set size is small, the model parameters might be poorly estimated. In this thesis, we present several novel unsupervised pre-training models to learn the distributed text representation features, which are encoded with rich syntactic-semantic patterns of relation expressions. The experiments have demonstrated that this type of feature, combine with the traditional hand-crafted features, could improve the performance of the logistic classification model for relation extraction, especially on the classification of relations with only minor training instances.

翻訳日:2024-01-18 04:17:56 公開日:2024-01-14

# 分散学習のためのバイアス圧縮について

On Biased Compression for Distributed Learning ( http://arxiv.org/abs/2002.12410v4 )

ライセンス: Link先を確認

Aleksandr Beznosikov and Samuel Horv\'ath and Peter Richt\'arik and Mher Safaryan

(参考訳) 近年,分散学習におけるコミュニケーションのボトルネックを軽減するツールとして,様々なコミュニケーション圧縮技術が登場している。しかし、バイアス圧縮機は、より研究され理解されている非バイアス圧縮機と比較して、実際は優れた性能を示すことが多いが、それらについてはほとんど知られていない。本研究では, 偏差圧縮演算子の3つのクラスについて検討し, その2つのクラスは新しく, その性能は(確率的)勾配降下と分散(確率的)勾配降下に適用した。偏りのある圧縮機が単一ノードと分散設定の両方で線形収束率をもたらすことを初めて示す。 We prove that distributed compressed SGD method, employed with error feedback mechanism, enjoys the ergodic rate $O\left( \delta L \exp \left[-\frac{\mu K}{\delta L}\right] + \frac{(C + \delta D)}{K\mu}\right)$, where $\delta\ge 1$ is a compression parameter which grows when more compression is applied, $L$ and $\mu$ are the smoothness and strong convexity constants, $C$ captures stochastic gradient noise ($C=0$ if full gradients are computed on each node) and $D$ captures the variance of the gradients at the optimum ($D=0$ for over-parameterized models). さらに、通信勾配の合成的および経験的分布に関する理論的研究を通じて、なぜ、また、偏りのある圧縮機が偏りのない変種をどれだけ上回るかについて光を当てた。最後に, 理論的な保証と実用性能が期待できる新しいバイアス圧縮機を提案する。

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show superior performance in practice when compared to the much more studied and understood unbiased compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We prove that distributed compressed SGD method, employed with error feedback mechanism, enjoys the ergodic rate $O\left( \delta L \exp \left[-\frac{\mu K}{\delta L}\right] + \frac{(C + \delta D)}{K\mu}\right)$, where $\delta\ge 1$ is a compression parameter which grows when more compression is applied, $L$ and $\mu$ are the smoothness and strong convexity constants, $C$ captures stochastic gradient noise ($C=0$ if full gradients are computed on each node) and $D$ captures the variance of the gradients at the optimum ($D=0$ for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose several new biased compressors with promising theoretical guarantees and practical performance.

翻訳日:2024-01-18 04:16:40 公開日:2024-01-14

# 時空間ダイナミクスのための多分解能偏微分方程式保存学習フレームワーク

Multi-resolution partial differential equations preserved learning framework for spatiotemporal dynamics ( http://arxiv.org/abs/2205.03990v3 )

ライセンス: Link先を確認

Xin-Yang Liu and Min Zhu and Lu Lu and Hao Sun and Jian-Xun Wang

(参考訳) 従来のデータ駆動ディープラーニングモデルは、複雑な物理プロセスにおける高いトレーニングコスト、エラーの蓄積、そして不十分な一般化に苦しむことが多い。物理インフォームドディープラーニング(PiDL)は、物理原理をモデルに組み込むことによって、これらの課題に対処する。大半のPiDLは、制御方程式を損失関数に埋め込むことで正規化訓練にアプローチするが、これは損失項を測るために広範囲なハイパーパラメータチューニングに大きく依存する。そこで本研究では,偏微分方程式(pde)演算子とネットワーク構造との接続を通じて,離散化制御方程式をニューラルネットワークアーキテクチャに‘baking’することで,物理の事前知識を活用し,pde保存ニューラルネットワーク(ppnn)を実現することを提案する。マルチレゾリューション設定において畳み込み残差ネットワークを介して離散化されたpdesを埋め込み、従来のブラックボックスモデルに匹敵する一般化性と長期予測精度を大幅に向上させる。提案手法の有効性と有効性は, 反応拡散, バーガーズ, ナビエ・ストークス方程式など, 時空間PDEが支配する様々な時空間力学系で実証されている。

Traditional data-driven deep learning models often struggle with high training costs, error accumulation, and poor generalizability in complex physical processes. Physics-informed deep learning (PiDL) addresses these challenges by incorporating physical principles into the model. Most PiDL approaches regularize training by embedding governing equations into the loss function, yet this depends heavily on extensive hyperparameter tuning to weigh each loss term. To this end, we propose to leverage physics prior knowledge by ``baking'' the discretized governing equations into the neural network architecture via the connection between the partial differential equations (PDE) operators and network structures, resulting in a PDE-preserved neural network (PPNN). This method, embedding discretized PDEs through convolutional residual networks in a multi-resolution setting, largely improves the generalizability and long-term prediction accuracy, outperforming conventional black-box models. The effectiveness and merit of the proposed methods have been demonstrated across various spatiotemporal dynamical systems governed by spatiotemporal PDEs, including reaction-diffusion, Burgers', and Navier-Stokes equations.

翻訳日:2024-01-18 04:12:38 公開日:2024-01-14

# 局所的量子状態判別における局所的測定の不適合性

Incompatibility of local measurements provide advantage in local quantum state discrimination ( http://arxiv.org/abs/2204.10948v2 )

ライセンス: Link先を確認

Kornikar Sen, Saronath Halder, Ujjwal Sen

(参考訳) 不確実性原理は、可観測物の非互換性の概念を生じさせると考えられる。同時に測定できない量子測定のパックは、非互換な測定のセットを形成すると言われている。すべての非互換な測定値のセットは、アンサンブルから状態を準備して他の相手に送る量子状態識別タスクにおいて、対応するものよりも有利であり、後者は利用可能な測定値を使用して状態を検出する。大域的および局所的な量子状態の識別の比較は、「非局所的」現象をもたらすことが知られている。本研究では,局所量子状態識別と非互換量子測定の領域間の接続を密閉する。送信者が2部状態を作成し、2つの受信機にサブシステムを送信する局所量子状態識別タスクを考える。受信機は、ローカル不整合測定を用いて送信された状態を検出しようとする。不整合測定を用いて状態を推測する確率と、不整合測定を用いて状態を推測する最大確率の比率を解析した。この比は局所的な測定値の不適合性のロバストネスの単純な関数によって上限される。興味深いことに、すべての非互換な測定セットに対応して、この境界が達成できる少なくとも1つの局所状態判別タスクが存在する。最適局所量子状態判別タスクは、グローバルおよびローカルな状態判別において、不整合性および整合性のある測定による検出を成功させる確率の比の差という意味で、この用語が使われる「非局所性」を含まないことを論じる。結果は、タスクを区別するマルチパーティの局所量子状態の体系に一般化することができる。

The uncertainty principle may be considered as giving rise to the notion of incompatibility of observables. A pack of quantum measurements that cannot be measured simultaneously is said to form a set of incompatible measurements. Every set of incompatible measurements has an advantage over the compatible ones in a quantum state discrimination task where one prepares a state from an ensemble and sends it to another party, and the latter tries to detect the state using available measurements. Comparison between global and local quantum state discriminations is known to lead to a phenomenon of "nonlocality". In this work, we seal a connection between the domains of local quantum state discrimination and incompatible quantum measurements. We consider the local quantum state discrimination task where a sender prepares a bipartite state and sends the subsystems to two receivers. The receivers try to detect the sent state using locally incompatible measurements. We analyze the ratio of the probability of successfully guessing the state using incompatible measurements and the maximum probability of successfully guessing the state using compatible measurements. We find that this ratio is upper bounded by a simple function of robustnesses of incompatibilities of the local measurements. Interestingly, corresponding to every pair of sets of incompatible measurements, there exists at least one local state discrimination task where this bound can be achieved. We argue that the optimal local quantum state discrimination task does not present any "nonlocality", where the term is used in the sense of a difference between the ratios, of probabilities of successful detection via incompatible and compatible measurements, in global and local state discriminations. The results can be generalized to the regime of multipartite local quantum state distinguishing tasks.

翻訳日:2024-01-18 04:12:00 公開日:2024-01-14

# TeleGraph:階層的リンク予測のためのベンチマークデータセット

TeleGraph: A Benchmark Dataset for Hierarchical Link Prediction ( http://arxiv.org/abs/2204.07703v2 )

ライセンス: Link先を確認

Min Zhou, Bisheng Li, Menglin Yang, Lujia Pan

(参考訳) リンク予測は、ネットワーク構造データにとって重要な問題であり、その多様な応用のためにかなりの研究努力を惹きつける。現在のリンク予測手法は一般的なネットワークにフォーカスしており、ネットワークの閉じた三角形構造かノード属性のいずれかに依存する。スパースネットワークや高度階層ネットワークでのそれらの性能はよく研究されていない。一方、利用可能なツリーライクなベンチマークデータセットは、シミュレートされるか、ノード情報が少ないか、あるいは小規模である。このギャップを埋めるために、リンク推論技術の評価と育成のために、リッチノード属性に関連付けられた高度にスパースで階層的な通信ネットワークであるTeleGraphを提案する。実験結果から,ほとんどのアルゴリズムは,ほぼ木のようなデータセット上で十分な性能を得られず,リンク予測アルゴリズムの設計やデプロイには特に注意が必要であることが示唆された。

Link prediction is a key problem for network-structured data, attracting considerable research efforts owing to its diverse applications. The current link prediction methods focus on general networks and are overly dependent on either the closed triangular structure of networks or node attributes. Their performance on sparse or highly hierarchical networks has not been well studied. On the other hand, the available tree-like benchmark datasets are either simulated, with limited node information, or small in scale. To bridge this gap, we present a new benchmark dataset TeleGraph, a highly sparse and hierarchical telecommunication network associated with rich node attributes, for assessing and fostering the link inference techniques. Our empirical results suggest that most of the algorithms fail to produce a satisfactory performance on a nearly tree-like dataset, which calls for special attention when designing or deploying the link prediction algorithm in practice.

翻訳日:2024-01-18 04:11:33 公開日:2024-01-14

# ランダムリシャッフルSARAHは完全な勾配計算を必要としない

Random-reshuffled SARAH does not need a full gradient computations ( http://arxiv.org/abs/2111.13322v2 )

ライセンス: Link先を確認

Aleksandr Beznosikov and Martin Tak\'a\v{c}

(参考訳) 確率的再帰的勾配アルゴリズム(英: stochastic recursive gradient algorithm, sarah)は、確率的勾配降下(sgd)アルゴリズムの分散還元変種であり、時折目的関数の勾配を必要とする。本稿では,完全な勾配計算の必要性を除去する。これはランダムな再シャッフル戦略を使い、各エポックで得られる確率的勾配を集約することで達成される。集計された確率勾配はサラアルゴリズムの完全な勾配の推定に役立っている。本稿では,提案手法の理論的解析を行い,本手法の効率性を示す数値実験で論文をまとめる。

The StochAstic Recursive grAdient algoritHm (SARAH) algorithm is a variance reduced variant of the Stochastic Gradient Descent (SGD) algorithm that needs a gradient of the objective function from time to time. In this paper, we remove the necessity of a full gradient computation. This is achieved by using a randomized reshuffling strategy and aggregating stochastic gradients obtained in each epoch. The aggregated stochastic gradients serve as an estimate of a full gradient in the SARAH algorithm. We provide a theoretical analysis of the proposed approach and conclude the paper with numerical experiments that demonstrate the efficiency of this approach.

翻訳日:2024-01-18 04:08:27 公開日:2024-01-14

# Googleは、画像ベースのGoogleトレンドで新しいファッション製品の売上をマルチモーダル予測

Well Googled is Half Done: Multimodal Forecasting of New Fashion Product Sales with Image-based Google Trends ( http://arxiv.org/abs/2109.09824v6 )

ライセンス: Link先を確認

Geri Skenderi, Christian Joppi, Matteo Denitto, Marco Cristani

(参考訳) 新しいファッション製品の販売予測は、多くのビジネスダイナミクスを伴う困難な問題であり、古典的な予測アプローチでは解決できない。本稿では,過去のデータがないにもかかわらず,販売を効果的に予測するために,Google Trendsの時系列形式で外因性知識を体系的に探索し,それを新しいファッションアイテムに関連するマルチモーダル情報と組み合わせることの有効性を検討する。特に、エンコーダが外因性時系列の表現を学習し、デコーダがGoogle Trendsエンコーディングと利用可能なビジュアルおよびメタデータ情報に基づいて販売を予測するニューラルネットワークベースのアプローチを提案する。我々のモデルは非自己回帰的に機能し、大きな第1ステップエラーの複合効果を避ける。第2のコントリビューションとして,イタリアのファストファッション企業であるNunalieから2016～2019年の間に販売された,5577のリアルな新製品のマルチモーダル情報を含む,新しいファッション製品販売予測タスク用の公開データセットであるVISUELLEを紹介する。データセットには、製品、メタデータ、関連する販売、関連するGoogle Trendsの画像が備わっている。 visuelleを使って最先端の代替品やいくつかのベースラインと比較し、当社のニューラルネットワークベースのアプローチがパーセンテージと絶対誤差の両方の観点から最も正確であることを示しました。外部知識の追加は、重み付き絶対パーセンテージ誤差(wape)の観点から予測精度を1.5%向上させ、情報的外部情報の利用の重要性を明らかにした。コードとデータセットはhttps://github.com/HumaticsLAB/GTM-Transformer.comで公開されている。

New fashion product sales forecasting is a challenging problem that involves many business dynamics and cannot be solved by classical forecasting approaches. In this paper, we investigate the effectiveness of systematically probing exogenous knowledge in the form of Google Trends time series and combining it with multi-modal information related to a brand-new fashion item, in order to effectively forecast its sales despite the lack of past data. In particular, we propose a neural network-based approach, where an encoder learns a representation of the exogenous time series, while the decoder forecasts the sales based on the Google Trends encoding and the available visual and metadata information. Our model works in a non-autoregressive manner, avoiding the compounding effect of large first-step errors. As a second contribution, we present VISUELLE, a publicly available dataset for the task of new fashion product sales forecasting, containing multimodal information for 5577 real, new products sold between 2016-2019 from Nunalie, an Italian fast-fashion company. The dataset is equipped with images of products, metadata, related sales, and associated Google Trends. We use VISUELLE to compare our approach against state-of-the-art alternatives and several baselines, showing that our neural network-based approach is the most accurate in terms of both percentage and absolute error. It is worth noting that the addition of exogenous knowledge boosts the forecasting accuracy by 1.5% in terms of Weighted Absolute Percentage Error (WAPE), revealing the importance of exploiting informative external information. The code and dataset are both available at https://github.com/HumaticsLAB/GTM-Transformer.

翻訳日:2024-01-18 04:07:30 公開日:2024-01-14

# Lirot.ai: クラウドソーシング型網膜画像セグメンテーションのための新しいプラットフォーム

Lirot.ai: A Novel Platform for Crowd-Sourcing Retinal Image Segmentations ( http://arxiv.org/abs/2208.10100v2 )

ライセンス: Link先を確認

Jonathan Fhima, Jan Van Eijgen, Moti Freiman, Ingeborg Stalmans and Joachim A. Behar

(参考訳) 導入: 教師付きディープラーニング(DL)タスクには、大きな注釈付きデータセットが必要である。医学データサイエンスにおいて、dlモデルを開発するための大きな制限の1つは、大量の注釈付き例がないことである。これは、アノテートに必要な時間と専門知識によることが多い。 lirot を紹介します。イメージセグメンテーションの促進とクラウドソーシングのための、新しいプラットフォームだ。方法:リロット。 iPadOSクライアントアプリケーションであるLirot.aiは3つのコンポーネントで構成されている。 lirot.ai-app、lirotというバックエンドサーバ。 Lirot.ai-serverとpython APIの名前。 aiAPI。リロット i-appはSwift 5.6とLirotで開発された。 ai-serverはfirebaseバックエンドです。リロット ai-APIはデータベースの管理を可能にする。リロット i-appは必要なだけ多くのiPadOSデバイスにインストールでき、アノテータは同時にリモートでセグメンテーションを実行することができる。私たちはapple pencilの互換性を取り入れ、セグメンテーションを他のコンピュータベースの代替品よりも高速で、より正確で、専門家にとって直感的なものにしています。結果: Lirotの使用例を示す。参照血管分割を用いた網膜底部データセットの作成のためのai。議論と今後の作業:我々は、アノテートされる画像を選択し、アノテータに配布するより効率的なプロセスを含むことによって、網膜底部データセットの拡大を継続するために、アクティブラーニング戦略を使用する。

Introduction: For supervised deep learning (DL) tasks, researchers need a large annotated dataset. In medical data science, one of the major limitations to develop DL models is the lack of annotated examples in large quantity. This is most often due to the time and expertise required to annotate. We introduce Lirot. ai, a novel platform for facilitating and crowd-sourcing image segmentations. Methods: Lirot. ai is composed of three components; an iPadOS client application named Lirot. ai-app, a backend server named Lirot. ai-server and a python API name Lirot. ai-API. Lirot. ai-app was developed in Swift 5.6 and Lirot. ai-server is a firebase backend. Lirot. ai-API allows the management of the database. Lirot. ai-app can be installed on as many iPadOS devices as needed so that annotators may be able to perform their segmentation simultaneously and remotely. We incorporate Apple Pencil compatibility, making the segmentation faster, more accurate, and more intuitive for the expert than any other computer-based alternative. Results: We demonstrate the usage of Lirot. ai for the creation of a retinal fundus dataset with reference vasculature segmentations. Discussion and future work: We will use active learning strategies to continue enlarging our retinal fundus dataset by including a more efficient process to select the images to be annotated and distribute them to annotators.

翻訳日:2024-01-18 04:01:03 公開日:2024-01-14

# 深部NLPモデルにおけるサルエントニューロンの発見

Discovering Salient Neurons in Deep NLP Models ( http://arxiv.org/abs/2206.13288v2 )

ライセンス: Link先を確認

Nadir Durrani and Fahim Dalvi and Hassan Sajjad

(参考訳) 深部NLPモデルで学んだ表現や、どの知識を捉えるかを理解するために多くの研究がなされてきたが、個々のニューロンにはほとんど注意が払われていない。言語相関分析(英語版)と呼ばれる手法により、モデル内の有意な神経細胞を、いかなる外部特性に関しても抽出し、その知識がニューロン内でどのように保存されているかを理解することを目的としている。以下の質問に答えるために、きめ細かい分析を行う。 (i)特定の言語特性を捉えたネットワーク内のニューロンのサブセットを特定できるか? (ii)ネットワークにまたがる局所化ニューロンや分散ニューロンはどの程度存在するか? iii)情報がどれだけ冗長に保存されているか。 iv)学習した言語知識が下流のnlpタスクにどのように影響するか? 四異なる言語特性の学習において、建築はどのように変化するか。我々のデータ駆動量分析は興味深い発見を照らす。 (i)異なる言語課題を予測できるニューロンの小さなサブセットを発見した。二下位の層に局在する基本的な語彙情報(接尾辞等)を捉えたニューロン三複雑な概念(統語的役割など)を学ぶ者は、主に中層及び上層に置かれる。三ネットワークがタスク特定情報のために上位層を保存するため、転送学習中に、高度層から下位層に言語ニューロンを移動させること。 iv)言語情報がどのように保存されているかに関して,事前学習したモデル間で興味深い違いを見出した。 v) 概念は多言語トランスフォーマーモデルにおいて, 異なる言語にまたがる類似のニューロン分布を示すことがわかった。私たちのコードはneurox toolkitの一部として公開されています。

While a lot of work has been done in understanding representations learned within deep NLP models and what knowledge they capture, little attention has been paid towards individual neurons. We present a technique called as Linguistic Correlation Analysis to extract salient neurons in the model, with respect to any extrinsic property - with the goal of understanding how such a knowledge is preserved within neurons. We carry out a fine-grained analysis to answer the following questions: (i) can we identify subsets of neurons in the network that capture specific linguistic properties? (ii) how localized or distributed neurons are across the network? iii) how redundantly is the information preserved? iv) how fine-tuning pre-trained models towards downstream NLP tasks, impacts the learned linguistic knowledge? iv) how do architectures vary in learning different linguistic properties? Our data-driven, quantitative analysis illuminates interesting findings: (i) we found small subsets of neurons that can predict different linguistic tasks, ii) with neurons capturing basic lexical information (such as suffixation) localized in lower most layers, iii) while those learning complex concepts (such as syntactic role) predominantly in middle and higher layers, iii) that salient linguistic neurons are relocated from higher to lower layers during transfer learning, as the network preserve the higher layers for task specific information, iv) we found interesting differences across pre-trained models, with respect to how linguistic information is preserved within, and v) we found that concept exhibit similar neuron distribution across different languages in the multilingual transformer models. Our code is publicly available as part of the NeuroX toolkit.

翻訳日:2024-01-18 03:59:01 公開日:2024-01-14

# 事前条件付き更新による確率勾配法

Stochastic Gradient Methods with Preconditioned Updates ( http://arxiv.org/abs/2206.00285v2 )

ライセンス: Link先を確認

Abdurakhmon Sadiev, Aleksandr Beznosikov, Abdulla Jasem Almansoori, Dmitry Kamzolov, Rachael Tappenden, Martin Tak\'a\v{c}

(参考訳) 本研究は非凸有限和最小化問題を考える。このような問題に対するアルゴリズムはいくつか存在するが、既存の手法は、問題がひどくスケールしたり、不調になったりした場合にうまく動作しないことが多い。したがって、Hutchinsonによるヘッセン対角線近似のアプローチに基づく事前条件を記述し、新しいスケールアルゴリズム(Scaled SARAHとScaled L-SVRG)を提供する勾配法と組み合わせる。理論的複雑性は滑らかさの仮定の下で保証される。滑らかさとPL条件の両方を仮定すると線形収束が証明される。適応的拡大手法では, 近似的な部分的な2次曲率情報を用い, 問題の影響を軽減できる。この改良された実用性能は,本研究で示された数値実験で実証された。

This work considers the non-convex finite sum minimization problem. There are several algorithms for such problems, but existing methods often work poorly when the problem is badly scaled and/or ill-conditioned, and a primary goal of this work is to introduce methods that alleviate this issue. Thus, here we include a preconditioner based on Hutchinson's approach to approximating the diagonal of the Hessian, and couple it with several gradient-based methods to give new scaled algorithms: Scaled SARAH and Scaled L-SVRG. Theoretical complexity guarantees under smoothness assumptions are presented. We prove linear convergence when both smoothness and the PL condition are assumed. Our adaptively scaled methods use approximate partial second-order curvature information and, therefore, can better mitigate the impact of badly scaled problems. This improved practical performance is demonstrated in the numerical experiments also presented in this work.

翻訳日:2024-01-18 03:56:51 公開日:2024-01-14

# スペクトルクラスタリングのためのLeave-one-out Singular Subspace Perturbation解析

Leave-one-out Singular Subspace Perturbation Analysis for Spectral Clustering ( http://arxiv.org/abs/2205.14855v2 )

ライセンス: Link先を確認

Anderson Y. Zhang, Harrison H. Zhou

(参考訳) 特異部分空間摂動理論は確率と統計において基本的な重要性を持つ。様々な分野にまたがる様々な応用がある。 2つの任意の行列を考えると、一方は他方の左1カラムアウト部分行列であり、2つの対応する特異部分空間の間の距離に対する新しい摂動上界を確立する。これは混合モデルによく適合しており、ウェディンの定理のような古典摂動境界よりも鋭く細かい統計解析ができる。この残余1次摂動理論により、混合モデル下でのスペクトルクラスタリングの性能に関する決定論的帰納的分析を行う。本解析は,サブガウス混合モデルのスペクトルクラスタリングに対する明示的な指数的誤差率をもたらす。等方性ガウスの混合物の場合、この速度はl{\"o}ffler et al. (2021)よりも弱い信号対雑音条件下で最適である。

The singular subspaces perturbation theory is of fundamental importance in probability and statistics. It has various applications across different fields. We consider two arbitrary matrices where one is a leave-one-column-out submatrix of the other one and establish a novel perturbation upper bound for the distance between the two corresponding singular subspaces. It is well-suited for mixture models and results in a sharper and finer statistical analysis than classical perturbation bounds such as Wedin's Theorem. Empowered by this leave-one-out perturbation theory, we provide a deterministic entrywise analysis for the performance of spectral clustering under mixture models. Our analysis leads to an explicit exponential error rate for spectral clustering of sub-Gaussian mixture models. For the mixture of isotropic Gaussians, the rate is optimal under a weaker signal-to-noise condition than that of L{\"o}ffler et al. (2021).

翻訳日:2024-01-18 03:56:36 公開日:2024-01-14

# Impartial Games:強化学習への挑戦

Impartial Games: A Challenge for Reinforcement Learning ( http://arxiv.org/abs/2205.12787v4 )

ライセンス: Link先を確認

Bei Zhou and S{\o}ren Riis

(参考訳) 本稿では,AlphaZero-style reinforcement learning (RL)アルゴリズムが様々なボードゲームで優れている一方で,プレイヤーが駒を共有する公平なゲームでは課題に直面していることを示す。我々は、alphazero型および類似の自己遊び強化学習アルゴリズムの崩壊ブロックであるように見えるゲーム、すなわちnimの子供向けゲームおよびその他の不公平なゲームの具体例を示す。我々の研究は、ニューラルネットワークがパリティ関数を学習する能力に関するデータ分散の複雑さによって引き起こされる課題に基づいており、ノイズラベルの問題によって悪化している。最近の研究では、alphazeroスタイルのアルゴリズムが敵対的攻撃や敵対的摂動に対して脆弱であることを示しており、すべての合法状態においてゲームを習得する学習の難しさを示している。 Nimは小さなボード上で学習できるが、AlphaZeroスタイルのアルゴリズムの学習の進歩は、ボードのサイズが大きくなると劇的に遅くなる。直感的には、Nim のような公平なゲームと Chess や Go のようなパルチザン的なゲームの違いは、ボードの小さな部分が公平なゲームでカバーされている場合、ある空白位置の可視的な部分とその正しい評価との相関がしばしばゼロであるので、その位置が勝つか失われるかを予測できないという事実によって説明できる。この状況は、部分的に空白されたボード位置が典型的には、完全な未発見位置の値に関する多量または少なくともノントリフト情報を提供するパルチザンゲームとは対照的である。

While AlphaZero-style reinforcement learning (RL) algorithms excel in various board games, in this paper we show that they face challenges on impartial games where players share pieces. We present a concrete example of a game - namely the children's game of Nim - and other impartial games that seem to be a stumbling block for AlphaZero-style and similar self-play reinforcement learning algorithms. Our work is built on the challenges posed by the intricacies of data distribution on the ability of neural networks to learn parity functions, exacerbated by the noisy labels issue. Our findings are consistent with recent studies showing that AlphaZero-style algorithms are vulnerable to adversarial attacks and adversarial perturbations, showing the difficulty of learning to master the games in all legal states. We show that Nim can be learned on small boards, but the learning progress of AlphaZero-style algorithms dramatically slows down when the board size increases. Intuitively, the difference between impartial games like Nim and partisan games like Chess and Go can be explained by the fact that if a small part of the board is covered for impartial games it is typically not possible to predict whether the position is won or lost as there is often zero correlation between the visible part of a partly blanked-out position and its correct evaluation. This situation starkly contrasts partisan games where a partly blanked-out board position typically provides abundant or at least non-trifle information about the value of the fully uncovered position.

翻訳日:2024-01-18 03:56:04 公開日:2024-01-14

# RALACs:インタラクションエンコーディングと光フローを用いた自動運転車の行動認識

RALACs: Action Recognition in Autonomous Vehicles using Interaction Encoding and Optical Flow ( http://arxiv.org/abs/2209.14408v3 )

ライセンス: Link先を確認

Eddy Zhou, Alex Zhuang, Alikasim Budhwani, Owen Leather, Rowan Dempster, Quanquan Li, Mohammad Al-Sharman, Derek Rayside, and William Melek

(参考訳) 自律走行車(AV)設定に適用すると、行動認識は環境モデルの状況認識を高めることができる。これは特に、avsの伝統的な幾何学的記述やヒューリスティックが不十分なシナリオで一般的である。しかしながら、伝統的に人間の行動認識は研究されてきたが、ノイズに富んだ、無修正の生のRGBデータへの適応性には限界がある。行動認識のAVへの進歩と導入を促進するために,新たな2段階の行動認識システムであるRALACを提案する。 RALACは、道路シーンにおける行動認識の問題を定式化し、それと人間の行動認識の確立した分野とのギャップを埋める。本研究は,エージェント間の関係をエンコードするために注目層がいかに有用かを示し,そのようなスキームがクラスに依存しないかを強調した。さらに、道路上のエージェントの動的性質に対処するため、ralACsは、下流行動分類のためのエージェントトラックへの関心領域アライメント(ROI)適応のための新しいアプローチを構築している。最後に,本手法では,アクティブエージェント検出の問題点も考慮し,道路シーンにおける関連エージェントの識別に光フローマップを融合する新たな応用法を提案する。提案手法はICCV2021ロードチャレンジデータセットのベースラインを上回り,実際の車両プラットフォームに展開することにより,意思決定における行動認識の有用性に関する予備的な知見を提供する。

When applied to autonomous vehicle (AV) settings, action recognition can enhance an environment model's situational awareness. This is especially prevalent in scenarios where traditional geometric descriptions and heuristics in AVs are insufficient. However, action recognition has traditionally been studied for humans, and its limited adaptability to noisy, un-clipped, un-pampered, raw RGB data has limited its application in other fields. To push for the advancement and adoption of action recognition into AVs, this work proposes a novel two-stage action recognition system, termed RALACs. RALACs formulates the problem of action recognition for road scenes, and bridges the gap between it and the established field of human action recognition. This work shows how attention layers can be useful for encoding the relations across agents, and stresses how such a scheme can be class-agnostic. Furthermore, to address the dynamic nature of agents on the road, RALACs constructs a novel approach to adapting Region of Interest (ROI) Alignment to agent tracks for downstream action classification. Finally, our scheme also considers the problem of active agent detection, and utilizes a novel application of fusing optical flow maps to discern relevant agents in a road scene. We show that our proposed scheme can outperform the baseline on the ICCV2021 Road Challenge dataset and by deploying it on a real vehicle platform, we provide preliminary insight to the usefulness of action recognition in decision making.

翻訳日:2024-01-18 03:47:34 公開日:2024-01-14

# MIXRTs:繰り返しソフト決定木を混合した多エージェント強化学習に向けて

MIXRTs: Toward Interpretable Multi-Agent Reinforcement Learning via Mixing Recurrent Soft Decision Trees ( http://arxiv.org/abs/2209.07225v3 )

ライセンス: Link先を確認

Zichuan Liu, Yuanyang Zhu, Zhi Wang, Yang Gao, Chunlin Chen

(参考訳) さまざまな分野で大きな成功を収めている一方で、既存のマルチエージェント強化学習(MARL)とブラックボックスニューラルネットワークアーキテクチャは、学習知識の理解や入力観察が意思決定にどのように影響するかを人によって妨げる不透明な方法で決定を行う。代わりに、伝統的な線形モデルや決定木のような既存の解釈可能なアプローチは通常、弱い表現力と低い精度に悩まされる。ミキシング・リカレント・ソフト・決定木(MIXRTs)は,この性能と解釈可能性の明確な二分法に対処するため,各エージェントのチームへの貢献を反映し,ルート・ツー・リーフ・パスを通じて明確な決定プロセスを表現することができる新しい解釈可能なアーキテクチャである。具体的には、リカレントニューラルネットワークの進歩を利用して、部分観測可能性に対処する新しいソフト決定木を構築し、ツリーベースモデルによる意思決定プロセスに影響を与える特徴を実証する。そして,その値分解フレームワークに基づいて,各エージェントに対して,各アクション値を明示的に混合し,局所的な観察のみを用いて共同行動値を推定することにより,各エージェントに対する信頼度を線形に割り当てる。理論的解析により、MIXRTsは結合作用値の分解における付加性と単調性に関する構造的制約を保証していることが示された。課題であるSpreadとStarCraft IIタスクの評価から、MIXRTは広く研究されている手法と比較して競争性能を達成し、意思決定プロセスのより直接的な説明を提供する。我々は,MARLの新しい解釈可能なパラダイムに光を当てる可能性があり,高い性能と解釈可能性を持った学習アルゴリズム開発に向けた有望な道を探る。

While achieving tremendous success in various fields, existing multi-agent reinforcement learning (MARL) with a black-box neural network architecture makes decisions in an opaque manner that hinders humans from understanding the learned knowledge and how input observations influence decisions. Instead, existing interpretable approaches, such as traditional linear models and decision trees, usually suffer from weak expressivity and low accuracy. To address this apparent dichotomy between performance and interpretability, our solution, MIXing Recurrent soft decision Trees (MIXRTs), is a novel interpretable architecture that can represent explicit decision processes via the root-to-leaf path and reflect each agent's contribution to the team. Specifically, we construct a novel soft decision tree to address partial observability by leveraging the advances in recurrent neural networks, and demonstrate which features influence the decision-making process through the tree-based model. Then, based on the value decomposition framework, we linearly assign credit to each agent by explicitly mixing individual action values to estimate the joint action value using only local observations, providing new insights into how agents cooperate to accomplish the task. Theoretical analysis shows that MIXRTs guarantees the structural constraint on additivity and monotonicity in the factorization of joint action values. Evaluations on the challenging Spread and StarCraft II tasks show that MIXRTs achieves competitive performance compared to widely investigated methods and delivers more straightforward explanations of the decision processes. We explore a promising path toward developing learning algorithms with both high performance and interpretability, potentially shedding light on new interpretable paradigms for MARL.

翻訳日:2024-01-18 03:46:13 公開日:2024-01-14

# Mask Focal Loss: 正準物体検出ネットワークによる密集群カウントのための統一フレームワーク

Mask Focal Loss: A unifying framework for dense crowd counting with canonical object detection networks ( http://arxiv.org/abs/2212.11542v3 )

ライセンス: Link先を確認

Xiaopin Zhong, Guankun Wang, Weixiang Liu, Zongze Wu, Yuanlong Deng

(参考訳) 基本的なコンピュータビジョンタスクとして、群衆のカウントは公共の安全において重要な役割を果たす。現在、深層学習に基づく頭部検出は、群集カウントの有望な方法である。しかし,(1)既存の損失関数が高濃度で複雑な場面でサンプルの不均衡に対処できないこと,(2)標準物体検出器が損失計算における空間的一貫性を欠くこと,(2)物体の位置と背景領域の関係を無視すること,(3)頭部検出データセットのほとんどは,境界ボックスのない中心点にのみ注釈付けされていること,の3つの理由から,この問題によく適用できない。これらの問題を克服するために,ガウス核を用いたヒートマップに基づく新しいマスク焦点損失(mfl)を提案する。 MFLは、ヒートマップとバイナリフィーチャーマップの両方の真実に基づく損失関数の統一フレームワークを提供する。さらに、総合アノテーションを用いた合成データセットであるGTA_Headを導入し、評価と比較を行った。広範な実験結果から,様々な検出器とデータセットにおけるmflの性能が向上し,maeとrmseはそれぞれ47.03%,61.99%減少した。そこで本研究は,密度推定に基づく群集数法を推し進めるための強力な基盤を提供する。

As a fundamental computer vision task, crowd counting plays an important role in public safety. Currently, deep learning based head detection is a promising method for crowd counting. However, the highly concerned object detection networks cannot be well applied to this problem for three reasons: (1) Existing loss functions fail to address sample imbalance in highly dense and complex scenes; (2) Canonical object detectors lack spatial coherence in loss calculation, disregarding the relationship between object location and background region; (3) Most of the head detection datasets are only annotated with the center points, i.e. without bounding boxes. To overcome these issues, we propose a novel Mask Focal Loss (MFL) based on heatmap via the Gaussian kernel. MFL provides a unifying framework for the loss functions based on both heatmap and binary feature map ground truths. Additionally, we introduce GTA_Head, a synthetic dataset with comprehensive annotations, for evaluation and comparison. Extensive experimental results demonstrate the superior performance of our MFL across various detectors and datasets, and it can reduce MAE and RMSE by up to 47.03% and 61.99%, respectively. Therefore, our work presents a strong foundation for advancing crowd counting methods based on density estimation.

翻訳日:2024-01-18 03:35:07 公開日:2024-01-14

# deepspeed data efficiency: 効率的なデータサンプリングとルーティングによるディープラーニングモデルの品質とトレーニング効率の向上

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing ( http://arxiv.org/abs/2212.03597v3 )

ライセンス: Link先を確認

Conglong Li, Zhewei Yao, Xiaoxia Wu, Minjia Zhang, Connor Holmes, Cheng Li, Yuxiong He

(参考訳) ディープラーニングモデルの最近の進歩は、厳しいトレーニングコストを犠牲にしている。モデルサイズの増加は根本原因の1つだが、もう1つの強調されていない事実は、実際にデータスケールはモデルスケールと同じ速度で増加しており、トレーニングコストは両者に比例していることだ。急速に進化するモデルアーキテクチャと比較して、トレーニングデータ(特に高価なファンデーションモデル事前トレーニングのために)を効率的に利用する方法は、データ効率機能に重点を置く便利なフレームワークが欠如しているため、調査も困難である。この目的のために、データをよりよく活用し、トレーニング効率を高め、モデル品質を向上させるフレームワークであるDeepSpeed Data efficiencyを紹介します。具体的には,一般的なカリキュラム学習ライブラリを用いた効率的なデータサンプリング手法と,新しいランダム・レイヤワイズ・トークン・ドロップ手法による効率的なデータルーティング手法を提案する。 GPT-3 1.3B言語モデルの事前トレーニングでは、当社の作業は12.5倍少ないデータ/時間/コスト(Azureでレンタルすれば3.7K)を実現しています。 GPT-3 1.3B と BERT-large の事前トレーニングでは、データ/時間/コストの最大2倍のコストで同じモデル品質を達成できます。 DeepSpeed Data efficiency は使いやすく、チューニングも容易で、GPT-3 MoE モデル事前トレーニングや小型 GPT-2/ViT ファインタニングなどのタスクに簡単に適用でき、そのメリットを検証できる。

Recent advances on deep learning models come at the price of formidable training cost. The increasing model size is one of the root causes, but another less-emphasized fact is that data scale is actually increasing at a similar speed as model scale, and the training cost is proportional to both of them. Compared to the rapidly evolving model architecture, how to efficiently use the training data (especially for the expensive foundation model pretraining) is both less explored and difficult to realize due to the lack of a convenient framework that focuses on data efficiency capabilities. To this end, we present DeepSpeed Data Efficiency, a framework that makes better use of data, increases training efficiency, and improves model quality. Specifically, we propose and combine two data efficiency techniques: efficient data sampling via a general curriculum learning library, and efficient data routing via a novel random layerwise token dropping technique. For GPT-3 1.3B language model pretraining, our work achieves 12.5x less data/time/cost (\$3.7K if rent on Azure), while still maintaining 95% of model quality compared to baseline with full data and cost (\$46.3K). For GPT-3 1.3B and BERT-large pretraining, our work can also achieve the same model quality with up to 2x less data/time/cost, or achieve better model quality under same data/time/cost. DeepSpeed Data Efficiency is easy to use and tune, enabling us to easily apply it and verify its benefit on additional tasks including GPT-3 MoE model pretraining and small-scale GPT-2/ViT finetuning.

翻訳日:2024-01-18 03:34:26 公開日:2024-01-14

# NLPにおける望ましくないバイアス:測定の課題

Undesirable Biases in NLP: Addressing Challenges of Measurement ( http://arxiv.org/abs/2211.13709v4 )

ライセンス: Link先を確認

Oskar van der Wal, Dominik Bachmann, Alina Leidinger, Leendert van Maanen, Willem Zuidema, Katrin Schulz

(参考訳) 大規模言語モデルと自然言語処理(NLP)技術が急速に発展し、日々の生活に広まっていくにつれ、それらの利用が人々に与える影響を予想することが重要となる。近年、多くの注目を集めている問題の一つは、この技術が有害なバイアスを示しており、デロギ的ステレオタイプの生成から、異なる社会集団で異なる結果を生み出すまでである。これらのバイアスの評価と緩和に多くの労力が費やされてきたが、nlpモデルのバイアスを測定する方法には深刻な問題がある。本稿では,NLPモデルバイアスの問題を,直接観測できないバイアスのような概念の測定に特化している心理測定のレンズを用いて議論するための学際的アプローチを提案する。特に,心理計測から測定ツールの構成妥当性と信頼性の2つの中心的な概念を考察し,モデルバイアス測定の文脈でどのように適用できるかについて議論する。我々のゴールは、NLP実践者により良いバイアス測定を設計するための方法論的なツールを提供することであり、バイアス測定ツールの開発において、より一般的にサイコメトリックからツールを探索することである。

As Large Language Models and Natural Language Processing (NLP) technology rapidly develop and spread into daily life, it becomes crucial to anticipate how their use could harm people. One problem that has received a lot of attention in recent years is that this technology has displayed harmful biases, from generating derogatory stereotypes to producing disparate outcomes for different social groups. Although a lot of effort has been invested in assessing and mitigating these biases, our methods of measuring the biases of NLP models have serious problems and it is often unclear what they actually measure. In this paper, we provide an interdisciplinary approach to discussing the issue of NLP model bias by adopting the lens of psychometrics -- a field specialized in the measurement of concepts like bias that are not directly observable. In particular, we will explore two central notions from psychometrics, the construct validity and the reliability of measurement tools, and discuss how they can be applied in the context of measuring model bias. Our goal is to provide NLP practitioners with methodological tools for designing better bias measures, and to inspire them more generally to explore tools from psychometrics when working on bias measurement tools.

翻訳日:2024-01-18 03:33:14 公開日:2024-01-14

# 部分空間間の量子コヒーレンス:状態変換、コヒーレンスパワー、$k$コヒーレンスおよびその他の性質

Quantum coherence between subspaces: State transformation, Cohering Power, $k$-coherence and other properties ( http://arxiv.org/abs/2302.13148v4 )

ライセンス: Link先を確認

Azam Mani, Fatemeh Rezazadeh, Vahid Karimipour

(参考訳) 最初に[1]で導入され[2,3]で開発されたボックコヒーレンスの概念は、個々の原子上で任意の精密な測定を行うために実験能力がそれほど繊細でない場合を含む。我々は,この資源理論のさらなる研究を促進する枠組みを,いくつかの点で開発する。この枠組みを用いて,非一貫性操作による状態変換の問題を調査し,ブロック非一貫性操作による状態変換に必要な十分条件がメジャー化条件であることを示す。我々はまた、他の全ての状態およびすべてのユニタリゲートが非コヒーレント操作によって構築できる最大コヒーレント状態の形式を決定する。その後、量子チャネルのブロックコヒーレンスおよびブロックデコヒーレンスパワーの概念を定義し、これらのパワーを複数の種類のチャネルで決定する。最後に、ブロックコヒーレンスと、$k$-コヒーレンスと呼ばれる以前のコヒーレンスの拡張との関係について検討する。

The concept of bock-coherence, first introduced in [1] and developed in [2,3] encompasses the case where experimental capabilities are not so delicate to perform arbitrary refined measurements on individual atoms. We develop a framework which facilitates further investigation of this resource theory in several respects. Using this framework, we investigate the problem of state conversion by incoherent operations and show that a majorization condition is the necessary and sufficient condition for state transformation by block-incoherent operations. We also determine the form of the maximally coherent state from which all other states and all unitary gates can be constructed by incoherent operations. Thereafter, we define the concept of block-cohering and block-decohering powers of quantum channels and determine these powers for several types of channels. Finally, we explore the relation between block coherence and a previous extension of coherence, known as $k$-coherence.

翻訳日:2024-01-18 03:24:17 公開日:2024-01-14

# 情報理論上界に対する情報理論下界

Information Theoretic Lower Bounds for Information Theoretic Upper Bounds ( http://arxiv.org/abs/2302.04925v2 )

ライセンス: Link先を確認

Roi Livni

(参考訳) 確率的凸最適化の文脈において,出力モデルと経験的サンプル間の相互情報とアルゴリズムの一般化の関係について検討する。情報理論の一般化バウンダリへの関心が高まっているにもかかわらず、これらのバウンダリが様々な学習アルゴリズムの異常な性能に関する洞察を与えることができるかどうかは不明である。確率凸最適化の研究により,真のリスク最小化には次元依存的相互情報が必要であることが明らかになった。このことは、既存の情報理論の一般化境界は、次元に依存しないサンプル複雑性を持つSGDや正規化ERMのようなアルゴリズムの一般化能力の獲得に不足していることを示している。

We examine the relationship between the mutual information between the output model and the empirical sample and the generalization of the algorithm in the context of stochastic convex optimization. Despite increasing interest in information-theoretic generalization bounds, it is uncertain if these bounds can provide insight into the exceptional performance of various learning algorithms. Our study of stochastic convex optimization reveals that, for true risk minimization, dimension-dependent mutual information is necessary. This indicates that existing information-theoretic generalization bounds fall short in capturing the generalization capabilities of algorithms like SGD and regularized ERM, which have dimension-independent sample complexity.

翻訳日:2024-01-18 03:23:28 公開日:2024-01-14

# データ中心機械学習のための再ラベル法

The Re-Label Method For Data-Centric Machine Learning ( http://arxiv.org/abs/2302.04391v7 )

ライセンス: Link先を確認

Tong Guo

(参考訳) 業界深層学習アプリケーションでは、手作業でラベル付けしたデータは、一定の数のノイズデータを持っています。この問題を解決し、開発データセットで90以上のスコアを達成するために、人間のラベル付けにおける参照としてモデル予測を考慮し、ノイズデータを見つけ、ノイズデータを再ラベルする簡単な方法を提案する。本稿では,分類,シーケンスタグ付け,オブジェクト検出,シーケンス生成,クリックスルー率予測など,幅広いディープラーニングタスクのセットについて述べる。開発データセットの評価結果と人格評価結果は、このアイデアを検証する。

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate our idea for a broad set of deep learning tasks, includes classification, sequence tagging, object detection, sequence generation, click-through rate prediction. The dev dataset evaluation results and human evaluation results verify our idea.

翻訳日:2024-01-18 03:23:02 公開日:2024-01-14

# 不確実性を考慮した構造知識伝達による360$^\circ高分解能深さ推定

360$^\circ$ High-Resolution Depth Estimation via Uncertainty-aware Structural Knowledge Transfer ( http://arxiv.org/abs/2304.07967v2 )

ライセンス: Link先を確認

Zidong Cao, Hao Ai, Athanasios V. Vasilakos, Lin Wang

(参考訳) 高分解能(HR)全方位深度マップを予測するために、既存の手法では、完全に教師付き学習を通じて入力としてHR全方位画像(ODI)を利用するのが一般的である。しかし、実際にHR ODIを入力として使うのは、リソース制約されたデバイスのため望ましくない。さらに、深度マップはカラー画像よりも解像度が低いことが多い。そこで本研究では,HR深度GTマップが存在しない場合に,低分解能(LR) ODIから直接HR全方位深度を推定する。我々のキーとなる考え方は、HR画像のモダリティと対応するLR深度マップからシーン構造知識を移譲し、余分な推論コストを伴わずにHR深度推定の目標を達成することである。具体的には,ODIスーパーレゾリューション(SR)を補助タスクとして導入し,HR深度推定の性能を高めるために,両タスクを弱教師付きで協調的に訓練する。 ODI SRタスクは不確実性推定によってシーン構造的知識を抽出する。これにより,シーン構造知識伝達(SSKT)モジュールを2つのキーコンポーネントで提案する。まず,円筒型暗黙的補間関数(ciif)を用いて,円筒型神経補間重みを学習し,二つのタスク間でciifのパラメータを共有する。次に,hr深度推定タスクがシーン構造知識をより多く学ぶのに役立つ追加構造正規化を提供する特徴蒸留(fd)損失を提案する。広範な実験により,本手法はベースライン法を上回っており,完全教師あり法と同等の性能が得られることを示した。

To predict high-resolution (HR) omnidirectional depth map, existing methods typically leverage HR omnidirectional image (ODI) as the input via fully-supervised learning. However, in practice, taking HR ODI as input is undesired due to resource-constrained devices. In addition, depth maps are often with lower resolution than color images. Therefore, in this paper, we explore for the first time to estimate the HR omnidirectional depth directly from a low-resolution (LR) ODI, when no HR depth GT map is available. Our key idea is to transfer the scene structural knowledge from the HR image modality and the corresponding LR depth maps to achieve the goal of HR depth estimation without any extra inference cost. Specifically, we introduce ODI super-resolution (SR) as an auxiliary task and train both tasks collaboratively in a weakly supervised manner to boost the performance of HR depth estimation. The ODI SR task extracts the scene structural knowledge via uncertainty estimation. Buttressed by this, a scene structural knowledge transfer (SSKT) module is proposed with two key components. First, we employ a cylindrical implicit interpolation function (CIIF) to learn cylindrical neural interpolation weights for feature up-sampling and share the parameters of CIIFs between the two tasks. Then, we propose a feature distillation (FD) loss that provides extra structural regularization to help the HR depth estimation task learn more scene structural knowledge. Extensive experiments demonstrate that our weakly-supervised method outperforms baseline methods, and even achieves comparable performance with the fully-supervised methods.

翻訳日:2024-01-18 03:12:08 公開日:2024-01-14

# 弦から化学構造を学ぶ変圧器建築のキラリティー認識の難しさ

Difficulty in chirality recognition for Transformer architectures learning chemical structures from string ( http://arxiv.org/abs/2303.11593v4 )

ライセンス: Link先を確認

Yasuhiro Yoshikai, Tadahaya Mizuno, Shumpei Nemoto, Hiroyuki Kusuhara

(参考訳) 近年、非常に多様な分子の表現学習、特に自然言語処理(nlp)モデルを分子構造のリテラル表現であるスマイルに適用することに基づく記述子生成の急速な発展が見られる。しかし、これらのモデルがどのように化学構造を理解するかについてはほとんど研究されていない。このブラックボックスに対処するため,SMILESの学習過程と化学構造との関係を代表的NLPモデルであるTransformerを用いて検討した。トランスフォーマーは分子の部分構造を高速に学習するが、全体構造を理解するには拡張トレーニングが必要である。学習段階の異なるモデルから生成された記述子を用いた分子特性予測の精度は,訓練開始から終了まで類似していた。さらに,トランスフォーマーはキラリティーの学習に特に長い訓練が必要であり,エナンチオマーの誤解により性能が低下することもある。これらの知見は化学におけるNLPモデルの理解を深めることが期待される。

Recent years have seen rapid development of descriptor generation based on representation learning of extremely diverse molecules, especially those that apply natural language processing (NLP) models to SMILES, a literal representation of molecular structure. However, little research has been done on how these models understand chemical structure. To address this black box, we investigated the relationship between the learning progress of SMILES and chemical structure using a representative NLP model, the Transformer. We show that while the Transformer learns partial structures of molecules quickly, it requires extended training to understand overall structures. Consistently, the accuracy of molecular property predictions using descriptors generated from models at different learning steps was similar from the beginning to the end of training. Furthermore, we found that the Transformer requires particularly long training to learn chirality and sometimes stagnates with low performance due to misunderstanding of enantiomers. These findings are expected to deepen the understanding of NLP models in chemistry.

翻訳日:2024-01-18 03:08:26 公開日:2024-01-14

# WeditGAN: ラテント・スペース・リロケーションによる画像生成

WeditGAN: Few-Shot Image Generation via Latent Space Relocation ( http://arxiv.org/abs/2305.06671v3 )

ライセンス: Link先を確認

Yuxuan Duan, Li Niu, Yan Hong, Liqing Zhang

(参考訳) 少数の画像生成では、少数の画像上でGANモデルを直接訓練することは、過度に適合するリスクに直面している。一般的な解決策は、大きなソースドメインで事前訓練されたモデルを小さなターゲットに転送することである。本研究はWeditGANを導入し、StyleGANの中間潜伏符号$w$を学習定数オフセット($\Delta w$)で編集し、ソース潜伏空間の分布を単純に移動させることで、目標潜伏空間を発見し、構築することでモデル転送を実現する。潜在空間間の1対1マッピングが確立されると、自然にモードの崩壊やオーバーフィットを防止できる。さらに,方向を定式化したり,$\delta w$ の強度を微調整することにより,再配置プロセスをさらに強化するために,weditgan の変種も提案する。広く使われているソース/ターゲットデータセットの集合に関する実験では、現実的で多様な画像を生成するためのweditganの能力が示されている。コードはhttps://github.com/ldhlwh/weditganで入手できる。

In few-shot image generation, directly training GAN models on just a handful of images faces the risk of overfitting. A popular solution is to transfer the models pretrained on large source domains to small target ones. In this work, we introduce WeditGAN, which realizes model transfer by editing the intermediate latent codes $w$ in StyleGANs with learned constant offsets ($\Delta w$), discovering and constructing target latent spaces via simply relocating the distribution of source latent spaces. The established one-to-one mapping between latent spaces can naturally prevents mode collapse and overfitting. Besides, we also propose variants of WeditGAN to further enhance the relocation process by regularizing the direction or finetuning the intensity of $\Delta w$. Experiments on a collection of widely used source/target datasets manifest the capability of WeditGAN in generating realistic and diverse images, which is simple yet highly effective in the research area of few-shot image generation. Codes are available at https://github.com/Ldhlwh/WeditGAN.

翻訳日:2024-01-18 02:58:43 公開日:2024-01-14

# BPJDet:ジェネリックボディ部分関節検出のための拡張オブジェクト表現

BPJDet: Extended Object Representation for Generic Body-Part Joint Detection ( http://arxiv.org/abs/2304.10765v2 )

ライセンス: Link先を確認

Huayi Zhou, Fei Jiang, Jiaxin Si, Yue Ding, and Hongtao Lu

(参考訳) 人体とその部分の検出は集中的に研究されている。しかし、cnnsベースの検出器のほとんどは独立して訓練されており、検出された部品を身体と関連付けることが困難である。本稿では,人体とその部分の関節検出に焦点をあてる。具体的には,体部品の中心オフセットを統合した拡張オブジェクト表現を提案し,エンドツーエンドの汎用体部品関節検出器(BPJDet)を構築した。このように、ボディー・パート・アソシエーションは、意味的内容と幾何学的内容の両方を含む統一表現にきちんと埋め込まれている。したがって、マルチタスクを相乗的に扱うために、マルチロスを最適化することができる。さらに、この表現はアンカーベースおよびアンカーフリー検出器に適している。 BPJDetは、エラーを起こしやすいポストマッチングに悩まされず、スピードと精度のトレードオフを良好に保ちます。さらに、BPJDetは、ヒトまたは四肢動物の身体部分または身体部分を検出するために一般化することができる。 BPJDetの優位性を検証するため,体部(CityPersons,CrowdHuman,BodyHands)と体部(COCOHuman Parts,Animals5C)のデータセットについて実験を行った。 BPJDetは高い検出精度を維持しながら、すべてのデータセットで最先端のアソシエーションパフォーマンスを達成する。また, 高精度群頭検出とハンドコンタクト推定の2つの代表的な下流アプリケーションの性能向上により, 高度な身体関連能力の利点を示す。プロジェクトはhttps://hnuzhy.github.io/projects/bpjdetで入手できる。

Detection of human body and its parts has been intensively studied. However, most of CNNs-based detectors are trained independently, making it difficult to associate detected parts with body. In this paper, we focus on the joint detection of human body and its parts. Specifically, we propose a novel extended object representation integrating center-offsets of body parts, and construct an end-to-end generic Body-Part Joint Detector (BPJDet). In this way, body-part associations are neatly embedded in a unified representation containing both semantic and geometric contents. Therefore, we can optimize multi-loss to tackle multi-tasks synergistically. Moreover, this representation is suitable for anchor-based and anchor-free detectors. BPJDet does not suffer from error-prone post matching, and keeps a better trade-off between speed and accuracy. Furthermore, BPJDet can be generalized to detect body-part or body-parts of either human or quadruped animals. To verify the superiority of BPJDet, we conduct experiments on datasets of body-part (CityPersons, CrowdHuman and BodyHands) and body-parts (COCOHumanParts and Animals5C). While keeping high detection accuracy, BPJDet achieves state-of-the-art association performance on all datasets. Besides, we show benefits of advanced body-part association capability by improving performance of two representative downstream applications: accurate crowd head detection and hand contact estimation. Project is available in https://hnuzhy.github.io/projects/BPJDet.

翻訳日:2024-01-18 02:56:32 公開日:2024-01-14

# 非log-concave分布に対するMCMCアルゴリズムの高速条件混合

Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions ( http://arxiv.org/abs/2306.10506v2 )

ライセンス: Link先を確認

Xiang Cheng, Bohan Wang, Jingzhao Zhang, Yusong Zhu

(参考訳) MCMCアルゴリズムは、ターゲット分布$\pi(x) \propto \exp(-V(x))$からサンプリングするための経験的に効率的なツールを提供する。しかし理論側では、mcmcアルゴリズムは$\pi(x)$ が非log-concaveであるときに混合速度が遅い。我々の研究は、このギャップを検証し、ポアンカー型不等式が状態空間のサブセット$\mathcal{X}$に収まるとき、MCMC の条件分布は $\mathcal{X}$ より速く真の条件分布に混合することを示す。この高速混合保証は、グローバル混合が確実に遅い場合に保持することができる。ステートメントを形式化し,条件付き混合率を定量化する。さらに,条件付き混合はガウス型混合物のサンプリング,ガウス型混合モデルのパラメータ推定,局所的極小点のgibbsサンプリングに興味深い意味を持つことを示す。

MCMC algorithms offer empirically efficient tools for sampling from a target distribution $\pi(x) \propto \exp(-V(x))$. However, on the theory side, MCMC algorithms suffer from slow mixing rate when $\pi(x)$ is non-log-concave. Our work examines this gap and shows that when Poincar\'e-style inequality holds on a subset $\mathcal{X}$ of the state space, the conditional distribution of MCMC iterates over $\mathcal{X}$ mixes fast to the true conditional distribution. This fast mixing guarantee can hold in cases when global mixing is provably slow. We formalize the statement and quantify the conditional mixing rate. We further show that conditional mixing can have interesting implications for sampling from mixtures of Gaussians, parameter estimation for Gaussian mixture models and Gibbs-sampling with well-connected local minima.

翻訳日:2024-01-18 02:35:46 公開日:2024-01-14

# できるようにする、得る限りではない

Do as I can, not as I get ( http://arxiv.org/abs/2306.10345v2 )

ライセンス: Link先を確認

Shangfei Zheng, Hongzhi Yin, Tong Chen, Quoc Viet Hung Nguyen, Wei Chen, and Lei Zhao

(参考訳) 本稿では、シミュレーションデータ環境から貴重な情報をマイニングするためのTMRモデルを提案する。私たちはこの論文の提出を終えるつもりです。

This paper proposes a model called TMR to mine valuable information from simulated data environments. We intend to complete the submission of this paper.

翻訳日:2024-01-18 02:35:30 公開日:2024-01-14

# フォースを学べる:マルチオブジェクトビデオ生成におけるスパースモーション制御の実現

Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation ( http://arxiv.org/abs/2306.03988v2 )

ライセンス: Link先を確認

Aram Davtyan and Paolo Favaro

(参考訳) 本研究では,単一のフレームとスパース動作入力から映像を自動回帰生成する新しい教師なし手法を提案する。我々の訓練されたモデルは、目に見えない現実的なオブジェクト間相互作用を生成できる。私たちのモデルは、トレーニング中にシーン内の各オブジェクトの明示的なセグメンテーションと動きを与えられることはないが、それらのダイナミクスと範囲を暗黙的に分離することができる。本手法の重要な構成要素は, ランダム化条件付けスキーム, 入力動作制御の符号化, ランダム化およびスパースサンプリングであり, 分布域外への一般化を可能にする。ヨーダと呼ばれる我々のモデルは、物理的に触れることなく物体を動かすことができる。いくつかのデータセットの定性的・定量的な評価を通じて, YODAは, 制御性と映像品質の両面で, 先行研究の最先端技術と同等かそれ以上であることを示す。

We propose a novel unsupervised method to autoregressively generate videos from a single frame and a sparse motion input. Our trained model can generate unseen realistic object-to-object interactions. Although our model has never been given the explicit segmentation and motion of each object in the scene during training, it is able to implicitly separate their dynamics and extents. Key components in our method are the randomized conditioning scheme, the encoding of the input motion control, and the randomized and sparse sampling to enable generalization to out of distribution but realistic correlations. Our model, which we call YODA, has therefore the ability to move objects without physically touching them. Through extensive qualitative and quantitative evaluations on several datasets, we show that YODA is on par with or better than state of the art video generation prior work in terms of both controllability and video quality.

翻訳日:2024-01-18 02:32:20 公開日:2024-01-14

# 民話weisfeiler-lehmanによるグラフニューラルネットワークの設計空間の拡張

Extending the Design Space of Graph Neural Networks by Rethinking Folklore Weisfeiler-Lehman ( http://arxiv.org/abs/2306.03266v3 )

ライセンス: Link先を確認

Jiarui Feng, Lecheng Kong, Hao Liu, Dacheng Tao, Fuhai Li, Muhan Zhang, Yixin Chen

(参考訳) 近年、グラフニューラルネットワーク(GNN)の最も人気のあるフレームワークとして、メッセージパッシングニューラルネットワーク(MPNN)が登場している。しかし、その表現力は1次元のWeisfeiler-Lehman (1-WL) テストによって制限される。いくつかの作品は$k$-WL/FWL(Folklore WL)にインスパイアされ、対応するニューラルバージョンを設計する。表現力が高いにもかかわらず、この研究には深刻な制限がある。特に、(1)$k$-WL/FWL は少なくとも$O(n^k)$空間複雑性を必要とし、これは$k=3$; (2)$k$-WL/FWL の設計空間は厳密であり、唯一の調整可能なハイパーパラメータは$k$である。最初の制限に対処するために、$(k,t)$-FWLの拡張を提案する。理論的には、空間複雑性を$O(n^k)$ (任意の$k\geq 2$) in $(k,t)$-FWL に固定しても、グラフ同型問題を解くまで表現性階層を構築することができる。 2つ目の問題に取り組むために、全てのノードの代わりに任意の同変集合を隣人として考える$k$-FWL+を提案し、その結果、設計空間を$k$-FWLに拡大する。これら2つの修正を組み合わせると、柔軟性と強力なフレームワーク $(k,t)$-fwl+ が得られる。我々は、$(k,t)$-FWL+が、表現性にマッチする既存のモデルを実装することを実証する。次に、(k,t)$-FWL+ である Neighborhood$^2$-FWL (N$^2$-FWL) の例を導入する。 N$^2$-FWL は 3WL に劣らず強力であり、O(n^2)$空間のみを必要としながら多くの部分構造を符号化できる。最後に、N$^2$-GNNというニューラルバージョンを設計し、各種タスクの性能を評価する。 N$^2$-GNN は ZINC-Subset (0.059) で記録破りの結果を達成し、以前の SOTA の成績を 10.6% 上回った。さらに、N$^2$-GNNは、既存のすべての高表現性GNN手法の中でBRECデータセット(71.8%)で新しいSOTA結果を達成する。

Message passing neural networks (MPNNs) have emerged as the most popular framework of graph neural networks (GNNs) in recent years. However, their expressive power is limited by the 1-dimensional Weisfeiler-Lehman (1-WL) test. Some works are inspired by $k$-WL/FWL (Folklore WL) and design the corresponding neural versions. Despite the high expressive power, there are serious limitations in this line of research. In particular, (1) $k$-WL/FWL requires at least $O(n^k)$ space complexity, which is impractical for large graphs even when $k=3$; (2) The design space of $k$-WL/FWL is rigid, with the only adjustable hyper-parameter being $k$. To tackle the first limitation, we propose an extension, $(k,t)$-FWL. We theoretically prove that even if we fix the space complexity to $O(n^k)$ (for any $k\geq 2$) in $(k,t)$-FWL, we can construct an expressiveness hierarchy up to solving the graph isomorphism problem. To tackle the second problem, we propose $k$-FWL+, which considers any equivariant set as neighbors instead of all nodes, thereby greatly expanding the design space of $k$-FWL. Combining these two modifications results in a flexible and powerful framework $(k,t)$-FWL+. We demonstrate $(k,t)$-FWL+ can implement most existing models with matching expressiveness. We then introduce an instance of $(k,t)$-FWL+ called Neighborhood$^2$-FWL (N$^2$-FWL), which is practically and theoretically sound. We prove that N$^2$-FWL is no less powerful than 3-WL, and can encode many substructures while only requiring $O(n^2)$ space. Finally, we design its neural version named N$^2$-GNN and evaluate its performance on various tasks. N$^2$-GNN achieves record-breaking results on ZINC-Subset (0.059), outperforming previous SOTA results by 10.6%. Moreover, N$^2$-GNN achieves new SOTA results on the BREC dataset (71.8%) among all existing high-expressive GNN methods.

翻訳日:2024-01-18 02:32:04 公開日:2024-01-14

# ParameterNet:パラメータがすべて必要である

ParameterNet: Parameters Are All You Need ( http://arxiv.org/abs/2306.14525v2 )

ライセンス: Link先を確認

Kai Han, Yunhe Wang, Jianyuan Guo, Enhua Wu

(参考訳) 大規模視覚前訓練は、大規模視覚モデルの性能を大幅に向上させる。しかし、既存の低FLOPsモデルでは大規模な事前学習の恩恵を受けられないという「emph{low FLOPs pitfall」を観察する。本稿では,大規模視覚前訓練モデルのパラメータ数を増加させながらフラップの増加を最小限に抑えることを目的とした,parameternetと呼ばれる新しい設計原理を提案する。我々は動的畳み込みを利用して,フラップの限界上昇のみを伴い,ネットワークに追加パラメータを組み込む。 parameternetアプローチにより、低flopsネットワークは大規模なビジュアルプリトレーニングを活用できる。さらに,パラメータネットの概念を言語領域に拡張し,推論速度を保ちながら推論結果を向上する。大規模imagenet-22k実験では,パラメータネットスキームの優位性が示された。たとえばパラメータNet-600Mは、広く使われているSwin Transformer(81.6\% \emph{vs)よりもImageNetの方が精度が高い。 80.9\%) であり、より低いフロップ(0.6g \emph{vs)を持つ。 4.5g)である。言語領域では、パラメータNetで強化されたLLaMA-1Bは、バニラLLaMAよりも2倍高い精度を達成する。コードは \url{https://parameternet.github.io/} でリリースされる。

The large-scale visual pretraining has significantly improve the performance of large vision models. However, we observe the \emph{low FLOPs pitfall} that the existing low-FLOPs models cannot benefit from large-scale pretraining. In this paper, we introduce a novel design principle, termed ParameterNet, aimed at augmenting the number of parameters in large-scale visual pretraining models while minimizing the increase in FLOPs. We leverage dynamic convolutions to incorporate additional parameters into the networks with only a marginal rise in FLOPs. The ParameterNet approach allows low-FLOPs networks to take advantage of large-scale visual pretraining. Furthermore, we extend the ParameterNet concept to the language domain to enhance inference results while preserving inference speed. Experiments on the large-scale ImageNet-22K have shown the superiority of our ParameterNet scheme. For example, ParameterNet-600M can achieve higher accuracy on ImageNet than the widely-used Swin Transformer (81.6\% \emph{vs.} 80.9\%) and has much lower FLOPs (0.6G \emph{vs.} 4.5G). In the language domain, LLaMA-1B enhanced with ParameterNet achieves 2\% higher accuracy over vanilla LLaMA. The code will be released at \url{https://parameternet.github.io/}.

翻訳日:2024-01-18 02:22:04 公開日:2024-01-14

# 深部ニューラルネットワークを用いた低コスト赤外線カメラによる温度推定

Estimating temperatures with low-cost infrared cameras using deep neural networks ( http://arxiv.org/abs/2307.12130v2 )

ライセンス: Link先を確認

Navot Oz, Nir Sochen, David Mendelovich, Iftach Klapp

(参考訳) 低コストのサーマルカメラは不正確(通常$\pm 3^\circ C$)で、検出器全体で空間変動の非均一性を持つ。不正確さと不均一さは、カメラの周囲温度に依存する。この研究の目的は、低コストの赤外線カメラで温度を推定し、不均一性を補正することであった。環境温度を考慮した非均一性シミュレータを開発した。カメラの物理モデルと周囲カメラ温度の両方を組み込んだエンドツーエンドニューラルネットワークが導入された。ニューラルネットワークは、シミュレーションされた非一様性データを用いて訓練され、物体の温度を推定し、カメラ自体によって測定された単一の画像と周囲温度のみを用いて不均一性を補正した。提案手法は, 従来よりも平均温度誤差を最大0.5^\circ C$で改善した。さらに、カメラの物理モデルをネットワークに制約することで、追加で0.1^\circ C$の誤差を下げることができた。検証データセットの平均温度誤差は0.37^\circ C$であった。この手法はフィールド内の実データに基づいて検証し,等価な結果を得た。

Low-cost thermal cameras are inaccurate (usually $\pm 3^\circ C$) and have space-variant nonuniformity across their detector. Both inaccuracy and nonuniformity are dependent on the ambient temperature of the camera. The goal of this work was to estimate temperatures with low-cost infrared cameras, and rectify the nonuniformity. A nonuniformity simulator that accounts for the ambient temperature was developed. An end-to-end neural network that incorporates both the physical model of the camera and the ambient camera temperature was introduced. The neural network was trained with the simulated nonuniformity data to estimate the object's temperature and correct the nonuniformity, using only a single image and the ambient temperature measured by the camera itself. Results of the proposed method significantly improved the mean temperature error compared to previous works by up to $0.5^\circ C$. In addition, constraining the physical model of the camera with the network lowered the error by an additional $0.1^\circ C$. The mean temperature error over an extensive validation dataset was $0.37^\circ C$. The method was verified on real data in the field and produced equivalent results.

翻訳日:2024-01-18 02:10:59 公開日:2024-01-14

# CS-Mixer:空間-チャネル混合を用いた大規模視覚MLPモデル

CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing ( http://arxiv.org/abs/2308.13363v2 )

ライセンス: Link先を確認

Jonathan Cui, David A. Araujo, Suman Saha, Md. Faisal Kabir

(参考訳) Vision TransformersやConvolutional Neural Networksに比べて情報融合設計はシンプルだが、Vision MLPアーキテクチャは最近の研究で高い性能と高いデータ効率を示している。しかし、cyclemlpやvision permutatorのような既存の作品は、通常等サイズの空間領域における空間情報をモデル化し、クロススケールな空間的相互作用を考慮しない。さらに、トークンミキサーは1軸または2軸の相関のみをモデル化し、3軸の空間チャネル混合を避ける。そこで我々は,空間チャネル混合のための動的低ランク変換を局所的および大域的集約を通じて学習する階層型視覚MLPCS-Mixerを提案する。提案手法は,画像認識ベンチマークにおいて,計算量を大幅に増やさずに競合する結果を得る。我々の最大のモデルであるCS-Mixer-Lは、13.7 GFLOPと94Mパラメータを持つImageNet-1kで83.2%の精度に達した。

Despite their simpler information fusion designs compared with Vision Transformers and Convolutional Neural Networks, Vision MLP architectures have demonstrated strong performance and high data efficiency in recent research. However, existing works such as CycleMLP and Vision Permutator typically model spatial information in equal-size spatial regions and do not consider cross-scale spatial interactions. Further, their token mixers only model 1- or 2-axis correlations, avoiding 3-axis spatial-channel mixing due to its computational demands. We therefore propose CS-Mixer, a hierarchical Vision MLP that learns dynamic low-rank transformations for spatial-channel mixing through cross-scale local and global aggregation. The proposed methodology achieves competitive results on popular image recognition benchmarks without incurring substantially more compute. Our largest model, CS-Mixer-L, reaches 83.2% top-1 accuracy on ImageNet-1k with 13.7 GFLOPs and 94 M parameters.

翻訳日:2024-01-18 02:01:25 公開日:2024-01-14

# FoX:マルチエージェント強化学習における構成認識探索

FoX: Formation-aware exploration in multi-agent reinforcement learning ( http://arxiv.org/abs/2308.11272v2 )

ライセンス: Link先を確認

Yonghyeon Jo, Sunwoo Lee, Junghyuk Yeom, Seungyul Han

(参考訳) 近年,様々な協調型マルチエージェントタスクの成功により,マルチエージェント強化学習(marl)が注目されている。しかし、MARLではエージェントの部分的な観測可能性や、エージェントの数が増加するにつれて指数関数的に増加する探索空間が問題となっている。まず,探索空間のスケーラビリティ問題に対処するため,探索空間における構成に基づく等価性関係を定義し,異なる構成の有意義な状態のみを探索することによって探索空間の縮小を目指す。そこで本研究では, 様々な形態において, 部分的に観察可能なエージェントに対して, 自らの観察に基づいてのみ, 現形成をよく認識するように指導する, 新たな形成認識探索(fox)フレームワークを提案する。 Google Research Football (GRF) とSparse Starcraft II multi-agent Challenge (SMAC) のタスクにおいて,提案したFoXフレームワークは最先端のMARLアルゴリズムよりも大幅に優れていた。

Recently, deep multi-agent reinforcement learning (MARL) has gained significant popularity due to its success in various cooperative multi-agent tasks. However, exploration still remains a challenging problem in MARL due to the partial observability of the agents and the exploration space that can grow exponentially as the number of agents increases. Firstly, in order to address the scalability issue of the exploration space, we define a formation-based equivalence relation on the exploration space and aim to reduce the search space by exploring only meaningful states in different formations. Then, we propose a novel formation-aware exploration (FoX) framework that encourages partially observable agents to visit the states in diverse formations by guiding them to be well aware of their current formation solely based on their own observations. Numerical results show that the proposed FoX framework significantly outperforms the state-of-the-art MARL algorithms on Google Research Football (GRF) and sparse Starcraft II multi-agent challenge (SMAC) tasks.

翻訳日:2024-01-18 02:01:08 公開日:2024-01-14

# 量子電子商取引の実験

Experimental quantum e-commerce ( http://arxiv.org/abs/2308.08821v2 )

ライセンス: Link先を確認

Xiao-Yu Cao, Bing-Hong Li, Yang Wang, Yao Fu, Hua-Lei Yin, Zeng-Bing Chen

(参考訳) インターネット上で高い頻度で発生する取引の一種であるeコマースは、長距離のメッセージの完全性、認証、非送還性を保証する必要がある。現行の電子商取引スキームは計算攻撃に弱いため、量子暗号は敵の弁明と偽造に対する情報理論上のセキュリティを保証するため、この問題に対する解決策を提供する。しかし、一般に量子解は古典解よりもずっと低い性能を持つ。さらに、不完全なデバイスを考える場合、量子スキームの性能は大幅に低下する。ここでは,まず,不完全なデバイスからの攻撃に対する抵抗を示す量子電子商取引方式を提案することで,契約の締結と3つの当事者間の支払いを伴うeコマースプロセス全体を実証する。その結果,参加者間の最大減衰率25dBでは,約0.428メガビットの合意サイズに対して,毎秒0.82倍の署名率が得られることがわかった。本提案手法は,電子商取引に情報理論的なセキュリティを提供するための有望なソリューションである。

E-commerce, a type of trading that occurs at a high frequency on the Internet, requires guaranteeing the integrity, authentication and non-repudiation of messages through long distance. As current e-commerce schemes are vulnerable to computational attacks, quantum cryptography, ensuring information-theoretic security against adversary's repudiation and forgery, provides a solution to this problem. However, quantum solutions generally have much lower performance compared to classical ones. Besides, when considering imperfect devices, the performance of quantum schemes exhibits a significant decline. Here, for the first time, we demonstrate the whole e-commerce process of involving the signing of a contract and payment among three parties by proposing a quantum e-commerce scheme, which shows resistance of attacks from imperfect devices. Results show that with a maximum attenuation of 25 dB among participants, our scheme can achieve a signature rate of 0.82 times per second for an agreement size of approximately 0.428 megabit. This proposed scheme presents a promising solution for providing information-theoretic security for e-commerce.

翻訳日:2024-01-18 01:59:22 公開日:2024-01-14

# Gottesman-Kitaev-Preskill Codesによるボソニック量子誤差補正の進歩:理論・工学・応用

Advances in Bosonic Quantum Error Correction with Gottesman-Kitaev-Preskill Codes: Theory, Engineering and Applications ( http://arxiv.org/abs/2308.02913v3 )

ライセンス: Link先を確認

Anthony J. Brady, Alec Eickbusch, Shraddha Singh, Jing Wu and Quntao Zhuang

(参考訳) 量子情報を一組の高調波発振器に符号化することは、信頼性のある量子情報処理のためのノイズを軽減するためのハードウェア効率の良い手法と考えられる。量子ビットを発振器(猫符号、二項符号、gottesman-kitaev-preskill (gkp)符号など)にエンコードする様々な符号が提案されており、量子誤り訂正のために最初にブレークプラネット点に達したコードの一つである。 GKP符号は量子計算における約束によって広く認識されているが、ボソニックチャネルにおける準最適量子通信速度を促進し、発振器の任意の量子状態を保護する能力を提供する。本稿では、超伝導回路アーキテクチャの最近の実験的進歩とマルチモードGKP量子ビット符号および発振器からオシレータ(O2O)符号の理論的進歩を強調したGKP符号の基本動作機構、性能評価、および多くの応用について述べる。まず、ボソニック符号に必要な事前の連続変数形式から始める。次に、GKP状態の物理的実現に関わる量子工学に進む。本稿では,超伝導アーキテクチャにおけるGKP安定化と準備について深く掘り下げ,光領域におけるGKP状態を実現するための提案について検討する。最後に、マルチモードGKP量子ビットとGKP-O2O符号を示し、コード性能を調べ、計算、通信、センシングなどの量子情報処理タスクにおけるGKP符号の適用について議論する。

Encoding quantum information into a set of harmonic oscillators is considered a hardware efficient approach to mitigate noise for reliable quantum information processing. Various codes have been proposed to encode a qubit into an oscillator -- including cat codes, binomial codes and Gottesman-Kitaev-Preskill (GKP) codes -- and are among the first to reach a break-even point for quantum error correction. Though GKP codes are widely recognized for their promise in quantum computation, they also facilitate near-optimal quantum communication rates in bosonic channels and offer the ability to safeguard arbitrary quantum states of oscillators. This review focuses on the basic working mechanism, performance characterization, and the many applications of GKP codes -- emphasizing recent experimental progress in superconducting circuit architectures and theoretical advancements in multimode GKP qubit codes and oscillators-to-oscillators (O2O) codes. We begin with a preliminary continuous-variable formalism needed for bosonic codes. We then proceed to the quantum engineering involved to physically realize GKP states. We take a deep dive into GKP stabilization and preparation in superconducting architectures and examine proposals for realizing GKP states in the optical domain (along with a concise review of GKP realization in trapped-ion platforms). Finally, we present multimode GKP qubits and GKP-O2O codes, examine code performance and discuss applications of GKP codes in quantum information processing tasks such as computing, communication, and sensing.

翻訳日:2024-01-18 01:56:53 公開日:2024-01-14

# アナログ量子シミュレーションにおけるアルゴリズム誤差の最適化

Optimization of Algorithmic Errors in Analog Quantum Simulations ( http://arxiv.org/abs/2308.02642v2 )

ライセンス: Link先を確認

Nikita A. Zemlevskiy, Henry F. Froland, Stephan Caspar

(参考訳) アナログ量子シミュレーションは、多体実時間力学のような古典的到達不能な物理学を解明するための強力なツールとして登場している。現代機器のシミュレーションを用いて正確な予測を行うためには,不確実性の完全定量化が必要である。したがって、シミュレーションのパラメータに対するデバイス固有の物理的制約を理解する必要がある。本解析は,実世界のデバイス制約による近似時間変化のシミュレーションから生じる誤差の相互作用を考察する。これらの誤差はイジング・ハミルトンによって記述されたアナログ量子デバイス上のハイゼンベルク型システムで研究される。これらの誤差を定量化するための一般的なフレームワークが提案され、トロッターライクな手法やフロケエンジニアリングによる定数場アプローチなど、いくつかの時間発展手法に適用されている。現状のデバイスによる時間発展手法の精度に関する限界について考察する。異なるエラーソースのコヒーレント効果のスケーリングの特徴付けは、提示されるハミルトニアンのエンジニアリング手法を拡張して、今後のデバイス機能を活用する方法を提供する。

Analog quantum simulation is emerging as a powerful tool for uncovering classically unreachable physics such as many-body real-time dynamics. A complete quantification of uncertainties is necessary in order to make precise predictions using simulations on modern-day devices. Therefore, the inherent physical limitations of the device on the parameters of the simulation must be understood. This analysis examines the interplay of errors arising from simulation of approximate time evolution with those due to practical, real-world device constraints. These errors are studied in Heisenberg-type systems on analog quantum devices described by the Ising Hamiltonian. A general framework for quantifying these errors is introduced and applied to several proposed time evolution methods, including Trotter-like methods and Floquet-engineered constant-field approaches. The limitations placed on the accuracy of time evolution methods by current devices are discussed. Characterization of the scaling of coherent effects of different error sources provides a way to extend the presented Hamiltonian engineering methods to take advantage of forthcoming device capabilities.

翻訳日:2024-01-18 01:56:22 公開日:2024-01-14

# ブレイド統計を超える: 1次元に固有の交換統計を持つ任意のオンに対する格子モデルの構築

Beyond braid statistics: Constructing a lattice model for anyons with exchange statistics intrinsic to one dimension ( http://arxiv.org/abs/2309.04358v3 )

ライセンス: Link先を確認

Sebastian Nagies, Botao Wang, A.C. Knapp, Andr\'e Eckardt, and N.L. Harshman

(参考訳) 分数交換統計に従うものは2次元に自然に現れる: ハードコアの2体制約により、粒子の構成空間は単純に連結ではない。ブレイド群は、位相的に同値な交換経路がアーベル素数の非自明な幾何学的位相にどのように関連付けられるかを記述する。ブレイド・アニオン交換統計は1次元(1D)でも見られるが、2つのエノンが交換する異なる方法を区別するためには、ガリレオ不変性を欠く必要がある。しかし近年、ハードコアの3体制約によって構成空間が単純に連結されないため、交換統計の代替形式が1Dで発生することが示されている。ブレイド群の代わりに、交換経路の位相とその付随する非自明な幾何学的位相はトレイド群によって記述される。本稿では、この交換統計の代替形式を実現する最初の具体的モデルを提案する。数依存性ピアール位相を持つ所望の幾何学的位相を実装するボソニック格子モデルから始まり、ハミルトニアンの運動エネルギー項がそれらに関して局所的かつ二次的になるように、エノニック作用素を定義する。このtid-anyon-hubbardモデルの基底状態は、ボソンとフェルミオンの間の交換統計の中間のいくつかの兆候と、緊急に近似したハルダン排他統計の兆候を示している。連続極限は、以前に構築されたトレイド・エノンの連続波動関数に対応する固有状態を持つガリレオ不変ハミルトニアンをもたらす。これは格子モデルの非直交的正当性を提供するだけでなく、我々の構成がトロイド・エノン(すなわち1Dに固有のもの)に対する直感的なアプローチであることを示す。

Anyons obeying fractional exchange statistics arise naturally in two dimensions: hard-core two-body constraints make the configuration space of particles not simply-connected. The braid group describes how topologically-inequivalent exchange paths can be associated to non-trivial geometric phases for abelian anyons. Braid-anyon exchange statistics can also be found in one dimension (1D), but this requires broken Galilean invariance to distinguish different ways for two anyons to exchange. However, recently it was shown that an alternative form of exchange statistics can occur in 1D because hard-core three-body constraints also make the configuration space not simply-connected. Instead of the braid group, the topology of exchange paths and their associated non-trivial geometric phases are described by the traid group. In this article we propose a first concrete model realizing this alternative form of anyonic exchange statistics. Starting from a bosonic lattice model that implements the desired geometric phases with number-dependent Peierls phases, we then define anyonic operators so that the kinetic energy term in the Hamiltonian becomes local and quadratic with respect to them. The ground-state of this traid-anyon-Hubbard model exhibits several indications of exchange statistics intermediate between bosons and fermions, as well as signs of emergent approximate Haldane exclusion statistics. The continuum limit results in a Galilean invariant Hamiltonian with eigenstates that correspond to previously constructed continuum wave functions for traid anyons. This provides not only an a-posteriori justification of our lattice model, but also shows that our construction serves as an intuitive approach to traid anyons, i.e. anyons intrinsic to 1D.

翻訳日:2024-01-18 01:49:33 公開日:2024-01-14

# 後中等教育におけるテキスト応答の自動評価:体系的レビュー

Automatic assessment of text-based responses in post-secondary education: A systematic review ( http://arxiv.org/abs/2308.16151v2 )

ライセンス: Link先を確認

Rujun Gao, Hillary E. Merzdorf, Saira Anwar, M. Cynthia Hipwell, Arun Srinivasa

(参考訳) 学術的形式的・要約的評価におけるテキストベースのオープンエンド質問は、学生が深層学習者になり、後続の概念的評価の概念を理解する準備をするのに役立つ。しかし、テキストベースの質問、特に大きなコースでは、インストラクターにとって退屈で時間がかかります。テキスト処理モデルは、人工知能(AI)ツールと自然言語処理(NLP)アルゴリズムの急速な開発で進歩を続けている。特にLarge Language Models (LLM) のブレークスルーの後、教育におけるテキストベースの反応の迅速な評価とフィードバックを自動化する大きな可能性がある。本研究は,PRISMAプロセスに基づく学術・再現可能な文献検索戦略を採用し,第2次教育後におけるテキストベース自動評価システムの研究,838論文のスクリーニング,93研究の合成を行う。近年,テキストベースの自動評価システムが教育にどのように発展・適用されているかを理解するために,3つの研究課題が検討されている。システム入力と出力,研究モチベーション,研究成果など,研究課題への回答を目的とした包括的枠組みに基づいて,すべての研究を要約し,分類する。さらに, 本研究における自動評価システム, 研究方法, 応用領域の典型的研究を概説し, 要約した。この体系的なレビューは、高等教育におけるテキストベースアセスメントを支援する最新のAI/NLP開発を理解するために、テキストベースアセスメントシステムの最近の教育応用の概要を提供する。発見は、ChatGPTのようなLLMを教育活動に取り入れた研究者や教育者にとって特に有益である。

Text-based open-ended questions in academic formative and summative assessments help students become deep learners and prepare them to understand concepts for a subsequent conceptual assessment. However, grading text-based questions, especially in large courses, is tedious and time-consuming for instructors. Text processing models continue progressing with the rapid development of Artificial Intelligence (AI) tools and Natural Language Processing (NLP) algorithms. Especially after breakthroughs in Large Language Models (LLM), there is immense potential to automate rapid assessment and feedback of text-based responses in education. This systematic review adopts a scientific and reproducible literature search strategy based on the PRISMA process using explicit inclusion and exclusion criteria to study text-based automatic assessment systems in post-secondary education, screening 838 papers and synthesizing 93 studies. To understand how text-based automatic assessment systems have been developed and applied in education in recent years, three research questions are considered. All included studies are summarized and categorized according to a proposed comprehensive framework, including the input and output of the system, research motivation, and research outcomes, aiming to answer the research questions accordingly. Additionally, the typical studies of automated assessment systems, research methods, and application domains in these studies are investigated and summarized. This systematic review provides an overview of recent educational applications of text-based assessment systems for understanding the latest AI/NLP developments assisting in text-based assessments in higher education. Findings will particularly benefit researchers and educators incorporating LLMs such as ChatGPT into their educational activities.

翻訳日:2024-01-18 01:46:06 公開日:2024-01-14

# 量子は爆破グラフ上を歩く

Quantum walks on blow-up graphs ( http://arxiv.org/abs/2308.13887v2 )

ライセンス: Link先を確認

Bikash Bhattacharjya, Hermie Monterde, Hiranmoy Pal

(参考訳) グラフ $G$ の$n$コピーのブローアップは、$G$ のすべての頂点を独立したサイズの集合 $n$ に置き換えることで得られるグラフ $\overset{n}\uplus~G$ である。我々の目標は、$\overset{n}\uplus~g$ で表される量子系の時間に依存しないハミルトニアンとして随伴行列を取る、ブローアップグラフ $\overset{n}\uplus~g$ 上の量子状態遷移の存在を調べることである。特に,爆発グラフにおける頂点の必要十分条件を定め,強いコスペクトル性を示すとともに,周期性,完全状態移動(PST),かなりよい状態移動(PGST)など,様々な種類の高確率量子輸送を示す。すると、$\overset{n}\uplus~G$ が PST や PGST を許すなら、$n=2.$ でなければならない。さらに、$G$ が可逆な隣接行列を持つなら、$\overset{2}\uplus~G$ のすべての頂点が、強いコスペクタリティを示すために一意の頂点と組むことを示す。この結果を用いて, pst と pgst をブローアップするグラフの無限族を決定する。

A blow-up of $n$ copies of a graph $G$ is the graph $\overset{n}\uplus~G$ obtained by replacing every vertex of $G$ by an independent set of size $n$, where the copies of vertices in $G$ are adjacent in the blow-up if and only if the vertices adjacent in $G$. Our goal is to investigate the existence of quantum state transfer on a blow-up graph $\overset{n}\uplus~G$, where the adjacency matrix is taken to be the time-independent Hamiltonian of the quantum system represented by $\overset{n}\uplus~G$. In particular, we establish necessary and sufficient conditions for vertices in a blow-up graph to exhibit strong cospectrality and various types of high probability quantum transport, such as periodicity, perfect state transfer (PST) and pretty good state transfer (PGST). It turns out, if $\overset{n}\uplus~G$ admits PST or PGST, then one must have $n=2.$ Moreover, if $G$ has an invertible adjacency matrix, then we show that every vertex in $\overset{2}\uplus~G$ pairs up with a unique vertex to exhibit strong cospectrality. We then apply our results to determine infinite families of graphs whose blow-ups admit PST and PGST.

翻訳日:2024-01-18 01:44:22 公開日:2024-01-14

# 低次元多様体上のポリシー最適化のための神経政策ミラー降下のサンプル複雑性

Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds ( http://arxiv.org/abs/2309.13915v2 )

ライセンス: Link先を確認

Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao

(参考訳) ディープニューラルネットワークを備えたポリシ勾配法は,高次元強化学習(RL)問題を解く上で大きな成功を収めている。しかし、現在の分析ではなぜそれが次元の呪いに抵抗しているのかは説明できない。本研究では,深部畳み込みニューラルネットワーク(CNN)を用いたNPMDアルゴリズムのサンプル複雑性について検討する。多くの高次元環境が、像を状態とするような低次元構造を持つ状態空間を持つという経験的観察に動機づけられ、状態空間は、内在次元 $d\ll d$ を持つ$d$次元ユークリッド空間に埋め込まれた$d$次元多様体であると考える。 NPMDの各イテレーションにおいて、値関数とポリシーの両方がCNNによって適切に近似可能であることを示す。近似誤差はネットワークのサイズによって制御され、前のネットワークの滑らかさを継承することができる。その結果、ネットワークサイズとハイパーパラメータを適切に選択することで、npmdは、期待値の$\widetilde{o}(\epsilon^{-\frac{d}{\alpha}-2})$サンプルを持つ$\epsilon$-optimalポリシーを見つけることができ、ここで$\alpha\in(0,1]$は環境の滑らかさを示す。これまでの研究と比較すると,NPMDは状態空間の低次元構造を利用して次元性の呪いから逃れることができ,深い政策勾配アルゴリズムの有効性が説明できる。

Policy gradient methods equipped with deep neural networks have achieved great success in solving high-dimensional reinforcement learning (RL) problems. However, current analyses cannot explain why they are resistant to the curse of dimensionality. In this work, we study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN). Motivated by the empirical observation that many high-dimensional environments have state spaces possessing low-dimensional structures, such as those taking images as states, we consider the state space to be a $d$-dimensional manifold embedded in the $D$-dimensional Euclidean space with intrinsic dimension $d\ll D$. We show that in each iteration of NPMD, both the value function and the policy can be well approximated by CNNs. The approximation errors are controlled by the size of the networks, and the smoothness of the previous networks can be inherited. As a result, by properly choosing the network size and hyperparameters, NPMD can find an $\epsilon$-optimal policy with $\widetilde{O}(\epsilon^{-\frac{d}{\alpha}-2})$ samples in expectation, where $\alpha\in(0,1]$ indicates the smoothness of environment. Compared to previous work, our result exhibits that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality, explaining the efficacy of deep policy gradient algorithms.

翻訳日:2024-01-18 01:36:51 公開日:2024-01-14

# CB-Whisper:Open-Vocabulary Keyword-Spotting を用いたコンテキストバイザWhisper

CB-Whisper: Contextual Biasing Whisper using Open-Vocabulary Keyword-Spotting ( http://arxiv.org/abs/2309.09552v2 )

ライセンス: Link先を確認

Yuang Li, Yinglu Li, Min Zhang, Chang Su, Mengxin Ren, Xiaosong Qiao, Xiaofeng Zhao, Mengyao Piao, Jiawei Yu, Xinglin Lv, Miaomiao Ma, Yanqing Zhao, Hao Yang

(参考訳) エンド・ツー・エンド自動音声認識(asr)システムは、個人名、組織、用語など、トレーニングデータにあまり遭遇しない珍しい名前のエンティティを認識するのに苦労することが多い。本稿では,openai のwhisper モデルに基づく新しいasrシステムである context biasing whisper (cb-whisper) を提案する。認識されたエンティティは、whisperデコーダのプロンプトとして使用される。まず,OV-KWS タスクと ASR タスクを併用したマルチタスク学習手法を提案する。実験により,中国語のAishellホットワードサブセットと2つの内部コードスウィッチテストセットのWhisperモデルと比較して,エンティティリコールを大幅に改善した。しかし,大惨事による内部テストセットにおける混合エラーレート(mer)の増加がみられた。そこで本研究では, ov-kwsを別モジュールとして使用し, 幻覚を防止すべく, 発声型プロンプトを構築することを提案する。 OV-KWSモジュールは、小さめ、中、大型モデルのMERとEntity Recallを一貫して改善する。

End-to-end automatic speech recognition (ASR) systems often struggle to recognize rare name entities, such as personal names, organizations, and terminologies not frequently encountered in the training data. This paper presents Contextual Biasing Whisper (CB-Whisper), a novel ASR system based on OpenAI's Whisper model that can recognize user-defined name entities by performing open-vocabulary keyword-spotting (OV-KWS) using the hidden states of Whisper encoder. The recognized entities are used as prompts for the Whisper decoder. We first propose a multitask training approach with OV-KWS and ASR tasks to optimize the model. Experiments show that this approach substantially improves the entity recalls compared to the original Whisper model on Chinese Aishell hot word subsets and two internal code-switch test sets. However, we observed a slight increase in mixed-error-rate (MER) on internal test sets due to catastrophic forgetting. To address this problem and use different sizes of the Whisper model without finetuning, we propose to use OV-KWS as a separate module and construct a spoken form prompt to prevent hallucination. The OV-KWS module consistently improves MER and Entity Recall for whisper-small, medium, and large models.

翻訳日:2024-01-18 01:34:11 公開日:2024-01-14

# 睡眠ステージの透明性:モデル解釈可能性を考慮した脳波睡眠ステージ分類のための深層学習法

Transparency in Sleep Staging: Deep Learning Method for EEG Sleep Stage Classification with Model Interpretability ( http://arxiv.org/abs/2309.07156v4 )

ライセンス: Link先を確認

Shivam Sharma, Suvadeep Maiti, S. Mythirayee, Srijithesh Rajendran, Raju Surampudi Bapi

(参考訳) 単チャンネル脳波を用いた睡眠ステージの自動分類は睡眠品質評価と障害診断にとって重要なツールである。しかし、この信号に固有の複雑さと変動性をモデル化することは難しい課題であり、臨床における実用性と有効性を制限する。これらの課題を緩和するために、残余ネットワーク内に圧縮ブロックと励起ブロックを統合して特徴を抽出し、複雑な時間的依存関係を理解するために積み重ねたBi-LSTM(Deep-to-end Deep Learning)モデルを提案する。本研究の特筆すべき側面は、睡眠ステージングのためのGradCamの適応であり、この領域における説明可能なDLモデルの最初の事例であり、その決定と睡眠専門家の洞察の一致である。公開データセット(SleepEDF-20,SleepEDF-78,SHHS)を用いて,Macro-F1スコアが82.5,78.9,81.9であった。さらに、ストライドサイズの増大により、新たなトレーニング効率向上戦略が実施され、パフォーマンスへの影響を最小限に抑えながら、トレーニング時間を8倍に短縮した。比較分析は,本モデルが既存のすべてのベースラインより優れており,臨床応用の可能性を示している。

Automated Sleep stage classification using raw single channel EEG is a critical tool for sleep quality assessment and disorder diagnosis. However, modelling the complexity and variability inherent in this signal is a challenging task, limiting their practicality and effectiveness in clinical settings. To mitigate these challenges, this study presents an end-to-end deep learning (DL) model which integrates squeeze and excitation blocks within the residual network to extract features and stacked Bi-LSTM to understand complex temporal dependencies. A distinctive aspect of this study is the adaptation of GradCam for sleep staging, marking the first instance of an explainable DL model in this domain with alignment of its decision-making with sleep expert's insights. We evaluated our model on the publically available datasets (SleepEDF-20, SleepEDF-78, and SHHS), achieving Macro-F1 scores of 82.5, 78.9, and 81.9, respectively. Additionally, a novel training efficiency enhancement strategy was implemented by increasing stride size, leading to 8x faster training times with minimal impact on performance. Comparative analyses underscore our model outperforms all existing baselines, indicating its potential for clinical usage.

翻訳日:2024-01-18 01:33:14 公開日:2024-01-14

# 注意パラダイムを超越する:地理空間ソーシャルメディアデータからの表現学習

Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media Data ( http://arxiv.org/abs/2310.05378v3 )

ライセンス: Link先を確認

Nick DiSanto, Anthony Corso, Benjamin Sanders, Gavin Harding

(参考訳) トランスフォーマーは、言語モデリングの基盤として注目駆動アーキテクチャを開拓してきたが、文脈情報への依存は、テキストのテーマを暗黙的に学習する能力の限界を浮き彫りにした。本研究では,分散パターンの源泉としてソーシャルメディアデータを調べることで,パフォーマンスベンチマークのヒューリスティックパラダイムに挑戦する。複雑な長期的な依存関係を捉えるネットワークとは対照的に、オンラインデータのモデルは本質的に構造を欠き、集約内の潜在構造を検出せざるを得ない。この研究は、これらの抽象的な関係を適切に表現するために、実験的なソーシャルメディアコーパスを要素的要素に分類し、人口密度の高い場所で20億以上のツイートを分析します。我々は各都市固有の単語の埋め込みを作成し、それぞれの表現を比較する。これは、ノイズの多いデータにもかかわらず、地理的な場所はオンライン通信に大きな影響を与え、隠れた洞察は高度なアルゴリズムの欠如なしに発見できることを示している。この証拠は、社会科学において貴重な地理空間的含意を示し、複雑なモデルが自然言語におけるパターン認識の前提条件であるという考えに挑戦する。これは、抽象的な理解よりも絶対的な解釈可能性を受け入れることに疑問を呈し、洗練されたフレームワークと無形関係の間の隔たりを橋渡しする進化途上の状況と一致する。

While transformers have pioneered attention-driven architectures as a cornerstone of language modeling, their dependence on explicitly contextual information underscores limitations in their abilities to tacitly learn overarching textual themes. This study challenges the heuristic paradigm of performance benchmarking by investigating social media data as a source of distributed patterns. In stark contrast to networks that rely on capturing complex long-term dependencies, models of online data inherently lack structure and are forced to detect latent structures in the aggregate. To properly represent these abstract relationships, this research dissects empirical social media corpora into their elemental components, analyzing over two billion tweets across population-dense locations. We create Bag-of-Word embedding specific to each city and compare their respective representations. This finds that even amidst noisy data, geographic location has a considerable influence on online communication, and that hidden insights can be uncovered without the crutch of advanced algorithms. This evidence presents valuable geospatial implications in social science and challenges the notion that intricate models are prerequisites for pattern recognition in natural language. This aligns with the evolving landscape that questions the embrace of absolute interpretability over abstract understanding and bridges the divide between sophisticated frameworks and intangible relationships.

翻訳日:2024-01-18 01:23:05 公開日:2024-01-14

# クロスモーダル検索のためのプロトタイプベースアレエータ不確かさ定量化

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval ( http://arxiv.org/abs/2309.17093v3 )

ライセンス: Link先を確認

Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, Heng Tao Shen

(参考訳) クロスモーダル検索手法は、共通表現空間を共同学習することにより、視覚と言語モダリティの類似性関係を構築する。しかし、この予測は、腐敗した画像、速いペースの動画、未詳のテキストなど、低品質のデータによって引き起こされるアリータティックな不確実性によって、しばしば信頼性が低下する。本稿では,不確実性から生じる不確かさを定量化することにより,信頼性の高い予測を実現するための新しいプロトタイプベースアレエータ型不確実性定量化(pau)フレームワークを提案する。具体的には、セマンティクス部分空間全体を表現するために、まず様々な学習可能なプロトタイプを各モダリティ向けに構築する。次に、デンプスター・シェーファー理論と主観論理理論を用いて、証拠とディリクレ分布パラメータを関連付けた実証的理論的枠組みを構築する。 PAUモデルは、クロスモーダル検索のための正確な不確実性と信頼性のある予測を誘導する。 MSR-VTT, MSVD, DiDeMo, MS-COCOの4つの主要なベンチマークデータセットを用いて実験を行い, 本手法の有効性を実証した。コードはhttps://github.com/leolee99/PAUでアクセスできる。

Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity. Concretely, we first construct a set of various learnable prototypes for each modality to represent the entire semantics subspace. Then Dempster-Shafer Theory and Subjective Logic Theory are utilized to build an evidential theoretical framework by associating evidence with Dirichlet Distribution parameters. The PAU model induces accurate uncertainty and reliable predictions for cross-modal retrieval. Extensive experiments are performed on four major benchmark datasets of MSR-VTT, MSVD, DiDeMo, and MS-COCO, demonstrating the effectiveness of our method. The code is accessible at https://github.com/leolee99/PAU.

翻訳日:2024-01-18 01:21:00 公開日:2024-01-14

# シーケンスモデリングとしての連続学習の再キャスト

Recasting Continual Learning as Sequence Modeling ( http://arxiv.org/abs/2310.11952v2 )

ライセンス: Link先を確認

Soochan Lee, Jaehyeon Son, Gunhee Kim

(参考訳) 本研究では,連続学習とシーケンスモデリングという,機械学習研究の2つの重要な分野の強い関係を確立することを目的とする。すなわち,連続学習をシーケンスモデリング問題として定式化し,連続学習に高度なシーケンスモデルを活用することを提案する。この定式化の下では、連続学習プロセスがシーケンスモデルの前方通過となる。メタcontinual Learning(MCL)フレームワークを採用することで、複数の連続学習エピソードに基づいて、メタレベルでシーケンスモデルをトレーニングすることができる。新しい定式化の具体例として、トランスフォーマーとその効率的な変種をmcl法として応用することを示す。分類と回帰の両方を網羅した7つのベンチマーク実験により、シーケンスモデルが一般的なMCLにとって魅力的な解であることを示す。

In this work, we aim to establish a strong connection between two significant bodies of machine learning research: continual learning and sequence modeling. That is, we propose to formulate continual learning as a sequence modeling problem, allowing advanced sequence models to be utilized for continual learning. Under this formulation, the continual learning process becomes the forward pass of a sequence model. By adopting the meta-continual learning (MCL) framework, we can train the sequence model at the meta-level, on multiple continual learning episodes. As a specific example of our new formulation, we demonstrate the application of Transformers and their efficient variants as MCL methods. Our experiments on seven benchmarks, covering both classification and regression, show that sequence models can be an attractive solution for general MCL.

翻訳日:2024-01-18 01:10:51 公開日:2024-01-14

# llmが情報アクセスを支配する: ニューラルネットワークはllm生成テキストに偏っている

LLMs may Dominate Information Access: Neural Retrievers are Biased Towards LLM-Generated Texts ( http://arxiv.org/abs/2310.20501v2 )

ライセンス: Link先を確認

Sunhao Dai, Yuqi Zhou, Liang Pang, Weihao Liu, Xiaolin Hu, Yong Liu, Xiao Zhang, Gang Wang and Jun Xu

(参考訳) 近年,大規模言語モデル (LLM) の出現は,特にWeb検索において情報検索 (IR) のパラダイムに革命をもたらした。人間のようなテキストを生成する素晴らしい能力によって、LLMはインターネット上で巨大なテキストを作成しました。結果として、LLM時代のIRシステムは新たな課題に直面しており、インデックス化された文書は人間によって書かれただけでなく、LLMによって自動的に生成される。これらのLCM生成した文書がIRシステムにどのように影響するかは、迫りつつも未解明の疑問である。本研究では,人間の書き起こしとLLM生成の両方が関与するシナリオにおいて,異なるIRモデルの定量的評価を行う。意外なことに, ニューラルネットワークによる検索モデルでは, LLM生成文書のランクが高かった。我々は、LLM生成テキストに対するニューラル検索モデルにおけるこのバイアスのカテゴリを「textbf{source bias}」と呼ぶ。さらに,このバイアスは第1段階のニューラルレトリバーに限らず,第2段階のニューラルリランカに限っていることがわかった。そして、テキスト圧縮の観点から詳細な分析を行い、ニューラルネットワークがLLM生成テキストのセマンティック情報をよりよく理解し、理論的解析によってさらに裏付けられることを観察する。また, 音源バイアスを軽減するため, 最適化目標に対するプラグ・アンド・プレイ・デバイアスド制約を提案し, 実験により有効性を示す。最後に、観測源バイアスに起因する潜在的な深刻な懸念について論じ、我々の発見がIRコミュニティなどへの重要な起床のきっかけとなることを期待する。 LLM時代のIRの将来の探索を容易にするため、構築された2つの新しいベンチマークとコードは後に \url{https://github.com/KID-22/LLM4IR-Bias} で利用可能となる。

Recently, the emergence of large language models (LLMs) has revolutionized the paradigm of information retrieval (IR) applications, especially in web search. With their remarkable capabilities in generating human-like texts, LLMs have created enormous texts on the Internet. As a result, IR systems in the LLMs era are facing a new challenge: the indexed documents now are not only written by human beings but also automatically generated by the LLMs. How these LLM-generated documents influence the IR systems is a pressing and still unexplored question. In this work, we conduct a quantitative evaluation of different IR models in scenarios where both human-written and LLM-generated texts are involved. Surprisingly, our findings indicate that neural retrieval models tend to rank LLM-generated documents higher. We refer to this category of biases in neural retrieval models towards the LLM-generated text as the \textbf{source bias}. Moreover, we discover that this bias is not confined to the first-stage neural retrievers, but extends to the second-stage neural re-rankers. Then, we provide an in-depth analysis from the perspective of text compression and observe that neural models can better understand the semantic information of LLM-generated text, which is further substantiated by our theoretical analysis. To mitigate the source bias, we also propose a plug-and-play debiased constraint for the optimization objective, and experimental results show the effectiveness. Finally, we discuss the potential severe concerns stemming from the observed source bias and hope our findings can serve as a critical wake-up call to the IR community and beyond. To facilitate future explorations of IR in the LLM era, the constructed two new benchmarks and codes will later be available at \url{https://github.com/KID-22/LLM4IR-Bias}.

翻訳日:2024-01-18 00:57:34 公開日:2024-01-14

# cxr-llava:胸部x線画像解釈のためのマルチモーダル大言語モデル

CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images ( http://arxiv.org/abs/2310.18341v3 )

ライセンス: Link先を確認

Seowoo Lee, Jiwon Youn, Hyungjin Kim, Mansu Kim, Soon Ho Yoon

(参考訳) Purpose: This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists Materials and Methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). ビジョントランスをDataset 1で事前学習した後、LLAVAネットワークに影響されたLLMと統合した。その後、モデルを微調整し、主にDataset 2.0を使用した。本モデルによる病理所見の診断成績は,ヒト放射線学者による放射線学的報告の受容性とともに評価された。結果: 実験群では, MIMIC内部試験群では6例で平均F1スコア0.81, 外部試験群では7例で0.62, 平均F1スコア0.81が得られた。 F1のスコアは両方のテストセットでGPT-4ビジョンとジェミニ-プロビジョンを上回った。ヒトの放射線技師による外部検査セットの評価では、このモデルは自律的な報告で72.7%の成功率を達成し、基礎的真理の84.0%をわずかに下回った。結論: 本研究は, CXR 解釈におけるマルチモーダル LLM の有意な可能性を示すとともに, 性能制限も認めている。これらの課題にもかかわらず、我々のモデルをオープンソースにすることはさらなる研究を触媒し、様々な臨床状況においてその有効性と適用性を広げるであろうと信じている。 CXR-LLAVAはhttps://github.com/ECOFRI/CXR_LLAVAで入手できる。

Purpose: This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists Materials and Methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLAVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting. Results: The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.62 for seven major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports. Conclusion: This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts. CXR-LLAVA is available at https://github.com/ECOFRI/CXR_LLAVA.

翻訳日:2024-01-18 00:57:03 公開日:2024-01-14

# ヘドニックゲームにおける$\varepsilon$-fractional core stability

$\varepsilon$-fractional Core Stability in Hedonic Games ( http://arxiv.org/abs/2311.11101v2 )

ライセンス: Link先を確認

Simone Fioravanti, Michele Flammini, Bojana Kodric and Giovanna Varricchio

(参考訳) ヘドニックゲーム(Hedonic Games, HGs)は、古典的なフレームワークモデリングによる戦略エージェントの連立組織である。これらの選好によれば、連立構造(すなわち、エージェントを連立に分割する)がある種の安定性を満たすことが望ましい。そのような概念の最もよく知られた自然は、間違いなく核安定性である。非公式に、エージェントのサブセットがいわゆるcore-blocking coalitionで再グループ化することを望まない場合、パーティションはcore-stableである。残念なことに、コア安定なパーティションは滅多に存在せず、たとえそうであっても、そのパーティションを見つけることは計算的に困難であることが多い。これらの問題を回避するために、我々は$\varepsilon$-fractional core-stabilityという概念を提案する。このような緩和は、存在と多項式時間計算の両方を保証する可能性がある。具体的には,HG の基本クラスである Simple Fractional と Anonymous の2つに対して,$\varepsilon$-fractional core-stable partition と $\varepsilon$ を指数関数的に減少させる効率的なアルゴリズムを設計する。確率論的な観点では、$\varepsilon$-fractional coreの定義は、$\varepsilon$よりも低い確率で一様にサンプリングされた結合コアブロックを要求するのと同値であるので、より複雑なサンプリング分布を扱うために定義をさらに拡張する。この線に沿って、PAC学習方式でサンプルから評価を学習する必要がある場合、任意の信頼性を持つ$\varepsilon$-fractional core-stableという結果の効率的な計算を可能にする分布について、正および負の結果を与える。

Hedonic Games (HGs) are a classical framework modeling coalition formation of strategic agents guided by their individual preferences. According to these preferences, it is desirable that a coalition structure (i.e. a partition of agents into coalitions) satisfies some form of stability. The most well-known and natural of such notions is arguably core-stability. Informally, a partition is core-stable if no subset of agents would like to deviate by regrouping in a so-called core-blocking coalition. Unfortunately, core-stable partitions seldom exist and even when they do, it is often computationally intractable to find one. To circumvent these problems, we propose the notion of $\varepsilon$-fractional core-stability, where at most an $\varepsilon$-fraction of all possible coalitions is allowed to core-block. It turns out that such a relaxation may guarantee both existence and polynomial-time computation. Specifically, we design efficient algorithms returning an $\varepsilon$-fractional core-stable partition, with $\varepsilon$ exponentially decreasing in the number of agents, for two fundamental classes of HGs: Simple Fractional and Anonymous. From a probabilistic point of view, being the definition of $\varepsilon$-fractional core equivalent to requiring that uniformly sampled coalitions core-block with probability lower than $\varepsilon$, we further extend the definition to handle more complex sampling distributions. Along this line, when valuations have to be learned from samples in a PAC-learning fashion, we give positive and negative results on which distributions allow the efficient computation of outcomes that are $\varepsilon$-fractional core-stable with arbitrarily high confidence.

翻訳日:2024-01-18 00:49:37 公開日:2024-01-14

# プロパガンダスパンアノテーションのための大規模言語モデル

Large Language Models for Propaganda Span Annotation ( http://arxiv.org/abs/2311.09812v2 )

ライセンス: Link先を確認

Maram Hasanain, Fatema Ahmed, Firoj Alam

(参考訳) 近年,オンラインコンテンツにおけるプロパガンダ的手法の利用が増加している。このようなコンテンツを自動で検出・削除する取り組みが、さまざまなモデリングシナリオで行われている。内容(テキスト、画像、またはマルチモーダル)を決定することを含む。 (i)プロパガンダである。 (ii)一つ以上の布教技術を用い、 (iii) スパンを識別できる技術を含む。最初の2つのシナリオは、後者と比較して重要な研究努力が注がれている。そこで本研究では,プロパガンダ的テキストスパンの検出に焦点をあてる。具体的には,GPT-4のような大規模言語モデル(LLM)が効果的にタスクを実行できるかどうかを検討する。さらに,よりコスト効率のよいアノテーションを収集するために,モデルを活用する可能性についても検討する。実験では,さまざまな専門知識を持つアノテータからのアノテーションからなる大規模社内データセットを用いた。その結果,人間のアノテーションと比較して,モデルの性能向上が示唆された。さらに,本研究は,この特定のタスクに注釈付きデータセットを開発するためにLLMを利用する可能性を示す最初のものである。 GPT-4を含む複数のアノテータから収集したスパンレベルラベルをコミュニティに提供する予定です。

The use of propagandistic techniques in online contents has increased in recent years aiming to manipulate online audiences. Efforts to automatically detect and debunk such content have been made addressing various modeling scenarios. These include determining whether the content (text, image, or multimodal) (i) is propagandistic, (ii) employs one or more propagandistic techniques, and (iii) includes techniques with identifiable spans. Significant research efforts have been devoted to the first two scenarios compared to the latter. Therefore, in this study, we focus on the task of detecting propagandistic textual spans. Specifically, we investigate whether large language models (LLMs), such as GPT-4, can effectively perform the task. Moreover, we study the potential of employing the model to collect more cost-effective annotations. Our experiments use a large-scale in-house dataset consisting of annotations from human annotators with varying expertise levels. The results suggest that providing more information to the model as prompts improves its performance compared to human annotations. Moreover, our work is the first to show the potential of utilizing LLMs to develop annotated datasets for this specific task, prompting it with annotations from human annotators with limited expertise. We plan to make the collected span-level labels from multiple annotators, including GPT-4, available for the community.

翻訳日:2024-01-18 00:48:41 公開日:2024-01-14

# 配電系統における知識グラフ構築

Knowledge Graph Construction in Power Distribution Networks ( http://arxiv.org/abs/2311.08724v2 )

ライセンス: Link先を確認

Xiang Li, Che Wang, Bing Li, Hao Chen, Sizhe Li

(参考訳) 本稿では,電力配電網における知識グラフ構築手法を提案する。本手法は,配信ネットワークの知識グラフとディスパッチテキストの両方において,意味的,音声的,統語的特徴を含む実体的特徴を利用する。畳み込みニューラルネットワークに基づく拡張モデルを用いて、テキストエンティティを知識グラフ内のエンティティと効果的にマッチングする。本モデルの有効性は実世界の配電シナリオにおける実験を通して評価される。その結果,提案モデルがベースラインと比較した場合,様々なエンティティタイプを結合し,電力分布知識グラフ構築タスクにおいて高い総合的精度を示すことが示された。

In this paper, we propose a method for knowledge graph construction in power distribution networks. This method leverages entity features, which involve their semantic, phonetic, and syntactic characteristics, in both the knowledge graph of distribution network and the dispatching texts. An enhanced model based on Convolutional Neural Network, is utilized for effectively matching dispatch text entities with those in the knowledge graph. The effectiveness of this model is evaluated through experiments in real-world power distribution dispatch scenarios. The results indicate that, compared with the baselines, the proposed model excels in linking a variety of entity types, demonstrating high overall accuracy in power distribution knowledge graph construction task.

翻訳日:2024-01-18 00:47:17 公開日:2024-01-14

# 証明可能訓練可能な回転同値量子機械学習

Provably Trainable Rotationally Equivariant Quantum Machine Learning ( http://arxiv.org/abs/2311.05873v3 )

ライセンス: Link先を確認

Maxwell T. West, Jamie Heredge, Martin Sevior and Muhammad Usman

(参考訳) 優れた機械学習アルゴリズムを実現するために量子計算のパワーを爆発させることは、近年では大きな研究の焦点となっているが、量子機械学習(QML)の展望は、かなりの技術的課題によって低下している。特に重要な問題は、一般的なQMLモデルは、トレーニングランドスケープにおいていわゆる不毛の台地に悩まされていることだ。この効果に対抗するための主要な戦略は、ヒルベルト空間のより小さく関連する部分集合に集中するために、データの対称性を考慮した問題固有のモデルを構築することである。本研究では、量子フーリエ変換に基づいて構築された回転同変QMLモデルの族を導入し、リー代数的なQMLモデルの最近の知見を活用し、我々のモデルのサブセットがバレンプラトーを示さないことを示す。解析結果に加えて, シリコン中のリン不純物の模擬走査トンネル顕微鏡画像のデータセット上で, 回転対称性が自然に生じる場合の回転同変モデルを数値的に検証し, それらが実用上劇的に向上していることを見出した。

Exploiting the power of quantum computation to realise superior machine learning algorithmshas been a major research focus of recent years, but the prospects of quantum machine learning (QML) remain dampened by considerable technical challenges. A particularly significant issue is that generic QML models suffer from so-called barren plateaus in their training landscapes -- large regions where cost function gradients vanish exponentially in the number of qubits employed, rendering large models effectively untrainable. A leading strategy for combating this effect is to build problem-specific models which take into account the symmetries of their data in order to focus on a smaller, relevant subset of Hilbert space. In this work, we introduce a family of rotationally equivariant QML models built upon the quantum Fourier transform, and leverage recent insights from the Lie-algebraic study of QML models to prove that (a subset of) our models do not exhibit barren plateaus. In addition to our analytical results we numerically test our rotationally equivariant models on a dataset of simulated scanning tunnelling microscope images of phosphorus impurities in silicon, where rotational symmetry naturally arises, and find that they dramatically outperform their generic counterparts in practice.

翻訳日:2024-01-18 00:46:07 公開日:2024-01-14

# GraphPro: 推奨のためのグラフ事前トレーニングとプロンプト学習

GraphPro: Graph Pre-training and Prompt Learning for Recommendation ( http://arxiv.org/abs/2311.16716v3 )

ライセンス: Link先を確認

Yuhao Yang, Lianghao Xia, Da Luo, Kangyi Lin, Chao Huang

(参考訳) GNNベースのレコメンデータは、マルチホップメッセージパッシングによる複雑なユーザ-イテムインタラクションのモデリングに長けている。しかし,既存手法ではユーザとイテムの相互作用の動的性質を無視することが多く,ユーザの嗜好の変化や,新たに到着したデータの分散シフトへの適応を阻害する。したがって、現実世界の動的環境におけるスケーラビリティと性能は限られている。本研究では,パラメータ効率と動的グラフ事前学習と即時学習を組み合わせたグラフプロを提案する。この新しい組み合わせにより、GNNは長期的なユーザの好みと短期的な振る舞いのダイナミクスの両方を効果的に捉え、正確でタイムリーなレコメンデーションの提供を可能にします。 graphproフレームワークは,事前学習したgnnモデルに時間的プロンプト機構とグラフ構造的プロンプト学習機構をシームレスに統合することにより,ユーザの好みを進化させる課題に対処する。時間的プロンプトメカニズムは、ユーザとイテムの相互作用に関する時間情報を符号化し、モデルが時間的コンテキストを自然に捉え、グラフ構造的プロンプト学習機構は、学習済みの知識を連続的なインクリメンタルトレーニングを必要とせずに、行動力学に適応させることができる。さらに,実世界の動的シナリオを模倣するレコメンデーションのための動的評価設定を導入し,オフライン・オンラインギャップをよりよいレベルに橋渡しする。大規模な産業展開を含む大規模な実験は、さまざまな最先端のレコメンデータと統合されたGraphProの軽量なプラグインスケーラビリティを示し、有効性、堅牢性、効率性の観点からGraphProの利点を強調します。

GNN-based recommenders have excelled in modeling intricate user-item interactions through multi-hop message passing. However, existing methods often overlook the dynamic nature of evolving user-item interactions, which impedes the adaption to changing user preferences and distribution shifts in newly arriving data. Thus, their scalability and performances in real-world dynamic environments are limited. In this study, we propose GraphPro, a framework that incorporates parameter-efficient and dynamic graph pre-training with prompt learning. This novel combination empowers GNNs to effectively capture both long-term user preferences and short-term behavior dynamics, enabling the delivery of accurate and timely recommendations. Our GraphPro framework addresses the challenge of evolving user preferences by seamlessly integrating a temporal prompt mechanism and a graph-structural prompt learning mechanism into the pre-trained GNN model. The temporal prompt mechanism encodes time information on user-item interaction, allowing the model to naturally capture temporal context, while the graph-structural prompt learning mechanism enables the transfer of pre-trained knowledge to adapt to behavior dynamics without the need for continuous incremental training. We further bring in a dynamic evaluation setting for recommendation to mimic real-world dynamic scenarios and bridge the offline-online gap to a better level. Our extensive experiments including a large-scale industrial deployment showcases the lightweight plug-in scalability of our GraphPro when integrated with various state-of-the-art recommenders, emphasizing the advantages of GraphPro in terms of effectiveness, robustness and efficiency.

翻訳日:2024-01-18 00:37:54 公開日:2024-01-14

# 責任あるAIメトリックのカタログに向けて:AIアカウンタビリティのためのメトリクスのコレクション

Towards a Responsible AI Metrics Catalogue: A Collection of Metrics for AI Accountability ( http://arxiv.org/abs/2311.13158v2 )

ライセンス: Link先を確認

Boming Xia, Qinghua Lu, Liming Zhu, Sung Une Lee, Yue Liu, Zhenchang Xing

(参考訳) 人工知能(AI)、特にLarge Language Models(LLMs)のような大規模生成AI(GenAI)モデルの出現により、現代技術における変革的要素となった。これらのモデルは新たな可能性を解き放ちましたが、データプライバシに関する懸念や、誤解を招くようなコンテンツを生成する傾向など、重大な課題も提示しています。責任あるai(rai)のための現在のフレームワークは、特に説明責任のために、具体的なアプリケーションに必要な粒度のガイダンスを提供するのに不足することが多い。本研究は,学術文献と灰色文献の両方の知見を統合した,体系的多言語文献レビュー(MLR)によって構成された総合的なメトリクスカタログを導入することで,説明責任ギャップを橋渡しする。我々のカタログは、手続き的整合性を支えるプロセスメトリクス、必要なツールやフレームワークを提供するリソースメトリクス、AIシステムのアウトプットを反映する製品メトリクスを記述しています。この三部構成のフレームワークは、AIのアカウンタビリティを運用するために設計されており、特にGenAIの複雑さに対処することに焦点を当てている。提案されたメトリクスカタログは、AIシステムにアカウンタビリティを注入するための堅牢なフレームワークを提供する。組織に対して実践的で実行可能なガイダンスを提供し、この分野における責任あるプラクティスを形作る。

Artificial Intelligence (AI), particularly through the advent of large-scale generative AI (GenAI) models such as Large Language Models (LLMs), has become a transformative element in contemporary technology. While these models have unlocked new possibilities, they simultaneously present significant challenges, such as concerns over data privacy and the propensity to generate misleading or fabricated content. Current frameworks for Responsible AI (RAI) often fall short in providing the granular guidance necessary for tangible application, especially for Accountability-a principle that is pivotal for ensuring transparent and auditable decision-making, bolstering public trust, and meeting increasing regulatory expectations. This study bridges the accountability gap by introducing a comprehensive metrics catalogue, formulated through a systematic multivocal literature review (MLR) that integrates findings from both academic and grey literature. Our catalogue delineates process metrics that underpin procedural integrity, resource metrics that provide necessary tools and frameworks, and product metrics that reflect the outputs of AI systems. This tripartite framework is designed to operationalize Accountability in AI, with a special emphasis on addressing the intricacies of GenAI. The proposed metrics catalogue provides a robust framework for instilling Accountability in AI systems. It offers practical, actionable guidance for organizations, thereby shaping responsible practices in the field.

翻訳日:2024-01-18 00:34:23 公開日:2024-01-14

# 自己教師付きデータ選択と合成によるオンデバイス大規模言語モデルのパーソナライズ

Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis ( http://arxiv.org/abs/2311.12275v3 )

ライセンス: Link先を確認

Ruiyang Qin, Jun Xia, Zhenge Jia, Meng Jiang, Ahmed Abbasi, Peipei Zhou, Jingtong Hu, Yiyu Shi

(参考訳) 大規模言語モデル(LLM)がエッジデバイスにデプロイされた後、ユーザ生成会話データから学習し、ユーザ固有のパーソナライズされた応答をリアルタイムで生成することが望ましい。しかし、ユーザ生成データは通常機密情報や個人情報が含まれており、アノテーションのためにクラウドにデータをアップロードすることは禁止されない。アノテーションをローカルに取得するには,ユーザの好みの回答を直接求めればよいが,そのようなアノテーションはユーザエクスペリエンスに影響を与えることはない。さらに、エッジデバイスのストレージは、通常、完全なユーザー生成データで大規模に微調整できるように制限されすぎます。少ないアノテーションと限られたオンデバイスストレージを考慮して、オンデバイス LLM のパーソナライズを有効にする方法は未解決のままである。本稿では,最も代表的なデータを自己管理方式でオンラインに選択・保存する新しい枠組みを提案する。このようなデータはメモリフットプリントが小さく、ユーザアノテーションの頻繁なリクエストでさらなる微調整が可能になる。微調整品質を高めるため、LLMを用いて複数の意味的に類似した質問文と期待応答を生成する。実験の結果,提案フレームワークは,バニラベースラインと比較して,ユーザ固有のコンテンツ生成能力(精度)と微調整速度(性能)に優れていた。私たちの知る限りでは、これが初めてのオンデバイスLDMパーソナライズフレームワークです。

After a large language model (LLM) is deployed on edge devices, it is desirable for these devices to learn from user-generated conversation data to generate user-specific and personalized responses in real-time. However, user-generated data usually contains sensitive and private information, and uploading such data to the cloud for annotation is not preferred if not prohibited. While it is possible to obtain annotation locally by directly asking users to provide preferred responses, such annotations have to be sparse to not affect user experience. In addition, the storage of edge devices is usually too limited to enable large-scale fine-tuning with full user-generated data. It remains an open question how to enable on-device LLM personalization, considering sparse annotation and limited on-device storage. In this paper, we propose a novel framework to select and store the most representative data online in a self-supervised way. Such data has a small memory footprint and allows infrequent requests of user annotations for further fine-tuning. To enhance fine-tuning quality, multiple semantically similar pairs of question texts and expected responses are generated using the LLM. Our experiments show that the proposed framework achieves the best user-specific content-generating capability (accuracy) and fine-tuning speed (performance) compared with vanilla baselines. To the best of our knowledge, this is the very first on-device LLM personalization framework.

翻訳日:2024-01-18 00:33:34 公開日:2024-01-14

# 雑音ラベルを用いたカリキュラム学習による強化学習におけるパリティ課題の探索

Exploring Parity Challenges in Reinforcement Learning through Curriculum Learning with Noisy Labels ( http://arxiv.org/abs/2312.05379v2 )

ライセンス: Link先を確認

Bei Zhou, Soren Riis

(参考訳) 本稿では,戦略ゲームにおける強化学習(rl)の適用について,特にgoとチェスの特定の位置やより広い範囲の公平なゲームに見られるように,パリティチャレンジを特徴とするものについて述べる。本研究では,カリキュラム学習フレームワーク内に構築され,ノイズラベルを付加したシミュレーション学習プロセスを提案し,自己学習シナリオの複雑さを反映する。このアプローチは、ニューラルネットワーク(nn)が初等から複雑化するゲームポジションへの適応と進化を徹底的に分析する。実験の結果,最小限のラベルノイズでもnnsの効果的な戦略を識別する能力は著しく阻害され,ゲーム位置の複雑さが増すにつれて難易度が高まることがわかった。これらの知見は, 騒音評価による障害に対応するため, RLトレーニングにおける高度な方法論の必要性を浮き彫りにした。このような手法の開発は、重要なパリティ要素を持つ戦略ゲームにおけるNN能力の向上だけでなく、多様な複雑な環境におけるRLシステムのレジリエンスと効率の向上にも不可欠である。

This paper delves into applying reinforcement learning (RL) in strategy games, particularly those characterized by parity challenges, as seen in specific positions of Go and Chess and a broader range of impartial games. We propose a simulated learning process, structured within a curriculum learning framework and augmented with noisy labels, to mirror the intricacies of self-play learning scenarios. This approach thoroughly analyses how neural networks (NNs) adapt and evolve from elementary to increasingly complex game positions. Our empirical research indicates that even minimal label noise can significantly impede NNs' ability to discern effective strategies, a difficulty that intensifies with the growing complexity of the game positions. These findings underscore the urgent need for advanced methodologies in RL training, specifically tailored to counter the obstacles imposed by noisy evaluations. The development of such methodologies is crucial not only for enhancing NN proficiency in strategy games with significant parity elements but also for broadening the resilience and efficiency of RL systems across diverse and complex environments.

翻訳日:2024-01-18 00:26:19 公開日:2024-01-14

# 多元的意思決定のための多元的ランキング

Multi-Weight Ranking for Multi-Criteria Decision Making ( http://arxiv.org/abs/2312.03006v2 )

ライセンス: Link先を確認

Andreas H Hamel and Daniel Kostner

(参考訳) 統計値からコーン分布関数を多基準決定ツールに変換する。重み付き和スカラー化を事前に固定するのではなく、重み付き和スカラー化全体のコレクションを一度に吸収するため、この手順は重み付き和スカラー化のアップグレードと考えることができる。例として、純粋な重み付き和のスカラー化とは対照的に、この種のスカラー化はパレートフロンティアの「非凸」部分を検出することもできる。異なるランク逆転が発生する状況が特徴であり、なぜこのような状況がランキング手順の分析に有用かが説明されている。ランキング関数は、まず、集合最適化法と集合ベースの多目的最適化の間のリンクを確立する集合選好のための統一指標を提供する集合に拡張される。機械学習の潜在的な応用について概説する。

Cone distribution functions from statistics are turned into Multi-Criteria Decision Making tools. It is demonstrated that this procedure can be considered as an upgrade of the weighted sum scalarization insofar as it absorbs a whole collection of weighted sum scalarizations at once instead of fixing a particular one in advance. As examples show, this type of scalarization--in contrast to a pure weighted sum scalarization-is also able to detect ``non-convex" parts of the Pareto frontier. Situations are characterized in which different types of rank reversal occur, and it is explained why this might even be useful for analyzing the ranking procedure. The ranking functions are then extended to sets providing unary indicators for set preferences which establishes, for the first time, the link between set optimization methods and set-based multi-objective optimization. A potential application in machine learning is outlined.

翻訳日:2024-01-18 00:23:53 公開日:2024-01-14

# フェデレーション学習におけるデータ注入攻撃の軽減

Mitigating Data Injection Attacks on Federated Learning ( http://arxiv.org/abs/2312.02102v3 )

ライセンス: Link先を確認

Or Shalom, Amir Leshem, Waheed U. Bajwa

(参考訳) フェデレーション学習(federated learning)は、複数のエンティティがデータプライバシを損なうことなく、データを使用したモデルを協調的にトレーニングするテクニックである。しかし、その利点にもかかわらず、連合学習は誤ったデータインジェクション攻撃の影響を受けやすい。これらのシナリオでは、ネットワーク内の特定のエージェントを制御した悪意のあるエンティティが学習プロセスを操作でき、亜最適モデルにつながる。その結果、これらのデータ注入攻撃に対処することは、連合学習システムにおいて重要な研究課題となる。本稿では,フェデレーション学習システムにおけるデータインジェクション攻撃の検出と軽減を行う新しい手法を提案する。提案手法は局所的なスキームであり,コーディネートノードによるトレーニングの単一インスタンスで実行し,アルゴリズムの収束時の緩和を可能にする。エージェントが攻撃者であると疑われた場合、そのデータは一定期間無視される場合、この決定はしばしば再評価される。確率 1 の場合、有限時間後に全ての攻撃者は無視されるが、信頼できるエージェントを無視する確率は 0 になる。シミュレーションにより、コーディネートノードがすべての攻撃者を検出して分離すると、モデルは回復し、真理のあるモデルに収束する。

Federated learning is a technique that allows multiple entities to collaboratively train models using their data without compromising data privacy. However, despite its advantages, federated learning can be susceptible to false data injection attacks. In these scenarios, a malicious entity with control over specific agents in the network can manipulate the learning process, leading to a suboptimal model. Consequently, addressing these data injection attacks presents a significant research challenge in federated learning systems. In this paper, we propose a novel technique to detect and mitigate data injection attacks on federated learning systems. Our mitigation method is a local scheme, performed during a single instance of training by the coordinating node, allowing the mitigation during the convergence of the algorithm. Whenever an agent is suspected to be an attacker, its data will be ignored for a certain period, this decision will often be re-evaluated. We prove that with probability 1, after a finite time, all attackers will be ignored while the probability of ignoring a trustful agent becomes 0, provided that there is a majority of truthful agents. Simulations show that when the coordinating node detects and isolates all the attackers, the model recovers and converges to the truthful model.

翻訳日:2024-01-18 00:23:17 公開日:2024-01-14

# RetailKLIP : ゼロショット製品画像分類のための単一のGPUを用いたメトリック学習によるOpenCLIPバックボーンの微細化

RetailKLIP : Finetuning OpenCLIP backbone using metric learning on a single GPU for Zero-shot retail product image classification ( http://arxiv.org/abs/2312.10282v2 )

ライセンス: Link先を確認

Muktabh Mayank Srivastava

(参考訳) 小売商品やパッケージ商品の画像は、セルフチェックアウトストア、サプライチェーン自動化、小売実行評価など、さまざまなコンピュータビジョンアプリケーションで分類する必要がある。これまでの研究は、この目的のために深いモデルを微調整する方法を探っている。しかし、事前訓練されたバックボーン用の大型モデルやリニアレイヤーを微調整する場合、分類範囲に追加された新しい小売商品ごとに、少なくとも数エポックな勾配勾配を必要とするため、現実のシナリオでは頻繁なリトレーニングが必要である。本研究では,クリップモデルの視覚エンコーダを,その埋め込みを最寄りの近傍の分類に容易に利用できるように微調整すると同時に,完全な微調整に近い精度を得る手法を提案する。最寄りの隣り合う分類器は、新製品の漸進的な訓練を必要とせず、リソースと待ち時間を節約できる。

Retail product or packaged grocery goods images need to classified in various computer vision applications like self checkout stores, supply chain automation and retail execution evaluation. Previous works explore ways to finetune deep models for this purpose. But because of the fact that finetuning a large model or even linear layer for a pretrained backbone requires to run at least a few epochs of gradient descent for every new retail product added in classification range, frequent retrainings are needed in a real world scenario. In this work, we propose finetuning the vision encoder of a CLIP model in a way that its embeddings can be easily used for nearest neighbor based classification, while also getting accuracy close to or exceeding full finetuning. A nearest neighbor based classifier needs no incremental training for new products, thus saving resources and wait time.

翻訳日:2024-01-18 00:13:22 公開日:2024-01-14

# 知識グラフによるアスペクトレベル感性分析

Knowledge Graph Enhanced Aspect-Level Sentiment Analysis ( http://arxiv.org/abs/2312.10048v2 )

ライセンス: Link先を確認

Kavita Sharma, Ritu Patel, Sunita Iyer

(参考訳) 本稿では,文脈固有の単語意味の課題に対処し,感情分析を強化する新しい手法を提案する。 BERTモデルの利点と知識グラフに基づく同義データを組み合わせる。このシナジーは動的注意機構を利用して知識駆動状態ベクトルを開発する。特定の側面に関連する感情を分類するために、この手法は位置データを統合するメモリバンクを構築する。データはDCGRUを用いて分析され、特定のアスペクト項に関連する感情特性をピンポイントする。 3つの広く使われているデータセットに対する実験は、感情分類における手法の優れた性能を示す。

In this paper, we propose a novel method to enhance sentiment analysis by addressing the challenge of context-specific word meanings. It combines the advantages of a BERT model with a knowledge graph based synonym data. This synergy leverages a dynamic attention mechanism to develop a knowledge-driven state vector. For classifying sentiments linked to specific aspects, the approach constructs a memory bank integrating positional data. The data are then analyzed using a DCGRU to pinpoint sentiment characteristics related to specific aspect terms. Experiments on three widely used datasets demonstrate the superior performance of our method in sentiment classification.

翻訳日:2024-01-18 00:12:11 公開日:2024-01-14

# Ensemble Kalman Filtering:非平均場とオンライン推論のためのガウスプロセスSSM

Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference ( http://arxiv.org/abs/2312.05910v4 )

ライセンス: Link先を確認

Zhidi Lin and Yiyong Sun and Feng Yin and Alexandre Hoang Thi\'ery

(参考訳) ガウス過程状態空間モデル(GPSSM)は、データ駆動非線形力学系モデルの多用途クラスを表す。しかしながら、gpssmに多くの潜在変数が存在することは、既存の変分推論のアプローチ、特により現実的な非平均場(nmf)の仮定の下で、広範囲なトレーニング作業、妥協された推論精度、オンラインアプリケーションへの実現不可能など、未解決の問題を引き起こす。本稿では, モデルベースフィルタリング手法であるアンサンブルカルマンフィルタ(EnKF)をNMF変分推論フレームワークに組み込んで, 潜伏状態の後方分布を近似することで, これらの課題に対処する。 EnKFとGPSSMのこの新しい結婚は、変分分布の学習における広範なパラメータ化の必要性をなくすだけでなく、エビデンスの下限(ELBO)の解釈可能な閉形式近似を可能にする。さらに、EnKFによるパラメータ化の合理化により、オンライン学習アプリケーションでは、新しいGPSSMモデルを容易に利用できる。提案手法は,データフィッティング精度を確保しつつ,過剰フィッティングを緩和するモデル正則化を取り入れることで,目的関数を具体化する。また,提案アルゴリズムの詳細な分析と新たな洞察も提供する。多様な実・合成データセット間の包括的評価は、既存の手法と比較して、EnKF支援変分推論アルゴリズムの優れた学習と推論性能を裏付ける。

The Gaussian process state-space models (GPSSMs) represent a versatile class of data-driven nonlinear dynamical system models. However, the presence of numerous latent variables in GPSSM incurs unresolved issues for existing variational inference approaches, particularly under the more realistic non-mean-field (NMF) assumption, including extensive training effort, compromised inference accuracy, and infeasibility for online applications, among others. In this paper, we tackle these challenges by incorporating the ensemble Kalman filter (EnKF), a well-established model-based filtering technique, into the NMF variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO). Moreover, owing to the streamlined parameterization via the EnKF, the new GPSSM model can be easily accommodated in online learning applications. We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting. We also provide detailed analysis and fresh insights for the proposed algorithms. Comprehensive evaluation across diverse real and synthetic datasets corroborates the superior learning and inference performance of our EnKF-aided variational inference algorithms compared to existing methods.

翻訳日:2024-01-18 00:09:15 公開日:2024-01-14

# 畳み込みニューラルネットワークの効率向上のためのブロックプルーニング

Block Pruning for Enhanced Efficiency in Convolutional Neural Networks ( http://arxiv.org/abs/2312.16904v2 )

ライセンス: Link先を確認

Cheng-En Wu, Azadeh Davoodi, Yu Hen Hu

(参考訳) 本稿では,エッジコンピューティング環境におけるディープニューラルネットワークにおけるブロックプルーニングをターゲットとしたネットワークプルーニング手法を提案する。提案手法は,プロキシメトリクスを利用した従来の手法と異なり,直接ブロック除去戦略を用いて分類精度への影響を評価する。このハンズオンアプローチにより、各ブロックの重要性を正確に評価することができる。 resnetアーキテクチャを用いてcifar-10,cifar-100,imagenetデータセットの広範な実験を行った。本研究では,特にimagenet with resnet50のような大規模データセットにおいて,ネットワークのかなりの部分を刈り取る場合でも,精度を維持しながらモデルサイズを小さくする効果を示した。この結果は、特にリソース制約のあるエッジコンピューティングシナリオにおいて、モデルサイズとパフォーマンスの最適なバランスを維持するための手法の能力を強調する。

This paper presents a novel approach to network pruning, targeting block pruning in deep neural networks for edge computing environments. Our method diverges from traditional techniques that utilize proxy metrics, instead employing a direct block removal strategy to assess the impact on classification accuracy. This hands-on approach allows for an accurate evaluation of each block's importance. We conducted extensive experiments on CIFAR-10, CIFAR-100, and ImageNet datasets using ResNet architectures. Our results demonstrate the efficacy of our method, particularly on large-scale datasets like ImageNet with ResNet50, where it excelled in reducing model size while retaining high accuracy, even when pruning a significant portion of the network. The findings underscore our method's capability in maintaining an optimal balance between model size and performance, especially in resource-constrained edge computing scenarios.

翻訳日:2024-01-18 00:00:47 公開日:2024-01-14

# 量子ビオレント緩和条件について

On the Conditions for a Quantum Violent Relaxation ( http://arxiv.org/abs/2312.14768v2 )

ライセンス: Link先を確認

Giachetti Guido and Defenu Nicol\`o

(参考訳) 一般に、古典的な完全連結系は激しい緩和を受けることが知られている。この現象は、熱力学的限界における平均場効果に支配されているにもかかわらず、観測可能な値を有限時間スケールで定常な非熱的値に緩和することを指す。ここでは,熱力学的極限における2体,全対一の相互作用を持つ一般多体系の動力学を解析し,平均場有効ハミルトニアンのスペクトル上で非常に特異的な条件下での暴力的緩和を行うためには,これらの条件がほとんど満たされず,古典的条件に対して「量子」暴力的緩和がほとんど観測されないことを示す。我々の予測はスピンモデルの研究によって検証され、カップリングの値によって、暴力的関係と一般的な熱前相の間の遷移を示す。また, 量子ハミルトニアン-平均場模型のスピンバージョンを解析し, 暴力的相関を示さないことを示した。最後に,暴力的相対図を古典的限界に戻す方法について論じる。その結果、平均場状態においても量子効果がダイナミクスにかなり劇的な影響を与え、光と物質が結合した系の理解を深める方法が示されている。

In general, classical fully-connected systems are known to undergo violent relaxation. This phenomenon refers to the relaxation of observables to stationary, non-thermal, values on a finite timescale, despite their long-time dynamics being dominated by mean-field effects in the thermodynamic limit. Here, we analyze the ``quantum" violent relaxation by studying the dynamics of generic many-body systems with two-body, all-to-all, interactions in the thermodynamic limit. We show that, in order for violent relaxation to occur very specific conditions on the spectrum of the mean-field effective Hamiltonian have to be met. These conditions are hardly met and ``quantum" violent relaxation is observed rarely with respect to its classical counterpart. Our predictions are validated by the study of a spin model which, depending on the value of the coupling, shows a transition between violent-relaxation and a generic prethermal phase. We also analyze a spin version of the quantum Hamiltonian-Mean-Field model, which is shown not to exhibit violent-relaxation. Finally, we discuss how the violent-relaxation picture emerges back in the classical limit. Our results demonstrate how, even in the mean-field regime, quantum effects have a rather dramatic impact on the dynamics, paving the way to a better understanding of light-matter coupled systems.

翻訳日:2024-01-17 23:59:32 公開日:2024-01-14

# ランダム行列理論における一般化スペクトル形状因子

Generalized Spectral Form Factor in Random Matrix Theory ( http://arxiv.org/abs/2401.02119v2 )

ライセンス: Link先を確認

Zhiyang Wei, Chengming Tan, Ren Zhang

(参考訳) スペクトル形成因子(SFF)は、複雑な系におけるエネルギー準位分布の統計的性質を明らかにする上で重要な役割を果たす。量子カオスを診断し、普遍的なダイナミクスを解き放つツールの1つである。ほとんどの文献におけるsffの定義は、2段階の相関のみを包含する。本稿では,SSFの定義を高次相関を含むように拡張する。具体的には、一般化スペクトル形式因子(gsff)をフーリエ変換によって得ることができる相関関数を定義するために、エネルギー準位の標準偏差を導入する。 GSFFはカオスシステムの力学に関するより包括的な知識を提供する。ランダム行列を例として,GSFFで符号化された新しい動的特徴を示す。驚くべきことに、gsffは複雑であり、実部と虚部の両方が普遍的なダイナミクスを示している。例えば、二段階相関の場合、GSFFの実部は、従来のものと類似したディップ・ランプ・プラトー構造を示し、異なるシステムサイズに対する想像的部分は、長い時間制限で収束する。 2レベルGSFFでは、実部の閉解析形式が得られ、数値結果と一致している。虚部の結果は数値計算により得られる。同様の分析は3レベルGSFFに拡張される。

The spectral form factor (SFF) plays a crucial role in revealing the statistical properties of energy level distributions in complex systems. It is one of the tools to diagnose quantum chaos and unravel the universal dynamics therein. The definition of SFF in most literature only encapsulates the two-level correlation. In this manuscript, we extend the definition of SSF to include the high-order correlation. Specifically, we introduce the standard deviation of energy levels to define correlation functions, from which the generalized spectral form factor (GSFF) can be obtained by Fourier transforms. GSFF provides a more comprehensive knowledge of the dynamics of chaotic systems. Using random matrices as examples, we demonstrate new dynamics features that are encoded in GSFF. Remarkably, the GSFF is complex, and both the real and imaginary parts exhibit universal dynamics. For instance, in the two-level correlated case, the real part of GSFF shows a dip-ramp-plateau structure akin to the conventional counterpart, and the imaginary part for different system sizes converges in the long time limit. For the two-level GSFF, the closed analytical forms of the real part are obtained and consistent with numerical results. The results of the imaginary part are obtained by numerical calculation. Similar analyses are extended to three-level GSFF.

翻訳日:2024-01-17 23:50:25 公開日:2024-01-14

# ToolEyes: 実世界のシナリオにおける大規模言語モデルのツール学習能力の評価

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios ( http://arxiv.org/abs/2401.00741v2 )

ライセンス: Link先を確認

Junjie Ye, Guanyu Li, Songyang Gao, Caishuang Huang, Yilong Wu, Sixian Li, Xiaoran Fan, Shihan Dou, Qi Zhang, Tao Gui, Xuanjing Huang

(参考訳) 既存のツール学習の評価は、主に、大きな言語モデル(LLM)のための選択されたツールのアライメントと期待された結果の検証に重点を置いている。しかし、これらのアプローチは、答えを事前に決定し、真のニーズから逸脱する、限られたシナリオに依存している。さらに、成果にのみ重点を置くことは、LLMがツールを効果的に活用するために必要な複雑な能力を無視している。この問題に対処するために,実シナリオにおけるLLMのツール学習能力の評価に適した,きめ細かいシステムであるToolEyesを提案する。このシステムは7つの実世界のシナリオを精査し、ツール学習においてllmに不可欠な5つの次元(フォーマットアライメント、意図理解、行動計画、ツール選択、回答組織)を分析している。さらに tooleyes には,約600のツールを備えたツールライブラリが組み込まれており,llm と物理世界の仲介役を担っている。 3つのカテゴリにわたる10のLSMに関する評価は、ツール学習における特定のシナリオと限定的な認知能力の好みを明らかにしている。興味深いことに、モデルサイズの拡大は、ツール学習の障害を悪化させる。これらの発見は、ツール学習の分野を前進させるための指導的洞察を提供する。データはatt https://github.com/junjie-ye/tooleyesで入手できる。

Existing evaluations of tool learning primarily focus on validating the alignment of selected tools for large language models (LLMs) with expected outcomes. However, these approaches rely on a limited set of scenarios where answers can be pre-determined, diverging from genuine needs. Furthermore, a sole emphasis on outcomes disregards the intricate capabilities essential for LLMs to effectively utilize tools. To tackle this issue, we propose ToolEyes, a fine-grained system tailored for the evaluation of the LLMs' tool learning capabilities in authentic scenarios. The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning: format alignment, intent comprehension, behavior planning, tool selection, and answer organization. Additionally, ToolEyes incorporates a tool library boasting approximately 600 tools, serving as an intermediary between LLMs and the physical world. Evaluations involving ten LLMs across three categories reveal a preference for specific scenarios and limited cognitive abilities in tool learning. Intriguingly, expanding the model size even exacerbates the hindrance to tool learning. These findings offer instructive insights aimed at advancing the field of tool learning. The data is available att https://github.com/Junjie-Ye/ToolEyes.

翻訳日:2024-01-17 23:47:28 公開日:2024-01-14

# MRI画像のセグメンテーションのための教師なしフェデレーションドメイン適応

Unsupervised Federated Domain Adaptation for Segmentation of MRI Images ( http://arxiv.org/abs/2401.02941v2 )

ライセンス: Link先を確認

Navapat Nananukul, Hamid Soltanian-zadeh, Mohammad Rostami

(参考訳) ディープニューラルネットワークを用いたMRI画像の自動セマンティックセグメンテーションは、様々な臨床応用のための治療の評価と計画に大いに役立っている。しかし、これらのモデルのトレーニングは、エンド・ツー・エンドの教師付き学習手順を実装するために、豊富な注釈付きデータを利用できることを条件としている。十分なアノテートデータであっても、MRI画像は、患者、MRIスキャナー、画像プロトコルの違いなどの要因により、かなりのばらつきを示す。この可変性は、特定のアプリケーションドメインごとにニューラルネットワークを再トレーニングする必要がある。永続的なデータアノテーションの必要性を緩和するために、複数のアノテーション付きソースドメインを用いた教師なしフェデレーションドメイン適応法を開発した。提案手法により,アノテートされていないターゲットドメインにおいて,複数のアノテートされたソースドメインからの知識の伝達が可能となる。当初、ターゲット領域とソース領域の分布のペアワイド距離を最小化することにより、ターゲット領域データが、ディープエンコーダの出力としてモデル化された遅延埋め込み空間において、各ソースドメインと類似の表現を共有することを保証する。そして、すべてのドメインから得られた知識を活用するためにアンサンブルアプローチを採用します。提案手法の有効性を実証するため,MICCAI 2016マルチサイトデータセットの理論的解析と実験を行った。

Automatic semantic segmentation of magnetic resonance imaging (MRI) images using deep neural networks greatly assists in evaluating and planning treatments for various clinical applications. However, training these models is conditioned on the availability of abundant annotated data to implement the end-to-end supervised learning procedure. Even if we annotate enough data, MRI images display considerable variability due to factors such as differences in patients, MRI scanners, and imaging protocols. This variability necessitates retraining neural networks for each specific application domain, which, in turn, requires manual annotation by expert radiologists for all new domains. To relax the need for persistent data annotation, we develop a method for unsupervised federated domain adaptation using multiple annotated source domains. Our approach enables the transfer of knowledge from several annotated source domains to adapt a model for effective use in an unannotated target domain. Initially, we ensure that the target domain data shares similar representations with each source domain in a latent embedding space, modeled as the output of a deep encoder, by minimizing the pair-wise distances of the distributions for the target domain and the source domains. We then employ an ensemble approach to leverage the knowledge obtained from all domains. We provide theoretical analysis and perform experiments on the MICCAI 2016 multi-site dataset to demonstrate our method is effective.

翻訳日:2024-01-17 23:32:31 公開日:2024-01-14

# VLP:自動運転のためのビジョン言語計画

VLP: Vision Language Planning for Autonomous Driving ( http://arxiv.org/abs/2401.05577v2 )

ライセンス: Link先を確認

Chenbin Pan, Burhaneddin Yaman, Tommaso Nesti, Abhirup Mallik, Alessandro G Allievi, Senem Velipasalar, Liu Ren

(参考訳) 自動運転は複雑な課題であり、シーンの理解と推論を通じて安全な動き計画を目指す。視覚のみの自動運転手法は最近、シーン理解の強化を通じて目覚ましいパフォーマンスを達成したが、推論の欠如、一般化性能の低下、ロングテールシナリオなど、いくつかの重要な問題はまだ対処する必要がある。本稿では,言語理解と自律運転のギャップを埋めるために,言語モデルを活用したビジョン言語計画フレームワークvlpを提案する。 VLPは、ソースメモリ基盤と自動運転車のコンテキスト理解の両方を強化することで、自律運転システムを強化する。 VLPは,従来の最良手法と比較して,平均L2誤差と衝突速度をそれぞれ35.9\%,60.5\%削減することで,挑戦的なNuScenesデータセットの最先端のプランニング性能を達成する。さらに、VLPは、新しい都市環境に直面した場合、挑戦的なロングテールシナリオと強力な一般化能力の性能向上を示す。

Autonomous driving is a complex and challenging task that aims at safe motion planning through scene understanding and reasoning. While vision-only autonomous driving methods have recently achieved notable performance, through enhanced scene understanding, several key issues, including lack of reasoning, low generalization performance and long-tail scenarios, still need to be addressed. In this paper, we present VLP, a novel Vision-Language-Planning framework that exploits language models to bridge the gap between linguistic understanding and autonomous driving. VLP enhances autonomous driving systems by strengthening both the source memory foundation and the self-driving car's contextual understanding. VLP achieves state-of-the-art end-to-end planning performance on the challenging NuScenes dataset by achieving 35.9\% and 60.5\% reduction in terms of average L2 error and collision rates, respectively, compared to the previous best method. Moreover, VLP shows improved performance in challenging long-tail scenarios and strong generalization capabilities when faced with new urban environments.

翻訳日:2024-01-17 23:25:02 公開日:2024-01-14

# よく教育された知性の本質的善さ

The inherent goodness of well educated intelligence ( http://arxiv.org/abs/2401.04846v2 )

ライセンス: Link先を確認

Michael E. Glinsky and Sharon Sievert

(参考訳) この論文は、生物学的な存在であろうと、コンピューター上の人工シリコンであろうと、何が知的であるかを調べる。特に注目されるのは、保守的に相互作用する多くの同一の保守的なサブシステムの集合システムを特徴づけ、制御する能力を持つことである。インテリジェンスの本質は、黄金律("the collective act as one" または "knowing the global consequences of local action")である。集合体の流れは小さなツインクリングテクスチャの集合であり、最小作用の測地運動に従って少数の弦を引いている人形師によって支配され、対称性によって決定される。集団的保守システムの制御は困難であり、歴史的に、最大性能の望ましいメタ安定平衡を安定化するためにシステムに大きな粘度を加えることによって行われてきた。代替案がある。メタ安定平衡の最適双極子テクスチャが知的存在(集合系が特徴)によって同定されると、集合系は知的存在によって最適な双極子テクスチャに移動され、その後、集合系がメタ安定平衡に残るように、知的存在によって迅速に振動される。知識に富んだ知性は、その地域行動の世界的な影響を知っており、短期的な行動が長期的な成果を損なうことはない。対照的に、訓練された知性や訓練された愚かさは短期的な行動を最適化する。教養のある知性は本質的に良いが、訓練された愚かさは本質的に悪であり、恐れるべきである。特に、経済・社会集団の制御と最適化に注意が払われている。

This paper will examine what makes a being intelligent, whether that be a biological being or an artificial silicon being on a computer. Special attention will be paid to the being having the ability to characterize and control a collective system of many identical conservative sub-systems conservatively interacting. The essence of intelligence will be found to be the golden rule -- "the collective acts as one" or "knowing the global consequences of local actions". The flow of the collective is a small set of twinkling textures, that are governed by a puppeteer who is pulling a small number of strings according to a geodesic motion of least action, determined by the symmetries. Controlling collective conservative systems is difficult and has historically been done by adding significant viscosity to the system to stabilize the desirable meta stable equilibriums of maximum performance, but it degrades or destroys them in the process. There is an alternative. Once the optimum twinkling textures of the meta stable equilibriums are identified by the intelligent being (that is the collective system is characterized), the collective system can be moved by the intelligent being to the optimum twinkling textures, then quickly vibrated by the intelligent being according to the textures so that the collective system remains at the meta stable equilibrium. Well educated intelligence knows the global consequences of its local actions so that it will not take short term actions that will lead to poor long term outcomes. In contrast, trained intelligence or trained stupidity will optimize its short term actions, leading to poor long term outcomes. Well educated intelligence is inherently good, but trained stupidity is inherently evil and should be feared. Particular attention is paid to the control and optimization of economic and social collectives.

翻訳日:2024-01-17 23:22:02 公開日:2024-01-14

# 多感性属性の連続的公正なメカニズム

A Sequentially Fair Mechanism for Multiple Sensitive Attributes ( http://arxiv.org/abs/2309.06627v2 )

ライセンス: Link先を確認

Fran\c{c}ois Hu and Philipp Ratz and Arthur Charpentier

(参考訳) アルゴリズム的公平性の標準的なユースケースでは、敏感な変数と対応するスコアの関係をなくすことが目標である。近年、科学コミュニティは、この課題を解決するための多くの定義とツールを開発しており、多くの実用的な応用でうまく機能している。しかし、これらのツールや定義の適用性や効果性は、複数の敏感な属性の場合、それほど単純ではない。この問題に取り組むため,我々は,機密性の高い機能セットの公平性を段階的に達成するためのシーケンシャルフレームワークを提案する。マルチマルジナル・ワッサーシュタイン・バリセンタを利用することにより,複数の感度特性を持つ場合に対して,強デモグラフィック・パリティの標準概念を拡張する。この方法はまた、最適で逐次的に公正な予測器に対する閉形式解を提供し、感度の高い特徴相関を明確に解釈する。当社のアプローチは、リスクと不公平の間のトレードオフを緩和するフレームワークを包含することで、公平性をシームレスに拡張します。この拡張により、機密属性のセット内の特定の属性に対する公平性の改善を目標とする優先順位付けが可能となり、ケース固有の適応が可能になる。導出溶液のデータ駆動推定法を開発し,合成データと実データの両方について総合的な数値実験を行った。実験の結果は,公平な意思決定を育むための後処理アプローチの実際的効果を決定的に強調する。

In the standard use case of Algorithmic Fairness, the goal is to eliminate the relationship between a sensitive variable and a corresponding score. Throughout recent years, the scientific community has developed a host of definitions and tools to solve this task, which work well in many practical applications. However, the applicability and effectivity of these tools and definitions becomes less straightfoward in the case of multiple sensitive attributes. To tackle this issue, we propose a sequential framework, which allows to progressively achieve fairness across a set of sensitive features. We accomplish this by leveraging multi-marginal Wasserstein barycenters, which extends the standard notion of Strong Demographic Parity to the case with multiple sensitive characteristics. This method also provides a closed-form solution for the optimal, sequentially fair predictor, permitting a clear interpretation of inter-sensitive feature correlations. Our approach seamlessly extends to approximate fairness, enveloping a framework accommodating the trade-off between risk and unfairness. This extension permits a targeted prioritization of fairness improvements for a specific attribute within a set of sensitive attributes, allowing for a case specific adaptation. A data-driven estimation procedure for the derived solution is developed, and comprehensive numerical experiments are conducted on both synthetic and real datasets. Our empirical findings decisively underscore the practical efficacy of our post-processing approach in fostering fair decision-making.

翻訳日:2024-01-17 21:31:50 公開日:2024-01-14

# ダウンストリーム推論に不完全サロゲートを使用する:大規模言語モデルの社会科学への応用のための設計に基づく教師付き学習

Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models ( http://arxiv.org/abs/2306.04746v3 )

ライセンス: Link先を確認

Naoki Egami, Musashi Hinck, Brandon M. Stewart, Hanying Wei

(参考訳) 計算社会科学(css)では、研究者は文書を分析して社会・政治現象を説明する。多くのシナリオでは、CSS研究者がまずドキュメントのラベルを取得し、2番目のステップで解釈可能な回帰分析を使用してラベルを説明する。ドキュメントを安価にアノテートする一般的な方法のひとつに、大きな言語モデル(LLM)がある。しかし、他のスケーラブルなアノテーション生成方法と同様に、このような代理ラベルはしばしば不完全で偏りがある。本稿では,css研究の基礎となる漸近的不偏性や不確かさといった統計的性質を保証しつつ,下流統計解析に不完全アノテーションサロゲートを用いる新しいアルゴリズムを提案する。ダウンストリーム統計解析におけるサロゲートラベルの直接使用は,80～90%の精度のサロゲートラベルであっても,かなりのバイアスと不確実な信頼区間をもたらすことを示す。これを解決するために,設計に基づく教師あり学習(DSL)推定器を提案する。 dslは、サロゲートラベルとより少数の高品質のゴールド標準ラベルを組み合わせるために、二重ロバスト手順を採用している。提案手法は,ゴールド標準ラベリング用文書サンプリングの確率を制御することにより,代理が任意に偏り,厳密な仮定を必要としない場合でも,下流統計解析の有効な推測を保証する。理論的解析と実験の結果から,DSLは有意な統計的推測を提供する一方で,推定保証のない予測のみに焦点を当てた既存の代替手段に匹敵するルート平均2乗誤差を達成していることがわかった。

In computational social science (CSS), researchers analyze documents to explain social and political phenomena. In most scenarios, CSS researchers first obtain labels for documents and then explain labels using interpretable regression analyses in the second step. One increasingly common way to annotate documents cheaply at scale is through large language models (LLMs). However, like other scalable ways of producing annotations, such surrogate labels are often imperfect and biased. We present a new algorithm for using imperfect annotation surrogates for downstream statistical analyses while guaranteeing statistical properties -- like asymptotic unbiasedness and proper uncertainty quantification -- which are fundamental to CSS research. We show that direct use of surrogate labels in downstream statistical analyses leads to substantial bias and invalid confidence intervals, even with high surrogate accuracy of 80-90%. To address this, we build on debiased machine learning to propose the design-based supervised learning (DSL) estimator. DSL employs a doubly-robust procedure to combine surrogate labels with a smaller number of high-quality, gold-standard labels. Our approach guarantees valid inference for downstream statistical analyses, even when surrogates are arbitrarily biased and without requiring stringent assumptions, by controlling the probability of sampling documents for gold-standard labeling. Both our theoretical analysis and experimental results show that DSL provides valid statistical inference while achieving root mean squared errors comparable to existing alternatives that focus only on prediction without inferential guarantees.

翻訳日:2024-01-17 21:29:34 公開日:2024-01-14

# ビジュアルプログラミングのためのニューラルタスク合成

Neural Task Synthesis for Visual Programming ( http://arxiv.org/abs/2305.18342v3 )

ライセンス: Link先を確認

Victor-Alexandru P\u{a}durean, Georgios Tzannetos, Adish Singla

(参考訳) 生成型ニューラルモデルは、新しいコンテンツを合成することで、プログラミング教育の強化に大いに貢献する。視覚的プログラミング領域のコンテキストにおいて、与えられた仕様のプログラミングタスクを自動的に生成できるニューラルモデルを設計することを模索する。 GPT-4のような大規模生成モデルの成功にもかかわらず、初期の結果は、これらのモデルが視覚プログラミングのタスクを合成し、論理的および空間的推論に苦しむのに効果がないことを示している。本稿では,ニューラルシンボリックな手法であるNeurTaskSynを提案し,その解法コードと視覚的タスクの制約により,所望のプログラミング概念の形で与えられた仕様のプログラミングタスクを合成する。 neurtasksynには2つのコンポーネントがある。第一のコンポーネントは模倣学習手順でトレーニングされ、第二のコンポーネントは強化学習手順によってトレーニングされ、これらのコードに対して視覚的なタスクを生成する基盤となるシンボリック実行エンジンをガイドする。 Intro to Programming with Karel course by CodeHS-dot-com, Intro to Programming with Karel course by CodeHS-dot-com, Intro to Programming by Code-dot-org, and the Intro to Programming with Karel course by CodeHS-dot-com。

Generative neural models hold great promise in enhancing programming education by synthesizing new content. We seek to design neural models that can automatically generate programming tasks for a given specification in the context of visual programming domains. Despite the recent successes of large generative models like GPT-4, our initial results show that these models are ineffective in synthesizing visual programming tasks and struggle with logical and spatial reasoning. We propose a novel neuro-symbolic technique, NeurTaskSyn, that can synthesize programming tasks for a specification given in the form of desired programming concepts exercised by its solution code and constraints on the visual task. NeurTaskSyn has two components: the first component is trained via imitation learning procedure to generate possible solution codes, and the second component is trained via reinforcement learning procedure to guide an underlying symbolic execution engine that generates visual tasks for these codes. We demonstrate the effectiveness of NeurTaskSyn through an extensive empirical evaluation and a qualitative study on reference tasks taken from the Hour of Code: Classic Maze challenge by Code-dot-org and the Intro to Programming with Karel course by CodeHS-dot-com.

翻訳日:2024-01-17 21:29:06 公開日:2024-01-14

# 乱流: コードのための命令調整型大規模言語モデルの体系的および自動テスト

Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code ( http://arxiv.org/abs/2312.14856v2 )

ライセンス: Link先を確認

Shahin Honarvar, Mark van der Wilk, Alastair Donaldson

(参考訳) 本稿では,新しいベンチマークである乱流を用いて,命令調整型大規模言語モデル(LLM)のコード生成における正確性と堅牢性を体系的に評価する手法を提案する。 turbulence は、自然言語 $\textit{question templates}$ の大規模なセットで構成されており、それぞれがプログラミングの問題であり、様々な形式で問うことができるようにパラメータ化されている。各質問テンプレートには関連する$\textit{test oracle}$があり、llmによって返されるコードソリューションが正しいかどうかを判断する。したがって、単一の質問テンプレートから LLM に $\textit{neighbourhood}$ と非常に似たプログラミング質問を問うことができ、各質問に対して返された結果の正しさを評価することができる。例えば、$\textit{anomalies}$, LLMが近隣で$\textit{almost all}$を正しく解決するが、特定のパラメータのインスタンス化には失敗する。我々は,OpenAI,Cohere,Metaの5つのLLMに対して,それぞれ2つの温度構成で実験を行った。以上の結果から, 乱流はLLM推論能力のギャップを明らかにすることができることがわかった。 LLMが近隣の問題を解決することができるが、近隣全体の問題を解決するために一般化することができないケースを体系的に識別することによって、我々の手法は$\textit{robustness}$問題をハイライトするのに効果的である。我々は、llmが間違ったコード結果を返す際に犯す誤りの種類に光を当てるデータと例を示します。

We present a method for systematically evaluating the correctness and robustness of instruction-tuned large language models (LLMs) for code generation via a new benchmark, Turbulence. Turbulence consists of a large set of natural language $\textit{question templates}$, each of which is a programming problem, parameterised so that it can be asked in many different forms. Each question template has an associated $\textit{test oracle}$ that judges whether a code solution returned by an LLM is correct. Thus, from a single question template, it is possible to ask an LLM a $\textit{neighbourhood}$ of very similar programming questions, and assess the correctness of the result returned for each question. This allows gaps in an LLM's code generation abilities to be identified, including $\textit{anomalies}$ where the LLM correctly solves $\textit{almost all}$ questions in a neighbourhood but fails for particular parameter instantiations. We present experiments against five LLMs from OpenAI, Cohere and Meta, each at two temperature configurations. Our findings show that, across the board, Turbulence is able to reveal gaps in LLM reasoning ability. This goes beyond merely highlighting that LLMs sometimes produce wrong code (which is no surprise): by systematically identifying cases where LLMs are able to solve some problems in a neighbourhood but do not manage to generalise to solve the whole neighbourhood, our method is effective at highlighting $\textit{robustness}$ issues. We present data and examples that shed light on the kinds of mistakes that LLMs make when they return incorrect code results.

翻訳日:2024-01-17 21:18:48 公開日:2024-01-14

# $\mathbb{Z}_2\times \mathbb{Z}_2$ Equivariant Quantum Neural Networks: Benchmarking against Classical Neural Networks

$\mathbb{Z}_2\times \mathbb{Z}_2$ Equivariant Quantum Neural Networks: Benchmarking against Classical Neural Networks ( http://arxiv.org/abs/2311.18744v2 )

ライセンス: Link先を確認

Zhongtian Dong, Mar\c{c}al Comajoan Cara, Gopal Ramesh Dahale, Roy T. Forestano, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva, Eyup B. Unlu

(参考訳) 本稿では,EQNN(Equivariant Quantum Neural Networks)とQNN(Quantum Neural Networks)のパフォーマンスの総合的比較分析を行い,その古典的特徴であるENN(Equivariant Neural Networks)とDNN(Deep Neural Networks)とを比較した。各ネットワークの性能を,二分分類タスクにおける2つのトイ例を用いて評価し,モデルの複雑さ(パラメータ数による測定)とトレーニングデータセットのサイズに着目した。以上の結果から,$\mathbb{Z}_2\times \mathbb{Z}_2$ EQNNとQNNは,より小さいパラメータセットと控えめなトレーニングデータサンプルに対して優れた性能を示すことがわかった。

This paper presents a comprehensive comparative analysis of the performance of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks (QNN), juxtaposed against their classical counterparts: Equivariant Neural Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of each network with two toy examples for a binary classification task, focusing on model complexity (measured by the number of parameters) and the size of the training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$ EQNN and the QNN provide superior performance for smaller parameter sets and modest training data samples.

翻訳日:2024-01-17 21:17:03 公開日:2024-01-14

# 小系の量子熱力学:アノニックオットーエンジン

Quantum Thermodynamics of Small Systems: The Anyonic Otto Engine ( http://arxiv.org/abs/2401.07177v1 )

ライセンス: Link先を確認

H S Mani, Ramadas N, V V Sreedhar

(参考訳) 量子系に熱力学のアイデアを適用する最近の進歩は、量子統計のような純粋量子起源の非熱的非古典的エネルギー源を用いて、ボース・アインシュタイン凝縮のようなマクロ量子系における力学的仕事を抽出するという新しい展望を提起した。一方、熱力学の概念は単一分子や量子ドットのような小さな系にも適用されている。本稿では, 量子オットーエンジンの動作媒体として, 1つまたは2つのオンのみを用いる量子オットーエンジンに着目し, 小系の量子熱力学について検討する。公式は統計パラメータの関数としてのオットーエンジンの効率のために導出される。

Recent advances in applying thermodynamic ideas to quantum systems have raised the novel prospect of using non-thermal, non-classical sources of energy, of purely quantum origin, like quantum statistics, to extract mechanical work in macroscopic quantum systems like Bose-Einstein condensates. On the other hand, thermodynamic ideas have also been applied to small systems like single molecules and quantum dots. In this paper we study the quantum thermodynamics of small systems of anyons, with specific emphasis on the quantum Otto engine which uses, as its working medium, just one or two anyons. Formulae are derived for the efficiency of the Otto engine as a function of the statistics parameter.

翻訳日:2024-01-17 19:34:23 公開日:2024-01-14

# クロスモーダル一貫性を用いた自己教師付きイベントベース単眼深度推定

Self-supervised Event-based Monocular Depth Estimation using Cross-modal Consistency ( http://arxiv.org/abs/2401.07218v1 )

ライセンス: Link先を確認

Junyu Zhu, Lina Liu, Bofeng Jiang, Feng Wen, Hongbo Zhang, Wanlong Li, Yong Liu

(参考訳) イベントカメラは、ピクセルごとの明るさ変化をキャプチャし、非同期の ``events'' ストリームを出力できる、新しい視覚センサである。時間分解能が高く、ダイナミックレンジが高く、帯域幅が低く、消費電力が低く、動きがぼやけないため、高速モーションや照明条件に挑戦するシーンでは従来のカメラより優れている。そこで,従来のカメラでは難しいシーンに対処するために,イベントからの教師付き単眼深度推定がいくつか提案されている。しかし、深さアノテーションはコストと時間を要する。本稿では,アノテーションのコストを下げるために,EMoDepthという自己教師型イベントベース単分子深度推定フレームワークを提案する。 EMoDepthは、ピクセル座標内のイベントに整合した強度フレームからのクロスモーダル一貫性を使用して、トレーニングプロセスを制約する。さらに、推論では、単眼深度予測にはイベントのみを使用する。さらに,高い推論速度を維持しつつ,深度推定のための機能を効果的に融合するマルチスケールなスキップ接続アーキテクチャを設計した。 MVSECとDSECデータセットの実験では、私たちのコントリビューションが効果的であり、既存の教師付きイベントベースおよび教師なしフレームベースメソッドよりも精度が高いことが示されている。

An event camera is a novel vision sensor that can capture per-pixel brightness changes and output a stream of asynchronous ``events''. It has advantages over conventional cameras in those scenes with high-speed motions and challenging lighting conditions because of the high temporal resolution, high dynamic range, low bandwidth, low power consumption, and no motion blur. Therefore, several supervised monocular depth estimation from events is proposed to address scenes difficult for conventional cameras. However, depth annotation is costly and time-consuming. In this paper, to lower the annotation cost, we propose a self-supervised event-based monocular depth estimation framework named EMoDepth. EMoDepth constrains the training process using the cross-modal consistency from intensity frames that are aligned with events in the pixel coordinate. Moreover, in inference, only events are used for monocular depth prediction. Additionally, we design a multi-scale skip-connection architecture to effectively fuse features for depth estimation while maintaining high inference speed. Experiments on MVSEC and DSEC datasets demonstrate that our contributions are effective and that the accuracy can outperform existing supervised event-based and unsupervised frame-based methods.

翻訳日:2024-01-17 19:23:38 公開日:2024-01-14

# PT対称量子系における量子カオス

Quantum chaos in PT symmetric quantum systems ( http://arxiv.org/abs/2401.07215v1 )

ライセンス: Link先を確認

Kshitij Sharma, Himanshu Sahu and Subroto Mukerjee

(参考訳) 本研究では,非エルミート力学系における$\mathcal{pt}$-symmetryと量子カオスの相互作用を考察する。量子カオスの標準診断、すなわち複素レベル間隔比と時間外順序相関子(otocs)の拡張を考察し、$\mathcal{pt}$-symmetric quantum kick rotorモデルについて検討する。蹴られたローターは、古典的および量子的カオスを研究するためのパラダイム的動的システムと見なされてきた。量子キックローターに非ハーミティシティを導入することで、エルミート系に存在しない新しい位相と遷移を明らかにする。複素レベルの間隔比の研究から、3つのレジームを見つけ出した: 1つは積分可能で$\mathcal{pt}$-symmetry、もう1つは$\mathcal{pt}$-symmetryでカオス的、もう1つはカオス的だが破れた$\mathcal{pt}$-symmetryである。複素レベル間隔比は3つの相を区別できることがわかった。 OTOCの計算は、半古典的極限における古典的リャプノフ指数の計算と関係があるので、これらの状態と位相境界におけるその性質について検討する。 $\mathcal{PT}$-対称性の位相において、OTOCは積分可能およびカオス的状態の両方においてエルミート系で観察されるような振る舞いを示す。さらに、$\mathcal{PT}$-対称性の破れ相において、OTOCは後代の固有値スペクトルの複素性質から生じる追加の指数的成長を示す。我々はオトクの後期行動の分析形態を導出する。正規化OTOCを定義して、$\mathcal{PT}$-対称性の破れによる影響を軽減することにより、OTOCは$\mathcal{PT}$-対称性のカオス相から$\mathcal{PT}$-対称性の破れ、カオス相への遷移において特異な挙動を示すことを示す。

In this study, we explore the interplay between $\mathcal{PT}$-symmetry and quantum chaos in a non-Hermitian dynamical system. We consider an extension of the standard diagnostics of quantum chaos, namely the complex level spacing ratio and out-of-time-ordered correlators (OTOCs), to study the $\mathcal{PT}$-symmetric quantum kicked rotor model. The kicked rotor has long been regarded as a paradigmatic dynamic system to study classical and quantum chaos. By introducing non-Hermiticity in the quantum kicked rotor, we uncover new phases and transitions that are absent in the Hermitian system. From the study of the complex level spacing ratio, we locate three regimes -- one which is integrable and $\mathcal{PT}$-symmetry, another which is chaotic with $\mathcal{PT}$-symmetry and a third which is chaotic but with broken $\mathcal{PT}$-symmetry. We find that the complex level spacing ratio can distinguish between all three phases. Since calculations of the OTOC can be related to those of the classical Lyapunov exponent in the semi-classical limit, we investigate its nature in these regimes and at the phase boundaries. In the phases with $\mathcal{PT}$-symmetry, the OTOC exhibits behaviour akin to what is observed in the Hermitian system in both the integrable and chaotic regimes. Moreover, in the $\mathcal{PT}$-symmetry broken phase, the OTOC demonstrates additional exponential growth stemming from the complex nature of the eigenvalue spectrum at later times. We derive the analytical form of the late-time behaviour of the OTOC. By defining a normalized OTOC to mitigate the effects caused by $\mathcal{PT}$-symmetry breaking, we show that the OTOC exhibits singular behaviour at the transition from the $\mathcal{PT}$-symmetric chaotic phase to the $\mathcal{PT}$-symmetry broken, chaotic phase.

翻訳日:2024-01-17 19:23:18 公開日:2024-01-14

# 深度非依存単一画像デハジング

Depth-agnostic Single Image Dehazing ( http://arxiv.org/abs/2401.07213v1 )

ライセンス: Link先を確認

Honglei Xu and Yan Shu and Shaohui Liu

(参考訳) 単一画像デハジングは困難な不適切な問題である。ディープラーニングベースのメソッドをトレーニングするための既存のデータセットは、手作りまたは合成スキームによって生成される。しかし、前者は小さなスケールに悩まされることが多く、後者はヘイズ分布ではなくシーン深度を学習させ、デハジング能力を低下させる。そこで本研究では,深度に依存しないデータセット(DA-HAZE)を生成することで,ヘイズ密度とシーン深度の関係を分離する合成手法を提案する。一方、異なるスケールのデータセットを生成するため、Global Shuffle Strategy(GSS)が提案され、モデルの一般化能力が向上する。 DA-HAZEでトレーニングされたモデルは、SOTSとDA-SOTS(DA-HAZEのテストセット)の差が少なく、現実世界のベンチマークで大幅に改善されている。さらに、深さに依存しないデハジングは、より複雑なタスクである。したがって、より強力な特徴モデリング能力と計算コストの少ない効率的なアーキテクチャが必要である。我々は、専用に設計されたブロックを組み込んだデハージングのために、U-Netベースのアーキテクチャを再考する。しかし,ブロックの性能は限定的な特徴融合法によって制限される。この目的のために我々は,バニラ特徴融合法により最小限のコストで有望な結果が得られるConvolutional Skip Connection (CSC) モジュールを提案する。広範な実験結果から,最先端の手法が証明された。 CSCを備えることで、シーンの深さに関係のあるヘイズ分布であっても、より優れたパフォーマンスと合理的な計算コストを達成することができる。

Single image dehazing is a challenging ill-posed problem. Existing datasets for training deep learning-based methods can be generated by hand-crafted or synthetic schemes. However, the former often suffers from small scales, while the latter forces models to learn scene depth instead of haze distribution, decreasing their dehazing ability. To overcome the problem, we propose a simple yet novel synthetic method to decouple the relationship between haze density and scene depth, by which a depth-agnostic dataset (DA-HAZE) is generated. Meanwhile, a Global Shuffle Strategy (GSS) is proposed for generating differently scaled datasets, thereby enhancing the generalization ability of the model. Extensive experiments indicate that models trained on DA-HAZE achieve significant improvements on real-world benchmarks, with less discrepancy between SOTS and DA-SOTS (the test set of DA-HAZE). Additionally, Depth-agnostic dehazing is a more complicated task because of the lack of depth prior. Therefore, an efficient architecture with stronger feature modeling ability and fewer computational costs is necessary. We revisit the U-Net-based architectures for dehazing, in which dedicatedly designed blocks are incorporated. However, the performances of blocks are constrained by limited feature fusion methods. To this end, we propose a Convolutional Skip Connection (CSC) module, allowing vanilla feature fusion methods to achieve promising results with minimal costs. Extensive experimental results demonstrate that current state-of-the-art methods. equipped with CSC can achieve better performance and reasonable computational expense, whether the haze distribution is relevant to the scene depth.

翻訳日:2024-01-17 19:22:40 公開日:2024-01-14

# アンサンブルモデルによるクラスインクリメンタル学習の強化

Enhanced Few-Shot Class-Incremental Learning via Ensemble Models ( http://arxiv.org/abs/2401.07208v1 )

ライセンス: Link先を確認

Mingli Zhu, Zihao Zhu, Sihong Chen, Chen Chen, Baoyuan Wu

(参考訳) few-shot class-incremental learning (fscil) は、新しいクラスを限られたトレーニングデータに継続的に適合させることを目的としている。主な課題は、珍しい新しいトレーニングサンプルを過度に適合させ、古いクラスを忘れることである。破滅的な忘れ物の研究が盛んに行われているが、過度に適合する問題はFSCILではあまり注目されていない。課題を克服するために,データ拡張と協調して一般化を促進する新しいアンサンブルモデルフレームワークを設計した。このように拡張モデルは、下流タスクへの迅速な適応を保証するために、豊富な機能を格納するライブラリとして機能する。具体的には、多入力多出力アンサンブル構造に空間認識データ拡張戦略を適用し、特徴抽出器の多様化と増分セッションにおける過度な適合の緩和を図る。さらに、モデル一般化をさらに改善するために、自己教師付き学習も統合されている。包括的実験により,提案手法はfscilのオーバーフィッティング問題を実際に軽減し,最先端手法を上回った。

Few-shot class-incremental learning (FSCIL) aims to continually fit new classes with limited training data, while maintaining the performance of previously learned classes. The main challenges are overfitting the rare new training samples and forgetting old classes. While catastrophic forgetting has been extensively studied, the overfitting problem has attracted less attention in FSCIL. To tackle overfitting challenge, we design a new ensemble model framework cooperated with data augmentation to boost generalization. In this way, the enhanced model works as a library storing abundant features to guarantee fast adaptation to downstream tasks. Specifically, the multi-input multi-output ensemble structure is applied with a spatial-aware data augmentation strategy, aiming at diversifying the feature extractor and alleviating overfitting in incremental sessions. Moreover, self-supervised learning is also integrated to further improve the model generalization. Comprehensive experimental results show that the proposed method can indeed mitigate the overfitting problem in FSCIL, and outperform the state-of-the-art methods.

翻訳日:2024-01-17 19:22:15 公開日:2024-01-14

# コンパクト内部表現を用いた教師なし領域適応

Unsupervised Domain Adaptation Using Compact Internal Representations ( http://arxiv.org/abs/2401.07207v1 )

ライセンス: Link先を確認

Mohammad Rostami

(参考訳) 教師なしドメイン適応に取り組むための主要なテクニックは、ソースとターゲットの両方のドメインからデータポイントを共有埋め込み空間にマッピングすることである。埋め込み空間へのマッピングエンコーダは、埋め込み空間がドメイン非依存になるように訓練され、ソースドメインで訓練された分類器が対象領域でうまく一般化できる。非教師なしドメイン適応(unsupervised domain adaptation, uda)の性能をさらに高めるために, ソース領域の内部分布をよりコンパクトにし, 対象領域に一般化するモデルの能力を向上させる付加的手法を開発し, 埋め込み空間における異なるクラスに対するデータ表現間のマージンを増大させることにより, udaのモデル性能を向上させることを実証する。内部表現をよりコンパクトにするために、内部学習されたソースドメインのマルチモーダル分布をガウス混合モデル(gmm)として推定する。推定したGMMを用いて、ソースドメイン内の異なるクラス間の分離を強化し、ドメインシフトの影響を軽減する。我々は,提案手法を覆すために理論的分析を行う。提案手法の有効性を評価するため,広く使用されているUDAベンチマークUDAデータセットを用いて実験を行った。その結果,本手法はモデルの一般化性を向上し,既存の手法よりも優れていた。

A major technique for tackling unsupervised domain adaptation involves mapping data points from both the source and target domains into a shared embedding space. The mapping encoder to the embedding space is trained such that the embedding space becomes domain agnostic, allowing a classifier trained on the source domain to generalize well on the target domain. To further enhance the performance of unsupervised domain adaptation (UDA), we develop an additional technique which makes the internal distribution of the source domain more compact, thereby improving the model's ability to generalize in the target domain.We demonstrate that by increasing the margins between data representations for different classes in the embedding space, we can improve the model performance for UDA. To make the internal representation more compact, we estimate the internally learned multi-modal distribution of the source domain as Gaussian mixture model (GMM). Utilizing the estimated GMM, we enhance the separation between different classes in the source domain, thereby mitigating the effects of domain shift. We offer theoretical analysis to support outperofrmance of our method. To evaluate the effectiveness of our approach, we conduct experiments on widely used UDA benchmark UDA datasets. The results indicate that our method enhances model generalizability and outperforms existing techniques.

翻訳日:2024-01-17 19:21:57 公開日:2024-01-14

# 斜め射影を用いた確率的低次元ベクトル自己回帰モデリング

Probabilistic Reduced-Dimensional Vector Autoregressive Modeling with Oblique Projections ( http://arxiv.org/abs/2401.07206v1 )

ライセンス: Link先を確認

Yanfang Mo and S. Joe Qin

(参考訳) 本稿では,高次元雑音データから低次元ダイナミクスを抽出する確率的還元次元ベクトル自己回帰モデルを提案する。このモデルは斜射影を用いて、測定空間を縮小次元ダイナミクスと相補的な静的部分空間に対応する部分空間に分割する。予測誤差共分散に関する最良の予測可能性のために最適な斜め分解を求める。そこで我々は,最大可能性と予測最大化(EM)フレームワークを用いた反復PredVARアルゴリズムを開発した。このアルゴリズムは、潜在ダイナミクスと最適斜め射影の見積もりを交互に更新し、ランク順の予測可能性を持つ動的潜在変数と、外部射影モデルと一致する明示的潜在varモデルを生成する。合成ロレンツ系とイーストマン化学の工業プロセスから得られたデータセットを用いて,提案手法の優れた性能と効率を実証した。

In this paper, we propose a probabilistic reduced-dimensional vector autoregressive (PredVAR) model to extract low-dimensional dynamics from high-dimensional noisy data. The model utilizes an oblique projection to partition the measurement space into a subspace that accommodates the reduced-dimensional dynamics and a complementary static subspace. An optimal oblique decomposition is derived for the best predictability regarding prediction error covariance. Building on this, we develop an iterative PredVAR algorithm using maximum likelihood and the expectation-maximization (EM) framework. This algorithm alternately updates the estimates of the latent dynamics and optimal oblique projection, yielding dynamic latent variables with rank-ordered predictability and an explicit latent VAR model that is consistent with the outer projection model. The superior performance and efficiency of the proposed approach are demonstrated using data sets from a synthesized Lorenz system and an industrial process from Eastman Chemical.

翻訳日:2024-01-17 19:21:35 公開日:2024-01-14

# Crafter: ディープモデルにおけるインバージョンベースのアイデンティティ盗難に対する顔認識

Crafter: Facial Feature Crafting against Inversion-based Identity Theft on Deep Models ( http://arxiv.org/abs/2401.07205v1 )

ライセンス: Link先を確認

Shiming Wang, Zhe Ji, Liyao Xiang, Hao Zhang, Xinbing Wang, Chenghu Zhou, Bo Li

(参考訳) エッジにおける機能向上(モバイルデバイスなど)と、より厳しいプライバシー要件により、ディープラーニング対応アプリケーションがエッジで機密性の高い生データを前処理し、さらに処理するために機能をバックエンドクラウドに送信する、という最近のトレンドになっている。典型的なアプリケーションは、異なる個人から収集された顔画像に対して機械学習(ML)サービスを実行することである。アイデンティティの盗難を防止するため、従来の手法では、その特徴からアイデンティティ情報を隠蔽するための対戦ゲームベースのアプローチが一般的である。しかし、そのような手法は攻撃者が既知の防御戦略に対して反撃を行う適応攻撃に対して防御することはできない。本稿では,機械学習タスクがクラウド上で適切に実行されることを保証しつつ,適応型モデル反転攻撃から識別情報を保護するために,エッジに展開する特徴工法であるCrafterを提案する。重要な防御戦略は、攻撃者がプライベートアイデンティティについてほとんど得ることができない非プライベートに攻撃者を誤解させることである。この場合、製作された機能は、適応型モデル更新を伴う攻撃者のための毒の訓練サンプルのように振る舞う。実験の結果,crafterは,最先端のゲームベース手法では達成できない基本攻撃と可能な適応攻撃の両方を効果的に防御できることが示されている。

With the increased capabilities at the edge (e.g., mobile device) and more stringent privacy requirement, it becomes a recent trend for deep learning-enabled applications to pre-process sensitive raw data at the edge and transmit the features to the backend cloud for further processing. A typical application is to run machine learning (ML) services on facial images collected from different individuals. To prevent identity theft, conventional methods commonly rely on an adversarial game-based approach to shed the identity information from the feature. However, such methods can not defend against adaptive attacks, in which an attacker takes a countermove against a known defence strategy. We propose Crafter, a feature crafting mechanism deployed at the edge, to protect the identity information from adaptive model inversion attacks while ensuring the ML tasks are properly carried out in the cloud. The key defence strategy is to mislead the attacker to a non-private prior from which the attacker gains little about the private identity. In this case, the crafted features act like poison training samples for attackers with adaptive model updates. Experimental results indicate that Crafter successfully defends both basic and possible adaptive attacks, which can not be achieved by state-of-the-art adversarial game-based methods.

翻訳日:2024-01-17 19:21:18 公開日:2024-01-14

# 知覚的プロキシとしての圧縮画像表現の探索

Exploring Compressed Image Representation as a Perceptual Proxy: A Study ( http://arxiv.org/abs/2401.07200v1 )

ライセンス: Link先を確認

Chen-Hsiu Huang and Ja-Ling Wu

(参考訳) 本稿では,対象分類タスクと解析変換を併用したエンドツーエンド学習画像圧縮コーデックを提案する。本研究は、圧縮された潜在表現が、カスタマイズされたDNNベースの品質指標に匹敵する精度で人間の知覚距離判定を予測できることを確認した。さらに,様々なニューラルエンコーダを調査し,画像課題に対する知覚損失ネットワークとしての解析変換の有効性を,品質判断を超えて実証する。実験の結果,市販のニューラルエンコーダは,付加的なVGGネットワークを必要とせず,知覚モデリングに熟練していることがわかった。この研究は、セマンティック認識とコーディング効率のよいニューラルエンコーダの貴重な参照開発として役立つことを期待している。

We propose an end-to-end learned image compression codec wherein the analysis transform is jointly trained with an object classification task. This study affirms that the compressed latent representation can predict human perceptual distance judgments with an accuracy comparable to a custom-tailored DNN-based quality metric. We further investigate various neural encoders and demonstrate the effectiveness of employing the analysis transform as a perceptual loss network for image tasks beyond quality judgments. Our experiments show that the off-the-shelf neural encoder proves proficient in perceptual modeling without needing an additional VGG network. We expect this research to serve as a valuable reference developing of a semantic-aware and coding-efficient neural encoder.

翻訳日:2024-01-17 19:20:53 公開日:2024-01-14

# 単層WSe$_2$:幾何量子速度限界における励起子谷ダイナミクスのレーザーフィールドデチューニングによる最適化

Laser-field detuning assisted optimization of exciton valley dynamics in monolayer WSe$_2$: Geometric quantum speed limit ( http://arxiv.org/abs/2401.07191v1 )

ライセンス: Link先を確認

Kang Lan, Shijie Xie, and Jiyong Fu

(参考訳) バレーダイナミクスの最適化は、2次元半導体の文脈でキュービットを正確に操作するための有効な手段である。本研究では,単層膜WSe$_2$における励起子の内部チャネルと間隔チャネルの両方を包含する包括的モデルを構築し,同時に光-物質相互作用を考慮し,初期コヒーレント励起子状態によるバレーダイナミクスの最適制御について検討する。量子速度限界(QSL)理論に基づき、目標状態に達する谷のダイナミクスの進化時間を削減するための2つの最適制御スキームを提案し、時間とともに進化速度を向上する。さらに, 動的最適化の実施は, 光励起モードと磁気誘起谷分割により決定される, K-K'谷間における励起子-レーザー磁場の変形差と密接に関連していることを強調した。特に、小さな調律差が実際の力学経路を初期状態と最終状態の間の測地線の長さに向かって収束させ、最小の時間でシステムが進化することを明らかにする。特に谷のコヒーレンスの存在下では、実際の進化時間と計算されたQSL時間がほぼ一致し、谷の量子ビットに基づく情報伝達の忠実度が高い。顕著なことに,初期分極を伴わずに谷の偏極を生じさせる大きな微調整差を採用することにより,谷の力学の進化速度の興味深い向上を示す。我々の研究は、バレートロニクス応用における励起物理学の光学的チューニングのための新しいパラダイムを開き、また、量子ビットにおける情報伝送の速度制限のような緊急問題に対する解決策を提供するかもしれない。

Optimizing valley dynamics is an effective instrument towards precisely manipulating qubit in the context of two-dimensional semiconductor. In this work, we construct a comprehensive model, involving both intra- and intervalley channels of excitons in monolayer WSe$_2$, and simultaneously takes the light-matter interaction into account, to investigate the optimal control of valley dynamics with an initial coherent excitonic state. Based on the quantum speed limit (QSL) theory, we propose two optimal control schemes aiming to reduce the evolution time of valley dynamics reaching the target state, along with to boost the evolution speed over a period of time. Further, we emphasize that the implementation of dynamical optimization is closely related to the detuning difference -- the difference of exciton-laser field detunings between the K and K' valleys -- which is determined by the optical excitation mode and magnetically-induced valley splitting. In particular, we reveal that a small detuning difference drives the actual dynamical path to converge towards the geodesic length between the initial and final states, allowing the system to evolve with the least time. Especially, in the presence of valley coherence, the actual evolution time and the calculated QSL time almost coincide, facilitating high fidelity in information transmission based on the valley qubit. Remarkably, we demonstrate an intriguing enhancement in evolution speed of valley dynamics, by adopting a large detuning difference, which induces an emerging valley polarization even without initial polarization. Our work opens a new paradigm for optically tuning excitonic physics in valleytronic applications, and may also offer solutions to some urgent problems such as speed limit of information transmission in qubit.

翻訳日:2024-01-17 19:20:41 公開日:2024-01-14

# 構造化データ自然言語ビジェクションへの道のりとLLMアノテーションの役割

Inroads to a Structured Data Natural Language Bijection and the role of LLM annotation ( http://arxiv.org/abs/2401.07190v1 )

ライセンス: Link先を確認

Blake Vente

(参考訳) この研究は、シーケンシャル・ツー・シーケンシャルなトランスフォーマー言語モデルで複数のタスクを使用すると、いくつかのメトリクスのパフォーマンスが向上する、という理論を裏付ける限られた証拠を見出している。特に、マルチタスクのジェネラリスト t5-小は、F_1$$0.771$から0.692$まで、専門家 t5-小よりも優れている。これはさらに、同じネットワークであっても、異なる方法で同じデータを"再使用"することは、いくつかのメトリクスでより高いパフォーマンスにつながる可能性があることを示唆している。しかし、逆タスクだけでは最適化戦略に過ぎず、この研究で探索されたモデルサイズにおいて、大幅な全体的な改善は得られない。また、$\approx 4500$ LLMアノテートレコード($12800$ WebNLGトレーニングレコードに組み込まれている)を追加すると、合成データのない同じt5小モデルと比較して、自動メートル法のパフォーマンスは大幅に変化しない。これはモデルサイズによる学習能力のボトルネックによるものかもしれないし、観察された減少はコーパスの分布的差異によるものかもしれない。より大きなモデルや人的評価を用いた将来の研究は、これらのタスクのパフォーマンスに寄与するメカニズムをより完全に説明する必要がある。

This work finds limited evidence supporting the theory that using multiple tasks with sequence-to-sequence transformer language models can improve performance on some metrics. In particular, the multi-task generalist t5-small outperforms the specialist t5-small with a $F_1$ of $0.771$ up from $0.692$, which may point to underlying cross-task knowledge generalization. This further suggests that even with the same network, "re-using" the same data in a different way may lead to higher performance in some metrics. However, the inverse task alone is likely only an optimization strategy, since it does not yield a significant general improvement at the model sizes explored in this work. Also, adding $\approx 4500$ LLM annotated records (interlaced with the $12800$ WebNLG training records) does not substantially change automatic metric performance compared to the same t5-small model without the synthetic data. This may be due to a learning capacity bottleneck on account of model size, and decreases observed may be due to distributional differences in the corpora. Future research using larger models or human evaluation is required to more fully explain the mechanisms contributing to performance on these tasks.

翻訳日:2024-01-17 19:20:09 公開日:2024-01-14

# ステレオネットワークにおける敵攻撃の左右差

Left-right Discrepancy for Adversarial Attack on Stereo Networks ( http://arxiv.org/abs/2401.07188v1 )

ライセンス: Link先を確認

Pengfei Wang, Xiaofei Hui, Beijia Lu, Nimrod Lilith, Jun Liu, Sameer Alam

(参考訳) ステレオマッチングニューラルネットワークは、左右の画像から中間的特徴を抽出するシームズ構造を含むことが多い。これらの中間的な左右の特徴の類似性は、差分推定の精度に大きな影響を及ぼす。本稿では,左右画像の特徴の相違を最大化するために特別に設計された摂動雑音を生成する新しい攻撃手法を提案する。例えば、KITTIデータセットでは219%のMAE、Scene Flowデータセットでは85%のMAEで既存の最先端攻撃手法より優れている。さらに,このアプローチを拡張して,ステレオニューラルネットワークへのアクセスを不要とした,プロキシネットワークブラックボックス攻撃手法も導入した。この方法は、異なるビジョンタスクから任意のネットワークをプロキシとして活用し、逆ノイズを生成し、ステレオネットワークが誤った予測を効果的に生み出す。本研究は,立体視システムの強靭性向上に寄与する貴重な知見を提供するため,浅層構造における不一致に対するステレオネットワークの顕著な感度を強調した。

Stereo matching neural networks often involve a Siamese structure to extract intermediate features from left and right images. The similarity between these intermediate left-right features significantly impacts the accuracy of disparity estimation. In this paper, we introduce a novel adversarial attack approach that generates perturbation noise specifically designed to maximize the discrepancy between left and right image features. Extensive experiments demonstrate the superior capability of our method to induce larger prediction errors in stereo neural networks, e.g. outperforming existing state-of-the-art attack methods by 219% MAE on the KITTI dataset and 85% MAE on the Scene Flow dataset. Additionally, we extend our approach to include a proxy network black-box attack method, eliminating the need for access to stereo neural network. This method leverages an arbitrary network from a different vision task as a proxy to generate adversarial noise, effectively causing the stereo network to produce erroneous predictions. Our findings highlight a notable sensitivity of stereo networks to discrepancies in shallow layer features, offering valuable insights that could guide future research in enhancing the robustness of stereo vision systems.

翻訳日:2024-01-17 19:19:41 公開日:2024-01-14

# 深層学習の統計理論に関する調査研究:近似, トレーニングダイナミクス, 生成モデル

A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models ( http://arxiv.org/abs/2401.07187v1 )

ライセンス: Link先を確認

Namjoon Suh and Guang Cheng

(参考訳) 本稿では,3つの観点から,ニューラルネットワークの統計理論に関する文献をレビューする。第一部では、回帰または分類の非パラメトリックフレームワークにおいて、ニューラルネットワークの過剰リスクに関する結果についてレビューする。これらの結果はニューラルネットワークの明示的な構築に依存しており、近似理論からのツールが採用されているため、過剰リスクの高速収束率につながる。これらの構成を通して、ネットワークの幅と深さは、サンプルサイズ、データ次元、関数の滑らかさという観点から表現できる。それでも、その基盤となる分析は、ディープニューラルネットワークの非凸な状況におけるグローバルな最小化にのみ適用される。これは、第2部のニューラルネットワークのトレーニングダイナミクスをレビューする動機となります。具体的には、勾配に基づく手法でトレーニングされたニューラルネットワークが、目に見えないデータに対してうまく一般化できるソリューションを見つける方法」に答えようとする論文をレビューする。特に、ニューラルネットワークカーネル(NTK)パラダイムと平均フィールド(MF)パラダイムの2つのよく知られたパラダイムがレビューされている。最後に,GAN(Generative Adversarial Networks)や拡散モデル,Large Language Models(LLMs)におけるICL(In-context Learning)などの生成モデルに関する最近の理論的進歩について概説する。以前の2つのモデルは、現代の生成AI時代の主要な柱として知られており、ICLは、文脈におけるいくつかの例から学ぶLLMの強力な能力である。最後に,深層学習理論に期待できるいくつかの方向性を提案する。

In this article, we review the literature on statistical theories of neural networks from three perspectives. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression or classification. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks, in that tools from the approximation theory are adopted. Through these constructions, the width and depth of the networks can be expressed in terms of sample size, data dimension, and function smoothness. Nonetheless, their underlying analysis only applies to the global minimizer in the highly non-convex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review papers that attempt to answer ``how the neural network trained via gradient-based methods finds the solution that can generalize well on unseen data.'' In particular, two well-known paradigms are reviewed: the Neural Tangent Kernel (NTK) paradigm, and Mean-Field (MF) paradigm. In the last part, we review the most recent theoretical advancements in generative models including Generative Adversarial Networks (GANs), diffusion models, and in-context learning (ICL) in the Large Language Models (LLMs). The former two models are known to be the main pillars of the modern generative AI era, while ICL is a strong capability of LLMs in learning from a few examples in the context. Finally, we conclude the paper by suggesting several promising directions for deep learning theory.

翻訳日:2024-01-17 19:19:24 公開日:2024-01-14

# 量子アニーリングによる近似最適化におけるスケーリングアドバンテージ

Scaling Advantage in Approximate Optimization with Quantum Annealing ( http://arxiv.org/abs/2401.07184v1 )

ライセンス: Link先を確認

Humberto Munoz Bauza and Daniel A. Lidar

(参考訳) 量子アニーリング(quantum annealing)は、量子進化を利用して最低エネルギー状態を見つけるヒューリスティック最適化アルゴリズムである。量子アニーラは近年、より大きく、より高度に連結された離散最適化と量子シミュレーションの問題に取り組むために拡大している。しかし、多くの試みにもかかわらず、量子アニールハードウェアを用いた正確な最適化における計算量子の優位性はいまだ解明されていない。ここでは、近似最適化における量子アニールスケーリングの利点を示す。利点は古典的ヒューリスティックアルゴリズム(PT-ICM)に比較して、アイソエネルゲティッククラスタ移動(PT-ICM)による並列テンパリングである。このセッティングは、高精度スピンスピン相互作用を持つ2次元スピングラス問題の族である。この利点を得るために、我々は量子アニール補正(QAC)を実装し、D波アドバンテージ量子アニールの性質を利用したビットフリップ誤り訂正符号をエネルギーペナルティで埋め込み、次数5の相互作用グラフ上で1300以上の誤り抑制論理量子ビットを生成する。このグラフ上でランダムなスピングラスのインスタンスを生成し、低エネルギー状態に対する時間-解法の一般化である時間-エプシロンをベンチマークする。その結果,QACではPT-ICMよりも少なくとも1.0%の最適性ギャップを有する低エネルギー状態のサンプリングにおいて,量子アニールがスケーリング上の優位性を示すことがわかった。これは近似最適化におけるアルゴリズム量子スピードアップの最初の実演である。

Quantum annealing is a heuristic optimization algorithm that exploits quantum evolution to approximately find lowest energy states. Quantum annealers have scaled up in recent years to tackle increasingly larger and more highly connected discrete optimization and quantum simulation problems. Nevertheless, despite numerous attempts, a computational quantum advantage in exact optimization using quantum annealing hardware has so far remained elusive. Here, we present evidence for a quantum annealing scaling advantage in approximate optimization. The advantage is relative to the top classical heuristic algorithm: parallel tempering with isoenergetic cluster moves (PT-ICM). The setting is a family of 2D spin-glass problems with high-precision spin-spin interactions. To achieve this advantage, we implement quantum annealing correction (QAC): an embedding of a bit-flip error-correcting code with energy penalties that leverages the properties of the D-Wave Advantage quantum annealer to yield over 1,300 error-suppressed logical qubits on a degree-5 interaction graph. We generate random spin-glass instances on this graph and benchmark their time-to-epsilon, a generalization of the time-to-solution metric for low-energy states. We demonstrate that with QAC, quantum annealing exhibits a scaling advantage over PT-ICM at sampling low energy states with an optimality gap of at least 1.0%. This amounts to the first demonstration of an algorithmic quantum speedup in approximate optimization.

翻訳日:2024-01-17 19:18:50 公開日:2024-01-14

# LLMフィードバックからの強化学習と対向ゴールミスジェネリゼーション

Reinforcement Learning from LLM Feedback to Counteract Goal Misgeneralization ( http://arxiv.org/abs/2401.07181v1 )

ライセンス: Link先を確認

Houda Nait El Barj, Theophile Sautory

(参考訳) 本稿では,大規模言語モデル(LLM)フィードバックを活用した強化学習(RL)における目標誤一般化に対処する手法を提案する。目標の一般化(goal misgeneralization) rlにおける堅牢性障害の一種は、エージェントが分散能力を保持しながら、意図したものではなくプロキシを追求した場合に発生する。本手法はLLMを用いて,トレーニング中のRLエージェントのポリシーを分析し,潜在的な障害シナリオを特定する。 RLエージェントはこれらのシナリオにデプロイされ、LLMの好みとフィードバックを通じて報酬モデルが学習される。このLLMインフォームド報酬モデルを使用して、元のデータセット上でRLエージェントをさらに訓練する。本手法を迷路ナビゲーションタスクに適用し,特に真とプロキシの目標がある程度区別可能であり,行動バイアスが顕著な場合に,目標一般化の顕著な改善を示す。本研究は、LLMがタスク能力の不足にもかかわらず、効率的にRLエージェントを監督し、LLMを用いてRLにおける目標指向学習を強化するためのスケーラブルな監視と価値ある洞察を提供する方法を示す。

We introduce a method to address goal misgeneralization in reinforcement learning (RL), leveraging Large Language Model (LLM) feedback during training. Goal misgeneralization, a type of robustness failure in RL occurs when an agent retains its capabilities out-of-distribution yet pursues a proxy rather than the intended one. Our approach utilizes LLMs to analyze an RL agent's policies during training and identify potential failure scenarios. The RL agent is then deployed in these scenarios, and a reward model is learnt through the LLM preferences and feedback. This LLM-informed reward model is used to further train the RL agent on the original dataset. We apply our method to a maze navigation task, and show marked improvements in goal generalization, especially in cases where true and proxy goals are somewhat distinguishable and behavioral biases are pronounced. This study demonstrates how the LLM, despite its lack of task proficiency, can efficiently supervise RL agents, providing scalable oversight and valuable insights for enhancing goal-directed learning in RL through the use of LLMs.

翻訳日:2024-01-17 19:18:11 公開日:2024-01-14

# 欧州のGDP予測とテキストデータ

Forecasting GDP in Europe with Textual Data ( http://arxiv.org/abs/2401.07179v1 )

ライセンス: Link先を確認

Luca Barbaglia, Sergio Consoli, Sebastiano Manzan

(参考訳) 我々は、欧州5大経済圏の国内総生産(gdp)およびその他のマクロ経済変数を予測するためのニュースベースの感情指標の情報内容を評価する。われわれのデータセットには、5つの言語で26の新聞の2700万記事が含まれている。これらの指標はマクロ経済変数を予測するための重要な予測因子であり、予測内容はリアルタイムに予測者が利用できる他の指標の制御に堅牢であることを示す。

We evaluate the informational content of news-based sentiment indicators for forecasting Gross Domestic Product (GDP) and other macroeconomic variables of the five major European economies. Our data set includes over 27 million articles for 26 major newspapers in 5 different languages. The evidence indicates that these sentiment indicators are significant predictors to forecast macroeconomic variables and their predictive content is robust to controlling for other indicators available to forecasters in real-time.

翻訳日:2024-01-17 19:17:34 公開日:2024-01-14

# 幾何誤差最小化による都市景観の超解像

City Scene Super-Resolution via Geometric Error Minimization ( http://arxiv.org/abs/2401.07272v1 )

ライセンス: Link先を確認

Zhengyang Lu and Feng Wang

(参考訳) 超解像技術は画像の粒度向上に不可欠であり、特に複雑な都市部では、幾何学的構造を保存することが、データインフォームドな文化遺産の応用に不可欠である。本稿では,幾何学的誤差最小化による都市景観超解法を提案する。幾何一貫性機構は、ハフ変換を利用して都市景観の規則的な幾何学的特徴を抽出し、低解像度画像と高解像度画像の間の幾何学的誤差の計算を可能にする。超解像過程における混合平均二乗誤差と幾何整合誤差を最小化することにより、提案手法は詳細および幾何正則性を効率的に復元する。 SET14,BSD300,Cityscapes,GSV-Citiesのデータセットに対する広範囲な検証は,提案手法が既存の最先端手法,特に都市シーンにおいて優れていることを示す。

Super-resolution techniques are crucial in improving image granularity, particularly in complex urban scenes, where preserving geometric structures is vital for data-informed cultural heritage applications. In this paper, we propose a city scene super-resolution method via geometric error minimization. The geometric-consistent mechanism leverages the Hough Transform to extract regular geometric features in city scenes, enabling the computation of geometric errors between low-resolution and high-resolution images. By minimizing mixed mean square error and geometric align error during the super-resolution process, the proposed method efficiently restores details and geometric regularities. Extensive validations on the SET14, BSD300, Cityscapes and GSV-Cities datasets demonstrate that the proposed method outperforms existing state-of-the-art methods, especially in urban scenes.

翻訳日:2024-01-17 19:10:31 公開日:2024-01-14

# SpineCLUE:コントラスト学習と不確実性推定を用いた自動動詞識別

SpineCLUE: Automatic Vertebrae Identification Using Contrastive Learning and Uncertainty Estimation ( http://arxiv.org/abs/2401.07271v1 )

ライセンス: Link先を確認

Sheng Zhang, Minheng Chen, Junxian Wu, Ziyue Zhang, Tonglong Li, Cheng Xue, Youyong Kong

(参考訳) 任意の分野における椎体同定は脊椎疾患の診断において重要な役割を担っている。ほとんどの脊椎ctは頸部、胸部、腹部などの局所領域のみを含んでいる。したがって、識別は特定の椎骨や特定の数の脊椎に依存すべきではない。既存の脊椎レベルの方法は、この課題を満たせない。本稿では,脊椎レベルでの3次元CT椎骨識別の課題に対処する3段階の手法を提案する。脊椎のローカライゼーション、セグメンテーション、識別のタスクを順次実行することにより、脊椎の解剖学的事前情報をその過程を通して効果的に活用する。具体的には,個々の椎骨の局在情報を取得する2要素密度クラスタリングアルゴリズムを導入し,その後のセグメンテーションと識別処理を容易にする。さらに,クラス間類似性とクラス内変動性の問題に取り組むため,教師付きコントラスト学習法を用いて識別ネットワークを事前学習する。識別結果をさらに最適化するために,分類ネットワークの不確実性を推定し,メッセージ融合モジュールを用いて不確実性スコアを合成し,スピンに関する情報を集約した。本手法は, verse19 および verse20 challenge ベンチマークで最新の結果を得た。さらに,本手法は,広範囲の異常例を含む収集データセット上での卓越した一般化性能を示す。

Vertebrae identification in arbitrary fields-of-view plays a crucial role in diagnosing spine disease. Most spine CT contain only local regions, such as the neck, chest, and abdomen. Therefore, identification should not depend on specific vertebrae or a particular number of vertebrae being visible. Existing methods at the spine-level are unable to meet this challenge. In this paper, we propose a three-stage method to address the challenges in 3D CT vertebrae identification at vertebrae-level. By sequentially performing the tasks of vertebrae localization, segmentation, and identification, the anatomical prior information of the vertebrae is effectively utilized throughout the process. Specifically, we introduce a dual-factor density clustering algorithm to acquire localization information for individual vertebra, thereby facilitating subsequent segmentation and identification processes. In addition, to tackle the issue of interclass similarity and intra-class variability, we pre-train our identification network by using a supervised contrastive learning method. To further optimize the identification results, we estimated the uncertainty of the classification network and utilized the message fusion module to combine the uncertainty scores, while aggregating global information about the spine. Our method achieves state-of-the-art results on the VerSe19 and VerSe20 challenge benchmarks. Additionally, our approach demonstrates outstanding generalization performance on an collected dataset containing a wide range of abnormal cases.

翻訳日:2024-01-17 19:10:16 公開日:2024-01-14

# 850-950nm波長の高速光子数分解検出器

High-Performance Photon Number Resolving Detectors for 850-950 nm wavelengths ( http://arxiv.org/abs/2401.07265v1 )

ライセンス: Link先を確認

J. W. N. Los, Mariia Sidorova, B. L. Rodriguez, Patrick Qualm, J. Chang, S. Steinhauer, V. Zwiller, I. Esmaeil Zadeh

(参考訳) 2001年の最初のデモンストレーション以来、超伝導-ナノワイヤ単光子検出器は20年間にわたって大きな発展を遂げてきた。 SNSPDは現代のほとんどの量子光学実験において選択の検知器であり、徐々に他の光子飢えの光学分野への道を見つけつつある。しかし、ほとんど全ての実験で、snspdは2進検出器として使われており、0光子と1光子以上しか区別できず、光子番号情報は失われる。近年の研究では、2から5個の光子を数える原理光子数解法(PNR) SNSPDが実証されている。光子数分解能力は、HOM干渉、フォトニック量子コンピューティング、量子通信、非ガウス量子状態準備など、様々な量子光学実験で要求されている。特に、850nmから950nmの波長域のpnr検出器は、高品質の半導体量子ドットと高性能セシウムベースの量子メモリが利用可能であるため、非常に興味深い。本稿では,NbTiNをベースとしたSNSPDのシステム検出効率が94%以上,1光子の11ps以下のタイミングジッタ,2光子の7ps以下のタイミングジッタを実証する。さらに重要なのは、従来の極低温電気読み出し回路で最大7光子を検出できることです。理論的解析により,検出器の現在のPNR性能は,読み出し回路の信号と雑音比,帯域幅を改善することでさらに向上できることを示す。私たちの結果は、光量子コンピューティングと量子通信の将来に有望です。

Since their first demonstration in 2001, superconducting-nanowire single-photon detectors have witnessed two decades of great developments. SNSPDs are the detector of choice in most modern quantum optics experiments and are slowly finding their way into other photon starved fields of optics. Until now, however, in nearly all experiments SNSPDs were used as binary detectors, meaning they can only distinguish between 0 and more than 1 photons and photon number information is lost. Recent research works have demonstrated proof of principle photon number resolving (PNR) SNSPDs counting 2 to 5 photons. The photon-number-resolving capability is highly demanded in various quantum-optics experiments, including HOM interference, photonic quantum computing, quantum communication, and non Gaussian quantum state preparation. In particular, PNR detectors at the wavelength range of 850 to 950 nm are of great interest due to the availability of high quality semiconductor quantum dots and high-performance Cesium-based quantum memories. In this paper, we demonstrate NbTiN based SNSPDs with over 94 percent system detection efficiency, sub 11 ps timing jitter for one photon, and sub 7 ps for two photon. More importantly, our detectors resolve up to 7 photons using conventional cryogenic electric readout circuitry. Through theoretical analysis, we show that the current PNR performance of our detectors can still be further improved by improving the signal to noise ratio and bandwidth of our readout circuitry. Our results are promising for the future of optical quantum computing and quantum communication.

翻訳日:2024-01-17 19:09:57 公開日:2024-01-14

# BET: エラー確率決定による深層強化学習の解説

BET: Explaining Deep Reinforcement Learning through The Error-Prone Decisions ( http://arxiv.org/abs/2401.07263v1 )

ライセンス: Link先を確認

Xiao Liu, Jie Zhao, Wubing Chen, Mao Tan, Yongxing Su

(参考訳) 多くの困難なシナリオにおいて、Deep Reinforcement Learning (DRL)エージェントの印象的な機能にもかかわらず、彼らのブラックボックス決定プロセスは、安全に敏感なドメインへのデプロイメントを著しく制限している。以前のいくつかの自己解釈可能な研究は、エージェントの決定の重大な状態を明らかにすることに焦点を当てている。しかし、エラーを起こしやすい状態は特定できない。この問題に対処するために,backbone extract tree (bet) と呼ばれる新しい自己解釈可能な構造を提案する。高いレベルでは、BETはエージェントが一貫して一様決定を行う状態はエラーの確率を減少させるという仮説を立てている。この現象を効果的にモデル化するために、ベットはこれらの状態を近隣で表現し、それぞれが代表的状態のキュレーションによって定義される。したがって、これらの代表的なベンチマークからより離れた位置にある状態はエラーを起こしやすい。我々は,様々なRL環境におけるBETの評価を行い,既存の自己解釈モデルよりも説明の忠実度が優れていることを示す。さらに,高度なマルチエージェント協調ゲームであるStarCraft IIにおいて,エージェントの説明を行うためのユースケースを示す。私たちの知る限りでは,このような複雑なシナリオを,完全に透過的な構造を使って最初に説明します。

Despite the impressive capabilities of Deep Reinforcement Learning (DRL) agents in many challenging scenarios, their black-box decision-making process significantly limits their deployment in safety-sensitive domains. Several previous self-interpretable works focus on revealing the critical states of the agent's decision. However, they cannot pinpoint the error-prone states. To address this issue, we propose a novel self-interpretable structure, named Backbone Extract Tree (BET), to better explain the agent's behavior by identify the error-prone states. At a high level, BET hypothesizes that states in which the agent consistently executes uniform decisions exhibit a reduced propensity for errors. To effectively model this phenomenon, BET expresses these states within neighborhoods, each defined by a curated set of representative states. Therefore, states positioned at a greater distance from these representative benchmarks are more prone to error. We evaluate BET in various popular RL environments and show its superiority over existing self-interpretable models in terms of explanation fidelity. Furthermore, we demonstrate a use case for providing explanations for the agents in StarCraft II, a sophisticated multi-agent cooperative game. To the best of our knowledge, we are the first to explain such a complex scenarios using a fully transparent structure.

翻訳日:2024-01-17 19:09:31 公開日:2024-01-14

# 人点雲上の3次元ランドマーク検出:ベンチマークと二重カスケード変換器フレームワーク

3D Landmark Detection on Human Point Clouds: A Benchmark and A Dual Cascade Point Transformer Framework ( http://arxiv.org/abs/2401.07251v1 )

ライセンス: Link先を確認

Fan Zhang, Shuyi Mao, Qing Li, Xiaojiang Peng

(参考訳) 3Dランドマーク検出は、3D登録、ポーズ推定、仮想トライオンなど、さまざまなアプリケーションにおいて重要な役割を果たす。 2次元のランドマーク検出やポーズ推定でかなりの成功を収めてきたが、未秩序な3次元点雲におけるランドマーク検出に関する報告がほとんどない。本稿では,人間点雲における3次元ランドマーク検出という新たな課題について紹介する。まず,3Dランドマーク検出コミュニティを支援するために,HPoint103という総合的な人点クラウドデータセットを構築した。このデータセットは、商用ソフトウェアとアクターで作成された103のヒューマンポイントクラウドで構成され、それぞれが手動で11の安定したランドマークで注釈付けされている。次に, 2次元カスケード点変換器(D-CPT)モデルを提案する。 D-CPTは、ポイントクラウドストリーム全体にわたってカスケードトランスフォーマーデコーダ層を通じてランドマークを徐々に洗練し、同時にローカルリージョン上のRefineNetとのランドマーク座標を拡張している。 hpoint103とdhp19の一般的なポイントベース手法による比較評価は,d-cptの劇的な性能低下を示している。さらに、既存のメソッドへのRefineNetの統合は、パフォーマンスを継続的に改善します。

3D landmark detection plays a pivotal role in various applications such as 3D registration, pose estimation, and virtual try-on. While considerable success has been achieved in 2D human landmark detection or pose estimation, there is a notable scarcity of reported works on landmark detection in unordered 3D point clouds. This paper introduces a novel challenge, namely 3D landmark detection on human point clouds, presenting two primary contributions. Firstly, we establish a comprehensive human point cloud dataset, named HPoint103, designed to support the 3D landmark detection community. This dataset comprises 103 human point clouds created with commercial software and actors, each manually annotated with 11 stable landmarks. Secondly, we propose a Dual Cascade Point Transformer (D-CPT) model for precise point-based landmark detection. D-CPT gradually refines the landmarks through cascade Transformer decoder layers across the entire point cloud stream, simultaneously enhancing landmark coordinates with a RefineNet over local regions. Comparative evaluations with popular point-based methods on HPoint103 and the public dataset DHP19 demonstrate the dramatic outperformance of our D-CPT. Additionally, the integration of our RefineNet into existing methods consistently improves performance.

翻訳日:2024-01-17 19:09:10 公開日:2024-01-14

# 単純再正規化戦略によるシャープネス認識最小化の安定化

Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy ( http://arxiv.org/abs/2401.07250v1 )

ライセンス: Link先を確認

Chengli Tan, Jiangshe Zhang, Junmin Liu, Yicheng Wang, Yunda Hao

(参考訳) 近年、一般化性能の向上に驚くべき効果があるため、シャープネス認識最小化(SAM)が注目されているが、現在の点における正確な勾配の方向に沿って損失が減少せず、近くの別の点で評価された代理勾配の方向に従っているため、SAMを用いたニューラルネットワークのトレーニングは非常に不安定である。この問題に対処するため,我々は,サロゲート勾配のノルムが正確な勾配のノルムと同じ状態を維持するように,stablesamと呼ばれる単純な再正規化戦略を提案する。我々の戦略は実装が簡単で、samとその派生製品と統合できるほど柔軟で、ほとんど計算コストがかからない。また,凸最適化と学習理論の基本的なツールを用いてシャープネス認識訓練の理論解析を行い,確率的勾配降下(sgd)と比較して,samの有効性は限られた学習率でのみ保証されることを明らかにした。対照的に、StableSAMは学習率のこの仕組みを拡張し、小さな修正でSAMよりも一貫して性能を向上できるかを示す。最後に,いくつかの代表的なデータセットとタスクにおけるstablesamの性能向上を示す。

Recently, sharpness-aware minimization (SAM) has attracted a lot of attention because of its surprising effectiveness in improving generalization performance.However, training neural networks with SAM can be highly unstable since the loss does not decrease along the direction of the exact gradient at the current point, but instead follows the direction of a surrogate gradient evaluated at another point nearby. To address this issue, we propose a simple renormalization strategy, dubbed StableSAM, so that the norm of the surrogate gradient maintains the same as that of the exact gradient. Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost. With elementary tools from convex optimization and learning theory, we also conduct a theoretical analysis of sharpness-aware training, revealing that compared to stochastic gradient descent (SGD), the effectiveness of SAM is only assured in a limited regime of learning rate. In contrast, we show how StableSAM extends this regime of learning rate and when it can consistently perform better than SAM with minor modification. Finally, we demonstrate the improved performance of StableSAM on several representative data sets and tasks.

翻訳日:2024-01-17 19:08:48 公開日:2024-01-14

# 不規則サンプリング時系列のプロトタイプからの系列間情報による命令

Imputation with Inter-Series Information from Prototypes for Irregular Sampled Time Series ( http://arxiv.org/abs/2401.07249v1 )

ライセンス: Link先を確認

Zhihao Yu, Xu Chu, Liantao Ma, Yasha Wang, Wenwu Zhu

(参考訳) 不規則にサンプリングされた時系列はユビキタスであり、値の欠如による分析に重大な課題がある。既存の方法がインプテーションに対処するにもかかわらず、彼らは主にシリーズ内情報を活用することに集中し、不確実性や記憶効果を減らすなど、シリーズ間情報が提供する潜在的な利点を無視している。このギャップを埋めるため,本論文では,不規則にサンプリングされた時系列の欠落値に対して,直列情報と直列情報の両方を統合した再帰的インプテーションモデル prime を提案する。本フレームワークは、シリーズ間情報を学習するプロトタイプメモリモジュールと、インプテーションのためのプロトタイプ情報を利用する双方向ゲートリカレントユニットと、インプテーションを調整するための注意的プロトタイプリファインメントモジュールとを備える。我々は3つのデータセットについて広範な実験を行い、PRIMEの最先端モデルに対する優位性を平均二乗誤差に対して最大26%改善した。

Irregularly sampled time series are ubiquitous, presenting significant challenges for analysis due to missing values. Despite existing methods address imputation, they predominantly focus on leveraging intra-series information, neglecting the potential benefits that inter-series information could provide, such as reducing uncertainty and memorization effect. To bridge this gap, we propose PRIME, a Prototype Recurrent Imputation ModEl, which integrates both intra-series and inter-series information for imputing missing values in irregularly sampled time series. Our framework comprises a prototype memory module for learning inter-series information, a bidirectional gated recurrent unit utilizing prototype information for imputation, and an attentive prototypical refinement module for adjusting imputations. We conducted extensive experiments on three datasets, and the results underscore PRIME's superiority over the state-of-the-art models by up to 26% relative improvement on mean square error.

翻訳日:2024-01-17 19:08:25 公開日:2024-01-14

# 表情認識のためのミックスコントラスト微調整によるマスク画像事前学習

MIMIC: Mask Image Pre-training with Mix Contrastive Fine-tuning for Facial Expression Recognition ( http://arxiv.org/abs/2401.07245v1 )

ライセンス: Link先を確認

Fan Zhang, Xiaobao Guo, Xiaojiang Peng, Alex Kot

(参考訳) 現在、顔認識(fer)における最先端の研究は、特徴抽出のために顔認識データセット上で教師ありに事前学習される畳み込みニューラルネットワーク(cnns)バックボーンの利用を好んでいる。しかし、膨大な顔認識データセットと、顔ラベルの収集に関連する高いコストのため、この事前学習パラダイムにはかなりの費用がかかる。この目的に向けて,中規模汎用画像データセット上での自己教師付きアプローチによる視覚トランスフォーマー(vits)の事前学習を提案する。さらに、顔データセットとFERデータセットの間に存在するドメイン格差と比較すると、一般的なデータセットとFERデータセットとのばらつきはより顕著である。そこで本研究では,この領域の差異を効果的に緩和するための対比的微調整手法を提案する。具体的には,Mix Contrastive Fine-tuning (MIMIC) を用いた Mask Image pre-training という新しいFERトレーニングパラダイムを提案する。初期段階では、一般画像のマスク画像再構成により、ViTを事前訓練する。その後, 微調整段階において, 混合教師付きコントラスト学習プロセスを導入し, 混合戦略によりより広範囲の正のサンプルでモデルを強化した。 3つのベンチマークデータセットで実施された広範な実験を通じて、MIMICは以前のトレーニングパラダイムよりも優れており、より良い表現を学ぶ能力を示している。注目すべきは、バニラ ViT が複雑な補助設計モジュールを必要とせずに素晴らしい性能を達成できることである。さらに、モデルサイズをスケールアップする場合、MIMICは性能飽和がなく、現在の最先端手法よりも優れている。

Cutting-edge research in facial expression recognition (FER) currently favors the utilization of convolutional neural networks (CNNs) backbone which is supervisedly pre-trained on face recognition datasets for feature extraction. However, due to the vast scale of face recognition datasets and the high cost associated with collecting facial labels, this pre-training paradigm incurs significant expenses. Towards this end, we propose to pre-train vision Transformers (ViTs) through a self-supervised approach on a mid-scale general image dataset. In addition, when compared with the domain disparity existing between face datasets and FER datasets, the divergence between general datasets and FER datasets is more pronounced. Therefore, we propose a contrastive fine-tuning approach to effectively mitigate this domain disparity. Specifically, we introduce a novel FER training paradigm named Mask Image pre-training with MIx Contrastive fine-tuning (MIMIC). In the initial phase, we pre-train the ViT via masked image reconstruction on general images. Subsequently, in the fine-tuning stage, we introduce a mix-supervised contrastive learning process, which enhances the model with a more extensive range of positive samples by the mixing strategy. Through extensive experiments conducted on three benchmark datasets, we demonstrate that our MIMIC outperforms the previous training paradigm, showing its capability to learn better representations. Remarkably, the results indicate that the vanilla ViT can achieve impressive performance without the need for intricate, auxiliary-designed modules. Moreover, when scaling up the model size, MIMIC exhibits no performance saturation and is superior to the current state-of-the-art methods.

翻訳日:2024-01-17 19:08:06 公開日:2024-01-14

# DCDet:動的クロスベース3Dオブジェクト検出器

DCDet: Dynamic Cross-based 3D Object Detector ( http://arxiv.org/abs/2401.07240v1 )

ライセンス: Link先を確認

Shuai Liu, Boyang Li, Zhiyu Fang and Kai Huang

(参考訳) 近年, 3次元物体検出の研究において有意な進歩がみられた。しかし、ほとんどの先行研究は、センターベースまたはアンカーベースラベル割り当てスキームの利用に焦点を当てている。代替ラベル割り当て戦略は、3Dオブジェクト検出において未探索のままである。センターベースのラベル割り当てはトレーニングのために十分な正のサンプルを生成しないことが多いが、アンカーベースのラベル割り当ては、様々なスケールのオブジェクトを扱う際に不均衡な問題に遭遇する傾向がある。これらの課題を解決するために, 動的クロスラベル割当(DCLA)方式を導入し, 対象物に対して動的に正のサンプルを交叉領域から割り当てることで, 十分な正のサンプルとバランスの取れた正のサンプルをトレーニング用に提供する。さらに,様々なスケールの物体を正確に後退させる課題に対処するために,回転重み付き交叉係数(rwiou)を用いて回帰損失のl1メトリックを置き換えた。広汎な実験により,DCLAとRWIoUに基づく回帰損失の一般化と有効性を示した。コードはhttps://github.com/Say2L/DCDet.gitで入手できる。

Recently, significant progress has been made in the research of 3D object detection. However, most prior studies have focused on the utilization of center-based or anchor-based label assignment schemes. Alternative label assignment strategies remain unexplored in 3D object detection. We find that the center-based label assignment often fails to generate sufficient positive samples for training, while the anchor-based label assignment tends to encounter an imbalanced issue when handling objects of varying scales. To solve these issues, we introduce a dynamic cross label assignment (DCLA) scheme, which dynamically assigns positive samples for each object from a cross-shaped region, thus providing sufficient and balanced positive samples for training. Furthermore, to address the challenge of accurately regressing objects with varying scales, we put forth a rotation-weighted Intersection over Union (RWIoU) metric to replace the widely used L1 metric in regression loss. Extensive experiments demonstrate the generality and effectiveness of our DCLA and RWIoU-based regression loss. The Code will be available at https://github.com/Say2L/DCDet.git.

翻訳日:2024-01-17 19:07:42 公開日:2024-01-14

# コヒーレント駆動量子調和振動子電池

Coherently Driven Quantum Harmonic Oscillator Battery ( http://arxiv.org/abs/2401.07238v1 )

ライセンス: Link先を確認

Kuldeep Gangwar and Anirban Pathak

(参考訳) 量子調和振動子(QHO)バッテリモデルは、実験的に実現可能であり、複数のエネルギーを蓄積する高いエルゴトロピーと容量を有するため、近年重要視されている。 QHOのバッテリモデルは、いくつかの基本的な質問に答えるために再検討されている。無制限充電は可能か? 触媒システムの使用は、量子電池へのエネルギー移動を促進するか? これらの質問は、QHO電池と相互作用するQHO充電器にレーザーを光らせるモデルを考えることにより、数値的および解析的に答えられる。既存の作品とは対照的に、得られた答えは概ね否定的である。特に,本研究では,QHO間の相互作用の影響を受け,大域的な充電器電池システムの周波数にレーザ周波数を調整した。固定レーザー場振幅$\textit{F}$の場合、この電池は、レーザ周波数をチャージャーとバッテリの局所周波数に調整することで蓄えられたエネルギーと比較して、グローバルチャージャーバッテリシステムの周波数に合わせるとより多くのエネルギーを蓄えることができると報告されている。また, 簡易モデルであるオープンQHOの帯電過程と, レーザフィールドの切換え後の自己放電(散逸)過程についても検討し, 簡易モデルにおけるQHOの帯電過程が触媒(非触媒)電池の帯電過程よりも高速であることを明らかにした。さらに, 自己放出過程は, 環境との相互作用に対して不安定となる帯電過程の約2倍高速であることが観察された。

Quantum harmonic oscillator (QHO) battery models have been studied with significant importance in the recent past because these batteries are experimentally realizable and have high ergotropy and capacity to store more than one quanta of energy. QHO battery models are reinvestigated here to answer a set of fundamental questions: Do such models have any benefit? Is unbounded charging possible? Does the use of a catalyst system enhance the energy transfer to quantum batteries? These questions are answered both numerically and analytically by considering a model that allows a laser to shine on a QHO charger that interacts with a QHO battery. In contrast to some of the existing works, the obtained answers are mostly negative. Specifically, in the present work, the laser frequency is tuned with the frequency of the global charger-battery system, which is affected by the interaction between QHOs. It is reported that for a fixed laser field amplitude $\textit{F}$, the battery can store more energy when tuned with the frequency of the global charger-battery system compared to energy stored by tuning the laser frequency with local frequencies of the charger and battery. The charging process of the open QHO, which is a simplified model, and the self-discharging (dissipation) process after switching off the laser field are also investigated to reveal that the charging process of QHO in the simplified model is faster than the charging process of the catalytic (non-catalytic) battery. Further, it's observed that the self-discharging process is almost two times faster than the charging process which makes such models unstable against interaction with the environment.

翻訳日:2024-01-17 19:07:25 公開日:2024-01-14

# 大規模言語モデルからのイベントシーケンス知識の蒸留

Distilling Event Sequence Knowledge From Large Language Models ( http://arxiv.org/abs/2401.07237v1 )

ライセンス: Link先を確認

Somin Wadhwa, Oktie Hassanzadeh, Debarun Bhattacharjya, Ken Barker, Jian Ni

(参考訳) イベントシーケンスモデルは、イベントの分析と予測に非常に有効であることが判明している。このようなモデルの構築には、豊富な高品質なイベントシーケンスデータが必要になる。しかし、特定のアプリケーションでは、クリーンな構造化されたイベントシーケンスは利用できず、自動シーケンス抽出はノイズが多く不完全なデータをもたらす。本研究では,確率的イベントモデル構築に効果的に使用できるイベントシーケンスを生成するための大規模言語モデル(llm)の利用を検討する。これは、LLMからイベントシーケンス知識を蒸留するメカニズムと見なすことができる。本手法は、因果関係を持つ事象概念の知識グラフ(KG)を用いて、因果関係生成のための生成言語モデルを導出する。提案手法は,入力KGの知識ギャップを埋めて,高品質なイベントシーケンスを生成することができることを示す。さらに,パターンマイニングや確率的イベントモデルから有用で複雑な構造化知識を発見するために,生成されたシーケンスをどのように活用するかを検討する。我々は、シーケンス生成コードと評価フレームワーク、およびイベントシーケンスデータのコーパスをリリースする。

Event sequence models have been found to be highly effective in the analysis and prediction of events. Building such models requires availability of abundant high-quality event sequence data. In certain applications, however, clean structured event sequences are not available, and automated sequence extraction results in data that is too noisy and incomplete. In this work, we explore the use of Large Language Models (LLMs) to generate event sequences that can effectively be used for probabilistic event model construction. This can be viewed as a mechanism of distilling event sequence knowledge from LLMs. Our approach relies on a Knowledge Graph (KG) of event concepts with partial causal relations to guide the generative language model for causal event sequence generation. We show that our approach can generate high-quality event sequences, filling a knowledge gap in the input KG. Furthermore, we explore how the generated sequences can be leveraged to discover useful and more complex structured knowledge from pattern mining and probabilistic event models. We release our sequence generation code and evaluation framework, as well as corpus of event sequence data.

翻訳日:2024-01-17 19:06:55 公開日:2024-01-14

# 信用リスク予測のためのフェデレーション学習アプローチにおけるデータ不均衡の効果

The Effects of Data Imbalance Under a Federated Learning Approach for Credit Risk Forecasting ( http://arxiv.org/abs/2401.07234v1 )

ライセンス: Link先を確認

Shuyao Zhang, Jordan Tay, Pedro Baiz

(参考訳) 信用リスク予測は、顧客へのローンの付与と損失の最小化において、商業銀行や他の金融機関にとって重要な役割を担っている。しかしながら、従来の機械学習手法では、セキュリティ上の脅威やプライバシリークのリスクを生じさせる可能性のあるグローバルモデルを構築するために、センシティブなクライアント情報を外部サーバと共有する必要がある。新たに開発されたプライバシー保護型分散機械学習技術であるfederated learning(fl)は、プライベートなローカルデータに直接アクセスすることなく、グローバルモデルのトレーニングを可能にする。本研究は,信用リスク評価におけるフェデレーション学習の有効性を検証し,データ不均衡がモデル性能に及ぼす影響を示した。多層型パーセプトロン (mlp) とlong short-term memory (lstm) の2つのニューラルネットワークアーキテクチャと、1つのツリーアンサンブルアーキテクチャであるextreme gradient boosting (xgboost) を、3つの異なるデータセットにまたがって、異なる数のクライアントとデータ分散構成を含む様々なシナリオで検討した。フェデレーションモデルが、より小さなデータセットを持つ非支配的なクライアントのローカルモデルを上回ることを実証する。この傾向は特に高度に不均衡なデータシナリオで顕著であり、モデルの性能が17.92%向上した。しかし、支配的なクライアント(より多くのデータを持つクライアント)にとって、フェデレーションされたモデルは優れたパフォーマンスを示しておらず、この種のクライアントが参加を促進するための特別なインセンティブの必要性が示唆される。

Credit risk forecasting plays a crucial role for commercial banks and other financial institutions in granting loans to customers and minimise the potential loss. However, traditional machine learning methods require the sharing of sensitive client information with an external server to build a global model, potentially posing a risk of security threats and privacy leakage. A newly developed privacy-preserving distributed machine learning technique known as Federated Learning (FL) allows the training of a global model without the necessity of accessing private local data directly. This investigation examined the feasibility of federated learning in credit risk assessment and showed the effects of data imbalance on model performance. Two neural network architectures, Multilayer Perceptron (MLP) and Long Short-Term Memory (LSTM), and one tree ensemble architecture, Extreme Gradient Boosting (XGBoost), were explored across three different datasets under various scenarios involving different numbers of clients and data distribution configurations. We demonstrate that federated models consistently outperform local models on non-dominant clients with smaller datasets. This trend is especially pronounced in highly imbalanced data scenarios, yielding a remarkable average improvement of 17.92% in model performance. However, for dominant clients (clients with more data), federated models may not exhibit superior performance, suggesting the need for special incentives for this type of clients to encourage their participation.

翻訳日:2024-01-17 19:06:39 公開日:2024-01-14

# 二項化ニューロモルフィックネットワークとしてのポラリトン格子

Polariton lattices as binarized neuromorphic networks ( http://arxiv.org/abs/2401.07232v1 )

ライセンス: Link先を確認

Evgeny Sedov and Alexey Kavokin

(参考訳) 本研究では, 励起子-偏光子縮合格子に基づく新規なニューロモルフィックネットワークアーキテクチャを導入し, 非共鳴光ポンピングにより複雑に相互接続し, エネルギー化する。ネットワークはバイナリフレームワークを採用しており、各ニューロンはペア結合凝縮の空間的コヒーレンスによって促進され、バイナリ操作を行う。このコヒーレンスはポラリトンの弾道伝播から生まれ、効率的でネットワーク全体の通信を保証する。双対ニューロンスイッチング機構は、偏光子の励起成分を介して非線形反発によって駆動され、連続重み付けニューラルネットワークよりも計算効率とスケーラビリティの利点を提供する。本ネットワークは並列処理が可能であり,シーケンシャルおよびパルス符号化バイナリシステムと比較して計算速度が向上する。システムの性能は手書き文字認識のためのMNISTデータセットを用いて評価され、97.5%の予測精度で示されるように、既存の偏極性ニューロモルフィックシステムを上回る可能性を示した。

We introduce a novel neuromorphic network architecture based on a lattice of exciton-polariton condensates, intricately interconnected and energized through non-resonant optical pumping. The network employs a binary framework, where each neuron, facilitated by the spatial coherence of pairwise coupled condensates, performs binary operations. This coherence, emerging from the ballistic propagation of polaritons, ensures efficient, network-wide communication. The binary neuron switching mechanism, driven by the nonlinear repulsion through the excitonic component of polaritons, offers computational efficiency and scalability advantages over continuous weight neural networks. Our network enables parallel processing, enhancing computational speed compared to sequential or pulse-coded binary systems. The system's performance was evaluated using the MNIST dataset for handwritten digit recognition, showcasing the potential to outperform existing polaritonic neuromorphic systems, as demonstrated by its impressive predicted classification accuracy of up to 97.5%.

翻訳日:2024-01-17 19:06:13 公開日:2024-01-14

# 先行知識を用いた非観測変数付き因果加法モデルの発見とその時系列データへの応用

Use of Prior Knowledge to Discover Causal Additive Models with Unobserved Variables and its Application to Time Series Data ( http://arxiv.org/abs/2401.07231v1 )

ライセンス: Link先を確認

Takashi Nicholas Maeda, Shimizu Shohei

(参考訳) 本稿では,無観測変数 (CAM-UV) を持つ因果加法モデルの2つの手法を提案する。 CAM-UV は、因果関数が一般化加法モデルの形式をとり、潜在的共同設立者が存在すると仮定する。まず,先行知識を活用した効率的な因果発見手法を提案する。次に,時系列データの因果関係を推定する手法の拡張を提案する。元のCAM-UVアルゴリズムは、観測変数間の因果順序を求めるのではなく、観測変数ごとに原因を特定することを目的としているという点で、既存の因果関数モデルとは異なる。したがって,本論文で最初に提案する手法は,特定の変数が他の変数の原因になり得ないことを理解するなど,事前の知識を活用できる。さらに,時間的影響に先行する先行知識を組み込むことで,時系列データにおける因果発見のための第1のアルゴリズムを第2の手法に拡張する。提案手法をシミュレーションデータを用いて検証し,先行知識の蓄積に伴って因果発見の精度が向上することを示す。さらに, シミュレーションデータと実世界データの両方を用いて, 既存の時系列因果発見法と比較し, 第二の手法を検証した。

This paper proposes two methods for causal additive models with unobserved variables (CAM-UV). CAM-UV assumes that the causal functions take the form of generalized additive models and that latent confounders are present. First, we propose a method that leverages prior knowledge for efficient causal discovery. Then, we propose an extension of this method for inferring causality in time series data. The original CAM-UV algorithm differs from other existing causal function models in that it does not seek the causal order between observed variables, but rather aims to identify the causes for each observed variable. Therefore, the first proposed method in this paper utilizes prior knowledge, such as understanding that certain variables cannot be causes of specific others. Moreover, by incorporating the prior knowledge that causes precedes their effects in time, we extend the first algorithm to the second method for causal discovery in time series data. We validate the first proposed method by using simulated data to demonstrate that the accuracy of causal discovery increases as more prior knowledge is accumulated. Additionally, we test the second proposed method by comparing it with existing time series causal discovery methods, using both simulated data and real-world data.

翻訳日:2024-01-17 19:05:56 公開日:2024-01-14

# CCTVカメラを用いた高分解能交通データ収集への2次元ホログラフィーの適用

Application of 2D Homography for High Resolution Traffic Data Collection using CCTV Cameras ( http://arxiv.org/abs/2401.07220v1 )

ライセンス: Link先を確認

Linlin Zhang, Xiang Yu, Abdulateef Daud, Abdul Rashid Mussah, Yaw Adu-Gyamfi

(参考訳) 交通カメラは、渋滞やインシデント監視などの監視活動の主要な情報源である。これまで、国家機関は、複雑なカメラのキャリブレーションの要件や高解像度データを生成することができないなど、現在の自動視覚システムの制限のために、ネットワークカメラからデータを抽出するための手作業に頼り続けている。本研究では,インフラ搭載CCTVカメラから車両数,速度,加速度などの高精細トラフィックデータを抽出するための3段階のビデオ分析フレームワークを実装した。このフレームワークの重要なコンポーネントは、オブジェクト認識、パースペクティブ変換、およびトラフィックデータ収集のための車両軌道再構成である。まず,最先端の車両認識モデルを用いて車両の検出と分類を行う。次に、カメラの歪みを補正し、部分閉塞を低減するために、2点線形視点にインスパイアされたアルゴリズムを用いて、関心領域(ROI)を自動的に抽出し、2Dホモグラフィー技術によりCCTVビューを鳥眼ビュー(BEV)に変換する。カメラは2層マトリクスシステムでキャリブレーションされ、画像座標を実世界計測に変換することで速度と加速度の抽出を可能にする。個々の車両軌跡は、BEVにおいてMotpyとBYTETrackという2つの時間空間ベースのオブジェクトトラッカーを用いて構築・比較される。その結果,指向性トラヒック数に対する誤差率は+/-4.5%であり,プローブデータからの推定値と比較して,カメラ推定速度バイアスが10%mse以下であった。交通カメラから高解像度データを抽出することは、交通管理の改善や危険な運転行動の特定、事故のリスクの高い地域、その他の安全上の問題など、いくつかの意味を持つ。

Traffic cameras remain the primary source data for surveillance activities such as congestion and incident monitoring. To date, State agencies continue to rely on manual effort to extract data from networked cameras due to limitations of the current automatic vision systems including requirements for complex camera calibration and inability to generate high resolution data. This study implements a three-stage video analytics framework for extracting high-resolution traffic data such vehicle counts, speed, and acceleration from infrastructure-mounted CCTV cameras. The key components of the framework include object recognition, perspective transformation, and vehicle trajectory reconstruction for traffic data collection. First, a state-of-the-art vehicle recognition model is implemented to detect and classify vehicles. Next, to correct for camera distortion and reduce partial occlusion, an algorithm inspired by two-point linear perspective is utilized to extracts the region of interest (ROI) automatically, while a 2D homography technique transforms the CCTV view to bird's-eye view (BEV). Cameras are calibrated with a two-layer matrix system to enable the extraction of speed and acceleration by converting image coordinates to real-world measurements. Individual vehicle trajectories are constructed and compared in BEV using two time-space-feature-based object trackers, namely Motpy and BYTETrack. The results of the current study showed about +/- 4.5% error rate for directional traffic counts, less than 10% MSE for speed bias between camera estimates in comparison to estimates from probe data sources. Extracting high-resolution data from traffic cameras has several implications, ranging from improvements in traffic management and identify dangerous driving behavior, high-risk areas for accidents, and other safety concerns, enabling proactive measures to reduce accidents and fatalities.

翻訳日:2024-01-17 19:05:35 公開日:2024-01-14

# MapNeXt: オンラインベクトル化HDマップ構築のためのトレーニングとスケーリングの再開

MapNeXt: Revisiting Training and Scaling Practices for Online Vectorized HD Map Construction ( http://arxiv.org/abs/2401.07323v1 )

ライセンス: Link先を確認

Toyota Li

(参考訳) ハイディフィニション(HD)マップは自動操縦のナビゲーションに欠かせない。実行時に軽量なHDマップ構築機能を自動運転システムに統合することは、最近、有望な方向として現れている。カメラがステレオ情報を認識できるので、可搬性と経済性という魅力的なサインはさておき、視覚のみの知覚が際立っている。最新のMapTRアーキテクチャは、オンラインHDマップ構築タスクをエンドツーエンドで解決するが、その可能性はまだ検討されていない。本研究では,MapTRのフルスケールアップグレードを提案し,次世代のHDマップ学習アーキテクチャであるMapNeXtを提案する。 MapTRのトレーニングダイナミクスに光を当て、MapNeXt-TinyはMapTR-TinyのmAPを49.0%から54.8%に引き上げる。マップセグメンテーションの成果を楽しみ、mapnext-baseは、以前の技術であるマルチモダリティmaptrを上回り、$\sim1.8\times$を高速にしながら、63.9%までマップを持ち上げる。クエリの増加は適切な消化のためにデコーダネットワークを広く好んでおり、大きなバックボーンはベルやホイッスルを使わずに最終的な精度を着実に向上させる。親指の2つのルールに基づいて、MapNeXt-Hugeは、挑戦的なnuScenesベンチマークで最先端のパフォーマンスを達成する。具体的には、マップレスビジョンのみのシングルモデルパフォーマンスを初めて78%以上にプッシュし、既存のメソッドから最高のモデルを16%上回らせました。

High-Definition (HD) maps are pivotal to autopilot navigation. Integrating the capability of lightweight HD map construction at runtime into a self-driving system recently emerges as a promising direction. In this surge, vision-only perception stands out, as a camera rig can still perceive the stereo information, let alone its appealing signature of portability and economy. The latest MapTR architecture solves the online HD map construction task in an end-to-end fashion but its potential is yet to be explored. In this work, we present a full-scale upgrade of MapTR and propose MapNeXt, the next generation of HD map learning architecture, delivering major contributions from the model training and scaling perspectives. After shedding light on the training dynamics of MapTR and exploiting the supervision from map elements thoroughly, MapNeXt-Tiny raises the mAP of MapTR-Tiny from 49.0% to 54.8%, without any architectural modifications. Enjoying the fruit of map segmentation pre-training, MapNeXt-Base further lifts the mAP up to 63.9% that has already outperformed the prior art, a multi-modality MapTR, by 1.4% while being $\sim1.8\times$ faster. Towards pushing the performance frontier to the next level, we draw two conclusions on practical model scaling: increased query favors a larger decoder network for adequate digestion; a large backbone steadily promotes the final accuracy without bells and whistles. Building upon these two rules of thumb, MapNeXt-Huge achieves state-of-the-art performance on the challenging nuScenes benchmark. Specifically, we push the mapless vision-only single-model performance to be over 78% for the first time, exceeding the best model from existing methods by 16%.

翻訳日:2024-01-17 19:00:38 公開日:2024-01-14

# RSUD20K:自動運転における道路シーン理解のためのデータセット

RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving ( http://arxiv.org/abs/2401.07322v1 )

ライセンス: Link先を確認

Hasib Zunair, Shakib Khan, and A. Ben Hamza

(参考訳) 道路シーンの理解は、機械が視覚環境を知覚できるように、自動運転において不可欠である。しかし、最近のオブジェクト検出器は、特定の地理的な場所から収集されたデータセットを学習するために調整されている。本稿では,バングラデシュ道路の運転視点から20K以上の高解像度画像で構成され,13のオブジェクトに対する130K境界ボックスアノテーションを含む道路シーン理解のための新しいデータセットであるRSUD20Kを提案する。この挑戦的なデータセットは、様々な道路のシーン、狭い通りとハイウェイを含み、さまざまな視点からのオブジェクトと、密集した乱雑な物体と様々な気象条件のある混雑した環境からのシーンを含んでいる。我々の作業は以前の取り組みを大幅に改善し、詳細なアノテーションを提供し、オブジェクトの複雑さを増大させます。我々はデータセットを徹底的に検証し、最先端の物体検出器をベンチマークし、画像アノテーションとして大規模ビジョンモデルを探索する。

Road scene understanding is crucial in autonomous driving, enabling machines to perceive the visual environment. However, recent object detectors tailored for learning on datasets collected from certain geographical locations struggle to generalize across different locations. In this paper, we present RSUD20K, a new dataset for road scene understanding, comprised of over 20K high-resolution images from the driving perspective on Bangladesh roads, and includes 130K bounding box annotations for 13 objects. This challenging dataset encompasses diverse road scenes, narrow streets and highways, featuring objects from different viewpoints and scenes from crowded environments with densely cluttered objects and various weather conditions. Our work significantly improves upon previous efforts, providing detailed annotations and increased object complexity. We thoroughly examine the dataset, benchmarking various state-of-the-art object detectors and exploring large vision models as image annotators.

翻訳日:2024-01-17 19:00:02 公開日:2024-01-14

# プライバシ関連ソースコードの検索

Finding Privacy-relevant Source Code ( http://arxiv.org/abs/2401.07316v1 )

ライセンス: Link先を確認

Feiyang Tang and Bjarte M. {\O}stvold

(参考訳) プライバシコードレビューは、開発者と法律の専門家がデータ保護規則の遵守を保証するための重要なプロセスである。しかし、リソースの制約のためタスクは困難である。この問題に対処するために、個人データの処理に直接関与するコードの特定の方法であるプライバシ関連メソッドの概念を紹介します。次に、ソースコード内のこれらのプライバシ関連メソッドを特定し分類することで、コードレビューを支援する自動アプローチを提案する。静的解析を用いて,50の一般的なライブラリにおけるそれらの発生に基づいて,一連のメソッドを識別する。次に、これらのメソッドを、githubアプリケーションのトップ30の実際の個人データと呼び出し頻度に従ってランク付けします。最高ランクのメソッドは、実際にはプライバシに関連するものとして指定するメソッドです。評価のために,100のオープンソースアプリケーションを調査した結果,プライバシ関連の個人データ処理手法の5%に満たないことがわかった。これにより、コードレビューに要する時間を削減できる。 Signal Desktop と Cal.com のケーススタディでは,プライバシ規制の遵守を容易にする拡張レポートの作成を支援するコードレビュアーのアプローチの有効性をさらに検証している。

Privacy code review is a critical process that enables developers and legal experts to ensure compliance with data protection regulations. However, the task is challenging due to resource constraints. To address this, we introduce the concept of privacy-relevant methods - specific methods in code that are directly involved in the processing of personal data. We then present an automated approach to assist in code review by identifying and categorizing these privacy-relevant methods in source code. Using static analysis, we identify a set of methods based on their occurrences in 50 commonly used libraries. We then rank these methods according to their frequency of invocation with actual personal data in the top 30 GitHub applications. The highest-ranked methods are the ones we designate as privacy-relevant in practice. For our evaluation, we examined 100 open-source applications and found that our approach identifies fewer than 5% of the methods as privacy-relevant for personal data processing. This reduces the time required for code reviews. Case studies on Signal Desktop and Cal.com further validate the effectiveness of our approach in aiding code reviewers to produce enhanced reports that facilitate compliance with privacy regulations.

翻訳日:2024-01-17 18:59:45 公開日:2024-01-14

# mapgpt:統一視覚言語ナビゲーションのための地図案内プロンプト

MapGPT: Map-Guided Prompting for Unified Vision-and-Language Navigation ( http://arxiv.org/abs/2401.07314v1 )

ライセンス: Link先を確認

Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong

(参考訳) 脳にGPTを装着した身体エージェントは、様々なタスクにおいて異常な思考と意思決定能力を示した。しかしながら、視覚・言語ナビゲーション(VLN)のための既存のゼロショットエージェントは、エージェントが全体の環境を理解するために効果的な「グローバルビュー」を構築することなく、GPTに過剰な環境情報を処理し、局所的な環境内の潜在的な場所を選択することを促すだけである。本稿では,ゼロショットvlnタスクのための新しいmap-guided gptベースの経路計画エージェントmapgptを提案する。具体的には、オンラインで構築されたトポロジカルマップを、地図誘導のグローバルな探索を促進するプロンプトに変換し、エージェントが局所的な探索に支障を来すのを避けるために、明示的に複数ステップの経路計画を出力し、更新する必要がある。大規模な実験により、我々のMapGPTは有効であり、R2RデータセットとREVERIEデータセット(それぞれ38.8%と28.4%の成功率)において印象的な性能を達成し、新たに登場したGPTモデルのグローバル思考とパス計画能力を示す。異なるデータセットにまたがる様々な命令スタイルに対応するために、パラメータの微調整や特定のプロンプト設計を必要とする以前のvlnエージェントとは異なり、mapgptは異なる命令スタイルにシームレスに適応できるため、より統一されている。

Embodied agents equipped with GPT as their brain have exhibited extraordinary thinking and decision-making abilities across various tasks. However, existing zero-shot agents for vision-and-language navigation (VLN) only prompt the GPT to handle excessive environmental information and select potential locations within localized environments, without constructing an effective ''global-view'' (e.g., a commonly-used map) for the agent to understand the overall environment. In this work, we present a novel map-guided GPT-based path-planning agent, dubbed MapGPT, for the zero-shot VLN task. Specifically, we convert a topological map constructed online into prompts to encourage map-guided global exploration, and require the agent to explicitly output and update multi-step path planning to avoid getting stuck in local exploration. Extensive experiments demonstrate that our MapGPT is effective, achieving impressive performance on both the R2R and REVERIE datasets (38.8% and 28.4% success rate, respectively) and showcasing the newly emerged global thinking and path planning capabilities of the GPT model. Unlike previous VLN agents, which require separate parameters fine-tuning or specific prompt design to accommodate various instruction styles across different datasets, our MapGPT is more unified as it can adapt to different instruction styles seamlessly, which is the first of its kind in this field.

翻訳日:2024-01-17 18:59:27 公開日:2024-01-14

# ベンガル語抑うつ的ソーシャルメディアテキスト検出のためのトランスフォーマーモデルによる大規模言語モデルの調和 : 総合的研究

Harnessing Large Language Models Over Transformer Models for Detecting Bengali Depressive Social Media Text: A Comprehensive Study ( http://arxiv.org/abs/2401.07310v1 )

ライセンス: Link先を確認

Ahmadul Karim Chowdhury, Md. Saidur Rahman Sujon, Md. Shirajus Salekin Shafi, Tasin Ahmmad, Sifat Ahmed, Khan Md Hasib, Faisal Muhammad Shah

(参考訳) うつ病を診断する静かな闘争が世界中で広まる中で、私たちの研究はメンタルヘルスとソーシャルメディアの重大なつながりに発展しました。 GPT 3.5, GPT 4 や提案した GPT 3.5 の微調整モデル DepGPT や高度な深層学習モデル (LSTM, Bi-LSTM, GRU, BiGRU) や Transformer モデル (BERT, BanglaBERT, SahajBERT, BanglaBERT-Base) を用いて, 抑うつの早期発見に焦点を当てた。この研究はRedditとXのデータセットを「抑うつ」セグメントと「非抑うつ」セグメントに分類し、メンタルヘルスの専門知識を持つネイティブスピーカーによってベンガル語に翻訳し、ベンガル社会メディア抑うつデータセット(BSMDD)を作成した。我々の研究は、各モデルに対する完全なアーキテクチャの詳細と、ゼロショットおよび少数ショット学習技術を用いて、ベンガルの抑うつ的テキスト分類におけるそれらの性能を評価する方法を提供する。我々の研究は、各ドメインにFastTextを組み込んだSahajBERTとBi-LSTMの優位性を示すとともに、トランスフォーマーモデルによる説明可能性の問題にも取り組み、LLM(特にDepGPT)の有効性を強調し、様々な学習文脈における柔軟性と能力を示す。実験結果によると,提案モデルであるdepgptは,ゼロショットと少数ショットのシナリオではalpaca lora 7bよりも優れており,0.9796に近い精度と0.9804のf1-score,高リコール,異常な精度を実現している。 gpt-3.5ターボとalpaca lora 7bは競争力は高いが、ゼロショットと少数ショットの状況では効果が比較的低い。この研究は、様々な言語状況におけるLLMの有効性と柔軟性を強調し、うつ病検出モデルの複雑な分野に関する洞察力のある情報を提供する。

In an era where the silent struggle of underdiagnosed depression pervades globally, our research delves into the crucial link between mental health and social media. This work focuses on early detection of depression, particularly in extroverted social media users, using LLMs such as GPT 3.5, GPT 4 and our proposed GPT 3.5 fine-tuned model DepGPT, as well as advanced Deep learning models(LSTM, Bi-LSTM, GRU, BiGRU) and Transformer models(BERT, BanglaBERT, SahajBERT, BanglaBERT-Base). The study categorized Reddit and X datasets into "Depressive" and "Non-Depressive" segments, translated into Bengali by native speakers with expertise in mental health, resulting in the creation of the Bengali Social Media Depressive Dataset (BSMDD). Our work provides full architecture details for each model and a methodical way to assess their performance in Bengali depressive text categorization using zero-shot and few-shot learning techniques. Our work demonstrates the superiority of SahajBERT and Bi-LSTM with FastText embeddings in their respective domains also tackles explainability issues with transformer models and emphasizes the effectiveness of LLMs, especially DepGPT, demonstrating flexibility and competence in a range of learning contexts. According to the experiment results, the proposed model, DepGPT, outperformed not only Alpaca Lora 7B in zero-shot and few-shot scenarios but also every other model, achieving a near-perfect accuracy of 0.9796 and an F1-score of 0.9804, high recall, and exceptional precision. Although competitive, GPT-3.5 Turbo and Alpaca Lora 7B show relatively poorer effectiveness in zero-shot and few-shot situations. The work emphasizes the effectiveness and flexibility of LLMs in a variety of linguistic circumstances, providing insightful information about the complex field of depression detection models.

翻訳日:2024-01-17 18:58:57 公開日:2024-01-14

# 超伝導回路を用いた量子情報処理-オープン量子系における量子ゲートとアルゴリズムの実現と特徴化

Quantum information processing with superconducting circuits: realizing and characterizing quantum gates and algorithms in open quantum systems ( http://arxiv.org/abs/2401.07302v1 )

ライセンス: Link先を確認

Hamid Sakhouf

(参考訳) この論文は超伝導デバイスを用いた量子情報処理、特にオープン量子システムにおける量子ゲートとアルゴリズムの実現に焦点を当てている。このような装置は、超伝導共振器に結合したトランスモン型超伝導量子ビットによって構成される。量子ゲートとアルゴリズムの実現には、一段階のアプローチが用いられる。 x$回転と2および3キュービットのゲート絡み込みを実現するための高速で効率的なスキームを提案する。これらの動作中、強いマイクロ波フィールドの追加により共振器光子番号がキャンセルされる。当初は真空状態での共振器の調製は必要とせず、共振器の減衰には敏感である。さらに、これらの操作のロバスト性は、マスター方程式におけるトランスモン系のデコヒーレンスと共振器崩壊の影響を含めることで示され、その結果、量子シミュレーションにおいて高い忠実性が得られる。さらに,実装したx-回転ゲートと位相ゲートを用いて,Groverのアルゴリズムを2と3の量子ビットに対して実装する方法を提案する。また、量子プロセストモグラフィーを用いて、2および3キュービットのシングルショットエンタングゲートの性能をフルに評価し、0.93以上のプロセス忠実度が得られることを数値シミュレーションで示す。これらのゲートは、ベルとグリーンベルガー=ホルン=ザイリンガー(GHZ)の絡み合った状態を作り出すために使用される。

This thesis focuses on quantum information processing using the superconducting device, especially, on realizing quantum gates and algorithms in open quantum systems. Such a device is constructed by transmon-type superconducting qubits coupled to a superconducting resonator. For the realization of quantum gates and algorithms, a one-step approach is used. We suggest faster and more efficient schemes for realizing $X$-rotation and entangling gates for two and three qubits. During these operations, the resonator photon number is canceled owing to the strong microwave field added. They do not require the resonator to be initially prepared in the vacuum state and the scheme is insensitive to resonator decay. Furthermore, the robustness of these operations is demonstrated by including the effect of the decoherence of transmon systems and the resonator decay in a master equation, and as a result, high fidelity will be achieved in quantum simulation. In addition, using the implemented x-rotation gates as well as the phase gates, we present an alternative way for implementing Grover's algorithm for two and three qubits, which does not require a series of single gates. As well, we also demonstrate by a numerical simulation the use of quantum process tomography to fully characterize the performance of a single-shot entangling gate for two and three qubits and obtain process fidelities greater than 0.93. These gates are used to create Bell and Greenberger-Horne-Zeilinger (GHZ) entangled states.

翻訳日:2024-01-17 18:58:13 公開日:2024-01-14

# 小さい言語モデルは自己修正できる

Small Language Model Can Self-correct ( http://arxiv.org/abs/2401.07301v1 )

ライセンス: Link先を確認

Haixia Han, Jiaqing Liang, Jie Shi, Qianyu He, Yanghua Xiao

(参考訳) ChatGPTのようなジェネレーティブ言語モデル(LM)は、様々な下流タスクで顕著なパフォーマンスを示している。それでも、最も顕著な欠点の1つは、自信のあるトーンで不正確または偽の情報を生成することである。従来の研究では、高度なパイプラインを考案し、大規模なLMを誘導して自己補正能力を示すよう促している。しかし、大きなLMは、自然に人間のように全てのステップを完了させるのではなく、その答えを個別に検証し、修正するよう明示的に促される。さらに、これらの複雑なプロンプトは小さなlmsでは極めて困難である。本稿では,60億個のパラメータを持つ小さなLMであっても,自己トリガー方式でLMの初期出力を補正することを目的として,生成言語モデルに \underline{I}ntrinsic \underline{S}elf-\underline{C}orrection (ISC) を導入する。具体的には,自己修正データ構築のためのパイプラインを考案し,微調整による内在的自己修正能力を有するモデルへの支援を目的とした部分的回答マスク(pam)を提案する。我々は,60億から13億のパラメータサイズを持つLMを用いて,常識推論と事実知識推論を含む2つのタスクで実験を行う。 ISCを用いて生成した出力は自己補正なしで生成した出力よりも優れていた。内在的な自己修正能力を持たせることで、小さなlmsでも出力品質がさらに向上できると考えています。

Generative Language Models (LMs) such as ChatGPT have exhibited remarkable performance across various downstream tasks. Nevertheless, one of their most prominent drawbacks is generating inaccurate or false information with a confident tone. Previous studies have devised sophisticated pipelines and prompts to induce large LMs to exhibit the capability for self-correction. However, large LMs are explicitly prompted to verify and modify its answers separately rather than completing all steps spontaneously like humans. Moreover, these complex prompts are extremely challenging for small LMs to follow. In this paper, we introduce the \underline{I}ntrinsic \underline{S}elf-\underline{C}orrection (ISC) in generative language models, aiming to correct the initial output of LMs in a self-triggered manner, even for those small LMs with 6 billion parameters. Specifically, we devise a pipeline for constructing self-correction data and propose Partial Answer Masking (PAM), aiming to endow the model with the capability for intrinsic self-correction through fine-tuning. We conduct experiments using LMs with parameters sizes ranging from 6 billion to 13 billion in two tasks, including commonsense reasoning and factual knowledge reasoning. Our experiments demonstrate that the outputs generated using ISC outperform those generated without self-correction. We believe that the output quality of even small LMs can be further improved by empowering them with the ability to intrinsic self-correct.

翻訳日:2024-01-17 18:57:50 公開日:2024-01-14

# 絡み合い、量子場の埋め込みとフォン・ノイマン代数の分類

Embezzlement of entanglement, quantum fields, and the classification of von Neumann algebras ( http://arxiv.org/abs/2401.07299v1 )

ライセンス: Link先を確認

Lauritz van Luijk, Alexander Stottmeister, Reinhard F. Werner, Henrik Wilming

(参考訳) 我々はフォン・ノイマン代数の設定における絡み合いの包括的処理を提供し、フォン・ノイマン代数の分類との関係と相対論的量子場理論への応用について論じる。絡み合いのエンベゼルメント(英: embezzlement of entanglement)とは、共有絡み合いリソース状態から任意の精度で絡み合い状態を生成するタスクであり、通信を使わずに、任意に資源を摂動させる。非相対論的量子論とは対照的に、量子場の記述はタイプi(有限または無限次元行列代数)を超えるフォン・ノイマン代数を必要とし、特にタイプiiiの代数は自然に現れる。したがって、量子場理論は、潜在的により大きな種類の横領資源を許容する。コンヌのIII型ノイマン代数の分類は、エンタングルメントの埋め込みのタスクを用いて定量的な操作的解釈を与えることができることを示す。具体的には、すべてのタイプ iii$_\lambda$ factor と $\lambda>0$ host embezzling state と、タイプ iii$_1$ factor 上のすべての正規状態がembezzlingであることを示す。さらに、半有限因子(I型またはII型)はエンベジング状態をホストすることができず、正確なエンベジング状態は非分離ヒルベルト空間を必要とすることを証明している。これらの結果は、重みのフローにおけるエンベジング状態と不変状態の間の1対1の対応から導かれる。本研究は、iii$_1$因子を「普遍的横領者」として特徴づけ、相対論的量子場理論がベルの不等式を最大に破る理由について、簡単な説明を与える。結果の多くはモジュラー理論と重みのフローを幅広く用いているが、ITPFI因子の普遍的なエンベジングは基本的議論によってIII$_1$であることを示す。

We provide a comprehensive treatment of embezzlement of entanglement in the setting of von Neumann algebras and discuss its relation to the classification of von Neumann algebras as well as its application to relativistic quantum field theory. Embezzlement of entanglement is the task of producing any entangled state to arbitrary precision from a shared entangled resource state using local operations without communication while perturbing the resource arbitrarily little. In contrast to non-relativistic quantum theory, the description of quantum fields requires von Neumann algebras beyond type I (finite or infinite dimensional matrix algebras) -- in particular, algebras of type III appear naturally. Thereby, quantum field theory allows for a potentially larger class of embezzlement resources. We show that Connes' classification of type III von Neumann algebras can be given a quantitative operational interpretation using the task of embezzlement of entanglement. Specifically, we show that all type III$_\lambda$ factors with $\lambda>0$ host embezzling states and that every normal state on a type III$_1$ factor is embezzling. Furthermore, semifinite factors (type I or II) cannot host embezzling states, and we prove that exact embezzling states require non-separable Hilbert spaces. These results follow from a one-to-one correspondence between embezzling states and invariant states on the flow of weights. Our findings characterize type III$_1$ factors as "universal embezzlers" and provide a simple explanation as to why relativistic quantum field theories maximally violate Bell inequalities. While most of our results make extensive use of modular theory and the flow of weights, we establish that universally embezzling ITPFI factors are of type III$_1$ by elementary arguments.

翻訳日:2024-01-17 18:57:12 公開日:2024-01-14

# 一般化低ランク行列帯域問題のための効率的なフレームワーク

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems ( http://arxiv.org/abs/2401.07298v1 )

ライセンス: Link先を確認

Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee

(参考訳) 確率的文脈的低ランク行列バンドイット問題において、アクションの期待された報酬は、アクションの特徴行列と固定されたいくつかの内部積によって与えられるが、最初は未知の$d_1$ by $d_2$ matrix $\Theta^*$ with rank $r \ll \{d_1, d_2\}$で与えられる。本稿では,一般化線形モデル(GLM)の枠組みの下で,最近 \cite{lu2021low} で提案されている一般化低ランク行列バンドイット問題について検討する。この問題に対する既存のアルゴリズムの計算不可能性と理論的制約を克服するために,まず,部分空間推定におけるsteinの手法を用いて \cite{jun2019bilinear} のアイデアを修飾した g-estt フレームワークを提案し,その推定部分空間を正規化アイデアで活用する。さらに,推定部分空間上の新しい排他的概念を用いてG-ESTTの効率を著しく向上させ,G-ESTSフレームワークを提案する。また、G-ESTT が $\tilde{O}(\sqrt{(d_1+d_2)MrT})$ bound of regret を達成できるのに対し、G-ESTS は $\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})$ bound of regret を対数的な仮定で達成できる。 M = O(((d_1+d_2)^2)$(d_1+d_2)^2)$ という合理的な仮定の下では、G-ESTT の後悔は $\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low}$D_{rr}$ の現在の最善後悔と一致する。完全性のために,提案アルゴリズム,特にG-ESTSは,計算可能であり,一連のシミュレーションに基づいて,他の最先端(一般化)線形行列バンドイット法より一貫して優れていることを示す実験を行う。

In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $\Theta^*$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the $\tilde{O}(\sqrt{(d_1+d_2)MrT})$ bound of regret while G-ESTS can achineve the $\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})$ bound of regret under mild assumption up to logarithm terms, where $M$ is some problem dependent value. Under a reasonable assumption that $M = O((d_1+d_2)^2)$ in our problem setting, the regret of G-ESTT is consistent with the current best regret of $\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low} ($D_{rr}$ will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.

翻訳日:2024-01-17 18:56:36 公開日:2024-01-14

# 量子場からのエンベジング絡み

Embezzling entanglement from quantum fields ( http://arxiv.org/abs/2401.07292v1 )

ライセンス: Link先を確認

Lauritz van Luijk, Alexander Stottmeister, Reinhard F. Werner, Henrik Wilming

(参考訳) エンベッズメント(英: Embezzlement)とは、補助系(「エンベッズラー」)の参照状態から、局所的な量子演算を通じて絡み合った量子状態を抽出する反直感的な可能性を指す。我々は、フォン・ノイマン代数の数学的分類とエンベジングエンタングルメントの操作課題との深い関係を報告する。この結果は相対論的量子場が普遍的エンベズラーであることを意味する:任意の次元の絡み合った状態は任意の精度でそれらからエンベズできる。これは相対論的量子場理論の真空状態に存在する無限個の絡み合いの操作的特徴を与える。

Embezzlement refers to the counterintuitive possibility of extracting entangled quantum states from a reference state of an auxiliary system (the "embezzler") via local quantum operations while hardly perturbing the latter. We report a deep connection between the mathematical classification of von Neumann algebras and the operational task of embezzling entanglement. This result implies that relativistic quantum fields are universal embezzlers: Any entangled state of any dimension can be embezzled from them with arbitrary precision. This provides an operational characterization of the infinite amount of entanglement present in the vacuum state of relativistic quantum field theories.

翻訳日:2024-01-17 18:55:33 公開日:2024-01-14

# 適応ガウス演算を用いた一般化光子減算を用いたフライング論理量子ビットの生成

Generation of Flying Logical Qubits using Generalized Photon Subtraction with Adaptive Gaussian Operations ( http://arxiv.org/abs/2401.07287v1 )

ライセンス: Link先を確認

Kan Takase, Fumiya Hanamura, Hironari Nagayoshi, J. Eli Bourassa, Rafael N. Alexander, Akito Kawasaki, Warit Asavanant, Mamoru Endo, and Akira Furusawa

(参考訳) 光移動波におけるGottesman-Kitaev-Preskill qubitと呼ばれる論理量子ビットの生成は、大規模で普遍的なフォールトトレラントな光量子コンピュータを実現する上で大きな課題である。近年,光子数測定とホモダイン測定を用いて基本GKP量子ビットの確率的生成が実証されている。しかし、生成速度はわずか数Hzであり、成功確率が著しく改善されない限り、実用速度で耐故障性GKP量子ビットを生成することは困難である。本稿では,複数の量子状態からgkp量子ビットを適応ガウス演算により効率的に合成する方法を提案する。光子数測定を利用する初期状態準備において、適応演算により一定の閾値を超える任意の測定結果を成功とみなすことができる。この閾値を一般化光子減算法を用いて低下させる。初期状態はホモダイン測定とその後の適応操作によりGKP量子ビットに合成される。その結果、現実的なスケールで耐故障性GKP量子ビットを生成する単一ショット成功確率は、従来の手法の100万倍である10$\%$を超えている。この提案は、原理実証段階から実用段階へ光量子コンピュータを進化させるための強力なツールとなる。

The generation of a logical qubit called the Gottesman-Kitaev-Preskill qubit in an optical traveling wave is a major challenge for realizing large-scale universal fault-tolerant optical quantum computers. Recently, probabilistic generation of elementary GKP qubits has been demonstrated using photon number measurements and homodyne measurements. However, the generation rate is only a few Hz, and it will be difficult to generate fault-tolerant GKP qubits at a practical rate unless success probability is significantly improved. Here, we propose a method to efficiently synthesize GKP qubits from several quantum states by adaptive Gaussian operations. In the initial state preparation that utilizes photon number measurements, an adaptive operation allows any measurement outcome above a certain threshold to be considered as a success. This threshold is lowered by utilizing the generalized photon subtraction method. The initial states are synthesized into a GKP qubit by homodyne measurements and a subsequent adaptive operation. As a result, the single-shot success probability of generating fault-tolerant GKP qubits in a realistic scale system exceeds 10$\%$, which is one million times better than previous methods. This proposal will become a powerful tool for advancing optical quantum computers from the proof-of-principle stage to practical application.

翻訳日:2024-01-17 18:55:22 公開日:2024-01-14

# CANDLE:Commonsense Reasoningのための大規模言語モデルからの反復的概念化とインスティファイション蒸留

CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning ( http://arxiv.org/abs/2401.07286v1 )

ライセンス: Link先を確認

Weiqi Wang, Tianqing Fang, Chunyang Li, Haochen Shi, Wenxuan Ding, Baixuan Xu, Zhaowei Wang, Jiaxin Bai, Xin Liu, Jiayang Cheng, Chunkit Chan, Yangqiu Song

(参考訳) 概念化とインスタンス化のシーケンシャルなプロセスは、既存の知識を未知のシナリオに適用できるため、一般化可能なコモンセンス推論に不可欠である。しかし、既存の研究はインスタンス化のステップを過小評価する傾向にあり、両方の種類の知識を収集するために事前に構築された概念分類やヒューマンアノテーションに強く依存しているため、完全な推論のためのインスタンス化された知識が不足し、コストが高く、スケーラビリティが制限される。これらの課題に取り組むため,我々は,大言語モデルに批判的フィルタリングを伴う2種類の知識を生成するよう指示することにより,コンテクスト化された概念化と,コモンセンス知識ベース上でのインスタンス化を反復的に行う蒸留フレームワークであるroウソクを紹介する。 CANDLEをATOMICに適用することにより、600万の概念化と三重項のインスタンス化を含む総合的な知識基盤を構築する。どちらの種類の知識も元のATOMICデータセットにしっかりと根付いており、本質的な評価はその例外的な品質と多様性を示している。実験結果から,4つの下流タスクにまたがる留学生モデルにおける蒸留ろうそくの効果が示唆された。私たちのコード、データ、モデルはhttps://github.com/HKUST-KnowComp/CANDLE.comで公開されています。

The sequential process of conceptualization and instantiation is essential to generalizable commonsense reasoning as it allows the application of existing knowledge to unfamiliar scenarios. However, existing works tend to undervalue the step of instantiation and heavily rely on pre-built concept taxonomies and human annotations to collect both types of knowledge, resulting in a lack of instantiated knowledge to complete reasoning, high cost, and limited scalability. To tackle these challenges, we introduce CANDLE, a distillation framework that iteratively performs contextualized conceptualization and instantiation over commonsense knowledge bases by instructing large language models to generate both types of knowledge with critic filtering. By applying CANDLE to ATOMIC, we construct a comprehensive knowledge base comprising six million conceptualizations and instantiated commonsense knowledge triples. Both types of knowledge are firmly rooted in the original ATOMIC dataset, and intrinsic evaluations demonstrate their exceptional quality and diversity. Empirical results indicate that distilling CANDLE on student models provides benefits across four downstream tasks. Our code, data, and models are publicly available at https://github.com/HKUST-KnowComp/CANDLE.

翻訳日:2024-01-17 18:54:55 公開日:2024-01-14

# 拡張テキスト読解によるドメイン適応の改善

Improving Domain Adaptation through Extended-Text Reading Comprehension ( http://arxiv.org/abs/2401.07284v1 )

ライセンス: Link先を確認

Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang

(参考訳) 大規模言語モデルのドメイン特化能力を高めるため、ドメイン特化コーパスでの事前学習が一般的である。最近の研究は、Regexベースのパターンでフォーマットされた読解データを用いてモデルを適用することで、ドメイン固有のタスクのパフォーマンスが大幅に向上することを示した。しかし、regexベースのパターンはドメイン固有の知識を使って生のコーパスを解析できない。さらに、質問と回答のペアは、事前に定義された形式でコーパスから直接抽出され、コンテキストが限定される。この制限に対処するため,LLMとクラスタリングによる読解理解を改善した。 LLMは、理解段階を洗練させるためにコーパス内のドメイン知識を活用することに焦点を当て、クラスタリングは、コンテキストを読書段階に拡張することで関連する知識を提供する。さらに,パラメータ効率の高い微調整を取り入れ,ドメイン適応の効率化を図る。 AdaptLLMと比較して、ドメイン固有のタスクで5%以上の改善を実現している。私たちのコードはhttps://github.com/microsoft/LMOpsで公開されます。

To enhance the domain-specific capabilities of large language models, continued pre-training on a domain-specific corpus is a prevalent method. Recent work demonstrates that adapting models using reading comprehension data formatted by regex-based patterns can significantly improve performance on domain-specific tasks. However, regex-based patterns are incapable of parsing raw corpora using domain-specific knowledge. Furthermore, the question and answer pairs are extracted directly from the corpus in predefined formats offers limited context. To address this limitation, we improve reading comprehension via LLM and clustering. LLM focuses on leveraging domain knowledge within the corpus to refine comprehension stage, while clustering supplies relevant knowledge by extending the context to enrich reading stage. Additionally, our method incorporates parameter-efficient fine-tuning to improve the efficiency of domain adaptation. In comparison to AdaptLLM, our method achieves an improvement exceeding 5% in domain-specific tasks. Our code will available at https://github.com/microsoft/LMOps.

翻訳日:2024-01-17 18:54:32 公開日:2024-01-14

# FROST-BRDF:BRDF取得のための高速かつロバストなサンプリング手法

FROST-BRDF: A Fast and Robust Optimal Sampling Technique for BRDF Acquisition ( http://arxiv.org/abs/2401.07283v1 )

ライセンス: Link先を確認

Ehsan Miandji, Tanaboon Tongbuasirilai, Saghi Hajisharif, Behnaz Kavoosighafi, Jonas Unger

(参考訳) 実世界の材料の効率的かつ正確なbrdf取得は、何百万もの入射光と視野方向のサンプリングを必要とする困難な研究課題である。取得過程を高速化するためには, BRDFの完全回復が正確かつ堅牢であるような, サンプリング方向の最小セットを見つける必要がある。本稿では,BRDF の取得を圧縮センシング問題として定式化し,センサオペレータは最適なサンプル方向のセットに従ってBRDF 信号のサブサンプリングを行う。この問題を解決するために,光ビューのサンプルを配置し,回復誤差を最小限に抑えるための最適サブサンプリング演算子を設計するためのFROST(Fast and Robust Optimal Smpling Technique)を提案する。 FROSTは,Multiple Measurement Vector (MMV)信号モデルの下で,圧縮センシングのための最適サブサンプリング演算子をスパース表現に設計する問題を提起する。提案された再構成は、正確には、すなわち近似がないため、難解な組合せ問題を標準的な最適化手法で解けるものに変換する。その結果、FROSTには圧縮センシングの分野からの強い理論的保証が伴う。 BRDFデータセットを用いた10倍のクロスバリデーションを用いたFROST-BRDFの網羅的解析を行い,再建品質に対する最先端技術と比較して大きな優位性を示した。最後に、frostは概念的にも実装的にもシンプルで、各実行時に一貫性のある結果をもたらし、少なくとも以前の技術よりも2桁高速である。

Efficient and accurate BRDF acquisition of real world materials is a challenging research problem that requires sampling millions of incident light and viewing directions. To accelerate the acquisition process, one needs to find a minimal set of sampling directions such that the recovery of the full BRDF is accurate and robust given such samples. In this paper, we formulate BRDF acquisition as a compressed sensing problem, where the sensing operator is one that performs sub-sampling of the BRDF signal according to a set of optimal sample directions. To solve this problem, we propose the Fast and Robust Optimal Sampling Technique (FROST) for designing a provably optimal sub-sampling operator that places light-view samples such that the recovery error is minimized. FROST casts the problem of designing an optimal sub-sampling operator for compressed sensing into a sparse representation formulation under the Multiple Measurement Vector (MMV) signal model. The proposed reformulation is exact, i.e. without any approximations, hence it converts an intractable combinatorial problem into one that can be solved with standard optimization techniques. As a result, FROST is accompanied by strong theoretical guarantees from the field of compressed sensing. We perform a thorough analysis of FROST-BRDF using a 10-fold cross-validation with publicly available BRDF datasets and show significant advantages compared to the state-of-the-art with respect to reconstruction quality. Finally, FROST is simple, both conceptually and in terms of implementation, it produces consistent results at each run, and it is at least two orders of magnitude faster than the prior art.

翻訳日:2024-01-17 18:54:18 公開日:2024-01-14

# 自己訓練による白血セルの半教師付きセマンティクスセグメンテーション

Semi-supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cel ( http://arxiv.org/abs/2401.07278v1 )

ライセンス: Link先を確認

Vinh Quoc Luu, Duy Khanh Le, Huy Thanh Nguyen, Minh Thanh Nguyen, Thinh Tien Nguyen, Vinh Quang Dinh

(参考訳) 医療における人工知能(AI)は、特に白血球がんの診断において、2つの主要な課題によって妨げられている: 白血球セグメンテーションのための大規模ラベル付きデータセットの欠如と、時代遅れのセグメンテーション方法である。最初の課題に対処するためには、大規模なデータセットを効率的にアノテートするために、半教師付き学習フレームワークを導入する必要がある。本稿では,fixmatchを組み込んだ新しい自己学習パイプラインを提案することで,この問題に対処した。自己学習パイプラインにFixMatchを組み込むことで、ほとんどのケースでパフォーマンスが向上することがわかった。 DeepLab-V3アーキテクチャの一貫性を備えた自己学習スキームとResNet-50で、Zheng 1, Zheng 2, LISCデータセットでそれぞれ90.69%、87.37%、76.49%に達した。

Artificial Intelligence (AI) in healthcare, especially in white blood cell cancer diagnosis, is hindered by two primary challenges: the lack of large-scale labeled datasets for white blood cell (WBC) segmentation and outdated segmentation methods. To address the first challenge, a semi-supervised learning framework should be brought to efficiently annotate the large dataset. In this work, we address this issue by proposing a novel self-training pipeline with the incorporation of FixMatch. We discover that by incorporating FixMatch in the self-training pipeline, the performance improves in the majority of cases. Our performance achieved the best performance with the self-training scheme with consistency on DeepLab-V3 architecture and ResNet-50, reaching 90.69%, 87.37%, and 76.49% on Zheng 1, Zheng 2, and LISC datasets, respectively.

翻訳日:2024-01-17 18:53:53 公開日:2024-01-14

# promptformer: asr用のconformerトランスデューサ

Promptformer: Prompted Conformer Transducer for ASR ( http://arxiv.org/abs/2401.07360v1 )

ライセンス: Link先を確認

Sergio Duarte-Torres, Arunasish Sen, Aman Rana, Lukas Drude, Alejandro Gomez-Alanis, Andreas Schwarz, Leif R\"adel, Volker Leutnant

(参考訳) コンテキストキューは、自動音声認識(ASR)システムにおけるマルチターンインタラクションを改善する情報を運ぶ。本稿では,注目機構の音響表現とテキストコンテキストを融合させるために,ハイパープロンプトにインスパイアされた新しいメカニズムを提案する。マルチターンインタラクションを用いたテストセットでは,強いベースライン上で5.9%の相対単語誤り率低減(rwerr)を達成した。提案手法は文脈の欠如により劣化せず,文脈を伴わずにモデルが訓練されても改善につながることを示す。さらに,文脈埋め込み生成に事前学習された文片モデルを用いることで,外部bertモデルに勝ることを示す。

Context cues carry information which can improve multi-turn interactions in automatic speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired by hyper-prompting to fuse textual context with acoustic representations in the attention mechanism. Results on a test set with multi-turn interactions show that our method achieves 5.9% relative word error rate reduction (rWERR) over a strong baseline. We show that our method does not degrade in the absence of context and leads to improvements even if the model is trained without context. We further show that leveraging a pre-trained sentence-piece model for context embedding generation can outperform an external BERT model.

翻訳日:2024-01-17 18:47:38 公開日:2024-01-14

# 科学と深層学習における信頼性と解釈可能性

Reliability and Interpretability in Science and Deep Learning ( http://arxiv.org/abs/2401.07359v1 )

ライセンス: Link先を確認

Luigi Scorzato

(参考訳) 近年,機械学習(ML)手法の信頼性に関する疑問が重要視され,関連する不確実性の分析が研究の動機となっている。しかしながら、これらの研究の多くは、標準の誤り解析をMLモデル、特に標準科学的モデリングからかなり離れたディープニューラルネットワーク(DNN)モデルに適用している。したがって、標準誤差解析を、dnnモデルと標準科学的モデリングの相違の可能性と、これらの相違が信頼性評価に与える影響についてのより深い認識論的分析と統合する必要がある。この記事にはいくつかの貢献がある。まず、理論自由科学の錯覚に対するモデル仮定(MLと従来の科学の両方)のユビキタスな役割を強調します。第二に、モデル仮定は、言語非依存であることが示される、その(エピスティックな)複雑さの観点から分析される。 dnnモデルの高い認識論的複雑性は、その信頼性と長期的な進歩の予測を妨げていると論じている。今後の可能性も示唆されている。第3に,責任あるaiの文脈で導入されたモデル認識複雑性とその解釈可能性との密接な関係を明らかにする。モデル(ブラックボックスの問題)の理解の欠如は、個々のスキルとは無関係な方法で、その解釈可能性に影響を与える。また、解釈可能性が、統計分析だけでは理解できないあらゆるモデルの信頼性を評価するための前提条件であることも明らかにした。本稿では,従来の科学的モデルとDNNモデルの比較に焦点を当てる。しかし、ランダムフォレストやロジスティック回帰モデルも簡単に考慮される。

In recent years, the question of the reliability of Machine Learning (ML) methods has acquired significant importance, and the analysis of the associated uncertainties has motivated a growing amount of research. However, most of these studies have applied standard error analysis to ML models, and in particular Deep Neural Network (DNN) models, which represent a rather significant departure from standard scientific modelling. It is therefore necessary to integrate the standard error analysis with a deeper epistemological analysis of the possible differences between DNN models and standard scientific modelling and the possible implications of these differences in the assessment of reliability. This article offers several contributions. First, it emphasises the ubiquitous role of model assumptions (both in ML and traditional Science) against the illusion of theory-free science. Secondly, model assumptions are analysed from the point of view of their (epistemic) complexity, which is shown to be language-independent. It is argued that the high epistemic complexity of DNN models hinders the estimate of their reliability and also their prospect of long-term progress. Some potential ways forward are suggested. Thirdly, this article identifies the close relation between a model's epistemic complexity and its interpretability, as introduced in the context of responsible AI. This clarifies in which sense, and to what extent, the lack of understanding of a model (black-box problem) impacts its interpretability in a way that is independent of individual skills. It also clarifies how interpretability is a precondition for assessing the reliability of any model, which cannot be based on statistical analysis alone. This article focuses on the comparison between traditional scientific models and DNN models. But, Random Forest and Logistic Regression models are also briefly considered.

翻訳日:2024-01-17 18:46:57 公開日:2024-01-14

# AI生成合成画像の認識のためのハラスティング機械学習

Harnessing Machine Learning for Discerning AI-Generated Synthetic Images ( http://arxiv.org/abs/2401.07358v1 )

ライセンス: Link先を確認

Yuyang Wang, Yizhi Hao, Amando Xu Cong

(参考訳) デジタルメディアの領域では、AI生成合成画像の出現は、実物と製作された視覚コンテンツを区別する上で大きな課題をもたらしている。これらの画像は、しばしば真偽とは区別できないが、デジタルメディアの信頼性への脅威となり、偽情報や詐欺につながる可能性がある。我々の研究は、AI生成画像と実画像の識別に機械学習技術を活用することで、この課題に対処する。私たちのアプローチの中心は、"Real"と"Fake"とラベル付けされた画像の包括的なコレクションであるCIFAKEデータセットです。 ResNet、VGGNet、DenseNetといった先進的なディープラーニングアーキテクチャを洗練・適応し、トランスファーラーニングを利用して合成画像の識別精度を向上させる。また,これらを,バニラサポートベクトルマシン(SVM)と独自の畳み込みニューラルネットワーク(CNN)からなるベースラインモデルと比較した。 DenseNetは97.74%の精度で、私たちの最適化されたディープラーニングモデルは従来の手法より優れていることを示した。本研究は,これらの高度なモデルを合成画像検出に適用し,最適化し,様々なメトリクスを用いた比較分析を行い,従来の機械学習手法よりもai生成画像の識別に優れた性能を示す。この研究は、デジタルメディアの整合性の分野を前進させるだけでなく、デジタルメディアにおけるAI生成コンテンツの倫理的・技術的側面を探求するための基盤となる。

In the realm of digital media, the advent of AI-generated synthetic images has introduced significant challenges in distinguishing between real and fabricated visual content. These images, often indistinguishable from authentic ones, pose a threat to the credibility of digital media, with potential implications for disinformation and fraud. Our research addresses this challenge by employing machine learning techniques to discern between AI-generated and genuine images. Central to our approach is the CIFAKE dataset, a comprehensive collection of images labeled as "Real" and "Fake". We refine and adapt advanced deep learning architectures like ResNet, VGGNet, and DenseNet, utilizing transfer learning to enhance their precision in identifying synthetic images. We also compare these with a baseline model comprising a vanilla Support Vector Machine (SVM) and a custom Convolutional Neural Network (CNN). The experimental results were significant, demonstrating that our optimized deep learning models outperform traditional methods, with DenseNet achieving an accuracy of 97.74%. Our application study contributes by applying and optimizing these advanced models for synthetic image detection, conducting a comparative analysis using various metrics, and demonstrating their superior capability in identifying AI-generated images over traditional machine learning techniques. This research not only advances the field of digital media integrity but also sets a foundation for future explorations into the ethical and technical dimensions of AI-generated content in digital media.

翻訳日:2024-01-17 18:45:51 公開日:2024-01-14

# BUGSPHP:PHPの自動プログラム修復のためのデータセット

BUGSPHP: A dataset for Automated Program Repair in PHP ( http://arxiv.org/abs/2401.07356v1 )

ライセンス: Link先を確認

K.D. Pramod, W.T.N. De Silva, W.U.K. Thabrew, Ridwan Shariffdeen, Sandareka Wickramanayake

(参考訳) 自動プログラム修正(APR)は、デバッグとバグ修正時間を節約することで開発者の生産性を向上させる。 APRはC/C++とJavaプログラムで広く研究されているが、ベンチマークPHPバグデータセットがないため、PHPプログラムのバグについてはほとんど研究されていない。 PHPが20年以上にわたって最も広く使われているサーバーサイド言語の一つであり、eコマース、ソーシャルネットワーク、コンテンツ管理といったさまざまなコンテキストで使われていることは驚くべきことです。本稿では,実世界のアプリケーションであるBUGSPHPにおけるPHPバグのベンチマークデータセットを提案する。データセットはトレーニングとテストデータセットで構成され、GitHubから別々にキュレーションされ、ローカルに処理される。トレーニングデータセットには600,000以上のバグ修正コミットが含まれている。テストデータセットには、開発者が提供するテストケースを備えた手作業によるバグ修正コミット513が含まれている。

Automated Program Repair (APR) improves developer productivity by saving debugging and bug-fixing time. While APR has been extensively explored for C/C++ and Java programs, there is little research on bugs in PHP programs due to the lack of a benchmark PHP bug dataset. This is surprising given that PHP has been one of the most widely used server-side languages for over two decades, being used in a variety of contexts such as e-commerce, social networking, and content management. This paper presents a benchmark dataset of PHP bugs on real-world applications called BUGSPHP, which can enable research on analysis, testing, and repair for PHP programs. The dataset consists of training and test datasets, separately curated from GitHub and processed locally. The training dataset includes more than 600,000 bug-fixing commits. The test dataset contains 513 manually validated bug-fixing commits equipped with developer-provided test cases to assess patch correctness.

翻訳日:2024-01-17 18:45:26 公開日:2024-01-14

# 低高度空域認証管理のための工学フェアと等価ソフトウェアシステムを目指して

Towards Engineering Fair and Equitable Software Systems for Managing Low-Altitude Airspace Authorizations ( http://arxiv.org/abs/2401.07353v1 )

ライセンス: Link先を確認

Usman Gohar, Michael C. Hunter, Agnieszka Marczak-Czajka, Robyn R. Lutz, Myra B. Cohen, Jane Cleland-Huang

(参考訳) 小型無人航空機システム(SUAS)は様々な用途に広く採用されている。これにより、共有空域内の運用上の複雑さと報告されたインシデントの増加が導入され、安全性への懸念が高まっている。これに対し、アメリカ連邦航空局(FAA)は、そのミッションを安全に完了させるSUASの予測能力に基づいて、空域へのアクセスを制御するUAS Traffic Management (UTM)システムを開発している。しかし、飛行要求を迅速に承認または否定できる完全自動化システムはバイアスを起こしやすいため、多様な利害関係者にとって安全、透明性、公平性を考慮しなければならない。本稿では,自動化システムにおいて考慮すべき要因について,利害関係者の視点を考察する最初の研究を行う。その結果、飛行特性と環境条件が最も重要視されているが、パイロットとドローンの能力も考慮すべきである。さらに、いくつかの回答者はAIをサポートする自動化への反対を示し、自動意思決定における完全な透明性の必要性を強調した。結果は、UTM飛行認可決定の自動化の課題に関する社会的視点を提供し、より広範なsUASコミュニティに受け入れられる解決策の継続的な設計の枠組み化を支援する。

Small Unmanned Aircraft Systems (sUAS) have gained widespread adoption across a diverse range of applications. This has introduced operational complexities within shared airspaces and an increase in reported incidents, raising safety concerns. In response, the U.S. Federal Aviation Administration (FAA) is developing a UAS Traffic Management (UTM) system to control access to airspace based on an sUAS's predicted ability to safely complete its mission. However, a fully automated system capable of swiftly approving or denying flight requests can be prone to bias and must consider safety, transparency, and fairness to diverse stakeholders. In this paper, we present an initial study that explores stakeholders' perspectives on factors that should be considered in an automated system. Results indicate flight characteristics and environmental conditions were perceived as most important but pilot and drone capabilities should also be considered. Further, several respondents indicated an aversion to any AI-supported automation, highlighting the need for full transparency in automated decision-making. Results provide a societal perspective on the challenges of automating UTM flight authorization decisions and help frame the ongoing design of a solution acceptable to the broader sUAS community.

翻訳日:2024-01-17 18:45:13 公開日:2024-01-14

# EU法における生成AI - 責任、プライバシ、知的財産権、サイバーセキュリティ

Generative AI in EU Law: Liability, Privacy, Intellectual Property, and Cybersecurity ( http://arxiv.org/abs/2401.07348v1 )

ライセンス: Link先を確認

Claudio Novelli, Federico Casolari, Philipp Hacker, Giorgio Spedicato, Luciano Floridi

(参考訳) 生成AIの出現、特にChatGPTとその後継者のような大規模言語モデル(LLM)を通じて、AIの世界におけるパラダイムシフトを象徴する。高度なLCMはマルチモーダリティを示し、多様なデータフォーマットを扱い、アプリケーションの範囲を広げる。しかし、これらのモデルの複雑さと創発的な自律性は、予測可能性と法的コンプライアンスの課題をもたらす。本稿では、欧州連合の文脈におけるジェネレーティブAIとLLMの法的および規制的な意味を掘り下げ、責任、プライバシー、知的財産権、サイバーセキュリティの側面を分析する。人工知能法(AIA)の草案を含む、既存のおよび提案されたEUの法律の妥当性を批判的に検証し、ジェネレーティブAIの一般的な問題、特にLLMの課題に対処する。本稿は、立法枠組みにおける潜在的なギャップと欠点を特定し、生成モデルの安全かつコンプライアンスの確保と、EUの進化するデジタルランドスケープと法的基準との整合性を確保するための勧告を提案する。

The advent of Generative AI, particularly through Large Language Models (LLMs) like ChatGPT and its successors, marks a paradigm shift in the AI landscape. Advanced LLMs exhibit multimodality, handling diverse data formats, thereby broadening their application scope. However, the complexity and emergent autonomy of these models introduce challenges in predictability and legal compliance. This paper delves into the legal and regulatory implications of Generative AI and LLMs in the European Union context, analyzing aspects of liability, privacy, intellectual property, and cybersecurity. It critically examines the adequacy of the existing and proposed EU legislation, including the Artificial Intelligence Act (AIA) draft, in addressing the unique challenges posed by Generative AI in general and LLMs in particular. The paper identifies potential gaps and shortcomings in the legislative framework and proposes recommendations to ensure the safe and compliant deployment of generative models, ensuring they align with the EU's evolving digital landscape and legal standards.

翻訳日:2024-01-17 18:44:51 公開日:2024-01-14

# 誰が言った? 幼児教室における音声分析の自動化

Who Said What? An Automated Approach to Analyzing Speech in Preschool Classrooms ( http://arxiv.org/abs/2401.07342v1 )

ライセンス: Link先を確認

Anchen Sun, Juan J Londono, Batya Elbaum, Luis Estrada, Roberto Jose Lazo, Laura Vitale, Hugo Gonzalez Villasanti, Riccardo Fusaroli, Lynn K Perry, Daniel S Messinger

(参考訳) 幼児は、騒音の多い幼稚園の教室で覚醒時間の大部分を過ごします。これらの環境では、教師との子どもの音声対話は言語結果に重要な貢献者であるが、手動による対話の翻訳は禁止されている。児童・教師向けレコーダーの音声を用いて,話者分類(ALICE)と発話書き起こし(Whisper)の両方にオープンソースソフトウェアを利用する自動フレームワークを提案する。本研究では,110分間の授業記録において,児童語マイクロフォン(n=4児)から85分間,教師・女性マイクロホン(n=2教師)から25分間の成績を比較した。すなわち、正しく分類された教師と子供の発話の割合は.76であり、誤り訂正されたカッパは.50、重み付けされたF1は.76である。教師と児童の書き起こしにおける単語エラー率は .15 であり、Whisper と専門家の書き起こしを同等にするためには、15%の単語を削除、追加、あるいは変更する必要がある。また, 単語の平均発話長, 質問文である教師と児童の発話率, 2.5秒以内で回答した発話の割合などの音声特徴は, 専門家と自動書き起こしとは別々に計算した場合に類似していた。その結果, 児童の言語発達を支援する教室音声の分析の進歩が示唆された。自然言語処理を用いた今後の研究は、話者分類の改善と、自動化された学習フレームワークの適用から、13人の子供と4人の教師が1年間に17回観察した教室記録を含むより大きなデータセットまでの分析が進められている。

Young children spend substantial portions of their waking hours in noisy preschool classrooms. In these environments, children's vocal interactions with teachers are critical contributors to their language outcomes, but manually transcribing these interactions is prohibitive. Using audio from child- and teacher-worn recorders, we propose an automated framework that uses open source software both to classify speakers (ALICE) and to transcribe their utterances (Whisper). We compare results from our framework to those from a human expert for 110 minutes of classroom recordings, including 85 minutes from child-word microphones (n=4 children) and 25 minutes from teacher-worn microphones (n=2 teachers). The overall proportion of agreement, that is, the proportion of correctly classified teacher and child utterances, was .76, with an error-corrected kappa of .50 and a weighted F1 of .76. The word error rate for both teacher and child transcriptions was .15, meaning that 15% of words would need to be deleted, added, or changed to equate the Whisper and expert transcriptions. Moreover, speech features such as the mean length of utterances in words, the proportion of teacher and child utterances that were questions, and the proportion of utterances that were responded to within 2.5 seconds were similar when calculated separately from expert and automated transcriptions. The results suggest substantial progress in analyzing classroom speech that may support children's language development. Future research using natural language processing is underway to improve speaker classification and to analyze results from the application of the automated it framework to a larger dataset containing classroom recordings from 13 children and 4 teachers observed on 17 occasions over one year.

翻訳日:2024-01-17 18:44:33 公開日:2024-01-14

# オンラインソーシャルリーダーにおけるシェイクスピアとカンパニーの残余

The Afterlives of Shakespeare and Company in Online Social Readership ( http://arxiv.org/abs/2401.07340v1 )

ライセンス: Link先を確認

Maria Antoniak, David Mimno, Rosamond Thalken, Melanie Walsh, Matthew Wilkens, Gregory Yauney

(参考訳) GoodreadsやLibraryThingといったソーシャル読書プラットフォームの成長により、非常に大規模かつ詳細な読書活動を分析することができます。しかし21世紀のシステムは、現代の読者のみに視点を与えてくれる。一方、シェイクスピア・アンド・カンパニーの貸出図書館記録のデジタル化は、戦間期パリのより小規模のコミュニティの読書活動の窓口となっている。本稿では,シェイクスピア・アンド・カンパニー・コミュニティとグッドリードズ・コミュニティの比較を行う。類似点と相違点の定量化によって、これらのデータセット間での作業の増加や人気低下のパターンを特定できる。また,共読パターンの類似性と差異を計測することで,作業の受信方法の違いを測定することもできる。最後に、共読の完全なネットワークを調べることで、文学的レセプションの全体構造の変化を観察することができる。

The growth of social reading platforms such as Goodreads and LibraryThing enables us to analyze reading activity at very large scale and in remarkable detail. But twenty-first century systems give us a perspective only on contemporary readers. Meanwhile, the digitization of the lending library records of Shakespeare and Company provides a window into the reading activity of an earlier, smaller community in interwar Paris. In this article, we explore the extent to which we can make comparisons between the Shakespeare and Company and Goodreads communities. By quantifying similarities and differences, we can identify patterns in how works have risen or fallen in popularity across these datasets. We can also measure differences in how works are received by measuring similarities and differences in co-reading patterns. Finally, by examining the complete networks of co-readership, we can observe changes in the overall structures of literary reception.

翻訳日:2024-01-17 18:44:02 公開日:2024-01-14

# CodeAgent: リアルタイムリポジトリレベルのコーディング課題のためのツール統合エージェントシステムによるコード生成の強化

CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges ( http://arxiv.org/abs/2401.07339v1 )

ライセンス: Link先を確認

Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, Zhi Jin

(参考訳) 大規模言語モデル(llm)は自動コード生成において有望であるが、一般的にはスタンドアロンコード単位生成のような単純なタスクでのみ優れている。しかし、実際のソフトウェア開発には、複雑な依存関係と広範なドキュメントを持つ複雑なコードリポジトリ(リポジトリという名前)が伴うことが多い。このギャップを埋めるために、我々の研究は、より現実的な、現実世界のリポジトリレベルのコード生成でLLMを評価することに重点を置いています。我々は,リポジトリレベルのコード生成のための手作業によるベンチマークであるCodeAgentBenchを紹介する。このベンチマークは、合計101サンプルを含む5つの高品質pythonプロジェクトで構成されている。我々は,リポジトリレベルのタスクにおいて9つの主要なllmを評価し,その性能の低下を観察した。そこで本研究では,レポレベルの効率的なコード生成に外部ツールを活用する新しいLLMベースのエージェントフレームワークであるCodeAgentを提案する。 CodeAgentは5つのプログラミングツールを統合し、情報検索、コードシンボルナビゲーション、コードテストのためのソフトウェアアーティファクトとのインタラクションを可能にする。これらのツールの使用を最適化するための4つのエージェント戦略を実装した。 CodeAgentBenchの実験では、CodeAgentはLLMの性能を大幅に向上させ、18.1\%から250\%に改善した。 HumanEvalベンチマークのさらなるテストでは、さまざまなコード生成タスクに対するCodeAgentの適応性と有効性を確認している。 CodeAgentはGithub Copilotのような商用製品よりも優れており、精度と効率が優れている。これらの結果は、コード生成におけるcodeagentの堅牢な能力を示し、実際のリポジトリレベルのコーディング課題の可能性を強調している。

Large Language Models (LLMs) have shown promise in automated code generation but typically excel only in simpler tasks such as generating standalone code units. Real-world software development, however, often involves complex code repositories (named repo) with complex dependencies and extensive documentation. To fill this gap, our research pivots towards evaluating LLMs in a more realistic setting -- real-world repo-level code generation. We introduce CodeAgentBench, a manually curated benchmark for repo-level code generation. This benchmark comprises five high-quality Python projects, encompassing a total of 101 samples. We assess nine leading LLMs on repo-level tasks and observe a decline in their performance. To tackle this, we present CodeAgent, a novel LLM-based agent framework that employs external tools for effective repo-level code generation. CodeAgent integrates five programming tools, enabling interaction with software artifacts for information retrieval, code symbol navigation, and code testing. We implement four agent strategies to optimize these tools' usage. Our experiments on CodeAgentBench show that CodeAgent enhances LLM performance significantly, with improvements ranging from 18.1\% to 250\%. Further tests on the HumanEval benchmark confirm CodeAgent's adaptability and efficacy across various code generation tasks. Notably, CodeAgent outperforms commercial products like Github Copilot, showcasing superior accuracy and efficiency. These results demonstrate CodeAgent's robust capabilities in code generation, highlighting its potential for real-world repo-level coding challenges.

翻訳日:2024-01-17 18:43:48 公開日:2024-01-14

# マンダリン多モーダル感情音声データベースの構築と評価

Construction and Evaluation of Mandarin Multimodal Emotional Speech Database ( http://arxiv.org/abs/2401.07336v1 )

ライセンス: Link先を確認

Zhu Ting, Li Liangqi, Duan Shufei, Zhang Xueying, Xiao Zhongzhe, Jia Hairng, Liang Huizhi

(参考訳) コーパス設計、主題選択、記録詳細及びデータ処理の側面から詳細に記述した、調音運動、音響、声門および顔の微小表現を含むマルチモーダル感情音声マンダリンデータベースを設計、確立する。信号は離散的な感情ラベル(中性、幸福、快楽、無関心、怒り、悲しみ、悲しみ)と次元的な感情ラベル(快楽、覚醒、支配)でラベル付けされる。本稿では,次元アノテーションデータの統計的解析により,次元アノテーションの有効性を検証する。注釈者のscl-90スケールデータを検証し、解析用パッドアノテーションデータと組み合わせ、アノテーションの異常現象と注釈者の心理状態との関係を探究する。本稿では,データベースの音声品質と感情識別の検証のために,svm,cnn,dnnの3つの基本モデルを用いて,これら7つの感情の認識率を計算する。その結果,音響データのみを用いた場合の7感情の平均認識率は約82%であった。声門データのみを使用する場合、平均認識率は約72%である。 kinematicsのデータだけで、平均認識率は55.7%に達する。したがって、データベースは高品質であり、特にマルチモーダル感情音声分析のタスクにおいて、音声分析研究の重要な情報源として使用できる。

A multi-modal emotional speech Mandarin database including articulatory kinematics, acoustics, glottal and facial micro-expressions is designed and established, which is described in detail from the aspects of corpus design, subject selection, recording details and data processing. Where signals are labeled with discrete emotion labels (neutral, happy, pleasant, indifferent, angry, sad, grief) and dimensional emotion labels (pleasure, arousal, dominance). In this paper, the validity of dimension annotation is verified by statistical analysis of dimension annotation data. The SCL-90 scale data of annotators are verified and combined with PAD annotation data for analysis, so as to explore the internal relationship between the outlier phenomenon in annotation and the psychological state of annotators. In order to verify the speech quality and emotion discrimination of the database, this paper uses 3 basic models of SVM, CNN and DNN to calculate the recognition rate of these seven emotions. The results show that the average recognition rate of seven emotions is about 82% when using acoustic data alone. When using glottal data alone, the average recognition rate is about 72%. Using kinematics data alone, the average recognition rate also reaches 55.7%. Therefore, the database is of high quality and can be used as an important source for speech analysis research, especially for the task of multimodal emotional speech analysis.

翻訳日:2024-01-17 18:43:23 公開日:2024-01-14

# ELLA-V: アライメント誘導配列並べ替えによる安定型ニューラルコーデック言語モデリング

ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering ( http://arxiv.org/abs/2401.07333v1 )

ライセンス: Link先を確認

Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Xie Chen

(参考訳) VALL-Eのような音響的および言語的プロンプトに基づく言語モデル(LM)アプローチは、ゼロショット音声生成の分野で顕著な進歩を遂げた。しかし、既存の方法にはいくつかの制限がある。 1) 音声及び音素トークン間のアライメントの制限による出力合成音声における繰り返し、転置及び省略 2)自己回帰(AR)言語モデルを用いた合成音声のきめ細かい制御の課題 3)ARによる復号化の性質,特に欲張り戦略の下での無限沈黙生成。そこで本研究では,音素レベルでの合成音声のきめ細かい制御を可能にする,単純かつ効率的なlmベースのゼロショットテキスト・ツー・スパイチ(tts)フレームワークであるella-vを提案する。 ELLA-Vの鍵となるのは、対応する音響トークンよりも先に音素トークンが現れる音響トークンと音素トークンの連成である。実験結果から,本モデルは精度でVALL-Eより優れ,グリージーおよびサンプリングに基づく復号方式によりより安定した結果が得られることがわかった。 ELLA-Vのコードはクリーンアップ後にオープンソース化される。オーディオサンプルはhttps://ereboas.github.io/ELLAV/で入手できる。

The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation. However, existing methods still have some limitations: 1) repetitions, transpositions, and omissions in the output synthesized speech due to limited alignment constraints between audio and phoneme tokens; 2) challenges of fine-grained control over the synthesized speech with autoregressive (AR) language model; 3) infinite silence generation due to the nature of AR-based decoding, especially under the greedy strategy. To alleviate these issues, we propose ELLA-V, a simple but efficient LM-based zero-shot text-to-speech (TTS) framework, which enables fine-grained control over synthesized audio at the phoneme level. The key to ELLA-V is interleaving sequences of acoustic and phoneme tokens, where phoneme tokens appear ahead of the corresponding acoustic tokens. The experimental findings reveal that our model outperforms VALL-E in terms of accuracy and delivers more stable results using both greedy and sampling-based decoding strategies. The code of ELLA-V will be open-sourced after cleanups. Audio samples are available at https://ereboas.github.io/ELLAV/.

翻訳日:2024-01-17 18:43:01 公開日:2024-01-14

# 注意型UNetによるモノのインターネット上の軽量画像セマンティック通信システム

Attention-based UNet enabled Lightweight Image Semantic Communication System over Internet of Things ( http://arxiv.org/abs/2401.07329v1 )

ライセンス: Link先を確認

Guoxin Ma, Haonan Tong, Nuocheng Yang, and Changchuan Yin

(参考訳) 本稿では,モノのインターネット(IoT)デバイス上に展開される軽量画像意味コミュニケーションシステムの問題について検討する。考慮されたシステムモデルでは、デバイスはデータ伝送効率の高い究極のビデオサービスにおけるユーザの行動認識をサポートするためにセマンティック通信技術を使用する必要がある。しかし、ディープラーニング(DL)ベースのコーデックトレーニングと推論の複雑な計算プロセスのため、IoTデバイスがセマンティックコーデックをデプロイするのは計算コストがかかる。セマンティック通信システムをIoTデバイスで展開するために,低計算複雑性と小型モデルサイズを実現する,注目ベースの軽量画像セマンティック通信(LSSC)システムを提案する。特に,我々はまず,エッジサーバのコーデックをLSSCシステムにトレーニングさせ,IoTデバイス上でのトレーニング計算負荷を低減する。次に,畳み込みブロックアテンションモジュール(cbam)を導入し,画像意味的特徴を抽出し,ダウンサンプリング層数を減らし,浮動小数点演算(flops)を削減する。最後に,コーデックの構造を実験的に調整し,ダウンサンプリング層の最適数を求める。シミュレーションの結果,提案するLSSCシステムにより,意味コーデックFLOPを14%削減し,モデルサイズを55%削減できることがわかった。さらに,提案方式は低チャネルsnr(signal-to-noise)領域において,従来の通信方式よりも高い伝送精度を実現することができる。

This paper studies the problem of the lightweight image semantic communication system that is deployed on Internet of Things (IoT) devices. In the considered system model, devices must use semantic communication techniques to support user behavior recognition in ultimate video service with high data transmission efficiency. However, it is computationally expensive for IoT devices to deploy semantic codecs due to the complex calculation processes of deep learning (DL) based codec training and inference. To make it affordable for IoT devices to deploy semantic communication systems, we propose an attention-based UNet enabled lightweight image semantic communication (LSSC) system, which achieves low computational complexity and small model size. In particular, we first let the LSSC system train the codec at the edge server to reduce the training computation load on IoT devices. Then, we introduce the convolutional block attention module (CBAM) to extract the image semantic features and decrease the number of downsampling layers thus reducing the floating-point operations (FLOPs). Finally, we experimentally adjust the structure of the codec and find out the optimal number of downsampling layers. Simulation results show that the proposed LSSC system can reduce the semantic codec FLOPs by 14%, and reduce the model size by 55%, with a sacrifice of 3% accuracy, compared to the baseline. Moreover, the proposed scheme can achieve a higher transmission accuracy than the traditional communication scheme in the low channel signal-to-noise (SNR) region.

翻訳日:2024-01-17 18:42:40 公開日:2024-01-14

# 従来のアプローチを超えて:乳房超音波診断のためのマルチタスクネットワーク

Beyond Traditional Approaches: Multi-Task Network for Breast Ultrasound Diagnosis ( http://arxiv.org/abs/2401.07326v1 )

ライセンス: Link先を確認

Dat T. Chung, Minh-Anh Dang, Mai-Anh Vu, Minh T. Nguyen, Thanh-Huy Nguyen, and Vinh Q. Dinh

(参考訳) 乳腺超音波は非侵襲的アプローチとして癌診断において重要な役割を担っている。近年、深層学習の発展に伴い、腫瘍の局在化と癌分類のタスクにおいて多くのCNNベースのアプローチが研究されている。従来のシングルモデルは両方のタスクで優れたパフォーマンスを達成したが、これらのメソッドは推論時間、GPU要求、各モデルの微調整にいくつかの制限がある。本研究では,分割と分類の両方を行うために,エンドツーエンドのマルチタスクアーキテクチャを再設計し,構築することを目的とする。提案手法では,セグメンテーションタスクにおけるDeepLabV3+アーキテクチャの79.8%と86.4%で,優れた性能と時間効率を実現した。

Breast Ultrasound plays a vital role in cancer diagnosis as a non-invasive approach with cost-effective. In recent years, with the development of deep learning, many CNN-based approaches have been widely researched in both tumor localization and cancer classification tasks. Even though previous single models achieved great performance in both tasks, these methods have some limitations in inference time, GPU requirement, and separate fine-tuning for each model. In this study, we aim to redesign and build end-to-end multi-task architecture to conduct both segmentation and classification. With our proposed approach, we achieved outstanding performance and time efficiency, with 79.8% and 86.4% in DeepLabV3+ architecture in the segmentation task.

翻訳日:2024-01-17 18:42:14 公開日:2024-01-14

# Sachdev-Ye-Kitaev模型のランダムプルーニング

Randomly Pruning the Sachdev-Ye-Kitaev model ( http://arxiv.org/abs/2401.07325v1 )

ライセンス: Link先を確認

Richard Berkovits

(参考訳) SYKモデル(Sachdev-Ye-Kitaev model)はその短時間のカオス的振る舞いで知られ、量子重力やホログラフィーなどの様々な分野への応用において基本的な役割を果たす。エネルギースペクトルの普遍的なカオス的振る舞いが停止するエネルギースケールを表すThouless Energyは、スペクトル自身から決定することができる。古典的あるいは量子コンピュータ上でSYKモデルをシミュレートする場合、結合をランダムに切断することでハミルトン項の項数を最小化することが有利である。本稿では,多数の結合を排除しながらも,カオス的挙動は短時間で持続することを示した。これは,完全連結SYKモデル(具体的には$O(KL)$)において,元の$O(L^4)$結合のごく一部しか保持されていない場合でも事実である。ここで、$l$はサイト数を表し、$k\sim 10$である。短時間スケールに対応する長距離エネルギースケールの特性を数値的特異値分解(svd)とレベル数分散計算により検証する。

The Sachdev-Ye-Kitaev model (SYK) is renowned for its short-time chaotic behavior, which plays a fundamental role in its application to various fields such as quantum gravity and holography. The Thouless energy, representing the energy scale at which the universal chaotic behavior in the energy spectrum ceases, can be determined from the spectrum itself. When simulating the SYK model on classical or quantum computers, it is advantageous to minimize the number of terms in the Hamiltonian by randomly pruning the couplings. In this paper, we demonstrate that even with a significant pruning, eliminating a large number of couplings, the chaotic behavior persists up to short time scales This is true even when only a fraction of the original $O(L^4)$ couplings in the fully connected SYK model, specifically $O(KL)$, is retained. Here, $L$ represents the number of sites, and $K\sim 10$. The properties of the long-range energy scales, corresponding to short time scales, are verified through numerical singular value decomposition (SVD) and level number variance calculations.

翻訳日:2024-01-17 18:42:02 公開日:2024-01-14

# 小さなLLMは弱いツール学習者:マルチLLMエージェント

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent ( http://arxiv.org/abs/2401.07324v1 )

ライセンス: Link先を確認

Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang

(参考訳) 大規模言語モデル(LLM)エージェントはスタンドアロンのLLMの機能を大幅に拡張し、外部ツール(API、関数など)と対話し、自己指向的な複雑なタスクを完了させる。ツール利用の課題は、LCMがユーザクエリを理解し、回答を生成するだけでなく、タスク計画、メモリ管理、ツールの実行、結果の要約にも長けていることである。従来のアプローチでは、これらすべての機能で単一のLLMをトレーニングすることに重点を置いているが、特に小さなモデルでは、パフォーマンス上の制限が明らかになっている。さらに、LDM全体がツールの更新時に再トレーニングを必要とする場合がある。これらの課題を克服するため,我々は,上記の機能をプランナー,呼び出し元,要約元に分解する新しい戦略を提案する。各コンポーネントは、特定の機能に焦点を当てた単一のLCMによって実装され、タスクを達成するために他のコンポーネントと協調する。このモジュール化フレームワークは、個々の更新と、各機能を構築するためのより小さなllmの使用を促進する。このフレームワークを効果的にトレーニングするために,2段階のトレーニングパラダイムを導入する。まず、サブタスクを識別することなく、データセット全体のバックボーンLDMを微調整し、タスクを包括的に理解するモデルを提供する。次に、微調整LDMを用いて、各サブタスク上で連続的に微調整されるプランナー、呼び出し元、および要約器をインスタンス化する。ツール使用ベンチマークによる評価は,提案したマルチLLMフレームワークが従来の単一LLMアプローチを超越していることを示し,ツール学習の有効性とメリットを強調している。

Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs, empowering them to interact with external tools (e.g., APIs, functions) and complete complex tasks in a self-directed fashion. The challenge of tool use demands that LLMs not only understand user queries and generate answers but also excel in task planning, memory management, tool invocation, and result summarization. While traditional approaches focus on training a single LLM with all these capabilities, performance limitations become apparent, particularly with smaller models. Moreover, the entire LLM may require retraining when tools are updated. To overcome these challenges, we propose a novel strategy that decomposes the aforementioned capabilities into a planner, caller, and summarizer. Each component is implemented by a single LLM that focuses on a specific capability and collaborates with other components to accomplish the task. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability. To effectively train this framework, we introduce a two-stage training paradigm. First, we fine-tune a backbone LLM on the entire dataset without discriminating sub-tasks, providing the model with a comprehensive understanding of the task. Second, the fine-tuned LLM is used to instantiate the planner, caller, and summarizer respectively, which are continually fine-tuned on respective sub-tasks. Evaluation across various tool-use benchmarks illustrates that our proposed multi-LLM framework surpasses the traditional single-LLM approach, highlighting its efficacy and advantages in tool learning.

翻訳日:2024-01-17 18:41:39 公開日:2024-01-14

# 強い誘導バイアス:二元画像分類のためのGzip

A Strong Inductive Bias: Gzip for binary image classification ( http://arxiv.org/abs/2401.07392v1 )

ライセンス: Link先を確認

Marco Scilipoti, Marina Fuster and Rodrigo Ramele

(参考訳) ディープラーニングネットワークは、産業と研究のためのコンピュータビジョンのデファクトスタンダードになっている。しかし、従兄弟である自然言語処理(nlp)における最近の進展は、強い帰納的バイアスを持つパラメータレスモデルが計算上安価でより単純な代替手段として役立つことを示した。本稿では,Gzipのような汎用圧縮機と隣接した2値画像分類モデルを提案する。 Resnet、EfficientNet、Mobilenetといった一般的なディープラーニングネットワークと比較し、精度が向上し、2桁以上の空間を数ショット設定で大幅に削減できることを示します。結果として、少数のシナリオにおいて、より強い帰納バイアスを持つモデルの未解決ポテンシャルが弱まると信じている。

Deep learning networks have become the de-facto standard in Computer Vision for industry and research. However, recent developments in their cousin, Natural Language Processing (NLP), have shown that there are areas where parameter-less models with strong inductive biases can serve as computationally cheaper and simpler alternatives. We propose such a model for binary image classification: a nearest neighbor classifier combined with a general purpose compressor like Gzip. We test and compare it against popular deep learning networks like Resnet, EfficientNet and Mobilenet and show that it achieves better accuracy and utilizes significantly less space, more than two order of magnitude, within a few-shot setting. As a result, we believe that this underlines the untapped potential of models with stronger inductive biases in few-shot scenarios.

翻訳日:2024-01-17 18:34:12 公開日:2024-01-14

# 膝またはROC

Knee or ROC ( http://arxiv.org/abs/2401.07390v1 )

ライセンス: Link先を確認

Veronica Wendt, Byunggu Yu, Caleb Kelly, and Junwhan Kim

(参考訳) セルフアテンショントランスフォーマは、より小さなデータセットによる画像分類の精度を示している。しかし、現在までのテストは、画像人口の既知の表現を伴う単一クラス画像検出に基づいている。入力画像クラスが1より大きい場合と、画像人口の表現に関する完全な情報を持たないテストセットの場合、精度の計算が適応しなければならない。受信側動作特性(ROC)の精度は、マルチクラスの入力画像のインスタンスに対処できる。しかし、このアプローチは、画像の人口表現が不明な場合では不適当である。そこで我々は, 膝法を用いて, アドホックベースでしきい値を決定する計算精度について検討した。マルチクラス画像検出のためにcifar-10画像から作成した多クラスデータセットにおけるroc曲線と膝閾値について検討した。

Self-attention transformers have demonstrated accuracy for image classification with smaller data sets. However, a limitation is that tests to-date are based upon single class image detection with known representation of image populations. For instances where the input image classes may be greater than one and test sets that lack full information on representation of image populations, accuracy calculations must adapt. The Receiver Operating Characteristic (ROC) accuracy thresh-old can address the instances of multi-class input images. However, this approach is unsuitable in instances where image population representation is unknown. We consider calculating accuracy using the knee method to determine threshold values on an ad-hoc basis. Results of ROC curve and knee thresholds for a multi-class data set, created from CIFAR-10 images, are discussed for multi-class image detection.

翻訳日:2024-01-17 18:33:57 公開日:2024-01-14

# クラスタリングアルゴリズムの迅速なレビュー

A Rapid Review of Clustering Algorithms ( http://arxiv.org/abs/2401.07389v1 )

ライセンス: Link先を確認

Hui Yin, Amir Aryani, Stephen Petrie, Aishwarya Nambissan, Aland Astudillo, Shengyuan Cao

(参考訳) クラスタリングアルゴリズムは、データ内の固有のパターンと類似性に基づいて、データをグループまたはクラスタにまとめることを目的としている。それらは、マーケティングやeコマース、ヘルスケア、データ組織と分析、ソーシャルメディアなど、今日の生活において重要な役割を担っている。クラスタリングアルゴリズムは数多く存在し、新しいものを導入する開発が進行中である。各アルゴリズムには独自の強みと弱みがあり、現在、すべてのタスクに普遍的に適用可能なアルゴリズムは存在しない。本研究では,既存のクラスタリングアルゴリズムを分析し,基本原理と特徴,クラスタへのデータポイント割り当て,データセットのキャパシティ,クラスタ番号とアプリケーション領域という,5つの異なる次元にわたるメインストリームアルゴリズムを分類する。この分類は、様々な観点からクラスタリングアルゴリズムを理解し、特定のタスクを解くのに適したアルゴリズムを特定するのに役立つ。最後に,クラスタリングアルゴリズムの現状と今後の方向性について考察した。また、オープンな課題と未解決の問題を特定し議論した。

Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data organization and analysis, and social media. Numerous clustering algorithms exist, with ongoing developments introducing new ones. Each algorithm possesses its own set of strengths and weaknesses, and as of now, there is no universally applicable algorithm for all tasks. In this work, we analyzed existing clustering algorithms and classify mainstream algorithms across five different dimensions: underlying principles and characteristics, data point assignment to clusters, dataset capacity, predefined cluster numbers and application area. This classification facilitates researchers in understanding clustering algorithms from various perspectives and helps them identify algorithms suitable for solving specific tasks. Finally, we discussed the current trends and potential future directions in clustering algorithms. We also identified and discussed open challenges and unresolved issues in the field.

翻訳日:2024-01-17 18:33:46 公開日:2024-01-14

# デバイス非依存モデルによるネットワークインタラクションの最適化

Optimising network interactions through device agnostic models ( http://arxiv.org/abs/2401.07387v1 )

ライセンス: Link先を確認

Luca Manneschi, Ian T. Vidamour, Kilian D. Stenning, Jack C. Gartside, Charles Swindells, Guru Venkat, David Griffin, Susan Stepney, Will R. Branford, Thomas Hayward, Matt O Ellis, Eleni Vasilaki

(参考訳) 物理的に実装されたニューラルネットワークは、デバイス固有の物理的特性を計算ツールとして活用することにより、ディープラーニングモデルの性能を達成する可能性を秘めている。この計算のための物理過程の探索は、情報を処理する貴重な資源となる固有の力学も考慮する必要がある。しかし、既存の計算手法は、しばしば正確な数学的記述を欠いているデバイス力学に影響を与えるパラメータにディープラーニング技術の成功を拡張できない。本研究では,動的物理システムとのインタラクションを完全にデータ駆動方式で最適化するための普遍的なフレームワークを定式化する。このフレームワークは、神経確率微分方程式を微分可能なデジタル双対として採用し、デバイスの決定論的および確率的両方の振る舞いを効果的にキャプチャする。トレーニングされたモデルによる微分の利用は、物理ニューラルネットワークの最適化に不可欠な数学的推定を提供し、その物理ノードの固有の時間計算能力を活用する。実際のデバイスの動作を正確にモデル化するために,様々な実験環境で動作可能なニューラルsde変種を定式化した。本研究は、物理的に定義されたニューラルネットワークの展開を成功させる上で、システム確率を正確に捉えることの重要性を強調しながら、シミュレーションと相互作用する動的デバイスの物理的実装を通じて、フレームワークの適用性を示す。

Physically implemented neural networks hold the potential to achieve the performance of deep learning models by exploiting the innate physical properties of devices as computational tools. This exploration of physical processes for computation requires to also consider their intrinsic dynamics, which can serve as valuable resources to process information. However, existing computational methods are unable to extend the success of deep learning techniques to parameters influencing device dynamics, which often lack a precise mathematical description. In this work, we formulate a universal framework to optimise interactions with dynamic physical systems in a fully data-driven fashion. The framework adopts neural stochastic differential equations as differentiable digital twins, effectively capturing both deterministic and stochastic behaviours of devices. Employing differentiation through the trained models provides the essential mathematical estimates for optimizing a physical neural network, harnessing the intrinsic temporal computation abilities of its physical nodes. To accurately model real devices' behaviours, we formulated neural-SDE variants that can operate under a variety of experimental settings. Our work demonstrates the framework's applicability through simulations and physical implementations of interacting dynamic devices, while highlighting the importance of accurately capturing system stochasticity for the successful deployment of a physically defined neural network.

翻訳日:2024-01-17 18:33:33 公開日:2024-01-14

# マシンはどのように学習するか? AIcon2abs法の評価

How do machines learn? Evaluating the AIcon2abs method ( http://arxiv.org/abs/2401.07386v1 )

ライセンス: Link先を確認

Rubens Lacerda Queiroz, Cabral Lima, Fabio Ferrentini Sampaio, Priscila Machado Vieira Lima

(参考訳) 本稿では,最近の提案手法であるaicon2abs(queiroz et al., 2021)について評価する。これは、容易に理解できる機械学習メカニズムであるWiSARDを使用することで可能であり、ほとんど労力を要せず、ターゲットユーザからの技術的バックグラウンドも必要としない。 WiSARDはデジタルコンピューティングに忠実であり、トレーニングはRAMタイプのメモリへの書き込みから成り、分類はこれらのメモリからの読み込みから成り立っている。このモデルにより、学習や分類タスクの内部実現を簡単に可視化し、理解することができる。さらに、WiSARDモデルはトレーニングや分類にインターネット接続を必要としないため、いくつかの例から学ぶことができる。この機能により、マシンの観察が容易になり、使用する新しい例ごとに特定のタスクの精度が向上する。 WiSARDはこれまでに学んだことの「メンタルイメージ」を作成でき、特定のクラスに関連する重要な特徴を識別できる。 AIcon2abs法の有効性の評価は,作業負荷が約6時間である遠隔コースの評価を通じて行った。 8歳から11歳の子供5人、12歳から17歳の青年5人、21歳から72歳の成人24人であった。データ分析はハイブリッドアプローチを採用した。 AIcon2absは、研究対象者の約100%によって評価され、収集されたデータは、意図された結果に関して非常に満足な結果を示した。この研究は、CEP/HUCFF/FM/UFRJ Human Research Ethics Committeeによって承認されている。

This paper evaluates AIcon2abs (Queiroz et al., 2021), a recently proposed method that enables awareness among the general public on machine learning. Such is possible due to the use of WiSARD, an easily understandable machine learning mechanism, thus requiring little effort and no technical background from the target users. WiSARD is adherent to digital computing; training consists of writing to RAM-type memories, and classification consists of reading from these memories. The model enables easy visualization and understanding of training and classification tasks' internal realization through ludic activities. Furthermore, the WiSARD model does not require an Internet connection for training and classification, and it can learn from a few or one example. This feature makes it easier to observe the machine, increasing its accuracy on a particular task with each new example used. WiSARD can also create "mental images" of what it has learned so far, evidencing key features pertaining to a given class. The assessment of the AIcon2abs method's effectiveness was conducted through the evaluation of a remote course with a workload of approximately 6 hours. It was completed by thirty-four Brazilian subjects: 5 children between 8 and 11 years old; 5 adolescents between 12 and 17 years old; and 24 adults between 21 and 72 years old. Data analysis adopted a hybrid approach. AIcon2abs was well-rated by almost 100% of the research subjects, and the data collected revealed quite satisfactory results concerning the intended outcomes. This research has been approved by the CEP/HUCFF/FM/UFRJ Human Research Ethics Committee.

翻訳日:2024-01-17 18:33:12 公開日:2024-01-14

# DRLC:LLM批判からのDense Rewardsによる強化学習

DRLC: Reinforcement Learning with Dense Rewards from LLM Critic ( http://arxiv.org/abs/2401.07382v1 )

ライセンス: Link先を確認

Meng Cao, Lei Shu, Lei Yu, Yun Zhu, Nevan Wichers, Yinxiao Liu, Lei Meng

(参考訳) 強化学習(rl)は、言語モデルを人間の好みなど、区別できない報酬信号に合わせることができる。しかし、これらの報酬信号のスパース性から生じる大きな課題は、通常、世代全体に対して1つの報酬しか存在しないことである。この報酬の幅は非効率で不安定な学習につながる可能性がある。本稿では,LLMの批判的能力を活用して,学習過程を通じて深い報酬を生み出す新しい枠組みを提案する。我々のアプローチには、政策モデルと並んで批判言語モデルが組み込まれています。この批評家は、タスク記述、質問、ポリシーモデルの出力、環境の報酬信号を入力として促され、出力の各セグメントの品質を反映したトークンまたはスパンレベルの密集した報酬を提供する。我々は,感情制御,言語モデルのデトックス化,要約という3つのテキスト生成タスクに対するアプローチを評価する。実験結果から, 人工的な高密度報酬をトレーニングに取り入れることで, PPOベースラインを総合的な報酬で一貫した性能向上が得られることがわかった。さらに,同じモデルが政策と批判の両方として機能する環境では,自己批判的報酬が学習効率を高めることを実証する。

Reinforcement learning (RL) can align language models with non-differentiable reward signals, such as human preferences. However, a major challenge arises from the sparsity of these reward signals - typically, there is only one reward for the entire generation. This sparsity of rewards can lead to inefficient and unstable learning. In this paper, we introduce a novel framework leveraging the critique ability of LLMs to produce dense rewards throughout the learning process. Our approach incorporates a critic language model alongside the policy model. This critic is prompted with the task description, question, policy model's output, and environment's reward signal as input, and provides token or span-level dense rewards that reflect the quality of each segment of the output. We assess our approach on three text generation tasks: sentiment control, language model detoxification, and summarization. Experimental results show that incorporating artificial dense rewards in training yields consistent performance gains over the PPO baseline with holistic rewards. Furthermore, in a setting where the same model serves as both policy and critic, we demonstrate that "self-critique" rewards also boost learning efficiency.

翻訳日:2024-01-17 18:32:48 公開日:2024-01-14

# 物理インフォームドニューラルネットワークを用いた単細胞データからの動的遺伝子制御ネットワークの推定

Inference of dynamical gene regulatory networks from single-cell data with physics informed neural networks ( http://arxiv.org/abs/2401.07379v1 )

ライセンス: Link先を確認

Maria Mircea, Diego Garlaschelli, Stefan Semrau

(参考訳) 発達生物学の主な目的の1つは、多能性前駆体を正確に特定された細胞タイプに堅牢に分化させる遺伝子制御ネットワーク(GRN)を明らかにすることである。実験データからGRNを推定する既存の方法の多くは、推定されたGRNが単に遺伝子発現の類似性や相関を反映しているため、予測能力に制限がある。ここでは,物理インフォームドニューラルネットワーク(PINN)を用いて,生物学的プロセスの機械的理解を提供する予測的,動的GRNのパラメータを推定する方法を示す。具体的には, 分岐挙動を示すGRNについて検討し, 細胞分化をモデル化する。パラメータ推論タスクにおいて、PINNは通常のフィードフォワードニューラルネットワークよりも優れており、関連する2つの実験シナリオを分析する。 1 遺伝子発現経路が利用可能な細胞通信システム及び 2. 細胞間通信が欠如している細胞集団のスナップショット測定。我々の分析は、PINNで分析される将来の実験の設計を知らせ、この強力なニューラルネットワークモデルをさらに探求するための出発点を提供する。

One of the main goals of developmental biology is to reveal the gene regulatory networks (GRNs) underlying the robust differentiation of multipotent progenitors into precisely specified cell types. Most existing methods to infer GRNs from experimental data have limited predictive power as the inferred GRNs merely reflect gene expression similarity or correlation. Here, we demonstrate, how physics-informed neural networks (PINNs) can be used to infer the parameters of predictive, dynamical GRNs that provide mechanistic understanding of biological processes. Specifically we study GRNs that exhibit bifurcation behavior and can therefore model cell differentiation. We show that PINNs outperform regular feed-forward neural networks on the parameter inference task and analyze two relevant experimental scenarios: 1. a system with cell communication for which gene expression trajectories are available and 2. snapshot measurements of a cell population in which cell communication is absent. Our analysis will inform the design of future experiments to be analyzed with PINNs and provides a starting point to explore this powerful class of neural network models further.

翻訳日:2024-01-17 18:32:27 公開日:2024-01-14

# 最近傍探索に基づく地球モーバー距離の効率的な近似

Efficient approximation of Earth Mover's Distance Based on Nearest Neighbor Search ( http://arxiv.org/abs/2401.07378v1 )

ライセンス: Link先を確認

Guangyu Meng, Ruyu Zhou, Liu Liu, Peixian Liang, Fang Liu, Danny Chen, Michael Niemier, X.Sharon Hu

(参考訳) Earth Mover's Distance (EMD) は、2つの分布間の重要な類似度尺度であり、コンピュータビジョンやその他の多くのアプリケーションドメインで使用される。しかし、その正確な計算は計算量とメモリ集約性であり、大規模問題に対するスケーラビリティと適用性を妨げる。計算コストを削減するために様々な近似EMDアルゴリズムが提案されているが、精度が低下し、追加のメモリ使用量や手動パラメータチューニングが必要になる可能性がある。本稿では,NNS-EMDという新しい手法を用いて,近縁探索(NNS)を用いてEMDを近似し,高い精度,低時間複雑度,高メモリ効率を実現する。 NNS操作は、NNSイテレーション毎のデータポイント数を削減し、並列処理の機会を提供する。我々はさらに、大規模なデータセットに特に有益であるGPU上のベクトル化により、NS-EMDを加速する。我々は,NNS-EMDを画像分類および検索タスクにおける正確なEMDアルゴリズムと最先端の近似EMDアルゴリズムを比較した。また、NNS-EMDを用いてトランスポートマッピングを計算し、画像間の色移動を実現する。 NNS-EMDは、正確なEMD実装よりも44倍から135倍高速で、既存の近似EMD法よりも精度、スピードアップ、メモリ効率が優れている。

Earth Mover's Distance (EMD) is an important similarity measure between two distributions, used in computer vision and many other application domains. However, its exact calculation is computationally and memory intensive, which hinders its scalability and applicability for large-scale problems. Various approximate EMD algorithms have been proposed to reduce computational costs, but they suffer lower accuracy and may require additional memory usage or manual parameter tuning. In this paper, we present a novel approach, NNS-EMD, to approximate EMD using Nearest Neighbor Search (NNS), in order to achieve high accuracy, low time complexity, and high memory efficiency. The NNS operation reduces the number of data points compared in each NNS iteration and offers opportunities for parallel processing. We further accelerate NNS-EMD via vectorization on GPU, which is especially beneficial for large datasets. We compare NNS-EMD with both the exact EMD and state-of-the-art approximate EMD algorithms on image classification and retrieval tasks. We also apply NNS-EMD to calculate transport mapping and realize color transfer between images. NNS-EMD can be 44x to 135x faster than the exact EMD implementation, and achieves superior accuracy, speedup, and memory efficiency over existing approximate EMD methods.

翻訳日:2024-01-17 18:32:12 公開日:2024-01-14

# 道路網のトポロジ的クレデンシャルに基づく指向性構成のためのデータ駆動型レジリエンスフレームワーク

A Data-driven Resilience Framework of Directionality Configuration based on Topological Credentials in Road Networks ( http://arxiv.org/abs/2401.07371v1 )

ライセンス: Link先を確認

H M Imran Kays, Khondhaker Al Momin, K.K. "Muralee" Muraleetharan, Arif Mohaimin Sadri

(参考訳) 道路再設定は交通流の向上,渋滞の低減,既存のインフラや資源による道路網全体の性能向上を目的とした交通計画の重要な側面である。本稿では,最適化に基づくBrute Force検索手法と意思決定支援フレームワークを統合して,道路構成のランク付けを行い,性能向上を図る。提案フレームワークは、最適化プロセス中に生成されたシナリオからの入力を組み合わせたマルチ基準決定分析(MCDA)アプローチを取り入れている。最適化からのデータを利用することで,システム走行時間(stt)と全リンクトラヒックフロー(tltf)を最も影響力のある決定変数として識別する。開発したフレームワークはグラフ理論を利用して交通ネットワークのトポロジをモデル化し,ネットワーク科学のメトリクスと確率的ユーザ均衡トラフィック割り当てを適用し,各道路構成が全体のネットワーク性能に与える影響を評価する。道路構成のランク付けには、リッジ回帰などの機械学習アルゴリズムを使用し、各基準(TBC、STT、TLTF)の最適な重みを決定する。さらに、ネットワークベース分析により、選択された構成が個々の道路セグメントを最適化するだけでなく、システムレベルの効率を向上させることが保証される。マルチ基準の意思決定分析、機械学習、ネットワークサイエンスメトリクスを統合することで、提案されたフレームワークは、交通計画者が情報とデータ駆動による意思決定を可能にし、より持続可能な、効率的でレジリエントな道路構成を実現する。

Roadway reconfiguration is a crucial aspect of transportation planning, aiming to enhance traffic flow, reduce congestion, and improve overall road network performance with existing infrastructure and resources. This paper presents a novel roadway reconfiguration technique by integrating optimization based Brute Force search approach and decision support framework to rank various roadway configurations for better performance. The proposed framework incorporates a multi-criteria decision analysis (MCDA) approach, combining input from generated scenarios during the optimization process. By utilizing data from optimization, the model identifies total betweenness centrality (TBC), system travel time (STT), and total link traffic flow (TLTF) as the most influential decision variables. The developed framework leverages graph theory to model the transportation network topology and apply network science metrics as well as stochastic user equilibrium traffic assignment to assess the impact of each roadway configuration on the overall network performance. To rank the roadway configurations, the framework employs machine learning algorithms, such as ridge regression, to determine the optimal weights for each criterion (i.e., TBC, STT, TLTF). Moreover, the network-based analysis ensures that the selected configurations not only optimize individual roadway segments but also enhance system-level efficiency, which is particularly helpful as the increasing frequency and intensity of natural disasters and other disruptive events underscore the critical need for resilient transportation networks. By integrating multi-criteria decision analysis, machine learning, and network science metrics, the proposed framework would enable transportation planners to make informed and data-driven decisions, leading to more sustainable, efficient, and resilient roadway configurations.

翻訳日:2024-01-17 18:31:50 公開日:2024-01-14

# GAN系列を用いた歩行者検出のための合成画像の生成

Generation of Synthetic Images for Pedestrian Detection Using a Sequence of GANs ( http://arxiv.org/abs/2401.07370v1 )

ライセンス: Link先を確認

Viktor Seib and Malte Roosen and Ida Germann and Stefan Wirtz and Dietrich Paulus

(参考訳) 注釈付きデータセットを作成するには、かなりの量の手作業が必要です。本稿では,概念実証において,新しい画像生成パイプラインを提案することでこの問題に対処する。パイプラインは3つの異なる生成的敵ネットワーク(以前は公開されていた)で構成されており、歩行者検出のためのデータセットを増強する新しい方法で結合されている。生成した画像が必ずしも人間の目にとって視覚的に快適であるとは限らないにもかかわらず、我々の検出ベンチマークは結果がベースラインをはるかに上回っていることを明らかにしている。提案された概念実証作業は2020年に行われ、現在は3年間の維持期間を経て技術報告として公開されている。

Creating annotated datasets demands a substantial amount of manual effort. In this proof-of-concept work, we address this issue by proposing a novel image generation pipeline. The pipeline consists of three distinct generative adversarial networks (previously published), combined in a novel way to augment a dataset for pedestrian detection. Despite the fact that the generated images are not always visually pleasant to the human eye, our detection benchmark reveals that the results substantially surpass the baseline. The presented proof-of-concept work was done in 2020 and is now published as a technical report after a three years retention period.

翻訳日:2024-01-17 18:31:20 公開日:2024-01-14

# CoVO-MPC:サンプリングベースMPCの理論解析と最適共分散設計

CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design ( http://arxiv.org/abs/2401.07369v1 )

ライセンス: Link先を確認

Zeji Yi, Chaoyi Pan, Guanqi He, Guannan Qu, Guanya Shi

(参考訳) サンプリングベースのモデル予測制御(MPC)は、その柔軟性と並列化性により、モデルベースの強化学習など、多くの領域において実用的で効果的なアプローチである。その魅力的な経験的性能にもかかわらず、特に収束解析とハイパーパラメータチューニングの観点からの理論的理解はいまだ欠落している。本稿では,広く使用されているサンプリングベースMPC法であるモデル予測パス積分制御(MPPI)の収束特性を特徴付ける。時間変動LQRシステムをカバーする2次最適化では,MPPIは少なくとも線形収束率を満足することを示す。さらに、より一般的な非線形システムにも拡張します。我々の理論解析は, サンプリングに基づく新しいMPCアルゴリズム, CoVo-MPC (CoVariance-Optimal MPC) に直接導出し, サンプリング共分散を最適にスケジュールし, 収束率を最適化する。実証的には、CoVo-MPCは標準的なMPPIよりも43～54%優れています。ビデオと付録は \url{https://lecar-lab.github.io/covo-mpc/} で入手できる。

Sampling-based Model Predictive Control (MPC) has been a practical and effective approach in many domains, notably model-based reinforcement learning, thanks to its flexibility and parallelizability. Despite its appealing empirical performance, the theoretical understanding, particularly in terms of convergence analysis and hyperparameter tuning, remains absent. In this paper, we characterize the convergence property of a widely used sampling-based MPC method, Model Predictive Path Integral Control (MPPI). We show that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems. We then extend to more general nonlinear systems. Our theoretical analysis directly leads to a novel sampling-based MPC algorithm, CoVariance-Optimal MPC (CoVo-MPC) that optimally schedules the sampling covariance to optimize the convergence rate. Empirically, CoVo-MPC significantly outperforms standard MPPI by 43-54% in both simulations and real-world quadrotor agile control tasks. Videos and Appendices are available at \url{https://lecar-lab.github.io/CoVO-MPC/}.

翻訳日:2024-01-17 18:31:11 公開日:2024-01-14

# 大規模言語モデルを用いたnlpのアクティブラーニング

Active Learning for NLP with Large Language Models ( http://arxiv.org/abs/2401.07367v1 )

ライセンス: Link先を確認

Xuesong Wang

(参考訳) トレーニングサンプルの人間のアノテーションは高価で、退屈で、特に自然言語処理(NLP)タスクでは困難である。ラベリングコストを削減し、サンプル効率を高めるために、アクティブラーニング(AL)技術は、できるだけ少数のサンプルをラベル付けして、合理的または同様の結果に達することができる。さらにコストを削減し、LLM(Large Language Models)の大幅な進歩により、LLMはサンプルを注釈付けするのによい候補となる。 llms (gpt-3.5 と gpt-4) を用いて3つの異なるデータセットにサンプルをラベル付けする精度とコストについて検討した。 AL設定のサンプルに人為的アノテーションを使用できるように,不正確なラベル付きサンプルを選択するための一貫性ベースの戦略を提案し,これを混合アノテーション戦略と呼ぶ。次に,(1)ヒューマンアノテーションのみを使用する,(2)提案する混合アノテーション戦略を使用する、という2つの異なる設定でalの性能をテストする。 3つのALクエリ戦略の下でのALモデルの精度は、3つのテキスト分類データセット、すなわちAGのニュース、TREC-6、Rotten Tomatoesで報告される。 AGのNewsとRotten Tomatoesでは、混合アノテーション戦略でトレーニングされたモデルは、人間のアノテーションと同様またはより良い結果が得られる。この手法は、アクティブな学習環境における精度とコスト効率の観点から、アノテータとしてのLLMの大きな可能性を明らかにする。

Human annotation of training samples is expensive, laborious, and sometimes challenging, especially for Natural Language Processing (NLP) tasks. To reduce the labeling cost and enhance the sample efficiency, Active Learning (AL) technique can be used to label as few samples as possible to reach a reasonable or similar results. To reduce even more costs and with the significant advances of Large Language Models (LLMs), LLMs can be a good candidate to annotate samples. This work investigates the accuracy and cost of using LLMs (GPT-3.5 and GPT-4) to label samples on 3 different datasets. A consistency-based strategy is proposed to select samples that are potentially incorrectly labeled so that human annotations can be used for those samples in AL settings, and we call it mixed annotation strategy. Then we test performance of AL under two different settings: (1) using human annotations only; (2) using the proposed mixed annotation strategy. The accuracy of AL models under 3 AL query strategies are reported on 3 text classification datasets, i.e., AG's News, TREC-6, and Rotten Tomatoes. On AG's News and Rotten Tomatoes, the models trained with the mixed annotation strategy achieves similar or better results compared to that with human annotations. The method reveals great potentials of LLMs as annotators in terms of accuracy and cost efficiency in active learning settings.

翻訳日:2024-01-17 18:30:50 公開日:2024-01-14

# インコンテキスト演算子のPDE一般化:1次元スカラー非線形保存則に関する研究

PDE Generalization of In-Context Operator Networks: A Study on 1D Scalar Nonlinear Conservation Laws ( http://arxiv.org/abs/2401.07364v1 )

ライセンス: Link先を確認

Liu Yang, Stanley J. Osher

(参考訳) 幅広いPDE関連科学学習タスクのための単一大規模モデルを構築することができるか? このモデルは、微調整なしで新しい形式であっても新しいPDEに一般化できるだろうか? In-context operator learningとそれに対応するモデル In-Context Operator Networks (ICON) [1] はこれらの質問の最初の探索を表す。最初の質問に関するICONの能力は[1]で実証されている。本稿では,時間発展を持つpdesの族であるイコンの保存則に対する一般化能力について検討し,第2の質問について考察する。第二の質問に対する正の答え、すなわち ICON は、微調整なしで新しい形式を持つ PDE に対してうまく一般化できることを示す。また,関数や方程式をICONの機能範囲に変換することで,ICONが対処できる問題の範囲を広げる方法について述べる。本論文の進展は,PDE関連タスクの基礎モデルを,コンテキスト内演算子学習フレームワークの下で学習するための重要なステップであると考えている。

Can we build a single large model for a wide range of PDE-related scientific learning tasks? Can this model generalize to new PDEs, even of new forms, without any fine-tuning? In-context operator learning and the corresponding model In-Context Operator Networks (ICON) [1] represent an initial exploration of these questions. The capability of ICON regarding the first question has been demonstrated in [1]. In this paper, we explore the second question by investigating the generalization capabilities of ICON for conservation laws, a family of PDEs with temporal evolution. We show the positive answer to the second question, i.e., ICON can generalize well to some PDEs with new forms without any fine-tuning. We also show how to broaden the range of problems that ICON can address, by transforming functions and equations to ICON's capability scope. We believe that the progress in this paper is a significant step towards the goal of training a foundation model for PDE-related tasks under the in-context operator learning framework.

翻訳日:2024-01-17 18:30:27 公開日:2024-01-14

# PersonalityChat: Facts and Traitsを用いたパーソナライズダイアログモデリングのための会話蒸留

PersonalityChat: Conversation Distillation for Personalized Dialog Modeling with Facts and Traits ( http://arxiv.org/abs/2401.07363v1 )

ライセンス: Link先を確認

Ehsan Lotfi, Maxime De Bruyn, Jeska Buhmann, Walter Daelemans

(参考訳) 新しいLarge Language Models(LLM)は、大きな会話データセットをキュレートする効率的なツールを提供する。これまでの研究は主にタスク指向またはジェネリックなopen-domainダイアログにフォーカスしており、複雑なプロンプトに従うllmの機能を完全には検討していない。本研究では,パーソナライゼーションに重点を置き,クラウドソースにとって困難かつコストのかかるデータセットのキュレーションにllmを用いる。パーソナラチャットは,一般的なペルソナチャットデータセットに基づく合成会話データセットだが,ペルソナと(ビッグ5)パーソナリティ特性の両方を条件とする。このデータセットに基づいて微調整されたモデルを評価することで、パーソナリティ特性ラベルが生成対話モデルの特性に基づくパーソナライズに利用できることを示す。また,パーソナリティチャットとペルソナチャットを頭対頭で比較し,蒸留データセットのトレーニングにより,小モデル環境においてより流動的でコヒーレントな対話エージェントが得られることを示す。

The new wave of Large Language Models (LLM) has offered an efficient tool to curate sizeable conversational datasets. So far studies have mainly focused on task-oriented or generic open-domain dialogs, and have not fully explored the ability of LLMs in following complicated prompts. In this work, we focus on personalization, and employ LLMs to curate a dataset which is difficult and costly to crowd-source: PersonalityChat is a synthetic conversational dataset based upon the popular PersonaChat dataset, but conditioned on both personas and (Big-5) personality traits. Evaluating models fine-tuned on this dataset, we show that the personality trait labels can be used for trait-based personalization of generative dialogue models. We also perform a head-to-head comparison between PersonalityChat and PersonaChat, and show that training on the distilled dataset results in more fluent and coherent dialog agents in the small-model regime.

翻訳日:2024-01-17 18:30:12 公開日:2024-01-14

# 液晶からの絡み合った光子:波長可変量子光源の新しいパラダイム

Entangled photons from liquid crystals: a new paradigm of tunable quantum light sources ( http://arxiv.org/abs/2401.07362v1 )

ライセンス: Link先を確認

Vitaliy Sultanov, Alja\v{z} Kav\v{c}i\v{c}, Manolis Kokkinakis, Nerea Sebasti\'an, Natan Osterman, Maria V. Chekhova, and Matja\v{z} Humar

(参考訳) 液晶が複雑な構造に自己集合する能力、電界に対する強い応答、複雑な光学系への積分性、そして最近は相当な2階の光学非線形性により、様々な線形および非線形光学デバイスの基礎となっている。しかし、光の量子状態の源としての利用は、これまで研究されていない。本稿では、強誘電性ネマティック液晶における自発的パラメトリックダウンコンバージョンに基づく、絡み合った光子の効率的な電場可変広帯域源を示す。光子対の放出速度と偏光状態は、サンプルに沿って数ボルトを印加するか分子配向をねじり、ほぼどんな偏光状態も発生させることで劇的に変化させることができる。ここで開発された概念は、複雑な位相構造や量子光を生成するマルチピクセルデバイスにまで拡張することができる。

Due to the ability of liquid crystals to self-assemble into complex structures, their strong response to the electric field, integrability into complex optical systems, and recently also considerable second-order optical nonlinearity, they are a base for various linear and nonlinear optical devices. However, their use as sources of quantum states of light has not been explored so far. Here, we demonstrate an efficient electric-field tunable broadband source of entangled photons based on spontaneous parametric down-conversion in a ferroelectric nematic liquid crystal. The emission rate and the polarization state of the photon pairs can be drastically altered by either applying a few volts or twisting the molecular orientation along the sample, enabling the generation of almost any polarization state. The concepts developed here could be extended to complex topological structures and multi-pixel devices generating quantum light.

翻訳日:2024-01-17 18:29:53 公開日:2024-01-14

PDF登録状況（公開日: 20240114）