Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20231222となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 機械的相対論者におけるスピンの解釈 Une interpretation du spin en mecanique relativiste ( http://arxiv.org/abs/2406.15353v1 ) ライセンス: Link先を確認	Stefan Catheline,	(参考訳) 本論文は、スピンを再び研究することを目的としている。したがって、出発点は、量子力学のフレームにおいてのみコヒーレントな方法で記述できるシュテルンとゲルラッハの実験結果である。代わりに、剛体回転に関する前回の記事に続いて、相対論的力学的な視点が提案されている。実際、この相対論的剛体回転の地平線に関する慎重な研究は、スピン特性と完全に一致した任意の観測角度から不変であるように見える。 This paper aims at studying the spin once again. The departure point is thus the Stern and Gerlach experimental results that can be described in a coherent way in the frame of quantum mechanics only. Instead, the relativistic mechanics point of view is proposed here following the work presented in a previous article about rigid body rotation. Indeed, a careful study of the horizon of this relativistic rigid body rotation appears to be invariant from any observation angle in full agreement with the spin property.	翻訳日:2024-07-01 07:21:03 公開日:2023-12-22
# メタヒューリスティックスを用いたニューラルネットワークを用いた炭素繊維強化ポリマーのコンクリートの強度に及ぼす閉じ込め効果予測 Predicting Confinement Effect of Carbon Fiber Reinforced Polymers on Strength of Concrete using Metaheuristics-based Artificial Neural Networks ( http://arxiv.org/abs/2403.13809v1 ) ライセンス: Link先を確認	Sarmed Wahab, Mohamed Suleiman, Faisal Shabbir, Nasim Shakouri Mahmoudabadi, Sarmad Waqas, Nouman Herl, Afaq Ahmad,	(参考訳) 本稿では, メタヒューリスティックスに基づく人工ニューラルネットワークを用いた炭素繊維強化ポリマー(CFRP)のコンクリートシリンダー強度に対する閉じ込め効果の予測について述べる。 708CFRP拘束コンクリートシリンダーの詳細なデータベースを作成し, シリンダーの直径 (d) および高さ (h) などの幾何学的パラメータ, コンクリート(fco'), 厚み (nt), CFRP(Ef), 弾性率 (Ef), コンクリートひずみ拘束コンクリートひずみ, コンクリートfcc'の究極圧縮強度を含む8つのパラメータに関する情報を得た。粒子群最適化(PSO)、グレーオオカミ最適化(GWO)、バットアルゴリズム(BA)の3つのメタヒューリスティックモデルが実装されている。これらのアルゴリズムは平均二乗誤差の客観的関数を用いてデータに基づいて訓練され、その予測結果は実験と有限要素解析に対して検証される。 PSOのハイブリッドモデルでは、CFRP充填コンクリートシリンダーの強度を99.13%、GWOは98.17%と予測した。軸圧縮強度予測の精度は、これらの予測モデルが実験手法の信頼性の高い解であることを示した。予測モデルは、特に、プロセスが迅速かつ経済的になるような、フルスケールの時間を要する実験テストを避けるのに適している。 This article deals with the study of predicting the confinement effect of carbon fiber reinforced polymers (CFRPs) on concrete cylinder strength using metaheuristics-based artificial neural networks. A detailed database of 708 CFRP confined concrete cylinders is developed from previously published research with information on 8 parameters including geometrical parameters like the diameter (d) and height (h) of a cylinder, unconfined compressive strength of concrete (fco'), thickness (nt), the elastic modulus of CFRP (Ef), unconfined concrete strain confined concrete strain and the ultimate compressive strength of confined concrete fcc'. Three metaheuristic models are implemented including particle swarm optimization (PSO), grey wolf optimizer (GWO), and bat algorithm (BA). These algorithms are trained on the data using an objective function of mean square error and their predicted results are validated against the experimental studies and finite element analysis. The study shows that the hybrid model of PSO predicted the strength of CFRP-confined concrete cylinders with maximum accuracy of 99.13% and GWO predicted the results with an accuracy of 98.17%. The high accuracy of axial compressive strength predictions demonstrated that these prediction models are a reliable solution to the empirical methods. The prediction models are especially suitable for avoiding full-scale time-consuming experimental tests that make the process quick and economical.	翻訳日:2024-03-25 07:17:26 公開日:2023-12-22
# Google Tag Manager: EUデータ保護法に基づくデータ漏洩とその潜在的な違反 Google Tag Manager: Hidden Data Leaks and its Potential Violations under EU Data Protection Law ( http://arxiv.org/abs/2312.08806v2 ) ライセンス: Link先を確認	Gilles Mertens, Nataliia Bielova, Vincent Roca, Cristiana Santos, Michael Toth,	(参考訳) タグ管理システムは、ウェブサイトのパブリッシャーが複数のサードパーティのJavaScriptスクリプト(タグ)をウェブサイトにインストールするのをサポートするために開発された。 2012年、GoogleはGoogle Tag Manager(GTM)という独自のTMSを開発した。 2020年、新しい"Server-side" GTMが導入され、パブリッシャはTagを直接サーバに組み込めるようになった。しかしながら、GTMのどちらのバージョンも学術研究コミュニティによって徹底的に評価されていない。本稿では,Google Tag Management (GTM) アーキテクチャの2つのバージョンである Client- and Server-side GTM について検討する。 78のクライアントサイドタグ,8つのサーバサイドタグ,2つのConsent Management Platform (CMP) を内部から分析することにより,複数の隠れデータリーク,GTMパーミッションシステムをパスしてスクリプトを注入するタグ,デフォルトで有効となる同意などを検出する。我々は法律の専門家とともに、GTMとそのアクターの詳細な法的分析を行い、潜在的な法的違反とその責任を特定する。我々は,法的コンプライアンスを容易にするため,GTMの勧告と多数の改善を提案する。 Tag Management Systems were developed in order to support website publishers in installing multiple third-party JavaScript scripts (Tags) on their websites. In 2012, Google developed its own TMS called "Google Tag Manager" (GTM) that is currently present on 28 million live websites. In 2020, a new "Server-side" GTM was introduced, allowing publishers to include Tags directly on the server. However, neither version of GTM has yet been thoroughly evaluated by the academic research community. In this work, we study, for the first time, the two versions of the Google Tag Management (GTM) architectures: Client- and Server-side GTM. By analyzing these systems with 78 Client-side Tags, 8 Server-side Tags and two Consent Management Platforms (CMPs) from the inside, we discover multiple hidden data leaks, Tags bypassing GTM permission system to inject scripts, and consent enabled by default. With a legal expert, we perform an in-depth legal analysis of GTM and its actors to identify potential legal violations and their liabilities. We provide recommendations and propose numerous improvements for GTM to facilitate legal compliance.	翻訳日:2024-03-18 12:17:07 公開日:2023-12-22
# 再生可能なERC-20RトークンのRプールと決済市場 R-Pool and Settlement Markets for Recoverable ERC-20R Tokens ( http://arxiv.org/abs/2312.14375v1 ) ライセンス: Link先を確認	Kaili Wang, Qinchen Wang, Calvin Cai, Dan Boneh,	(参考訳) ERC-20RはERC-20を取り巻くラッパーで、資産が移管された後、限られた時間枠内で資産回復をサポートする。ブロックチェーンの盗難と損失を減らすために、被害者がリカバリウィンドウで盗まれた資産や失われた資産を回収できるようにする。誠実な受信者がERC-20Rの資産を受け取った場合、回収ウィンドウが終了するまで(例えば24時間)待たなければならない。多くのDeFiサービスは、通常の運用に干渉できるため、未解決の回収可能な資産を受け入れることを拒否する可能性が高い、と我々は主張する。そのため、アリスはERC-20Rトークンを受け取ったとき、DeFiサービスで使えるようになるまで24時間待たなければならない。しかし、もしAliceが、すぐに使える未開のERC-20トークンと、包んだトークンを交換するために、お金を払ってくれるとしたらどうだろう? 本稿では,同じ資産のベースとなるERC-20に対して,未設定のERC-20R資産を交換するためのプールの設計方法について検討する。このようなプールを設計することは、いくつかの難しい疑問を提起し、解決策を提示します。 ERC-20R is a wrapper around ERC-20 that supports asset recovery within a limited time window after an asset is transferred. It is designed to reduce theft and losses on the blockchain by allowing a victim to recover their stolen or lost assets during the recovery window. When an honest recipient receives an ERC-20R asset, they must wait until the recovery windows elapses (say, 24 hours), before they can unwrap the asset back to its base ERC-20 form. We argue that many DeFi services will likely refuse to accept unsettled recoverable assets because they can interfere with their normal operations. Consequently, when Alice receives an ERC-20R token, she must wait 24 hours before she can use it with a DeFi service. But what if Alice is willing to pay a fee to exchange the wrapped token for an unwrapped ERC-20 token that can be used right away? In this paper we explore how to design a pool to exchange an unsettled ERC-20R asset for a base ERC-20 of the same asset. Designing such a pool raises several challenging questions and we present our solutions.	翻訳日:2024-03-18 11:28:19 公開日:2023-12-22
# 検索可能な暗号化機能の検討と同型暗号化の評価 A Review on Searchable Encryption Functionality and the Evaluation of Homomorphic Encryption ( http://arxiv.org/abs/2312.14434v1 ) ライセンス: Link先を確認	Brian Kishiyama, Izzat Alsmadi,	(参考訳) Google Cloud Platform、Microsoft Azure、Amazon Web Servicesなどのクラウドサービスプロバイダは、継続的に進化するクラウドサービスを提供する。それは成長する産業です。 NetflixやPayPalのような企業は、データストレージ、コンピューティングパワー、その他のサービスにCloudを頼っている。企業にとって、クラウドはコストを削減し、柔軟性を提供し、成長を可能にする。しかし、クラウドにはセキュリティとプライバシに関する懸念がある。クラウドサービスはインターネットを通じてアクセスされるので、ハッカーや攻撃者はどこからでもサーバーにアクセスすることができる。クラウド内のデータを保護するためには、アップロード前に暗号化されるべきであり、ストレージやトランジットでも保護されるべきである。一方、データ所有者は暗号化されたデータにアクセスする必要があるかもしれない。また、変更、更新、削除、読み込み、検索、共有も必要になる。データがクラウドで復号化されると、機密データが露出し、公開され、誤使用される可能性がある。 1つの解決策は、データを暗号化形式で残し、暗号化されたデータを操作する検索可能暗号化(SE)を使用することである。 SEの機能は、開始以来改善され、研究は、SEを改善する方法を模索し続けている。本稿は、2019年から2023年までのクラウドサービスに関連するサーチブル暗号化の機能についてレビューし、そのスキームの1つであるFully Homomorphic Encryptionを評価する。全体としては、複数の機能が集約され、テストされるにつれて、SE効率が向上する段階にあるように思われる。 Cloud Service Providers, such as Google Cloud Platform, Microsoft Azure, or Amazon Web Services, offer continuously evolving cloud services. It is a growing industry. Businesses, such as Netflix and PayPal, rely on the Cloud for data storage, computing power, and other services. For businesses, the cloud reduces costs, provides flexibility, and allows for growth. However, there are security and privacy concerns regarding the Cloud. Because Cloud services are accessed through the internet, hackers and attackers could possibly access the servers from anywhere. To protect data in the Cloud, it should be encrypted before it is uploaded, it should be protected in storage and also in transit. On the other hand, data owners may need to access their encrypted data. It may also need to be altered, updated, deleted, read, searched, or shared with others. If data is decrypted in the Cloud, sensitive data is exposed and could be exposed and misused. One solution is to leave the data in its encrypted form and use Searchable Encryption (SE) which operates on encrypted data. The functionality of SE has improved since its inception and research continues to explore ways to improve SE. This paper reviews the functionality of Searchable Encryption, mostly related to Cloud services, in the years 2019 to 2023, and evaluates one of its schemes, Fully Homomorphic Encryption. Overall, it seems that research is at the point where SE efficiency is increased as multiple functionalities are aggregated and tested.	翻訳日:2024-03-18 11:28:19 公開日:2023-12-22
# コンカレンシーランドスケープのナビゲーション:レースコンディショナビリティ検出装置の調査 Navigating the Concurrency Landscape: A Survey of Race Condition Vulnerability Detectors ( http://arxiv.org/abs/2312.14479v1 ) ライセンス: Link先を確認	Aishwarya Upadhyay, Vijay Laxmi, Smita Naval,	(参考訳) 技術が進歩し続け、産業5.0の時代には、オペレーティングシステム、ファイルシステム、Web、ネットワークアプリケーションに大きなパラダイムシフトがありました。従来のマルチプロセッシングとマルチコアシステムの利用により、並列プログラミングはますます広まりつつある。しかし、このトランスフォーメーションは、並行プログラムが広く普及しているため、重大な障害と潜在的なセキュリティエクスプロイトに繋がった、並行バグとして知られる新しい一連の問題を引き起こした。過去20年間、多くの研究者がこれらのバグの公表、発見、緩和、予防に力を注いできた。並行性バグのスペクトルの中で、データレースや競合状態の脆弱性が最も多く、すべての並行性バグの80%が停滞している。本研究は,レースコンディションバグ検出の領域に焦点をあてる。我々はこれらの検出器を,それらが採用する多様な手法に基づいて系統的に分類する。さらに、レース検出に関連する技術やアルゴリズムを探索し、時間とともにこのフィールドの進化をトレースします。さらに,レースコンディションの脆弱性の検出にファジング技術を適用した。これらの検出器とその静的解析をレビューすることにより、競合状態の脆弱性検出における精度、性能、適用性、包括性などの今後の研究の方向性を概説する。 As technology continues to advance and we usher in the era of Industry 5.0, there has been a profound paradigm shift in operating systems, file systems, web, and network applications. The conventional utilization of multiprocessing and multicore systems has made concurrent programming increasingly pervasive. However, this transformation has brought about a new set of issues known as concurrency bugs, which, due to their wide prevalence in concurrent programs, have led to severe failures and potential security exploits. Over the past two decades, numerous researchers have dedicated their efforts to unveiling, detecting, mitigating, and preventing these bugs, with the last decade witnessing a surge in research within this domain. Among the spectrum of concurrency bugs, data races or race condition vulnerabilities stand out as the most prevalent, accounting for a staggering 80\% of all concurrency bugs. This survey paper is focused on the realm of race condition bug detectors. We systematically categorize these detectors based on the diverse methodologies they employ. Additionally, we delve into the techniques and algorithms associated with race detection, tracing the evolution of this field over time. Furthermore, we shed light on the application of fuzzing techniques in the detection of race condition vulnerabilities. By reviewing these detectors and their static analyses, we draw conclusions and outline potential future research directions, including enhancing accuracy, performance, applicability, and comprehensiveness in race condition vulnerability detection.	翻訳日:2024-03-18 11:28:18 公開日:2023-12-22
# 日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀 Concurrent Asynchronous Byzantine Agreement in Expected-Constant Rounds, Revisited ( http://arxiv.org/abs/2312.14506v1 ) ライセンス: Link先を確認	Ran Cohen, Pouyan Forghani, Juan Garay, Rutvik Patel, Vassilis Zikas,	(参考訳) ランダム化なしでは、Byzantine agreement (BA) は同期設定では直線的なラウンド数を必要とするが、非同期設定では不可能である。上記の制限を回避できるプリミティブは、oblivious Common coin (OCC) として知られている。ランダムなコインに一定の確率で合意できるが、これは合意が不可能である場合、つまり、プレイヤーは合意が達成されたかどうかを知らない。私たちの研究の出発点は、非同期環境で最適なレジリエンス(最終的なメッセージ配信を伴う)を持つ情報理論多値OCCには、既知のプロトコルが存在しないことです。文献のこの明らかな穴は特に問題であり、多値OCCはいくつかの構成で暗黙的または明示的に使用される。本稿では,最適なレジリエンス,すなわち$t < n/3$の汚職を許容し,この重要なギャップを埋める非同期設定において,最初の情報理論多値OCCプロトコルを提案する。さらに,本プロトコルは,よりシンプルで同期的な設定において,既知の構成では達成できない特性である指数的サイズのドメインでOCCを効率的に実装する。次に、非同期BAの並列合成を丸保存する問題に目を向ける。このタスクのプロトコルはBen-OrとEl-Yaniv [Distributed Computing '03]によって提案されました。しかし、その構造はいくつかの点で欠陥がある。したがって、第2のコントリビューションとして、上記のタスクに対してよりシンプルでモジュール化されたプロトコルを提供しています。 BAはセキュアなマルチパーティ計算プロトコルのコアビルディングブロックであるため、コンポーザビリティの保証を提供する最初のフレームワークになります。 It is well known that without randomization, Byzantine agreement (BA) requires a linear number of rounds in the synchronous setting, while it is flat out impossible in the asynchronous setting. The primitive which allows to bypass the above limitation is known as oblivious common coin (OCC). It allows parties to agree with constant probability on a random coin, where agreement is oblivious, i.e., players are not aware whether or not agreement has been achieved. The starting point of our work is the observation that no known protocol exists for information-theoretic multi-valued OCC with optimal resiliency in the asynchronous setting (with eventual message delivery). This apparent hole in the literature is particularly problematic, as multi-valued OCC is implicitly or explicitly used in several constructions. In this paper, we present the first information-theoretic multi-valued OCC protocol in the asynchronous setting with optimal resiliency, i.e., tolerating $t < n/3$ corruptions, thereby filling this important gap. Further, our protocol efficiently implements OCC with an exponential-size domain, a property which is not even achieved by known constructions in the simpler, synchronous setting. We then turn to the problem of round-preserving parallel composition of asynchronous BA. A protocol for this task was proposed by Ben-Or and El-Yaniv [Distributed Computing '03]. Their construction, however, is flawed in several ways. Thus, as a second contribution, we provide a simpler, more modular protocol for the above task. Finally, and as a contribution of independent interest, we provide proofs in Canetti's Universal Composability framework; this makes our work the first one offering composability guarantees, which are important as BA is a core building block of secure multi-party computation protocols.	翻訳日:2024-03-18 11:28:18 公開日:2023-12-22
# 移動中のサイバーセキュリティ : CAVの今後の試験施設の課題と要件 Cybersecurity in Motion: A Survey of Challenges and Requirements for Future Test Facilities of CAVs ( http://arxiv.org/abs/2312.14687v1 ) ライセンス: Link先を確認	Ioannis Mavromatis, Theodoros Spyridopoulos, Pietro Carnelli, Woon Hau Chin, Ahmed Khalil, Jennifer Chakravarty, Lucia Cipolina Kun, Robert J. Piechocki, Colin Robbins, Daniel Cunnington, Leigh Chase, Lamogha Chiazor, Chris Preston, Rahul, Aftab Khan,	(参考訳) 旅行のやり方は急速に変化しており、C-ITS(Cooperative Intelligent Transportation Systems)がこの進化の最前線にいる。しかし、C-ITSの採用は新たなリスクと課題をもたらし、サイバーセキュリティを安全性と信頼性を確保するための最優先事項にしている。この前提に基づいて,C-ITSのサイバーセキュリティの研究,試験,評価を促進するために設計されたCSCE(Cybersecurity Centre of Excellence)を提案する。我々は,CSCEの試験施設の設計,機能,課題について検討し,技術,セキュリティ,社会的要求の概要を述べる。本研究は, 今後のC-ITSに適応する柔軟性を強調し, 潜在的な脅威の検出・緩和におけるこれらのシステムの有効性について, 徹底的な調査・分析を通じて評価する。最後に、C-ITSのサイバーセキュリティに関するさらなる研究を動機付けることを目的として、様々なC-ITSドメインにおける現在の未解決課題を特定した。 The way we travel is changing rapidly, and Cooperative Intelligent Transportation Systems (C-ITSs) are at the forefront of this evolution. However, the adoption of C-ITSs introduces new risks and challenges, making cybersecurity a top priority for ensuring safety and reliability. Building on this premise, this paper presents an envisaged Cybersecurity Centre of Excellence (CSCE) designed to bolster research, testing, and evaluation of the cybersecurity of C-ITSs. We explore the design, functionality, and challenges of CSCE's testing facilities, outlining the technological, security, and societal requirements. Through a thorough survey and analysis, we assess the effectiveness of these systems in detecting and mitigating potential threats, highlighting their flexibility to adapt to future C-ITSs. Finally, we identify current unresolved challenges in various C-ITS domains, with the aim of motivating further research into the cybersecurity of C-ITSs.	翻訳日:2024-03-18 11:28:18 公開日:2023-12-22
# コンピュータサイエンスコースにおけるChatGPTの統合 : 学生の知覚と示唆 Integrating ChatGPT in a Computer Science Course: Students Perceptions and Suggestions ( http://arxiv.org/abs/2402.01640v1 ) ライセンス: Link先を確認	Kehinde Aruleba, Ismaila Temitayo Sanusi, George Obaido and Blessing Ogbuokiri	(参考訳) 近年,ChatGPTなどの人工知能ツールの教育システムへの統合が注目されている。本経験報告では,ChatGPTをコンピュータサイエンス科目に統合するための学生の認識と提案について考察する。コード補完と分析を含むChatGPT活動に続いて、7人の学生が詳細なインタビューに参加した。書き起こされたインタビューの結果から、chatgptはプログラミングを含む学習体験を向上させる可能性を示唆している。彼らは、クエリに即座に応答し、パーソナライズされた学習をサポートするツールの能力を強調した。しかし、ChatGPTへの依存度が学生の批判的思考や問題解決スキルに悪影響を及ぼす恐れがある。これらの結果は,コンピュータ科学コースにおけるChatGPTを用いたバランスをとることの重要性を示している。この研究の成果は、AIツールを教育の文脈に組み込むことを探求する教育者、カリキュラムデザイナー、政策立案者に大きな影響を与える。 The integration of artificial intelligence tools such as ChatGPT in the education system has gained attention in recent years. This experience report explores students' perceptions and suggestions for integrating ChatGPT in a computer science course. Following a ChatGPT activity which includes code completion and analysis, seven students participated in in-depth interviews. Findings from the transcribed interviews suggest that ChatGPT has the potential to enhance learning experience including programming. They highlighted the tool's ability to respond immediately to queries and supporting personalised learning. However, they raise concerns that heavy reliance on ChatGPT may adversely affect students' critical thinking and problem-solving skills. These findings show the importance of carefully balancing using ChatGPT in computer science courses. The findings of this research have significant implications for educators, curriculum designers and policymakers as they explore integrating AI tools into educational contexts.	翻訳日:2024-02-11 17:14:27 公開日:2023-12-22
# AI-Artificial Intelligenceのグローバルな影響:最近の進歩と今後の方向性,レビュー The Global Impact of AI-Artificial Intelligence: Recent Advances and Future Directions, A Review ( http://arxiv.org/abs/2401.12223v1 ) ライセンス: Link先を確認	Chandregowda Pachegowda	(参考訳) 人工知能(AI)は、経済、医療、交通など社会の多くの側面を変革する可能性を持つ新興技術である。この記事では、AIのグローバルな影響に関する最近の研究論文を合成し、その潜在的なメリットとリスクを探る。この記事では、経済的、倫理的、社会的、セキュリティとプライバシ、仕事のずれといった、AIの影響を強調している。偏見、セキュリティ、プライバシー侵害などの問題を含む、AI開発に関する倫理的懸念について論じている。 AIの責任ある開発と展開を保証するためには、政府、産業、学界の協力が不可欠である。この記事は、社会全体にAIが及ぼす影響の認識と理解を促進するために、公的なエンゲージメントと教育の重要性を強調して締めくくっている。 Artificial intelligence (AI) is an emerging technology that has the potential to transform many aspects of society, including the economy, healthcare, and transportation. This article synthesizes recent research literature on the global impact of AI, exploring its potential benefits and risks. The article highlights the implications of AI, including its impact on economic, ethical, social, security & privacy, and job displacement aspects. It discusses the ethical concerns surrounding AI development, including issues of bias, security, and privacy violations. To ensure the responsible development and deployment of AI, collaboration between government, industry, and academia is essential. The article concludes by emphasizing the importance of public engagement and education to promote awareness and understanding of AI's impact on society at large.	翻訳日:2024-01-28 15:40:50 公開日:2023-12-22
# 場所別アルゴリズムによるパトロール管理のためのデバイアス手法 A debiasing technique for place-based algorithmic patrol management ( http://arxiv.org/abs/2401.06162v1 ) ライセンス: Link先を確認	Alexander Einarsson (1), Simen Oestmo (2), Lester Wollman (2), Duncan Purves (3), Ryan Jenkins (4) ((1) Northwestern University (2) SoundThinking Inc. (3) University of Florida (4) California Polytechnic State University)	(参考訳) 近年、データ駆動型警察に革命が起こった。これにより、履歴データのバイアスがアルゴリズムによる意思決定にどのように影響するかが調査されるようになった。本稿では,位置対応型アルゴリズムパトロール管理システムのデバイアス化手法を提案する。本手法は, モデルに高い精度を保ちながら, 人種的に偏りのある特徴を効率的に除去することを示す。最後に、この研究が発見した公正性とデータ駆動ポリシングの領域における将来の潜在的な研究の長いリストを提供する。 In recent years, there has been a revolution in data-driven policing. With that has come scrutiny on how bias in historical data affects algorithmic decision making. In this exploratory work, we introduce a debiasing technique for place-based algorithmic patrol management systems. We show that the technique efficiently eliminates racially biased features while retaining high accuracy in the models. Finally, we provide a lengthy list of potential future research in the realm of fairness and data-driven policing which this work uncovered.	翻訳日:2024-01-22 12:51:09 公開日:2023-12-22
# 信頼できる人間中心型自動意思決定システム Trustworthy human-centric based Automated Decision-Making Systems ( http://arxiv.org/abs/2401.06161v1 ) ライセンス: Link先を確認	Marcelino Cabrera and Carlos Cruz and Pavel Novoa-Hern\'andez and David A. Pelta and Jos\'e Luis Verdegay	(参考訳) 自動意思決定システム(ADS: Automated Decision-Making Systems)は、様々な分野、活動、職業に普及し、性能を高めている。しかし、この普及はADSの誤用を含む潜在的なリスクをもたらす。このような誤用は、ADSが不必要である場合や、必須条件、条件、条件が見過ごされている場合に現れ、意図しない結果をもたらす。本研究では, デジタル化, デジタルトランスフォーメーション, ADS の現代社会と将来の文脈における活用に関連する意味, 差別, 倫理的考察について, 徹底的に検討する。 ADSの展開において、規制、透明性、倫理的行動の強制的な要求に重点を置いている。 Automated Decision-Making Systems (ADS) have become pervasive across various fields, activities, and occupations, to enhance performance. However, this widespread adoption introduces potential risks, including the misuse of ADS. Such misuse may manifest when ADS is employed in situations where it is unnecessary or when essential requirements, conditions, and terms are overlooked, leading to unintended consequences. This research paper presents a thorough examination of the implications, distinctions, and ethical considerations associated with digitalization, digital transformation, and the utilization of ADS in contemporary society and future contexts. Emphasis is placed on the imperative need for regulation, transparency, and ethical conduct in the deployment of ADS.	翻訳日:2024-01-22 12:51:03 公開日:2023-12-22
# 未来保護教育:大規模言語モデルを用いた口腔検査シミュレーションのためのプロトタイプ Future-proofing Education: A Prototype for Simulating Oral Examinations Using Large Language Models ( http://arxiv.org/abs/2401.06160v1 ) ライセンス: Link先を確認	Andr\'e Nitze	(参考訳) 本研究は,高等教育における大規模言語モデル(llm)の効果について検討し,プロトタイプを用いた自動口腔検査シミュレーションに焦点をあてた。プロトタイプの設計上の留意点を述べるとともに, 教育者, 学生の中から選択したグループで評価した。技術的および教育的観察について考察する。プロトタイプは、口腔検査のシミュレーション、パーソナライズされたフィードバックの提供、教育者のワークロードの合理化に有効であることが判明した。このプロトタイプの有望な成果は、教育の民主化、多様な学生の参加、教育の質と効率の向上におけるllmの可能性を示している。 This study explores the impact of Large Language Models (LLMs) in higher education, focusing on an automated oral examination simulation using a prototype. The design considerations of the prototype are described, and the system is evaluated with a select group of educators and students. Technical and pedagogical observations are discussed. The prototype proved to be effective in simulating oral exams, providing personalized feedback, and streamlining educators' workloads. The promising results of the prototype show the potential for LLMs in democratizing education, inclusion of diverse student populations, and improvement of teaching quality and efficiency.	翻訳日:2024-01-22 12:50:51 公開日:2023-12-22
# FRED: 空中画像オブジェクト検出における全回転等価性を目指して FRED: Towards a Full Rotation-Equivariance in Aerial Image Object Detection ( http://arxiv.org/abs/2401.06159v1 ) ライセンス: Link先を確認	Chanho Lee, Jinsu Son, Hyounguk Shon, Yunho Jeon, Junmo Kim	(参考訳) 回転同分散は、指向オブジェクト検出において必須だが挑戦的な性質である。一般物体検出器は、従来のCNNの翻訳等価性による空間シフトに対するロバストネスを自然に活用するが、回転等価性を達成することは、依然として解明の目標である。現在の検出器は回転不変の特徴を引き出すために様々なアライメント技術を展開しているが、それでも高容量モデルと重データ拡張に頼っている。本稿では,画像から境界ボックス予測までのプロセス全体が厳密な同値である完全回転同値指向物体検出器(fred)を提案する。具体的には、不変タスク(オブジェクト分類)と同変タスク(オブジェクトローカライゼーション)を分離して、エンドツーエンドの等価性を達成する。境界ボックスを回転同変ベクトルの集合として表現し、回転同変局在化を実装する。さらに,これらの回転同変ベクトルを変形可能な畳み込みのオフセットとして利用し,既存の空間適応の利点を高めた。完全な回転同分散を活用し,既存手法と比較して画像レベルの回転に対して高いロバスト性を示す。さらに,fredは,実験を通じて非軸協調学習に一歩近づいたことを示す。最新の手法と比較して,提案手法はDOTA-v1.0で同等の性能を示し,DOTA-v1.5では1.5mAPで性能が向上し,モデルパラメータは16%まで大幅に減少する。 Rotation-equivariance is an essential yet challenging property in oriented object detection. While general object detectors naturally leverage robustness to spatial shifts due to the translation-equivariance of the conventional CNNs, achieving rotation-equivariance remains an elusive goal. Current detectors deploy various alignment techniques to derive rotation-invariant features, but still rely on high capacity models and heavy data augmentation with all possible rotations. In this paper, we introduce a Fully Rotation-Equivariant Oriented Object Detector (FRED), whose entire process from the image to the bounding box prediction is strictly equivariant. Specifically, we decouple the invariant task (object classification) and the equivariant task (object localization) to achieve end-to-end equivariance. We represent the bounding box as a set of rotation-equivariant vectors to implement rotation-equivariant localization. Moreover, we utilized these rotation-equivariant vectors as offsets in the deformable convolution, thereby enhancing the existing advantages of spatial adaptation. Leveraging full rotation-equivariance, our FRED demonstrates higher robustness to image-level rotation compared to existing methods. Furthermore, we show that FRED is one step closer to non-axis aligned learning through our experiments. Compared to state-of-the-art methods, our proposed method delivers comparable performance on DOTA-v1.0 and outperforms by 1.5 mAP on DOTA-v1.5, all while significantly reducing the model parameters to 16%.	翻訳日:2024-01-22 12:50:41 公開日:2023-12-22
# Voila-A: ユーザの視線を意識した視覚言語モデル Voila-A: Aligning Vision-Language Models with User's Gaze Attention ( http://arxiv.org/abs/2401.09454v1 ) ライセンス: Link先を確認	Kun Yan, Lei Ji, Zeyu Wang, Yuntao Wang, Nan Duan, Shuai Ma	(参考訳) 近年、視覚と言語理解の統合は、人工知能、特にビジョン・ランゲージ・モデル(VLM)を通じて、大きな進歩をもたらした。しかし、既存のvlmは複雑なシーンや複数のオブジェクトで現実世界のアプリケーションを扱うことや、その焦点を人間の様々な注意パターンに合わせることが困難に直面している。本稿では,ar や vr デバイスで収集可能な視線情報について,vlm の人間的注意の指標として紹介するとともに,これらのモデルの現実の応用における解釈性と有効性を高めるために,視線アライメントのための新しいアプローチ voila-a を提案する。まず、数百分間の視線データを収集し、局所的な物語を用いて人間の視線モダリティを模倣できることを実証する。そして、GPT-4を利用して自動データアノテーションパイプラインを設計し、VOILA-COCOデータセットを生成する。さらに,Voila Perceiverモジュールを改良し,事前学習した知識を保ちながら視線情報をVLMに統合する。我々は,視線追跡装置を用いて実生活シナリオをキャプチャするVOILA-GAZEテストセットとホールドアウト検証セットを用いて,Voila-Aを評価する。実験の結果,voila-aはいくつかのベースラインモデルを大きく上回っている。モデルの注意を人間の視線パターンに合わせることで、Voila-Aはより直感的でユーザ中心のVLMを実現すると同時に、幅広いアプリケーションにわたる人間とAIのインタラクションを促進する。 In recent years, the integration of vision and language understanding has led to significant advancements in artificial intelligence, particularly through Vision-Language Models (VLMs). However, existing VLMs face challenges in handling real-world applications with complex scenes and multiple objects, as well as aligning their focus with the diverse attention patterns of human users. In this paper, we introduce gaze information, feasibly collected by AR or VR devices, as a proxy for human attention to guide VLMs and propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications. First, we collect hundreds of minutes of gaze data to demonstrate that we can mimic human gaze modalities using localized narratives. We then design an automatic data annotation pipeline utilizing GPT-4 to generate the VOILA-COCO dataset. Additionally, we innovate the Voila Perceiver modules to integrate gaze information into VLMs while preserving their pretrained knowledge. We evaluate Voila-A using a hold-out validation set and a newly collected VOILA-GAZE Testset, which features real-life scenarios captured with a gaze-tracking device. Our experimental results demonstrate that Voila-A significantly outperforms several baseline models. By aligning model attention with human gaze patterns, Voila-A paves the way for more intuitive, user-centric VLMs and fosters engaging human-AI interaction across a wide range of applications.	翻訳日:2024-01-22 09:29:15 公開日:2023-12-22
# 航空機翼の圧力分布の学習係数に対するリーマン幾何学的特徴の統合 Incorporating Riemannian Geometric Features for Learning Coefficient of Pressure Distributions on Airplane Wings ( http://arxiv.org/abs/2401.09452v1 ) ライセンス: Link先を確認	Liwei Hu, Wenyong Wang, Yu Xiang, Stefan Sommer	(参考訳) 航空機の空力係数は、特に攻撃角度(AoA)が大きい場合、その幾何学によって著しく影響を受ける。空気力学の分野では、伝統的な多項式ベースのパラメータ化は、翼の幾何学を記述するためにできるだけ少数のパラメータを使用する。しかし、翼の3次元幾何学は2次元翼よりも複雑であるため、多項式ベースのパラメータ化は翼全体の形状を正確に表現することが困難である。既存のディープラーニングベースの手法では、2D翼や2D翼の形状に関する巨大な潜在神経表現を抽出することができる。最近の研究では、幾何学的特徴を直接ニューラルネットワークへの入力として取り込むことで、予測された空力係数の精度を向上させることができる。幾何学理論により, 翼面上の圧力係数(CP)分布の学習にリーマン幾何学的特徴を取り入れることを提案する。提案手法は,幾何学的特徴(リーマン計量,接続,曲率)を計算し,さらに幾何学的特徴,座標,飛行条件を深層学習モデルに入力し,CP分布を予測する。実験の結果,最先端のディープ・アテンション・ネットワーク (dan) と比較して, dlr-f11 航空機テストセットの予測平均二乗誤差 (mse) を平均8.41%削減できた。 The aerodynamic coefficients of aircrafts are significantly impacted by its geometry, especially when the angle of attack (AoA) is large. In the field of aerodynamics, traditional polynomial-based parameterization uses as few parameters as possible to describe the geometry of an airfoil. However, because the 3D geometry of a wing is more complicated than the 2D airfoil, polynomial-based parameterizations have difficulty in accurately representing the entire shape of a wing in 3D space. Existing deep learning-based methods can extract massive latent neural representations for the shape of 2D airfoils or 2D slices of wings. Recent studies highlight that directly taking geometric features as inputs to the neural networks can improve the accuracy of predicted aerodynamic coefficients. Motivated by geometry theory, we propose to incorporate Riemannian geometric features for learning Coefficient of Pressure (CP) distributions on wing surfaces. Our method calculates geometric features (Riemannian metric, connection, and curvature) and further inputs the geometric features, coordinates and flight conditions into a deep learning model to predict the CP distribution. Experimental results show that our method, compared to state-of-the-art Deep Attention Network (DAN), reduces the predicted mean square error (MSE) of CP by an average of 8.41% for the DLR-F11 aircraft test set.	翻訳日:2024-01-22 09:28:50 公開日:2023-12-22
# 分子コンフォメーション予測のための拡散駆動生成枠組み Diffusion-Driven Generative Framework for Molecular Conformation Prediction ( http://arxiv.org/abs/2401.09451v1 ) ライセンス: Link先を確認	Bobin Yang, Zhenghan Chen	(参考訳) 二次元グラフ表現から3次元分子配置を推測するタスクは、計算化学の領域と医薬品の開発において重要な意味を持つ。これは分子機構と相互作用の理解に根本的に寄与する。機械学習の急速な進化、特に深層生成ネットワークの領域では、そのような予測モデリングの精度が飛躍的に向上した。従来の方法論では、最初は原子間距離を推定した後、距離幾何学の問題を解くことによって空間分子構造を彫刻する。しかし、この逐次的アプローチは時折、局所原子配列の複雑さを正確に捉えることに失敗し、結果として生じる構造モデルの完全性を損なう。これらの欠陥に対処するため、この研究は古典的非平衡熱力学で見られる拡散原理に基づくアバンギャルド生成フレームワークである 'method{} を導入する。 \method{} は原子を離散的な実体として定義し、マルコフ鎖に似た過程を通じて確率的ノイズの分布をコヒーレントな分子形式に戻す拡散の反転を導く。この変換は、抽象潜在空間における分子グラフの初期表現から始まり、タスクの特定の要求を尊重するように調整された精巧な双レベル最適化スキームを通じて3次元形式の実現へと進む。 The task of inferring three-dimensional molecular configurations from their two-dimensional graph representations is of critical significance in the domains of computational chemistry and the development of pharmaceuticals. It contributes fundamentally to our grasp of molecular mechanisms and interactions. The rapid evolution of machine learning, especially in the realm of deep generative networks, has catalyzed breakthroughs in the precision of such predictive modeling. Traditional methodologies typically employ a bifurcated strategy: initially estimating interatomic distances followed by sculpting the spatial molecular structure via solving a distance geometry problem. This sequential approach, however, occasionally fails to capture the intricacies of local atomic arrangements accurately, thus compromising the integrity of the resultant structural models. Addressing these deficiencies, this work introduces an avant-garde generative framework: \method{}, which is predicated on the diffusion principles found in classical non-equilibrium thermodynamics. \method{} envisages atoms as discrete entities and is adept at guiding the reversal of diffusion morphing a distribution of stochastic noise back into coherent molecular forms through a process akin to a Markov chain. This transformation begins with the initial representation of a molecular graph in an abstract latent space, progressing to the realization of the three-dimensional forms via an elaborate bilevel optimization scheme, tailored to respect the task's specific requirements.	翻訳日:2024-01-22 09:28:26 公開日:2023-12-22
# ai支援による病理診断への協力-empaiaイニシアチブ Joining Forces for Pathology Diagnostics with AI Assistance: The EMPAIA Initiative ( http://arxiv.org/abs/2401.09450v1 ) ライセンス: Link先を確認	Norman Zerbe, Lars Ole Schwen, Christian Gei{\ss}ler, Katja Wiesemann, Tom Bisson, Peter Boor, Rita Carvalho, Michael Franz, Christoph Jansen, Tim-Rasmus Kiehl, Bj\"orn Lindequist, Nora Charlotte Pohlan, Sarah Schmell, Klaus Strohmenger, Falk Zakrzewski, Markus Plass, Michael Takla, Tobias K\"uster, Andr\'e Homeyer, Peter Hufnagl	(参考訳) 過去10年間で、病理学における人工知能(AI)の手法は大幅に進歩した。しかし, 臨床診断製品への研究成果の翻訳における技術的, 規制的ハードルや, 標準化されたインターフェースの欠如など, 日常的な臨床実践への統合は遅れている。オープンでベンダ中立のEMPAIAイニシアチブは、これらの課題に対処する。本稿では,EMPAIAの成果と教訓について概説する。 EMPAIAは病理AIエコシステムの様々なステークホルダー、すなわち病理学者、コンピュータ科学者、産業を統合する。緊密なコラボレーションでは、技術的相互運用性標準、AIテストと製品開発のための推奨、説明可能性メソッドを開発しました。モジュール化されたオープンソースのEMPAIAプラットフォームを実装し、6つの異なるベンダーから11のAIベースの画像分析アプリを統合することに成功した。ヨーロッパとアジアで14種類の病理実験室で, 臨床現場におけるAIの活用を優先して検討した。技術開発に加えて、すべてのステークホルダーがデジタル病理とAIに関する情報と経験を共有するためのフォーラムを作りました。商業的、臨床的、学術的なステークホルダーはempaiaの共通のオープンソースインターフェースを採用することができ、大規模な標準化とプロセスの合理化にユニークな機会を提供する。日常的な実験室でのAI支援を効果的かつ広く確立するためには、さらなる努力が必要である。この目的のために、持続可能なインフラである非営利団体EMPAIA Internationalが、標準化を継続し、AI支援デジタル病理の未来に対する幅広い実装と擁護を支援するために設立された。 Over the past decade, artificial intelligence (AI) methods in pathology have advanced substantially. However, integration into routine clinical practice has been slow due to numerous challenges, including technical and regulatory hurdles in translating research results into clinical diagnostic products and the lack of standardized interfaces. The open and vendor-neutral EMPAIA initiative addresses these challenges. Here, we provide an overview of EMPAIA's achievements and lessons learned. EMPAIA integrates various stakeholders of the pathology AI ecosystem, i.e., pathologists, computer scientists, and industry. In close collaboration, we developed technical interoperability standards, recommendations for AI testing and product development, and explainability methods. We implemented the modular and open-source EMPAIA platform and successfully integrated 11 AI-based image analysis apps from 6 different vendors, demonstrating how different apps can use a single standardized interface. We prioritized requirements and evaluated the use of AI in real clinical settings with 14 different pathology laboratories in Europe and Asia. In addition to technical developments, we created a forum for all stakeholders to share information and experiences on digital pathology and AI. Commercial, clinical, and academic stakeholders can now adopt EMPAIA's common open-source interfaces, providing a unique opportunity for large-scale standardization and streamlining of processes. Further efforts are needed to effectively and broadly establish AI assistance in routine laboratory use. To this end, a sustainable infrastructure, the non-profit association EMPAIA International, has been established to continue standardization and support broad implementation and advocacy for an AI-assisted digital pathology future.	翻訳日:2024-01-22 09:28:05 公開日:2023-12-22
# Tumbug: 絵画的,普遍的な知識表現方法 Tumbug: A pictorial, universal knowledge representation method ( http://arxiv.org/abs/2401.09448v1 ) ライセンス: Link先を確認	Mark A. Atkins	(参考訳) 人工知能(AGI)の鍵は、一般的にコモンセンス推論(CSR)や、ほぼ同等に、特にCSRに適した知識表現法(KRM)の発見であると考えられており、著者らはCSR用のカスタムKRMを開発した。タムバグと呼ばれるこのKRMは、人間の脳がある種のKRMを使用しているという証拠が増えているため、自然界での写真として設計された。 tumbugは、roger schankのconceptual dependency (cd) theoryに似ているが、tumbugは、主に人間指向のアクティビティに基づいて約17のコンポーネント(=6つの原始的な概念カテゴリと11の原始的な行為)を使用しているcd理論とは対照的に、科学と人間の生活の基本的な概念に基づいた約30のコンポーネントを使用している。 Tumbugのビルディングブロックはすべて、従来のObject-Attribute-Value表現の3つのコンポーネント {O, A, V} と、Change and Systemである2つの新しいコンポーネント {C, S} に対応する5つのベーシックビルディングブロックに一般化することが判明した。 SCOVA」と呼ばれる5つの構成要素からなるこの集合は、すべての知識表現の普遍的な基盤であると考えられる。 Since the key to artificial general intelligence (AGI) is commonly believed to be commonsense reasoning (CSR) or, roughly equivalently, discovery of a knowledge representation method (KRM) that is particularly suitable for CSR, the author developed a custom KRM for CSR. This novel KRM called Tumbug was designed to be pictorial in nature because there exists increasing evidence that the human brain uses some pictorial type of KRM, and no well-known prior research in AGI has researched this KRM possibility. Tumbug is somewhat similar to Roger Schank's Conceptual Dependency (CD) theory, but Tumbug is pictorial and uses about 30 components based on fundamental concepts from the sciences and human life, in contrast to CD theory, which is textual and uses about 17 components (= 6 Primitive Conceptual Categories + 11 Primitive Acts) based mainly on human-oriented activities. All the Building Blocks of Tumbug were found to generalize to only five Basic Building Blocks that exactly correspond to the three components {O, A, V} of traditional Object-Attribute-Value representation plus two new components {C, S}, which are Change and System. Collectively this set of five components, called "SCOVA," seems to be a universal foundation for all knowledge representation.	翻訳日:2024-01-22 09:27:38 公開日:2023-12-22
# シミュレーションに基づく推定による孤立パルサー集団合成 Isolated pulsar population synthesis with simulation-based inference ( http://arxiv.org/abs/2312.14848v1 ) ライセンス: Link先を確認	Vanessa Graber, Michele Ronchi, Celsa Pardo-Araujo, Nanda Rea	(参考訳) 我々は、パルサー集団合成とシミュレーションに基づく推論を組み合わせることで、孤立したギャラクティック電波パルサーの磁気回転特性を抑える。まず、中性子星の誕生特性と進化をモデル化するための柔軟な枠組みを開発し、その動的、回転的、磁気的特性に焦点を当てた。特に、対数正規分布から初期磁場強度の$B$とスピン周期の$P$をサンプリングし、電力法則で遅延磁場崩壊を捉える。各ログノーマルは平均、$\mu_{\log b}, \mu_{\log p}$,および標準偏差、$\sigma_{\log b}, \sigma_{\log p}$ で記述されるが、パワーロームは$a_{\rm late}$で特徴づけられ、5つの自由パラメータが生成される。その後、恒星の電波放射と観測バイアスをモデル化し、3つの電波サーベイで検出を模倣し、入力パラメータを変化させて合成$p$-$\dot{p}$ダイアグラムの大規模なデータベースを作成する。次に、シミュレーションに基づく推論アプローチに従い、ニューラルネットワークを用いて、5つのモデルパラメータの後方分布を直接推測する深層ニューラルネットワークを訓練する。シミュレーションデータ上でこれらの個々の神経密度推定器の検証を成功させた後、観測されたパルサー集団の後方分布をネットワークのアンサンブルで推定する。我々は、対数正規分布に対して$\mu_{\log B} = 13.10^{+0.08}_{-0.10}$、$\sigma_{\log B} = 0.45^{+0.05}_{-0.05}$、$\mu_{\log P} = -1.00^{+0.26}_{-0.21}$、$\sigma_{\log P} = 0.38^{+0.33}_{-0.18}$、$a_{\rm late} = -1.80^{+0.65}_{-0.61}$、9.5\%$信頼区間における電力法について$を得る。このアプローチは、複雑な集団合成フレームワークに対するロバストな統計推論への重要なステップであり、銀河パルサーの将来の多波長解析の基礎を形成する。 We combine pulsar population synthesis with simulation-based inference to constrain the magneto-rotational properties of isolated Galactic radio pulsars. We first develop a flexible framework to model neutron-star birth properties and evolution, focusing on their dynamical, rotational and magnetic characteristics. In particular, we sample initial magnetic-field strengths, $B$, and spin periods, $P$, from log-normal distributions and capture the late-time magnetic-field decay with a power law. Each log-normal is described by a mean, $\mu_{\log B}, \mu_{\log P}$, and standard deviation, $\sigma_{\log B}, \sigma_{\log P}$, while the power law is characterized by the index, $a_{\rm late}$, resulting in five free parameters. We subsequently model the stars' radio emission and observational biases to mimic detections with three radio surveys, and produce a large database of synthetic $P$-$\dot{P}$ diagrams by varying our input parameters. We then follow a simulation-based inference approach that focuses on neural posterior estimation and employ this database to train deep neural networks to directly infer the posterior distributions of the five model parameters. After successfully validating these individual neural density estimators on simulated data, we use an ensemble of networks to infer the posterior distributions for the observed pulsar population. We obtain $\mu_{\log B} = 13.10^{+0.08}_{-0.10}$, $\sigma_{\log B} = 0.45^{+0.05}_{-0.05}$ and $\mu_{\log P} = -1.00^{+0.26}_{-0.21}$, $\sigma_{\log P} = 0.38^{+0.33}_{-0.18}$ for the log-normal distributions, and $a_{\rm late} = -1.80^{+0.65}_{-0.61}$ for the power law at $95\%$ credible interval. Our approach represents a crucial step towards robust statistical inference for complex population-synthesis frameworks and forms the basis for future multi-wavelength analyses of Galactic pulsars.	翻訳日:2024-01-15 13:15:57 公開日:2023-12-22
# 分離データアソシエーションとスムージングを用いたトランスベースマルチオブジェクトスムージング Transformer-Based Multi-Object Smoothing with Decoupled Data Association and Smoothing ( http://arxiv.org/abs/2312.17261v1 ) ライセンス: Link先を確認	Juliano Pinto, Georg Hess, Yuxuan Xia, Henk Wymeersch, Lennart Svensson	(参考訳) マルチオブジェクト追跡(Multi-object Tracking、MOT)は、ある時間ウィンドウ上で、未知および時間変化のオブジェクトの状態軌跡を推定するタスクである。オブジェクト検出を時間ウィンドウ内のすべての測定値に条件付けできるマルチオブジェクト平滑化タスクに取り組むために,いくつかのアルゴリズムが提案されている。しかし、最適性能の手法は難解な計算複雑性に悩まされ、近似が必要であり、複雑な環境では準最適に実行する。深層学習に基づくアルゴリズムはこの問題に対処する可能性があるが、正確なマルチオブジェクトモデルが利用可能であり、測定が低次元であるような環境では広く適用されていない。本稿では,データ関連タスクをスムーズなタスクから切り離すような,この設定に適した新しいDLアーキテクチャを提案する。本研究では,従来のベイズトラッカーとDLトラッカーのスムーズ化問題設定における最初の比較として,従来のベイズトラッカーとDLトラッカーとの比較を行った。 Multi-object tracking (MOT) is the task of estimating the state trajectories of an unknown and time-varying number of objects over a certain time window. Several algorithms have been proposed to tackle the multi-object smoothing task, where object detections can be conditioned on all the measurements in the time window. However, the best-performing methods suffer from intractable computational complexity and require approximations, performing suboptimally in complex settings. Deep learning based algorithms are a possible venue for tackling this issue but have not been applied extensively in settings where accurate multi-object models are available and measurements are low-dimensional. We propose a novel DL architecture specifically tailored for this setting that decouples the data association task from the smoothing task. We compare the performance of the proposed smoother to the state-of-the-art in different tasks of varying difficulty and provide, to the best of our knowledge, the first comparison between traditional Bayesian trackers and DL trackers in the smoothing problem setting.	翻訳日:2024-01-15 12:50:25 公開日:2023-12-22
# TimePillars: テンポラリリカレントな3D LiDARオブジェクト検出 TimePillars: Temporally-Recurrent 3D LiDAR Object Detection ( http://arxiv.org/abs/2312.17260v1 ) ライセンス: Link先を確認	Ernesto Lozano Calvo, Bernardo Taveira, Fredrik Kahl, Niklas Gustafsson, Jonathan Larsson, Adam Tonderski	(参考訳) LiDARポイントクラウドに適用される物体検出は、ロボット工学、特に自律運転において重要なタスクである。フィールドで主に使用される単一のフレームメソッドは、個々のセンサースキャンから情報を活用する。最近の手法は比較的低い推論時間で優れた性能を達成する。しかし、LiDARデータに固有の疎度を考えると、これらの手法は、安全な自動化を実現する上で欠かせない長距離検出(例えば200m)に苦慮している。複数のスキャンを集約することは、より密度の高いクラウド表現につながるだけでなく、システムにタイムアウェアネスをもたらし、環境の変化に関する情報を提供する。しかし、この種のソリューションは、しばしば非常に問題固有のものであり、慎重にデータ処理を必要とし、実行時要求を満たさない傾向があります。この文脈では,lidarデータのピラー表現を時間にわたって活用し,ハードウェア統合効率の制約を尊重し,新たなzenseact open dataset (zod) の多様性と長距離情報を活用する時間的リカレントオブジェクト検出パイプラインであるtimepillarsを提案する。実験を通じて、繰り返しの利点を証明し、基礎的なビルディングブロックがいかに堅牢で効率的な結果が得られるかを示す。 Object detection applied to LiDAR point clouds is a relevant task in robotics, and particularly in autonomous driving. Single frame methods, predominant in the field, exploit information from individual sensor scans. Recent approaches achieve good performance, at relatively low inference time. Nevertheless, given the inherent high sparsity of LiDAR data, these methods struggle in long-range detection (e.g. 200m) which we deem to be critical in achieving safe automation. Aggregating multiple scans not only leads to a denser point cloud representation, but it also brings time-awareness to the system, and provides information about how the environment is changing. Solutions of this kind, however, are often highly problem-specific, demand careful data processing, and tend not to fulfil runtime requirements. In this context we propose TimePillars, a temporally-recurrent object detection pipeline which leverages the pillar representation of LiDAR data across time, respecting hardware integration efficiency constraints, and exploiting the diversity and long-range information of the novel Zenseact Open Dataset (ZOD). Through experimentation, we prove the benefits of having recurrency, and show how basic building blocks are enough to achieve robust and efficient results.	翻訳日:2024-01-15 12:50:03 公開日:2023-12-22
# 大規模言語モデルエージェントのためのワーキングメモリの強化 Empowering Working Memory for Large Language Model Agents ( http://arxiv.org/abs/2312.17259v1 ) ライセンス: Link先を確認	Jing Guo, Nan Li, Jianchuan Qi, Hang Yang, Ruiqiao Li, Yuzhen Feng, Si Zhang, Ming Xu	(参考訳) 大きな言語モデル(LLM)は印象的な言語機能を実現している。しかし、鍵となる制限は人間のような記憶能力の欠如である。 LLMは連続的な相互作用に制約のあるメモリ保持を示し、複雑な推論を妨げる。本稿では,認知心理学のワーキングメモリフレームワークを適用し,LLMアーキテクチャを向上する可能性について考察する。従来のLLMメモリ設計の限界は、異なるダイアログエピソードの分離や永続的なメモリリンクの欠如など、分析される。これに対処するため、集中型ワーキングメモリハブとエピソディックバッファアクセスを組み込んだ革新的なモデルが提案されている。このアーキテクチャは、複雑なタスクや協調的なシナリオにおいて、微妙な文脈推論の継続性を高めることを目的としている。有望ではあるが、エピソードメモリエンコーディング、ストレージ、優先順位付け、検索、セキュリティの最適化にはさらなる研究が必要である。本稿では,より高度で人間らしい記憶能力を持つLSMエージェントを開発するための戦略的青写真を提供し,汎用人工知能における重要なフロンティアとしてメモリ機構を強調した。 Large language models (LLMs) have achieved impressive linguistic capabilities. However, a key limitation persists in their lack of human-like memory faculties. LLMs exhibit constrained memory retention across sequential interactions, hindering complex reasoning. This paper explores the potential of applying cognitive psychology's working memory frameworks, to enhance LLM architecture. The limitations of traditional LLM memory designs are analyzed, including their isolation of distinct dialog episodes and lack of persistent memory links. To address this, an innovative model is proposed incorporating a centralized Working Memory Hub and Episodic Buffer access to retain memories across episodes. This architecture aims to provide greater continuity for nuanced contextual reasoning during intricate tasks and collaborative scenarios. While promising, further research is required into optimizing episodic memory encoding, storage, prioritization, retrieval, and security. Overall, this paper provides a strategic blueprint for developing LLM agents with more sophisticated, human-like memory capabilities, highlighting memory mechanisms as a vital frontier in artificial general intelligence.	翻訳日:2024-01-15 12:49:42 公開日:2023-12-22
# MLによるフライング -- Affine 変換の CNN インバージョン Flying By ML -- CNN Inversion of Affine Transforms ( http://arxiv.org/abs/2312.17258v1 ) ライセンス: Link先を確認	L. Van Warren	(参考訳) 本稿では,cnnを用いてアフィン変換を反転させ,計器画像から航空機の状態を推定し,コックピットゲージの読解を自動化する機械学習手法について述べる。本研究は,ターン・アンド・バンクインジケータの合成画像を用いて検証し,単一画像からのデータセット生成,ノイズフリートレーニングのための「クリーントレーニング原理」,カテゴリデータからの連続値予測のためのcnn補間といった手法を導入する。ハイパーパラメータ最適化やMLシステムエンジニアリングに関する洞察も提供する。 This paper describes a machine learning method to automate reading of cockpit gauges, using a CNN to invert affine transformations and deduce aircraft states from instrument images. Validated with synthetic images of a turn-and-bank indicator, this research introduces methods such as generating datasets from a single image, the 'Clean Training Principle' for optimal noise-free training, and CNN interpolation for continuous value predictions from categorical data. It also offers insights into hyperparameter optimization and ML system software engineering.	翻訳日:2024-01-15 12:49:26 公開日:2023-12-22
# 長期条件記憶を持つ大規模言語モデルアシスタントの進化 Evolving Large Language Model Assistant with Long-Term Conditional Memory ( http://arxiv.org/abs/2312.17257v1 ) ライセンス: Link先を確認	Ruifeng Yuan, Shichao Sun, Zili Wang, Ziqiang Cao, Wenjie Li	(参考訳) 大規模言語モデルの急速な発展に伴い、ChatGPTのようなAIアシスタントは人々の作品や生活に広く浸透してきた。本稿では,言語長期記憶を利用した大規模言語モデルアシスタントについて述べる。ユーザーとaiアシスタントの間の履歴対話から知識と経験を保存し、より良い反応を生み出すための将来の対話に適用することに焦点を当てている。モデルは、完了した対話ごとに一連のレコードを生成し、それらをメモリに格納する。後の使用例では、新しいユーザ入力が与えられ、モデルがそれを使って関連するメモリを取得し、応答の質を改善する。メモリの最良の形態を見つけるために,メモリ構築のさまざまな方法を探り,条件記憶と呼ばれる新しい記憶機構を提案し,従来の手法の問題を解決する。また,生成過程におけるメモリの検索と利用について検討する。アシスタントはGPT-4をバックボーンとして使用し、長期記憶を持つAIアシスタントが必要とするさまざまな能力に着目した3つの構築されたテストデータセットで評価する。 With the rapid development of large language models, AI assistants like ChatGPT have widely entered people's works and lives. In this paper, we present an evolving large language model assistant that utilizes verbal long-term memory. It focuses on preserving the knowledge and experience from the history dialogue between the user and AI assistant, which can be applied to future dialogue for generating a better response. The model generates a set of records for each finished dialogue and stores them in the memory. In later usage, given a new user input, the model uses it to retrieve its related memory to improve the quality of the response. To find the best form of memory, we explore different ways of constructing the memory and propose a new memorizing mechanism called conditional memory to solve the problems in previous methods. We also investigate the retrieval and usage of memory in the generation process. The assistant uses GPT-4 as the backbone and we evaluate it on three constructed test datasets focusing on different abilities required by an AI assistant with long-term memory.	翻訳日:2024-01-15 12:49:17 公開日:2023-12-22
# voronoi tessellation の自己分化法 A Method for Auto-Differentiation of the Voronoi Tessellation ( http://arxiv.org/abs/2312.16192v1 ) ライセンス: Link先を確認	Sergei Shumilin, Alexander Ryabov, Evgeny Burnaev, Vladimir Vanovskii	(参考訳) ボロノイテッセルレーション(英: Voronoi tessellation)またはボロノイ図(英: Voronoi diagram)は、様々な科学分野に応用できる重要な計算幾何学技術である。これは、与えられた空間を点の集合に近接して領域に分割することである。自動微分は最適化タスクを解決する強力なツールです。自己微分は、バックプロパゲーションアルゴリズムを使って勾配を計算する計算グラフを構築することを前提としている。しかし、しばしばボロノイ音節はパイプラインの唯一の区別不能部分であり、エンドツーエンドの区別を禁止している。本稿では,2次元ヴォロノイテッセルレーションの自動微分法を提案する。この方法により、ヴォロノイのテッセル化と勾配の通過が可能であるため、構築をエンドツーエンドで微分できる。実装の詳細といくつかの重要な応用について述べる。私たちの知る限りでは、これはvoronoiの幾何学的パラメータの完全な集合を微分可能な方法で提供するvoronoi tessellationの最初の自己微分可能実現である。 Voronoi tessellation, also known as Voronoi diagram, is an important computational geometry technique that has applications in various scientific disciplines. It involves dividing a given space into regions based on the proximity to a set of points. Autodifferentiation is a powerful tool for solving optimization tasks. Autodifferentiation assumes constructing a computational graph that allows to compute gradients using backpropagation algorithm. However, often the Voronoi tessellation remains the only non-differentiable part of a pipeline, prohibiting end-to-end differentiation. We present the method for autodifferentiation of the 2D Voronoi tessellation. The method allows one to construct the Voronoi tessellation and pass gradients, making the construction end-to-end differentiable. We provide the implementation details and present several important applications. To the best of our knowledge this is the first autodifferentiable realization of the Voronoi tessellation providing full set of Voronoi geometrical parameters in a differentiable way.	翻訳日:2024-01-15 12:47:34 公開日:2023-12-22
# 乳癌におけるリソース制限自動Ki67指数の推定 Resource-Limited Automated Ki67 Index Estimation in Breast Cancer ( http://arxiv.org/abs/2401.00014v1 ) ライセンス: Link先を確認	J. Gliozzo, G. Marin\`o, A. Bonometti, M. Frasca and D. Malchiodi	(参考訳) 腫瘍進展と化学療法反応の予測は、最近、腫瘍浸潤性リンパ球(TIL)と核タンパク質Ki67を予後因子として用いている。近年,深層ニューラルネットワーク (dnns) が乳癌細胞においてki67の発現を推定し,腫瘍内tilsスコアを同時決定する結果が得られた。しかし、この10年間で、深層モデルによって引き起こされた異常な進歩は、少なくとも資源需要と同じくらいに増大した。深層モデルのクエリに必要な計算コストは、IoTベースのアプリケーションのように、リソース制限の強い制限を表している(場合によっては保存する)。そこで本研究では,乳がん検診においてki67陽性細胞の割合を効果的に推定するための資源消費対応dnnを提案する。提案手法では, メモリ使用量の75%と89%を削減し, エネルギー消費量を1.5倍に削減し, ベンチマーク・オブ・ザ・アート・ソリューションの総合的精度を向上した。このようなポジティブな結果に刺激されて,我々は,その汎用利用を可能にするために採用したフレームワークと,その利用をサポートするパブリックソフトウェアリポジトリを開発,構成した。 The prediction of tumor progression and chemotherapy response has been recently tackled exploiting Tumor Infiltrating Lymphocytes (TILs) and the nuclear protein Ki67 as prognostic factors. Recently, deep neural networks (DNNs) have been shown to achieve top results in estimating Ki67 expression and simultaneous determination of intratumoral TILs score in breast cancer cells. However, in the last ten years the extraordinary progress induced by deep models proliferated at least as much as their resource demand. The exorbitant computational costs required to query (and in some cases also to store) a deep model represent a strong limitation in resource-limited contexts, like that of IoT-based applications to support healthcare personnel. To this end, we propose a resource consumption-aware DNN for the effective estimate of the percentage of Ki67-positive cells in breast cancer screenings. Our approach reduced up to 75% and 89% the usage of memory and disk space respectively, up to 1.5x the energy consumption, and preserved or improved the overall accuracy of a benchmark state-of-the-art solution. Encouraged by such positive results, we developed and structured the adopted framework so as to allow its general purpose usage, along with a public software repository to support its usage.	翻訳日:2024-01-15 12:26:22 公開日:2023-12-22
# 継承表現を訓練したニューラルネットワークに基づくマルチモーダル認知マップ Multi-Modal Cognitive Maps based on Neural Networks trained on Successor Representations ( http://arxiv.org/abs/2401.01364v1 ) ライセンス: Link先を確認	Paul Stoewer, Achim Schilling, Andreas Maier and Patrick Krauss	(参考訳) 認知地図(Cognitive map)は、脳が記憶を効率的に整理し、そこからコンテキストを取り出す方法に関する概念である。 Entorhinal-hippocampal complexは、エピソードやリレーショナルメモリ処理、空間ナビゲーションに深く関わっており、場所や格子細胞を介して認知地図を構築すると考えられている。認知地図の有望な特性を利用するため,我々は,細胞動態と認知地図表現をモデル化可能な後継表現を用いたマルチモーダルニューラルネットワークを構築した。ここでは、画像と単語埋め込みからなるマルチモーダル入力を用いる。ネットワークは、新規入力とトレーニングデータベースとの類似性を学習し、認知地図の表現を成功させる。その後、ネットワークの予測は、1つのモダリティから別のモダリティへの推測に90\%以上の精度で使用できる。したがって、提案手法は、現在のAIシステムを改善するためのビルディングブロックであり、オブジェクトが現れる環境と異なるモダリティをよりよく理解することができる。したがって、特定のモダリティと特定の遭遇との関連性は、類似した情報が少なく、学習された認知地図から追加情報が推測されるような、新たな状況における文脈認識につながる可能性がある。脳のentorhinal-hippocampal complex(entorhinal-hippocampal complex)で表される認知地図は、記憶からコンテキストを整理し取り出すもので、chatgptのような大規模言語モデル(llm)が類似したアーキテクチャを利用して高レベルの処理中心として機能することを示唆している。最後に、マルチモーダル入力を利用することで、LLMは、さまざまな形式のデータ(画像や単語など)間のギャップを埋め、学習された関連を通じてコンテキスト認識と抽象概念の基盤を築き、AIの基盤問題に対処することができる。 Cognitive maps are a proposed concept on how the brain efficiently organizes memories and retrieves context out of them. The entorhinal-hippocampal complex is heavily involved in episodic and relational memory processing, as well as spatial navigation and is thought to built cognitive maps via place and grid cells. To make use of the promising properties of cognitive maps, we set up a multi-modal neural network using successor representations which is able to model place cell dynamics and cognitive map representations. Here, we use multi-modal inputs consisting of images and word embeddings. The network learns the similarities between novel inputs and the training database and therefore the representation of the cognitive map successfully. Subsequently, the prediction of the network can be used to infer from one modality to another with over $90\%$ accuracy. The proposed method could therefore be a building block to improve current AI systems for better understanding of the environment and the different modalities in which objects appear. The association of specific modalities with certain encounters can therefore lead to context awareness in novel situations when similar encounters with less information occur and additional information can be inferred from the learned cognitive map. Cognitive maps, as represented by the entorhinal-hippocampal complex in the brain, organize and retrieve context from memories, suggesting that large language models (LLMs) like ChatGPT could harness similar architectures to function as a high-level processing center, akin to how the hippocampus operates within the cortex hierarchy. Finally, by utilizing multi-modal inputs, LLMs can potentially bridge the gap between different forms of data (like images and words), paving the way for context-awareness and grounding of abstract concepts through learned associations, addressing the grounding problem in AI.	翻訳日:2024-01-15 10:09:50 公開日:2023-12-22
# SoK: 三角形のモデリング - 機械学習における公正さ、解釈可能性、プライバシの相互作用について SoK: Taming the Triangle -- On the Interplays between Fairness, Interpretability and Privacy in Machine Learning ( http://arxiv.org/abs/2312.16191v1 ) ライセンス: Link先を確認	Julien Ferry (LAAS-ROC), Ulrich A\"ivodji (ETS), S\'ebastien Gambs (UQAM), Marie-Jos\'e Huguet (LAAS-ROC), Mohamed Siala (LAAS-ROC)	(参考訳) 機械学習技術は、大学入学、ローンの帰属、再分配予測などの高い意思決定にますます使われている。したがって、学習したモデルが人間によって監査または理解され、差別や偏見を発生または再現せず、トレーニングデータに関する機密情報を漏洩しないようにすることが重要である。実際、解釈可能性、公正性、プライバシは、責任ある機械学習を開発する上で重要な要件であり、これら3つ全てが過去10年間に広く研究されてきた。しかし、それらは主に孤立していると考えられ、実際には肯定的にも否定的にも互いに相互作用する。本稿ではsok(systematization of knowledge)論文において,これら3つのデシデラタ間の相互作用に関する文献について検討した。より正確には、それぞれの相互作用について、同定されたシナジーと緊張を要約する。これらの知見は、いくつかの基本的な理論的および経験的対立を浮き彫りにしつつ、高レベルの実用性を維持することを目的とした場合、これらの異なる要件を共同で検討することは困難であることを示す。この問題を解決するために, 注意深い設計がこれらの異なる関心事を実際にうまく処理できることを示すため, 融和機構の可能性についても論じる。 Machine learning techniques are increasingly used for high-stakes decision-making, such as college admissions, loan attribution or recidivism prediction. Thus, it is crucial to ensure that the models learnt can be audited or understood by human users, do not create or reproduce discrimination or bias, and do not leak sensitive information regarding their training data. Indeed, interpretability, fairness and privacy are key requirements for the development of responsible machine learning, and all three have been studied extensively during the last decade. However, they were mainly considered in isolation, while in practice they interplay with each other, either positively or negatively. In this Systematization of Knowledge (SoK) paper, we survey the literature on the interactions between these three desiderata. More precisely, for each pairwise interaction, we summarize the identified synergies and tensions. These findings highlight several fundamental theoretical and empirical conflicts, while also demonstrating that jointly considering these different requirements is challenging when one aims at preserving a high level of utility. To solve this issue, we also discuss possible conciliation mechanisms, showing that a careful design can enable to successfully handle these different concerns in practice.	翻訳日:2023-12-31 03:01:10 公開日:2023-12-22
# Hessian-based generalization Guaranteesを用いたディープニューラルネットワークのロバスト微調整 Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees ( http://arxiv.org/abs/2206.02659v6 ) ライセンス: Link先を確認	Haotian Ju, Dongyue Li, Hongyang R. Zhang	(参考訳) 対象タスクにおける事前訓練されたディープニューラルネットワークの微調整を検討する。我々は、しばしば観測される過剰フィッティングの問題(例えば、ターゲットデータセットが小さい場合や、トレーニングラベルが騒がしい場合など)を理解するために、微調整の一般化特性について検討する。深層ネットワークに対する既存の一般化手法は、微調整モデルの初期化(即ち事前訓練されたネットワーク)からの距離や、深層ネットワークの雑音安定性などの概念に依存する。本稿では,PAC-Bayesian解析によるヘッセン系距離測定を同定し,微調整モデルの一般化ギャップとよく相関することを示した。理論的には、微調整モデルに対するヘッセン距離に基づく一般化境界を証明できる。また,オーバーフィッティングが重要な問題であるラベルノイズに対する微調整に関する拡張研究についても述べる。本稿では,このアルゴリズムについて,クラス条件付き独立ノイズモデルに基づくアルゴリズムと一般化誤差保証を提案する。経験的に、ヘッセン距離測度は、実際に微調整されたモデルの観測された一般化ギャップのスケールと一致する。また,ノイズの多いトレーニングラベルを用いた画像分類タスクでもアルゴリズムをテストし,先行手法の利得と微調整モデルのヘッセン距離測定値の低下を示した。 We consider fine-tuning a pretrained deep neural network on a target task. We study the generalization properties of fine-tuning to understand the problem of overfitting, which has often been observed (e.g., when the target dataset is small or when the training labels are noisy). Existing generalization measures for deep networks depend on notions such as distance from the initialization (i.e., the pretrained network) of the fine-tuned model and noise stability properties of deep networks. This paper identifies a Hessian-based distance measure through PAC-Bayesian analysis, which is shown to correlate well with observed generalization gaps of fine-tuned models. Theoretically, we prove Hessian distance-based generalization bounds for fine-tuned models. We also describe an extended study of fine-tuning against label noise, where overfitting remains a critical problem. We present an algorithm and a generalization error guarantee for this algorithm under a class conditional independent noise model. Empirically, we observe that the Hessian-based distance measure can match the scale of the observed generalization gap of fine-tuned models in practice. We also test our algorithm on several image classification tasks with noisy training labels, showing gains over prior methods and decreases in the Hessian distance measure of the fine-tuned model.	翻訳日:2023-12-27 23:29:31 公開日:2023-12-22
# DynGFN:GFlowNetを用いた遺伝子制御ネットワークのベイズ推定に向けて DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets ( http://arxiv.org/abs/2302.04178v4 ) ライセンス: Link先を確認	Lazar Atanackovic, Alexander Tong, Bo Wang, Leo J. Lee, Yoshua Bengio, Jason Hartford	(参考訳) 細胞生物学における大きな課題の1つは、遺伝子発現と細胞機能を制御する遺伝子とその産物間の相互作用を記述する遺伝子制御ネットワーク(GRN)を推論することである。 1) 規制ネットワークは本質的に循環的であるため、grnを有向非循環グラフ(dag)としてモデル化すべきではなく、2) 観測は重要な測定ノイズを持つので、典型的なサンプルサイズでは、データが与えられた可能性のあるグラフの大きな同値クラスが常に存在し、この不確かさを捉える方法を求めている。既存の方法は、チャレンジ(1)、ダイナミックスから循環構造を識別すること、あるいはチャレンジ(2)、DAGよりも複雑なベイズ後部を学習することに焦点を当てるが、両方ではない。本稿では、RNAベロシティ技術を用いて遺伝子発現の「速度」を推定できるという事実を活用し、両方の課題に対処するアプローチを開発する。速度情報へのアクセスがあるので,ベイズ構造学習問題を動的系のスパース同定問題として扱うことができ,循環フィードバックループを時間を通じて捉えることができる。本研究の目的は, 離散構造上の不確実性をモデル化することであり, 生成フローネットワーク(GFlowNets)を用いて, 結合空間の後方分布を推定することである。提案手法は, 従来のベイズ構造学習法と比較して, 循環構造の分布をよりよくカプセル化した後部学習法であることが示唆された。 One of the grand challenges of cell biology is inferring the gene regulatory network (GRN) which describes interactions between genes and their products that control gene expression and cellular function. We can treat this as a causal discovery problem but with two non-standard challenges: (1) regulatory networks are inherently cyclic so we should not model a GRN as a directed acyclic graph (DAG), and (2) observations have significant measurement noise, so for typical sample sizes there will always be a large equivalence class of graphs that are likely given the data, and we want methods that capture this uncertainty. Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both. In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges. Because we have access to velocity information, we can treat the Bayesian structure learning problem as a problem of sparse identification of a dynamical system, capturing cyclic feedback loops through time. Since our objective is to model uncertainty over discrete structures, we leverage Generative Flow Networks (GFlowNets) to estimate the posterior distribution over the combinatorial space of possible sparse dependencies. Our results indicate that our method learns posteriors that better encapsulate the distributions of cyclic structures compared to counterpart state-of-the-art Bayesian structure learning approaches.	翻訳日:2023-12-27 23:21:14 公開日:2023-12-22
# PAC-Optimal Hyper-PosteriorによるスケーラブルなPAC-Bayesianメタラーニング:理論から実践へ Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior: From Theory to Practice ( http://arxiv.org/abs/2211.07206v3 ) ライセンス: Link先を確認	Jonas Rothfuss, Martin Josifoski, Vincent Fortuin, Andreas Krause	(参考訳) Meta-Learningは、関連する学習タスクのデータセットから有用な帰納バイアスを取得することで、新しいタスクの学習プロセスを高速化することを目的としている。実際には、利用可能な関連するタスクの数は少ないことが多いが、既存のアプローチのほとんどは、多くのタスクを前提としており、非現実的で過度に適合する傾向がある。メタラーニング文学における中心的な疑問は、未発見のタスクへの一般化を確実にするための規則化の方法である。本研究では,pac-ベイズ理論を用いた理論的解析を行い,rothfuss et al. (2021a) によって初めて導かれたメタラーニングの一般化を提案する。重要なことに、この境界はPACOHと呼ばれる最適超後光の閉形式を導出することができ、最高の性能保証をもたらす。 PAC-Bayesian per-task 学習境界におけるメタラーニングの条件と程度について,理論的解析および実証事例研究を行った。閉形式PACOHは、二段階最適化に依存しない実践的なメタラーニングアプローチを刺激し、うまくスケールする標準的な変分法に対処可能な確率的最適化問題を引き起こす。実験の結果,PACOHをガウス過程とベイジアンニューラルネットワークモデルでインスタンス化する場合,提案手法はよりスケーラブルで,予測精度と不確実性評価の両面において最先端性能が得られることがわかった。 Meta-Learning aims to speed up the learning process on new tasks by acquiring useful inductive biases from datasets of related learning tasks. While, in practice, the number of related tasks available is often small, most of the existing approaches assume an abundance of tasks; making them unrealistic and prone to overfitting. A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks. In this work, we provide a theoretical analysis using the PAC-Bayesian theory and present a generalization bound for meta-learning, which was first derived by Rothfuss et al. (2021a). Crucially, the bound allows us to derive the closed form of the optimal hyper-posterior, referred to as PACOH, which leads to the best performance guarantees. We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds. The closed-form PACOH inspires a practical meta-learning approach that avoids the reliance on bi-level optimization, giving rise to a stochastic optimization problem that is amenable to standard variational methods that scale well. Our experiments show that, when instantiating the PACOH with Gaussian processes and Bayesian Neural Networks models, the resulting methods are more scalable, and yield state-of-the-art performance, both in terms of predictive accuracy and the quality of uncertainty estimates.	翻訳日:2023-12-27 23:16:39 公開日:2023-12-22
# 説明制約による学習 Learning with Explanation Constraints ( http://arxiv.org/abs/2303.14496v3 ) ライセンス: Link先を確認	Rattana Pukdee, Dylan Sam, J. Zico Kolter, Maria-Florina Balcan, Pradeep Ravikumar	(参考訳) 大規模なディープラーニングモデルは解釈が難しいため、最近はブラックボックスモデルの説明に焦点が当てられている。対照的に、モデルがどのように振る舞うべきかという apriori の説明があるかもしれない。本稿では,説明制約からの学習としてこの概念を定式化し,その説明がモデル学習をいかに改善できるかを分析するための学習論的枠組みを提案する。これらの説明はいつ役に立つのか? 私たちの最初の重要な貢献は、新しいデータに対する期待でこれらの説明制約を満たす一連のモデルを通じてこの問題に対処します。線形モデルと2層ニューラルネットワークの両方の設定における勾配情報から得られる説明の標準クラスに対して、これらのモデルの利点(Rademacher複雑性の低減の観点から)を特徴づける。さらに,より単純な拡張ラグランジアン法と比較して,より優れた性能を実現し,より頻繁な制約を満たす変分近似によって,我々のフレームワークのアルゴリズム的解を提供する。我々は,大規模な合成および実世界の実験に対するアプローチの利点を実証する。 As larger deep learning models are hard to interpret, there has been a recent focus on generating explanations of these black-box models. In contrast, we may have apriori explanations of how models should behave. In this paper, we formalize this notion as learning from explanation constraints and provide a learning theoretic framework to analyze how such explanations can improve the learning of our models. One may naturally ask, "When would these explanations be helpful?" Our first key contribution addresses this question via a class of models that satisfies these explanation constraints in expectation over new data. We provide a characterization of the benefits of these models (in terms of the reduction of their Rademacher complexities) for a canonical class of explanations given by gradient information in the settings of both linear models and two layer neural networks. In addition, we provide an algorithmic solution for our framework, via a variational approximation that achieves better performance and satisfies these constraints more frequently, when compared to simpler augmented Lagrangian methods to incorporate these explanations. We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.	翻訳日:2023-12-27 23:08:19 公開日:2023-12-22
# DeblurSR:スパイク表現の下のイベントベースの動き DeblurSR: Event-Based Motion Deblurring Under the Spiking Representation ( http://arxiv.org/abs/2303.08977v3 ) ライセンス: Link先を確認	Chen Song, Chandrajit Bajaj, Qixing Huang	(参考訳) 本稿では,ぼやけた映像をシャープな映像に変換する新しい動きデブラリング手法であるdeblursrを提案する。 DeblurSRはイベントデータを利用して動きのあいまいさを補償し、スパイキング表現を利用してシャープな出力ビデオを時間から強度へのマッピングとしてパラメータ化する。私たちの重要な貢献であるスパイキング表現(SR)は、生物において生物学的ニューロンがどのように相互に通信するかを決定する神経型原理にインスパイアされています。スパイクが鋭いエッジを表現できる理由と、スパイクパラメータがニューロモルフィックな視点からどのように解釈されるかについて議論する。 DeblurSRは出力品質が高く、最先端のイベントベースのモーションデブロア法よりも少ない計算資源を必要とする。さらに,我々のアプローチは,暗黙的神経表現の最近の進歩と相まって,ビデオの超解像まで容易に拡張できることを示した。 DeblurSRの実装と視覚化はhttps://github.com/chensong1995/DeblurSRで公開されている。 We present DeblurSR, a novel motion deblurring approach that converts a blurry image into a sharp video. DeblurSR utilizes event data to compensate for motion ambiguities and exploits the spiking representation to parameterize the sharp output video as a mapping from time to intensity. Our key contribution, the Spiking Representation (SR), is inspired by the neuromorphic principles determining how biological neurons communicate with each other in living organisms. We discuss why the spikes can represent sharp edges and how the spiking parameters are interpreted from the neuromorphic perspective. DeblurSR has higher output quality and requires fewer computing resources than state-of-the-art event-based motion deblurring methods. We additionally show that our approach easily extends to video super-resolution when combined with recent advances in implicit neural representation. The implementation and animated visualization of DeblurSR are available at https://github.com/chensong1995/DeblurSR.	翻訳日:2023-12-27 23:06:51 公開日:2023-12-22
# 次世代外科ナビゲーション : マーカレスマルチビュー6dofによる手術器具の姿勢推定 Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose Estimation of Surgical Instruments ( http://arxiv.org/abs/2305.03535v2 ) ライセンス: Link先を確認	Jonas Hein, Nicola Cavalcanti, Daniel Suter, Lukas Zingg, Fabio Carrillo, Lilian Calvet, Mazda Farshad, Marc Pollefeys, Nassir Navab, Philipp F\"urnstahl	(参考訳) 従来のコンピュータビジョンの最先端の研究は、外科領域でますます活用されている。コンピュータ支援手術において特に注目されるのは、計器位置決めのためのマーカーベースのトラッキングシステムと、深層学習を用いた純画像ベースの6DoFポーズ推定に置き換えることである。しかし、最先端の単一視点ポーズ推定法はまだ手術ナビゲーションに必要な精度を満たさない。そこで本研究では,手術器具の高精度かつ閉塞性6DoFポーズ推定のためのマルチビュー設定の利点を考察し,手術室の課題に対処する理想的なカメラシステムを提案する。この作品の貢献は3倍である。まず,スタティックカメラとヘッドマウントカメラからなるマルチカメラキャプチャセットアップを提案し,様々なカメラ構成におけるポーズ推定手法の性能について検討する。第2に,手術用湿式手術室と実手術室で撮影し,外科医,器具,患者解剖学の豊富なアノテーションを含む多視点RGB-Dビデオデータセットを公開する。第3に,手術器具の6dofポーズ推定作業における3つの最先端シングルビューおよびマルチビュー法を評価し,カメラ構成,トレーニングデータ,咬合が姿勢精度および一般化能力に及ぼす影響を分析した。最適な方法は5台のカメラを多視点ポーズ最適化に利用し、手術訓練では1.01mmと0.89\degの平均位置と方位誤差、最適な条件下では2.79mmと3.33\degを達成する。手術器具のマーカーレストラッキングが既存のマーカーベースシステムに代わる可能性が高まっていることを示す。 State-of-the-art research of traditional computer vision is increasingly leveraged in the surgical domain. A particular focus in computer-assisted surgery is to replace marker-based tracking systems for instrument localization with pure image-based 6DoF pose estimation using deep-learning methods. However, state-of-the-art single-view pose estimation methods do not yet meet the accuracy required for surgical navigation. In this context, we investigate the benefits of multi-view setups for highly accurate and occlusion-robust 6DoF pose estimation of surgical instruments and derive recommendations for an ideal camera system that addresses the challenges in the operating room. The contributions of this work are threefold. First, we present a multi-camera capture setup consisting of static and head-mounted cameras, which allows us to study the performance of pose estimation methods under various camera configurations. Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre and including rich annotations for surgeon, instrument, and patient anatomy. Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments and analyze the influence of camera configurations, training data, and occlusions on the pose accuracy and generalization ability. The best method utilizes five cameras in a multi-view pose optimization and achieves an average position and orientation error of 1.01 mm and 0.89\deg for a surgical drill as well as 2.79 mm and 3.33\deg for a screwdriver under optimal conditions. Our results demonstrate that marker-less tracking of surgical instruments is becoming a feasible alternative to existing marker-based systems.	翻訳日:2023-12-27 22:57:31 公開日:2023-12-22
# CAMEL: デバイス上での効率的な学習のためのAIモデルと組み込みDRAMの共同設計 CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device Learning ( http://arxiv.org/abs/2305.03148v3 ) ライセンス: Link先を確認	Sai Qian Zhang, Thierry Tambe, Nestor Cuevas, Gu-Yeon Wei, David Brooks	(参考訳) オンデバイス学習は、aiモデルがユーザデータに適応できるようにし、エッジプラットフォームにおけるサービス品質を向上させる。しかし、リソース制限されたデバイスでのAIのトレーニングは、コンピューティングワークロードの要求と、ディープニューラルネットワーク(DNN)が必要とするメモリ消費とデータアクセスが大きな課題となっている。そこで本研究では,過渡訓練データの主要記憶媒体として組込み動的ランダムアクセスメモリ(edram)の利用を提案する。静的ランダムアクセスメモリ(SRAM)と比較して、eDRAMはより高いストレージ密度と低いリーク電力を提供し、アクセスコストと電力リークを低減させる。それでも、保存されたデータの整合性を維持するために、周期的なパワーハングリーリフレッシュ操作はシステム性能を低下させる可能性がある。高価なeDRAMリフレッシュ操作の発生を最小限に抑えるため、トレーニングプロセス中に保存されたデータの寿命を短縮することが有用である。これを実現するために、我々はアルゴリズムとハードウェアの共同設計の原則を採用し、トレーニングを通してデータ寿命とストレージコストを効果的に削減する可逆的なDNNアーキテクチャのファミリーを導入した。さらに,eDRAMをプライマリオンチップメモリとして活用した,高効率なオンデバイストレーニングエンジン「textit{CAMEL}」を提案する。このエンジンは、トレーニング精度を向上しつつ、メモリ使用量とチップ外DRAMトラフィックを大幅に削減したデバイス上での効率的なトレーニングを可能にする。我々は、異なるデータセットを持つ複数のDNN上でCAMELシステムを評価し、トレーニングプロセスの2.5\times$スピードアップと2.8\times$トレーニングエネルギセーブを他のベースラインハードウェアプラットフォームよりも実証した。 On-device learning allows AI models to adapt to user data, thereby enhancing service quality on edge platforms. However, training AI on resource-limited devices poses significant challenges due to the demanding computing workload and the substantial memory consumption and data access required by deep neural networks (DNNs). To address these issues, we propose utilizing embedded dynamic random-access memory (eDRAM) as the primary storage medium for transient training data. In comparison to static random-access memory (SRAM), eDRAM provides higher storage density and lower leakage power, resulting in reduced access cost and power leakage. Nevertheless, to maintain the integrity of the stored data, periodic power-hungry refresh operations could potentially degrade system performance. To minimize the occurrence of expensive eDRAM refresh operations, it is beneficial to shorten the lifetime of stored data during the training process. To achieve this, we adopt the principles of algorithm and hardware co-design, introducing a family of reversible DNN architectures that effectively decrease data lifetime and storage costs throughout training. Additionally, we present a highly efficient on-device training engine named \textit{CAMEL}, which leverages eDRAM as the primary on-chip memory. This engine enables efficient on-device training with significantly reduced memory usage and off-chip DRAM traffic while maintaining superior training accuracy. We evaluate our CAMEL system on multiple DNNs with different datasets, demonstrating a $2.5\times$ speedup of the training process and $2.8\times$ training energy savings than the other baseline hardware platforms.	翻訳日:2023-12-27 22:57:00 公開日:2023-12-22
# SPIRES(Structured prompt interrogation and Recursive extract of semantics: SPIRES):ゼロショット学習を用いた知識ベース獲得手法 Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning ( http://arxiv.org/abs/2304.02711v2 ) ライセンス: Link先を確認	J. Harry Caufield, Harshad Hegde, Vincent Emonet, Nomi L. Harris, Marcin P. Joachimiak, Nicolas Matentzoglu, HyeongSik Kim, Sierra A.T. Moxon, Justin T. Reese, Melissa A. Haendel, Peter N. Robinson, and Christopher J. Mungall	(参考訳) 知識ベースとオントロジーの作成は、手動のキュレーションに依存する時間のかかる作業である。 ai/nlpアプローチは、これらの知識ベースを投入する専門家キュレーターを支援するが、現在のアプローチは広範なトレーニングデータに依存しており、任意の複雑なネストされた知識スキーマを投入できない。本稿では,SPIRES(Structured Prompt Interrogation and Recursive extract of Semantics)を提案する。Large Language Models (LLMs) によるゼロショット学習(ZSL) と,フレキシブルプロンプトからの汎用クエリ応答を,特定のスキーマに準拠した情報から行うことによる知識抽出手法である。詳細なユーザ定義の知識スキーマと入力テキストが与えられた場合、SPIRESはGPT-3+に対して即時尋問を行い、提供されたスキーマと一致する応答の集合を得る。 SPIRESは既存のオントロジーと語彙を使って、一致するすべての要素の識別子を提供する。本稿では,食品レシピの抽出,多種の細胞シグナル伝達経路,疾患治療,多段階薬物機構,化学・疾患因果グラフなど,さまざまな領域におけるSPIRESの使用例を紹介する。現在のSPIRES精度は、既存のリレーショナル抽出(RE)メソッドの中間範囲に匹敵するが、簡単にカスタマイズでき、柔軟性があり、重要なことに、トレーニングデータがない場合に新しいタスクを実行する能力がある。本手法は,LLMの言語解釈機能を活用して知識ベースを組み立て,手作業による知識のキュレーションと取得を支援するとともに,LLM以外のデータベースやオントロジーによる検証を支援する一般的な戦略を支援する。 SPIRESはオープンソースのOntoGPTパッケージの一部として利用可能である。 Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.	翻訳日:2023-12-27 22:54:08 公開日:2023-12-22
# ポリタプレット損失を考慮した理解・論理推論タスクの深層マニフォールド学習 Deep Manifold Learning for Reading Comprehension and Logical Reasoning Tasks with Polytuplet Loss ( http://arxiv.org/abs/2304.01046v4 ) ライセンス: Link先を確認	Jeffrey Lu, Ivan Rodriguez	(参考訳) 理解と論理的推論タスクを読む機械学習モデルの開発における現在のトレンドは、論理的ルールを理解し活用するモデルの能力を改善することに焦点を当てている。本研究は、人間が理解や論理的推論タスクを与えられたときに使用する共通の戦略を表現することにより、他のモデルよりも解釈可能なコンポーネントを持つ、新しい損失関数と付随するモデルアーキテクチャを提供することに焦点を当てている。我々の戦略は、絶対的精度よりも相対的精度を強調し、理論的には不完全な知識で正しい答えを生成できる。本稿では,この戦略の有効性を考察し,読解の理解と論理的推論の問題を解き明かす。モデルは、難読性理解と論理的推論ベンチマークであるreclorデータセットで評価された。本稿では,各選択の真の精度を学習するよりも,回答選択の相対的正しさを優先的に学習するポリタップレット損失関数を提案する。以上の結果から,ポリtuplet損失モデルが既存のベースラインモデルよりも優れていることが示唆されたが,その効果を定量化するためにはさらなる研究が必要である。 The current trend in developing machine learning models for reading comprehension and logical reasoning tasks is focused on improving the models' abilities to understand and utilize logical rules. This work focuses on providing a novel loss function and accompanying model architecture that has more interpretable components than some other models by representing a common strategy employed by humans when given reading comprehension and logical reasoning tasks. Our strategy involves emphasizing relative accuracy over absolute accuracy and can theoretically produce the correct answer with incomplete knowledge. We examine the effectiveness of this strategy to solve reading comprehension and logical reasoning questions. The models were evaluated on the ReClor dataset, a challenging reading comprehension and logical reasoning benchmark. We propose the polytuplet loss function, which forces prioritization of learning the relative correctness of answer choices over learning the true accuracy of each choice. Our results indicate that models employing polytuplet loss outperform existing baseline models, though further research is required to quantify the benefits it may present.	翻訳日:2023-12-27 22:53:24 公開日:2023-12-22
# ソーシャルメディアにおけるエンゲージメント,ユーザ満足度,分断コンテンツの増幅 Engagement, User Satisfaction, and the Amplification of Divisive Content on Social Media ( http://arxiv.org/abs/2305.16941v5 ) ライセンス: Link先を確認	Smitha Milli, Micah Carroll, Yike Wang, Sashrika Pandey, Sebastian Zhao, Anca D. Dragan	(参考訳) 事前登録されたランダム化実験で、twitterのエンゲージメントベースのランキングアルゴリズムは、感情的にチャージされ、グループ外で敵対的なコンテンツを増幅し、ユーザーが自分の政治的アウトグループについてより悪くなると感じていることがわかった。さらに,ユーザが選択した政治的つぶやきを好まないことを見出し,エンゲージメントに基づくアルゴリズムがユーザの好みを満たさないことを示唆する。最後に,ユーザの指定した嗜好に基づいてコンテンツのランク付けを行い,怒りやパルチザン,グループ外の敵対的コンテンツの削減に加えて,エコーチェンバーの強化の可能性も探究する。この証拠は、エンゲージメント、ユーザの選好、社会政治的な結果のバランスをとる、より微妙なコンテンツランキングアプローチの必要性を強調している。 In a pre-registered randomized experiment, we found that, relative to a reverse-chronological baseline, Twitter's engagement-based ranking algorithm amplifies emotionally charged, out-group hostile content that users say makes them feel worse about their political out-group. Furthermore, we find that users do not prefer the political tweets selected by the algorithm, suggesting that the engagement-based algorithm underperforms in satisfying users' stated preferences. Finally, we explore the implications of an alternative approach that ranks content based on users' stated preferences and find a reduction in angry, partisan, and out-group hostile content but also a potential reinforcement of echo chambers. The evidence underscores the necessity for a more nuanced approach to content ranking that balances engagement, users' stated preferences, and sociopolitical outcomes.	翻訳日:2023-12-27 22:43:08 公開日:2023-12-22
# DeltaNN:画像認識モデルの性能に及ぼす計算環境パラメータの影響の評価 DeltaNN: Assessing the Impact of Computational Environment Parameters on the Performance of Image Recognition Models ( http://arxiv.org/abs/2306.06208v4 ) ライセンス: Link先を確認	Nikolaos Louloudakis, Perry Gibson, Jos\'e Cano, and Ajitha Rajan	(参考訳) 画像認識タスクは一般的にディープラーニングを使用し、膨大な処理能力を必要とするため、高速でタイムリーな処理にはGPUやTPUなどのハードウェアアクセラレータに依存する。リアルタイム画像認識タスクの失敗は、モデル展開中にハードウェアアクセラレーターのサブ最適マッピングによって起こり、タイミングの不確実性と誤動作を引き起こす可能性がある。ハードウェアアクセラレータのマッピングは、ディープラーニングフレームワークやコンパイラ、デバイスライブラリなど、複数のソフトウェアコンポーネントを使用して行われます。自律運転や医用画像などの安全クリティカルなアプリケーションにおける画像認識タスクの利用の増加により、ディープラーニングフレームワークやコンパイラ最適化、ハードウェアデバイスなどのパラメータがモデル性能や正確性に与える影響が十分に理解されていないため、計算環境の変化に対する彼らの堅牢性を評価することが不可欠である。本稿では,差分テストフレームワーク DeltaNN を提案する。これによって,異なる計算環境パラメータが,展開中の画像認識モデルの性能,ポストトレーニングに与える影響を評価することができる。 DeltaNNは、ディープラーニングフレームワーク、コンパイラ最適化、ハードウェアデバイスなど、環境パラメータの変化に対する所定の画像認識モデルの異なる実装を生成し、結果としてモデルパフォーマンスの違いを分析する。 deltannを用いて,imagenetデータセットを用いた3つの人気のある画像認識モデルのロバスト性解析を行う。異なる設定における誤分類や推論時間の違いによる影響を報告する。合計で、ディープラーニングフレームワーク全体で最大72%のアウトプットラベルの差異を観測し、コンパイラの最適化を適用する場合、推論時間に関して予想外のパフォーマンス低下を最大81%観察した。 Image recognition tasks typically use deep learning and require enormous processing power, thus relying on hardware accelerators like GPUs and TPUs for fast, timely processing. Failure in real-time image recognition tasks can occur due to sub-optimal mapping on hardware accelerators during model deployment, which may lead to timing uncertainty and erroneous behavior. Mapping on hardware accelerators is done using multiple software components like deep learning frameworks, compilers, and device libraries, that we refer to as the computational environment. Owing to the increased use of image recognition tasks in safety-critical applications like autonomous driving and medical imaging, it is imperative to assess their robustness to changes in the computational environment, as the impact of parameters like deep learning frameworks, compiler optimizations, and hardware devices on model performance and correctness is not yet well understood. In this paper we present a differential testing framework, DeltaNN, that allows us to assess the impact of different computational environment parameters on the performance of image recognition models during deployment, post training. DeltaNN generates different implementations of a given image recognition model for variations in environment parameters, namely, deep learning frameworks, compiler optimizations and hardware devices and analyzes differences in model performance as a result. Using DeltaNN, we conduct an empirical study of robustness analysis of three popular image recognition models using the ImageNet dataset. We report the impact in terms of misclassifications and inference time differences across different settings. In total, we observed up to 72% output label differences across deep learning frameworks, and up to 81% unexpected performance degradation in terms of inference time, when applying compiler optimizations.	翻訳日:2023-12-27 22:31:39 公開日:2023-12-22
# 画像認識におけるBuggy Deep Learning Framework変換のためのフォールトローカライゼーション Fault Localization for Buggy Deep Learning Framework Conversions in Image Recognition ( http://arxiv.org/abs/2306.06157v4 ) ライセンス: Link先を確認	Nikolaos Louloudakis, Perry Gibson, Jos\'e Cano, and Ajitha Rajan	(参考訳) ディープニューラルネットワーク(dnn)をデプロイする場合、開発者はモデルをディープラーニングフレームワークから別のもの(tensorflowからpytorchなど)に変換することが多い。しかし、このプロセスはエラーを起こしやすく、ターゲットモデルの精度に影響を及ぼす可能性がある。画像認識に広く用いられている3つのDNN(MobileNetV2,ResNet101,InceptionV3)に対して,その影響の程度を明らかにするために,よく知られた4つのディープラーニングフレームワーク(PyTorch,Keras,TensorFlow(TF),TFLite)に変換された差分解析を行い,最大72%のモデルクラッシュと出力ラベルの差異を明らかにした。このような誤りを軽減するため,本研究では,事前学習された画像認識モデルに着目した,バギー深層学習フレームワーク変換のフォールトローカライズと修復への新しいアプローチを提案する。我々の手法は4段階の分析から成り立っている。 1)変換ツール、 2)モデルパラメータ。 3)モデルハイパーパラメータ、及び 4)グラフ表現。さらに,検出された障害の障害修復に関する様々な戦略を提案する。我々は,Apache TVMディープラーニングコンパイラ上で,InceptionV3のTFからTFLiteへの変換のための予備的なフォールトローカライズ解析を行うことにより,本手法を実装した。提案手法は,重みの精度誤差を導入し,モデルの精度を低下させる共通DNNコンバータツールの欠陥を検出する。障害ローカライズ後、私たちは問題を修復し、コンバージョンエラーをゼロにしました。 When deploying Deep Neural Networks (DNNs), developers often convert models from one deep learning framework to another (e.g., TensorFlow to PyTorch). However, this process is error-prone and can impact target model accuracy. To identify the extent of such impact, we perform and briefly present a differential analysis against three DNNs widely used for image recognition (MobileNetV2, ResNet101, and InceptionV3) converted across four well-known deep learning frameworks (PyTorch, Keras, TensorFlow (TF), and TFLite), which revealed numerous model crashes and output label discrepancies of up to 72%. To mitigate such errors, we present a novel approach towards fault localization and repair of buggy deep learning framework conversions, focusing on pre-trained image recognition models. Our technique consists of four stages of analysis: 1) conversion tools, 2) model parameters, 3) model hyperparameters, and 4) graph representation. In addition, we propose various strategies towards fault repair of the faults detected. We implement our technique on top of the Apache TVM deep learning compiler, and we test it by conducting a preliminary fault localization analysis for the conversion of InceptionV3 from TF to TFLite. Our approach detected a fault in a common DNN converter tool, which introduced precision errors in weights, reducing model accuracy. After our fault localization, we repaired the issue, reducing our conversion error to zero.	翻訳日:2023-12-27 22:31:11 公開日:2023-12-22
# transformerg2g:transformerを用いた時間グラフ埋め込み学習のための適応時間ステップ TransformerG2G: Adaptive time-stepping for learning temporal graph embeddings using transformers ( http://arxiv.org/abs/2307.02588v2 ) ライセンス: Link先を確認	Alan John Varghese, Aniruddha Bora, Mengjia Xu, George Em Karniadakis	(参考訳) 動的グラフ埋め込みは、様々なアプリケーションにおける多様な時間グラフ解析タスク(リンク予測、ノード分類、レコメンダシステム、異常検出、グラフ生成など)に対処するための非常に効果的な手法として登場した。このような時間グラフは異質な過渡的ダイナミクス、時間間隔の変化、その進化を通して高度に進化するノードの特徴を示す。したがって、歴史的グラフコンテキストからの長距離依存関係を組み込むことは、時間的ダイナミクスを正確に学習する上で重要な役割を果たす。本稿では,不確かさを定量化したグラフ埋め込みモデルtransformerg2gを開発した。これは,先進的なトランスフォーマーエンコーダを利用して,現在の状態 (t$) と以前の状況 (タイムスタンプ [$t-1, t-l$], $l$ is the length of context) から中間ノード表現を学習する。さらに、2つの射影層を用いて低次元多変量ガウス分布を生成し、各ノードの潜伏埋め込みをtimetamp$t$で行う。我々は,TAA(Temporal Edge Outearance)プロットによって測定された,‘novelty’のレベルが異なる多様なベンチマークを検討する。提案したTransformerG2Gモデルは, リンク予測精度と計算効率の両面から, 従来の多段階法と先行研究(DynG2G)より優れていることを示す。さらに、複数のグラフスナップショットにまたがる学習時間依存の注意重みは、変換器によって実現された自動適応時間ステップの開発を明らかにする。注意重みを調べることで、時間的依存関係を解明し、影響力のある要素を特定し、グラフ構造内の複雑な相互作用についての洞察を得ることができる。例えば,グラフトポロジー進化の様々な段階において,注意重みとノード次数との間に強い相関関係を見出した。 Dynamic graph embedding has emerged as a very effective technique for addressing diverse temporal graph analytic tasks (i.e., link prediction, node classification, recommender systems, anomaly detection, and graph generation) in various applications. Such temporal graphs exhibit heterogeneous transient dynamics, varying time intervals, and highly evolving node features throughout their evolution. Hence, incorporating long-range dependencies from the historical graph context plays a crucial role in accurately learning their temporal dynamics. In this paper, we develop a graph embedding model with uncertainty quantification, TransformerG2G, by exploiting the advanced transformer encoder to first learn intermediate node representations from its current state ($t$) and previous context (over timestamps [$t-1, t-l$], $l$ is the length of context). Moreover, we employ two projection layers to generate lower-dimensional multivariate Gaussian distributions as each node's latent embedding at timestamp $t$. We consider diverse benchmarks with varying levels of ``novelty" as measured by the TEA (Temporal Edge Appearance) plots. Our experiments demonstrate that the proposed TransformerG2G model outperforms conventional multi-step methods and our prior work (DynG2G) in terms of both link prediction accuracy and computational efficiency, especially for high degree of novelty. Furthermore, the learned time-dependent attention weights across multiple graph snapshots reveal the development of an automatic adaptive time stepping enabled by the transformer. Importantly, by examining the attention weights, we can uncover temporal dependencies, identify influential elements, and gain insights into the complex interactions within the graph structure. For example, we identified a strong correlation between attention weights and node degree at the various stages of the graph topology evolution.	翻訳日:2023-12-27 22:19:15 公開日:2023-12-22
# ピック・プレイスにおける対称性の活用 Leveraging Symmetries in Pick and Place ( http://arxiv.org/abs/2308.07948v2 ) ライセンス: Link先を確認	Haojie Huang, Dian Wang, Arsh Tangri, Robin Walters, Robert Platt	(参考訳) ロボットピックと配置タスクは、選択対象と所望の場所ポーズの両方の翻訳と回転の下で対称である。例えば、ピックオブジェクトが回転または変換された場合、最適なピックアクションも回転または変換されるべきである。同じことが、場所のポーズにも当てはまります。所望の場所のポーズが変わった場合、所望の場所のアクションもそれに応じて変化するべきです。 transporter netとして知られる最近提案されたpick and placeフレームワークは、これらの対称性の一部をキャプチャするが、すべてではない。本稿では,平面式ロボットピック・アンド・プレイスに存在する対称性を解析的に研究し,すべての対称性を捉える方法でトランスポーターネットに同変ニューラルモデルを組み込む方法を提案する。 Equivariant Transporter Net と呼ばれる新しいモデルは、ピック・アンド・プレイス・対称性に同値であり、ピック・アンド・プレイス・ポーズに即座に知識を一般化することができる。実験結果から,非対称型モデルよりもサンプル効率が良好であることを示し,様々な模倣学習タスクにおいて,人間によるごく少数のデモンストレーションを用いて,実演されたピック・アンド・プレース動作を模倣できるシステムを開発した。 Robotic pick and place tasks are symmetric under translations and rotations of both the object to be picked and the desired place pose. For example, if the pick object is rotated or translated, then the optimal pick action should also rotate or translate. The same is true for the place pose; if the desired place pose changes, then the place action should also transform accordingly. A recently proposed pick and place framework known as Transporter Net captures some of these symmetries, but not all. This paper analytically studies the symmetries present in planar robotic pick and place and proposes a method of incorporating equivariant neural models into Transporter Net in a way that captures all symmetries. The new model, which we call Equivariant Transporter Net, is equivariant to both pick and place symmetries and can immediately generalize pick and place knowledge to different pick and place poses. We evaluate the new model empirically and show that it is much more sample efficient than the non-symmetric version, resulting in a system that can imitate demonstrated pick and place behavior using very few human demonstrations on a variety of imitation learning tasks.	翻訳日:2023-12-27 22:07:38 公開日:2023-12-22
# 計画による計画のための反復的オプション発見 Iterative Option Discovery for Planning, by Planning ( http://arxiv.org/abs/2310.01569v2 ) ライセンス: Link先を確認	Kenny Young, Richard S. Sutton	(参考訳) オプションという形で有用な時間的抽象化を見つけることは、ますます複雑なドメインに強化学習と計画を適用する上で鍵となると広く考えられている。 alphazeroで使用されるポリシ学習に対するエキスパートイテレーションアプローチの実証的成功に基づいて,オプション発見の類似的なアプローチであるoption iterationを提案する。任意の場所で検索結果にマッチするように訓練された単一の強力なポリシーを学ぶのではなく、オプションイテレーションは、各状態が遭遇するたびに、セット内の少なくとも1つのポリシーが、将来に向けて検索結果にマッチするように訓練された一連のオプションポリシーを学ぶ。直感的には、現在の状態の詳細に複雑な依存関係を持つ単一のグローバルな強いポリシーを学ぶよりも、アルゴリズムが賭けをヘッジできるため、これはかなり簡単かもしれない。このようなローカルな強力なポリシーの集合を学習することで、より優れた選択肢がより良い検索結果に結びつき、より良い選択肢のトレーニングを可能にする、希少なサイクルをもたらす検索アルゴリズムをガイドすることができる。実験により,オプションイテレーションで学習したオプションを用いたプランニングは,プリミティブアクションの空間で動作する類似の計画アルゴリズムと,エキスパートイテレーションによる単一ロールアウトポリシーの学習と比較して,計画環境に挑戦する上で大きなメリットをもたらすことが示された。 Discovering useful temporal abstractions, in the form of options, is widely thought to be key to applying reinforcement learning and planning to increasingly complex domains. Building on the empirical success of the Expert Iteration approach to policy learning used in AlphaZero, we propose Option Iteration, an analogous approach to option discovery. Rather than learning a single strong policy that is trained to match the search results everywhere, Option Iteration learns a set of option policies trained such that for each state encountered, at least one policy in the set matches the search results for some horizon into the future. Intuitively, this may be significantly easier as it allows the algorithm to hedge its bets compared to learning a single globally strong policy, which may have complex dependencies on the details of the current state. Having learned such a set of locally strong policies, we can use them to guide the search algorithm resulting in a virtuous cycle where better options lead to better search results which allows for training of better options. We demonstrate experimentally that planning using options learned with Option Iteration leads to a significant benefit in challenging planning environments compared to an analogous planning algorithm operating in the space of primitive actions and learning a single rollout policy with Expert Iteration.	翻訳日:2023-12-27 21:58:09 公開日:2023-12-22
# See-Through Visuotactile Sensorを用いたマルチモーダルおよびフォースマッチ型模倣学習 Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor ( http://arxiv.org/abs/2311.01248v2 ) ライセンス: Link先を確認	Trevor Ablett, Oliver Limoyo, Adam Sigal, Affan Jilani, Jonathan Kelly, Kaleem Siddiqi, Francois Hogan, Gregory Dudek	(参考訳) Kinesthetic Teachingは、模倣学習(IL)のための接触豊富なタスクの専門的なロボットデモを集めるための一般的なアプローチであるが、通常、ロボットによって環境に置かれる力を無視して、動きを計測するだけである。さらに、接触に富んだタスクは、接触と接触の両方を正確に検知する必要があるため、従来の感覚モダリティの提供は困難である。両センサを用いたSee-Through-Your-Skin (STS) Visuotactile Sensorを用いてこれらの課題に対処する。 (i)審美的指導を改善するための測定ツール、及び (ii)接触式ドア操作タスクにおけるポリシー入力として。 stsセンサは、半透明な表面と制御可能な照明を利用して、視覚モードと触覚モードを切り替えることができ、単一のセンサで、接触前の視覚センシングと接触時の触覚センシングの両方を可能にする。まず,触覚信号を用いた審美的指導の際,ロボットが読み取る力とマッチングできる触覚力マッチング手法を提案する。第2に、STSモードスイッチングを制御するポリシーを開発し、STSを視覚から触覚モードに切り替えるための適切なタイミングを学習できるようにする。最後に,手首装着眼球カメラの視覚データとSTSの視覚的・触覚的データの価値を比較し比較するため,複数の観察構成について検討した。実世界の実験実験から3000回以上のテストエピソードが得られた結果、力のマッチングは平均的な政策成功率を62.5%、STSモードの切り替えは30.3%、STSデータは42.5%向上することが判明した。この結果から, IL のルックスルー触覚センシング, 力のマッチングを可能にするデータ収集, 正確なタスクフィードバックを可能にするポリシー実行の両面での有用性を強調した。 Kinesthetic Teaching is a popular approach to collecting expert robotic demonstrations of contact-rich tasks for imitation learning (IL), but it typically only measures motion, ignoring the force placed on the environment by the robot. Furthermore, contact-rich tasks require accurate sensing of both reaching and touching, which can be difficult to provide with conventional sensing modalities. We address these challenges with a See-Through-your-Skin (STS) visuotactile sensor, using the sensor both (i) as a measurement tool to improve kinesthetic teaching, and (ii) as a policy input in contact-rich door manipulation tasks. An STS sensor can be switched between visual and tactile modes by leveraging a semi-transparent surface and controllable lighting, allowing for both pre-contact visual sensing and during-contact tactile sensing with a single sensor. First, we propose tactile force matching, a methodology that enables a robot to match forces read during kinesthetic teaching using tactile signals. Second, we develop a policy that controls STS mode switching, allowing a policy to learn the appropriate moment to switch an STS from its visual to its tactile mode. Finally, we study multiple observation configurations to compare and contrast the value of visual and tactile data from an STS with visual data from a wrist-mounted eye-in-hand camera. With over 3,000 test episodes from real-world manipulation experiments, we find that the inclusion of force matching raises average policy success rates by 62.5%, STS mode switching by 30.3%, and STS data as a policy input by 42.5%. Our results highlight the utility of see-through tactile sensing for IL, both for data collection to allow force matching, and for policy execution to allow accurate task feedback.	翻訳日:2023-12-27 21:30:05 公開日:2023-12-22
# インテリジェントな製造アプリケーションのための大規模基盤モデル:調査 Large Scale Foundation Models for Intelligent Manufacturing Applications: A Survey ( http://arxiv.org/abs/2312.06718v3 ) ライセンス: Link先を確認	Haotian Zhang, Semujju Stuart Dereck, Zhicheng Wang, Xianwei Lv, Kang Xu, Liang Wu, Ye Jia, Jing Wu, Zhuo Long, Wensheng Liang, X.G. Ma, and Ruiyan Zhuang	(参考訳) 人工知能の応用、特に深層学習は知的製造の様々な側面を大幅に改善したが、一般化能力の貧弱さ、高品質なトレーニングデータセットの確立の困難、ディープラーニング手法の不満足な性能など、幅広い雇用の課題に直面した。大規模な基礎モデル(LSFM)の出現は、人工知能の分野で波を巻き起こし、ディープラーニングモデルをシングルタスク、シングルモーダル、限定データパターンから、多様なタスクを含むパラダイム、マルチモーダル、大規模データセットの事前トレーニングへとシフトさせた。 LSFMは、強力な一般化能力、自動高品質のトレーニングデータセット生成、様々な領域での優れた性能を示したが、LSFMの知能製造への応用はまだ初期段階にあった。このトピックの体系的な概要は欠如しており、特に深層学習の課題がLSFMによってどのように対処され、これらの課題が体系的に取り組まれるかについてである。このギャップを埋めるため,本稿では,現在のlsfm像とその知的製造における利点を体系的に提示した。そして、さまざまなインテリジェントな製造アプリケーションにおいて、現在のディープラーニングモデルが直面する課題と包括的に比較する。 LSFMを利用してこれらの課題に対処するためのロードマップも概説した。最後に、LSFMを実世界のインテリジェントな製造シナリオに適用する事例研究を行い、LSFMが産業にどのように貢献し、その効率を向上するかを示した。 Although the applications of artificial intelligence especially deep learning had greatly improved various aspects of intelligent manufacturing, they still face challenges for wide employment due to the poor generalization ability, difficulties to establish high-quality training datasets, and unsatisfactory performance of deep learning methods. The emergence of large scale foundational models(LSFMs) had triggered a wave in the field of artificial intelligence, shifting deep learning models from single-task, single-modal, limited data patterns to a paradigm encompassing diverse tasks, multimodal, and pre-training on massive datasets. Although LSFMs had demonstrated powerful generalization capabilities, automatic high-quality training dataset generation and superior performance across various domains, applications of LSFMs on intelligent manufacturing were still in their nascent stage. A systematic overview of this topic was lacking, especially regarding which challenges of deep learning can be addressed by LSFMs and how these challenges can be systematically tackled. To fill this gap, this paper systematically expounded current statue of LSFMs and their advantages in the context of intelligent manufacturing. and compared comprehensively with the challenges faced by current deep learning models in various intelligent manufacturing applications. We also outlined the roadmaps for utilizing LSFMs to address these challenges. Finally, case studies of applications of LSFMs in real-world intelligent manufacturing scenarios were presented to illustrate how LSFMs could help industries, improve their efficiency.	翻訳日:2023-12-27 21:11:28 公開日:2023-12-22
# エージェント注意:ソフトマックスと線形注意の統合について Agent Attention: On the Integration of Softmax and Linear Attention ( http://arxiv.org/abs/2312.08874v2 ) ライセンス: Link先を確認	Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Shiji Song, Gao Huang	(参考訳) attentionモジュールはTransformersの重要なコンポーネントである。グローバルアテンションメカニズムは高い表現性を提供するが、その過剰な計算コストは様々なシナリオで適用性を制限する。本稿では,計算効率と表現力のバランスをとるために,新しい注意パラダイムであるエージェント注意(Agent Attention)を提案する。具体的には、エージェントアテンションは4倍の$(Q, A, K, V)$と表現され、従来のアテンションモジュールに追加のエージェントトークンセット$A$を導入する。エージェントトークンは最初、クエリトークンのエージェントとして機能し、$k$と$v$から情報を集約し、その後、情報を$q$にブロードキャストする。エージェントトークンの数をクエリトークンの数よりもはるかに小さく設計できるため、グローバルコンテキストモデリング能力を維持しつつ、広く採用されているsoftmaxの注意よりもエージェントの注意ははるかに効率的である。興味深いことに,提案するエージェントアテンションは線形アテンションの一般化形式と等価である。したがって,エージェント・アテンションはソフトマックス・アテンションと高効率線形アテンションをシームレスに統合する。広範な実験により、様々な視覚トランスフォーマーや、画像分類、物体検出、意味セグメンテーション、画像生成など、様々な視覚タスクにおけるエージェントの注意の有効性が実証された。特に、エージェントの注意は高解像度シナリオにおいて顕著な性能を示しており、その線形の注意の性質に依拠している。例えば、安定拡散に適用した場合、エージェントアテンションは生成を加速し、追加のトレーニングなしで画像生成品質を大幅に向上させる。コードはhttps://github.com/LeapLabTHU/Agent-Attentionで入手できる。 The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation power. Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$, introduces an additional set of agent tokens $A$ into the conventional attention module. The agent tokens first act as the agent for the query tokens $Q$ to aggregate information from $K$ and $V$, and then broadcast the information back to $Q$. Given the number of agent tokens can be designed to be much smaller than the number of query tokens, the agent attention is significantly more efficient than the widely adopted Softmax attention, while preserving global context modelling capability. Interestingly, we show that the proposed agent attention is equivalent to a generalized form of linear attention. Therefore, agent attention seamlessly integrates the powerful Softmax attention and the highly efficient linear attention. Extensive experiments demonstrate the effectiveness of agent attention with various vision Transformers and across diverse vision tasks, including image classification, object detection, semantic segmentation and image generation. Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its linear attention nature. For instance, when applied to Stable Diffusion, our agent attention accelerates generation and substantially enhances image generation quality without any additional training. Code is available at https://github.com/LeapLabTHU/Agent-Attention.	翻訳日:2023-12-27 20:58:14 公開日:2023-12-22
# ICD-LM:言語モデリングによる視覚言語インテクスト記述の構成 ICD-LM: Configuring Vision-Language In-Context Demonstrations by Language Modeling ( http://arxiv.org/abs/2312.10104v2 ) ライセンス: Link先を確認	Yingzhe Peng, Xu Yang, Haoxuan Ma, Shuo Xu, Chi Zhang, Yucheng Han, Hanwang Zhang	(参考訳) 本稿では,LVLM(Large Vision-Language Model)のための強力なIn-Context Demonstration (ICD) シーケンスをどのように構成し,In-Context Learning (ICL) による視覚-Languageタスクを解決するかを検討する。 icdシーケンスの構成は、文を構成するミラープロセスである、すなわち、言語モデルを介して文を単語単位で構成できるように観察した後、icdシーケンスを1つずつ構成することもできる。その結果、有効なICDシーケンスを生成するために設計されたICD言語モデル(ICD-LM)を導入する。これには、さまざまなクエリサンプルのために手作りのICDシーケンスのデータセットを作成し、それをICD-LMのトレーニングに使用することが含まれる。提案手法は,ICDを別々に選択・注文する従来の方法と異なり,同時にICDを選択・注文する方法を学習し,シーケンスの効果を高める。さらに、データ構築中に、ICL実装を意図したLVLMを使用して、各ICDシーケンスの強度を検証することにより、モデル固有のデータセットと、このデータセットによってトレーニングされたICD-LMもモデル固有である。 ICD設定のための言語モデルを用いて,視覚的質問応答と画像キャプションの実験により,我々の方法論を検証した。本研究は,各種データセット構築およびICD-LM開発環境が結果に及ぼす影響について検討する。コードはhttps://github.com/ForJadeForest/ICD-LMで公開されている。 This paper studies how to configure powerful In-Context Demonstration (ICD) sequences for a Large Vision-Language Model (LVLM) to solve Vision-Language tasks through In-Context Learning (ICL). After observing that configuring an ICD sequence is a mirror process of composing a sentence, i.e., just as a sentence can be composed word by word via a Language Model, an ICD sequence can also be configured one by one. Consequently, we introduce an ICD Language Model (ICD-LM) specifically designed to generate effective ICD sequences. This involves creating a dataset of hand-crafted ICD sequences for various query samples and using it to train the ICD-LM. Our approach, diverging from traditional methods in NLP that select and order ICDs separately, enables to simultaneously learn how to select and order ICDs, enhancing the effect of the sequences. Moreover, during data construction, we use the LVLM intended for ICL implementation to validate the strength of each ICD sequence, resulting in a model-specific dataset and the ICD-LM trained by this dataset is also model-specific. We validate our methodology through experiments in Visual Question Answering and Image Captioning, confirming the viability of using a Language Model for ICD configuration. Our comprehensive ablation studies further explore the impact of various dataset construction and ICD-LM development settings on the outcomes. The code is given in https://github.com/ForJadeForest/ICD-LM.	翻訳日:2023-12-27 20:44:28 公開日:2023-12-22
# 教師付き自己組み立て型インコンテキスト学習によるタスク性能とモデル校正について On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning ( http://arxiv.org/abs/2312.13772v2 ) ライセンス: Link先を確認	Chengzu Li, Han Zhou, Goran Glava\v{s}, Anna Korhonen, Ivan Vuli\'c	(参考訳) 標準教師付き微調整(SFT)パラダイムに従って、インコンテキスト学習(ICL)は、最近の大規模言語モデル(LLM)の進歩によって推進される効率的なアプローチとなり、数発のデータセットで様々なタスクにわたって有望なパフォーマンスが得られる。しかし、両方のパラダイムは、特にそのような限られたデータ設定において、過信(すなわち誤校正)の致命的な問題に悩まされがちである。本研究では,学習方法の異なる選択に対して,パフォーマンスとキャリブレーションと相互作用の両方の観点から,行動の詳細な分析を行う。広範に制御された実験により,タスク性能とキャリブレーションの同時獲得は困難であり,低リソースシナリオにおけるすべての学習手法に誤校正の問題が存在することがわかった。この性能とキャリブレーションの難しいトレードオフに対処するために、異なるモデリング段階(例えば、インコンテキストの例のバリエーションやプロンプトのバリエーション、異なるアンサンブル戦略など)で適用される自己認識技術の可能性を検討する。 ICLに加えて、SFT上での自己理解の可能性も正当化し、予測を校正し、比較や性能の向上を図る。我々の研究は、選択する学習パラダイムと、タスクパフォーマンスとllmのキャリブレーションの両方を強化する方法に光を当てている。 Following the standard supervised fine-tuning (SFT) paradigm, in-context learning (ICL) has become an efficient approach propelled by the recent advancements in large language models (LLMs), yielding promising performance across various tasks in few-shot data setups. However, both paradigms are prone to suffer from the critical problem of overconfidence (i.e., miscalibration), especially in such limited data setups. In this work, we deliver an in-depth analysis of the behavior across different choices of learning methods from the perspective of both performance and calibration, as well as their interplay. Through extensive controlled experiments, we find that simultaneous gains for both task performance and calibration are difficult to achieve, and the problem of miscalibration exists across all learning methods in low-resource scenarios. To address this challenging trade-off between performance and calibration, we then investigate the potential of self-ensembling techniques applied at different modeling stages (e.g., variations of in-context examples or variations in prompts or different ensembling strategies). We justify the feasibility of self-ensembling on SFT in addition to ICL, to make the predictions more calibrated and have comparable or even better performance. Our work sheds light on which learning paradigm to choose and how to enhance both task performance and calibration of LLMs.	翻訳日:2023-12-27 20:36:46 公開日:2023-12-22
# RGB-only NeRF-SLAMのための3次元型オパシティとハイブリッドオドメトリー Ternary-type Opacity and Hybrid Odometry for RGB-only NeRF-SLAM ( http://arxiv.org/abs/2312.13332v2 ) ライセンス: Link先を確認	Junru Lin, Asen Nachkov, Songyou Peng, Luc Van Gool, Danda Pani Paudel	(参考訳) 不透明な表面を持つ立体的な3dシーンの不透明性はバイナリタイプであると考えられている。しかし,この特性は既存のRGBのみのNeRF-SLAMに従わないことがわかった。そのため,RGBのみのNeRF-SLAMパイプラインに導入する動機がある。残念なことに、ボリュームトリップレンダリング機能による最適化は、望ましい事前の統合を容易化しない。その代わり, 3次型 (TT) の不透明度は良好に支持されている。本研究では,三元型不透明性が手作業に適している理由について検討する。特に、ボリュームレンダリングプロセスを通じて放射率と不透明度を共同最適化する過程に関する理論的知見を提供する。ベンチマークデータセットに関する徹底的な実験を通じて、我々の主張を検証し、最適化プロセスに関する洞察を提供する。そこで本研究では,ボリュームとワーピングを併用した画像レンダリングを併用した,シンプルながら斬新なビジュアルオドメトリー手法を提案する。より具体的には、提案されたハイブリッドオドメトリ(ho)は、イメージウォーピングベースの粗オドメトリも使用し、最終的なスピードアップを桁違いに導く。さらに,提案するttとhoが相互に補完し,速度と精度の両面でベンチマークデータセットに最先端の結果を提供することを示した。 The opacity of rigid 3D scenes with opaque surfaces is considered to be of a binary type. However, we observed that this property is not followed by the existing RGB-only NeRF-SLAM. Therefore, we are motivated to introduce this prior into the RGB-only NeRF-SLAM pipeline. Unfortunately, the optimization through the volumetric rendering function does not facilitate easy integration of the desired prior. Instead, we observed that the opacity of ternary-type (TT) is well supported. In this work, we study why ternary-type opacity is well-suited and desired for the task at hand. In particular, we provide theoretical insights into the process of jointly optimizing radiance and opacity through the volumetric rendering process. Through exhaustive experiments on benchmark datasets, we validate our claim and provide insights into the optimization process, which we believe will unleash the potential of RGB-only NeRF-SLAM. To foster this line of research, we also propose a simple yet novel visual odometry scheme that uses a hybrid combination of volumetric and warping-based image renderings. More specifically, the proposed hybrid odometry (HO) additionally uses image warping-based coarse odometry, leading up to an order of magnitude final speed-up. Furthermore, we show that the proposed TT and HO well complement each other, offering state-of-the-art results on benchmark datasets in terms of both speed and accuracy.	翻訳日:2023-12-27 20:35:11 公開日:2023-12-22
# 説明可能性保証付きアンサンブルの学習性能最大化 Learning Performance Maximizing Ensembles with Explainability Guarantees ( http://arxiv.org/abs/2312.12715v2 ) ライセンス: Link先を確認	Vincent Pisztora, Jia Li	(参考訳) 本稿では,本質的な説明可能なガラス箱モデルとブラックボックスモデルとの観測を最適に割り当てる手法を提案する。任意の説明可能性レベル(すなわち、説明可能なモデルが予測関数である観察の割合)に対して最適な割り当てが定義され、基礎となるタスク上でのアンサンブルの性能を最大化し、最大アンサンブル性能条件の下で割り当てられた観測に対する説明可能なモデルの性能を最大化する。提案手法は,様々な説明可能およびブラックボックスモデルタイプにわたる表型データセットのベンチマークスイート上で,説明可能性の最適割当を生成する。これらの学習された割り当ては、非常に高い説明可能性レベルでアンサンブルのパフォーマンスを一貫して維持することが判明し(平均で74\%の観察値を示す)、説明可能性を改善しながら、コンポーネント説明可能モデルとブラックボックスモデルの両方を上回ることさえある。 In this paper we propose a method for the optimal allocation of observations between an intrinsically explainable glass box model and a black box model. An optimal allocation being defined as one which, for any given explainability level (i.e. the proportion of observations for which the explainable model is the prediction function), maximizes the performance of the ensemble on the underlying task, and maximizes performance of the explainable model on the observations allocated to it, subject to the maximal ensemble performance condition. The proposed method is shown to produce such explainability optimal allocations on a benchmark suite of tabular datasets across a variety of explainable and black box model types. These learned allocations are found to consistently maintain ensemble performance at very high explainability levels (explaining $74\%$ of observations on average), and in some cases even outperforming both the component explainable and black box models while improving explainability.	翻訳日:2023-12-27 20:34:29 公開日:2023-12-22
# 完全および部分入力依存対称性の自己監視検出 Self-Supervised Detection of Perfect and Partial Input-Dependent Symmetries ( http://arxiv.org/abs/2312.12223v2 ) ライセンス: Link先を確認	Alonso Urbano, David W. Romero	(参考訳) 群同分散は入力の群変換に対する一貫した応答を保証し、より堅牢なモデルと拡張された一般化能力をもたらす。しかし、この性質は、群で見なされる対称性がデータで観察されたものと異なる場合、過度に制約されたモデルをもたらす可能性がある。一般的な手法では、データセットレベルで適切な対称性のレベルを決定することでこの問題に対処するが、同じデータセットに複数の対称性が共存するシナリオは、教師付き設定と無視に限られる。例えば、車と飛行機の写真は異なるレベルの回転を示すが、どちらもCIFAR-10データセットに含まれている。本稿では,ラベルを使わずに各入力の対称性のレベルを検出する手法を提案する。この目的のために、データ内の対称性の分布を学ぶのに十分かつ必要な条件を導出する。学習した分布を用いて擬似ラベルを生成し,各入力の対称性のレベルを自己教師ありで学習する。本研究では, クラスごとに異なる対称性を持つ合成データセット, 例えば mnistmultiple に対して, 数値がクラスに依存して一様回転する手法の有効性を検証する。本手法は,対称性が存在しない標準データセットの生成や,推論中の分布外対称性の検出など,実用的な用途に応用できることを実証する。これにより、非同変モデルの一般化と堅牢性の両方を改善することができる。私たちのコードはhttps://github.com/aurban0/ssl-symで公開されています。 Group equivariance ensures consistent responses to group transformations of the input, leading to more robust models and enhanced generalization capabilities. However, this property can lead to overly constrained models if the symmetries considered in the group differ from those observed in data. While common methods address this by determining the appropriate level of symmetry at the dataset level, they are limited to supervised settings and ignore scenarios in which multiple levels of symmetry co-exist in the same dataset. For instance, pictures of cars and planes exhibit different levels of rotation, yet both are included in the CIFAR-10 dataset. In this paper, we propose a method able to detect the level of symmetry of each input without the need for labels. To this end, we derive a sufficient and necessary condition to learn the distribution of symmetries in the data. Using the learned distribution, we generate pseudo-labels that allow us to learn the levels of symmetry of each input in a self-supervised manner. We validate the effectiveness of our approach on synthetic datasets with different per-class levels of symmetries e.g. MNISTMultiple, in which digits are uniformly rotated within a class-dependent interval. We demonstrate that our method can be used for practical applications such as the generation of standardized datasets in which the symmetries are not present, as well as the detection of out-of-distribution symmetries during inference. By doing so, both the generalization and robustness of non-equivariant models can be improved. Our code is publicly available at https://github.com/aurban0/ssl-sym.	翻訳日:2023-12-27 20:33:54 公開日:2023-12-22
# C2FAR: 高精度確率予測のための粗大な自己回帰ネットワーク C2FAR: Coarse-to-Fine Autoregressive Networks for Precise Probabilistic Forecasting ( http://arxiv.org/abs/2312.15002v1 ) ライセンス: Link先を確認	Shane Bergsma, Timothy Zeyl, Javad Rahimipour Anaraki, Lei Guo	(参考訳) 本稿では,不定値の数値確率変数の確率分布をモデル化する手法であるc2farを提案する。 c2farは、各分布が予め生成された粗い間隔で条件づけされた複数のバイナリ分布から、段階的により細かい支持間隔を生成する。以前の(平坦な)双対分布とは異なり、C2FARは複雑性の線形増加のため、指数的に高い精度で値を表現することができる。我々はC2FARを用いて、繰り返しニューラルネットワークによる確率予測を行い、空間と時間の両方で時系列を自動回帰的にモデル化する。 C2FARは任意のスケールと分布形状の離散連続列を同時に扱う最初の方法である。この柔軟性は、異常検出、補間、圧縮など、さまざまな時系列ユースケースを可能にする。 C2FARは、いくつかのベンチマーク予測データセットの最先端よりも改善されている。 We present coarse-to-fine autoregressive networks (C2FAR), a method for modeling the probability distribution of univariate, numeric random variables. C2FAR generates a hierarchical, coarse-to-fine discretization of a variable autoregressively; progressively finer intervals of support are generated from a sequence of binned distributions, where each distribution is conditioned on previously-generated coarser intervals. Unlike prior (flat) binned distributions, C2FAR can represent values with exponentially higher precision, for only a linear increase in complexity. We use C2FAR for probabilistic forecasting via a recurrent neural network, thus modeling time series autoregressively in both space and time. C2FAR is the first method to simultaneously handle discrete and continuous series of arbitrary scale and distribution shape. This flexibility enables a variety of time series use cases, including anomaly detection, interpolation, and compression. C2FAR achieves improvements over the state-of-the-art on several benchmark forecasting datasets.	翻訳日:2023-12-27 20:26:21 公開日:2023-12-22
# 構成を一般化するモジュラー解の発見 Discovering modular solutions that generalize compositionally ( http://arxiv.org/abs/2312.15001v1 ) ライセンス: Link先を確認	Simon Schug, Seijin Kobayashi, Yassir Akram, Maciej Wo{\l}czyk, Alexandra Proca, Johannes von Oswald, Razvan Pascanu, Jo\~ao Sacramento, Angelika Steger	(参考訳) 多くの複雑なタスクや環境は、単純で独立した部分に分解できる。このような構成構造の発見は、適応を迅速化し、構成の一般化を可能にする可能性を秘めている。進歩にもかかわらず、我々の最も強力なシステムは柔軟に組み立てるのに苦労している。これらのシステムのほとんどはモノリシックだが、モジュール性によって多くのタスクの構成的性質をキャプチャできる。しかし、モジュラーシステムがこの隠れた構成構造を発見する状況は不明である。そこで,本研究では,地中真理モジュールの構成を完全に制御できるモジュール型教師を用いた教師学生設定について検討する。これにより、構成的一般化の問題と基盤となるモジュールの識別の問題とを関連付けることができる。実演から純粋に線形変換への同定は,指数関数的な加群の組み合わせを学習することなく,ハイパーネットで可能であることを示す。我々の理論は無限のデータ限界を前提としているが、有限データからのメタラーニングが、構成をモジュラーに一般化するがモノリシックなアーキテクチャではないモジュラーソリューションをいかに発見できるかを実証する。さらに,我々の洞察が教師の学習環境の外側に翻訳され,構成的選好と構成的目標を持つタスクにおいて,ハイパーネットワークが構成的に一般化するモジュラーポリシーを発見できることを実証する。 Many complex tasks and environments can be decomposed into simpler, independent parts. Discovering such underlying compositional structure has the potential to expedite adaptation and enable compositional generalization. Despite progress, our most powerful systems struggle to compose flexibly. While most of these systems are monolithic, modularity promises to allow capturing the compositional nature of many tasks. However, it is unclear under which circumstances modular systems discover this hidden compositional structure. To shed light on this question, we study a teacher-student setting with a modular teacher where we have full control over the composition of ground truth modules. This allows us to relate the problem of compositional generalization to that of identification of the underlying modules. We show theoretically that identification up to linear transformation purely from demonstrations is possible in hypernetworks without having to learn an exponential number of module combinations. While our theory assumes the infinite data limit, in an extensive empirical study we demonstrate how meta-learning from finite data can discover modular solutions that generalize compositionally in modular but not monolithic architectures. We further show that our insights translate outside the teacher-student setting and demonstrate that in tasks with compositional preferences and tasks with compositional goals hypernetworks can discover modular policies that compositionally generalize.	翻訳日:2023-12-27 20:26:03 公開日:2023-12-22
# デジタルフットプリントのクローズがユーザのプライバシーとパーソナライゼーションに及ぼす影響 The Impact of Cloaking Digital Footprints on User Privacy and Personalization ( http://arxiv.org/abs/2312.15000v1 ) ライセンス: Link先を確認	Sofie Goethals, Sandra Matz, Foster Provost, Yanou Ramon, David Martens	(参考訳) 私たちのオンライン生活は、技術プラットフォームによって蓄積され活用される、豊富な行動記録('デジタルフットプリント')を生み出します。このデータは、サービスをパーソナライズすることで、ユーザにとっての価値を生み出すために使用できる。しかし同時に、個人の特性(例えば、彼らの個性、政治的イデオロギー、性的指向)に非常に親密な窓を提供することで、人々のプライバシーを脅かす。以前の研究は、ユーザのフットプリントのクローキングという潜在的な修正を提案している。つまり、ユーザーは予測アルゴリズムからデジタルフットプリントの一部を隠して、望ましくない推論を避けることができる。このようなアプローチは、現時点ではプライバシー保護を提供することが示されているが、2つのオープンな疑問がある。第一に、クローキングが時間とともにどれだけうまく機能するかは不明だ。人々が常に新しいデジタルフットプリントを離れるにつれて、アルゴリズムは以前のクロークされた特性を予測する能力を取り戻すかもしれない。第二に、望ましくない推論を避けるためにデジタルフットプリントをクローズすることは、他の望ましい推論(例えば、望ましいパーソナライズされたコンテンツを駆動しているもの)に対するモデルの性能を低下させる可能性がある。これらの研究ギャップに照らして、私たちの貢献は2つあります。 1)メタフィーチャー(自動生成高レベルカテゴリ)を隠蔽する新しいクローキング戦略を提案し,その効果を既存のクローキングアプローチと比較する。 2) 一つの形質が他の形質に対する推論の正確性に及ぼす影響を検証した。重要な発見は、クローキングの有効性は時間とともに低下するが、その低下率は、個々のフットプリントよりもメタフィーチャーをクロークする場合にかなり小さいことである。さらに、われわれの発見はプライバシーとパーソナライゼーションのトレードオフが期待されていることを明らかにしている: 望ましくない特徴を隠すことも、他の望ましい特徴を部分的に隠している。 Our online lives generate a wealth of behavioral records -'digital footprints'- which are stored and leveraged by technology platforms. This data can be used to create value for users by personalizing services. At the same time, however, it also poses a threat to people's privacy by offering a highly intimate window into their private traits (e.g., their personality, political ideology, sexual orientation). Prior work has proposed a potential remedy: The cloaking of users' footprints. That is, platforms could allow users to hide portions of their digital footprints from predictive algorithms to avoid undesired inferences. While such an approach has been shown to offer privacy protection in the moment, there are two open questions. First, it remains unclear how well cloaking performs over time. As people constantly leave new digital footprints, the algorithm might regain the ability to predict previously cloaked traits. Second, cloaking digital footprints to avoid one undesirable inference may degrade the performance of models for other, desirable inferences (e.g., those driving desired personalized content). In the light of these research gaps, our contributions are twofold: 1) We propose a novel cloaking strategy that conceals 'metafeatures' (automatically generated higher-level categories) and compares its effectiveness against existing cloaking approaches, and 2) we test the spill-over effects of cloaking one trait on the accuracy of inferences on other traits. A key finding is that the effectiveness of cloaking degrades over times, but the rate at which it degrades is significantly smaller when cloaking metafeatures rather than individual footprints. In addition, our findings reveal the expected trade-off between privacy and personalization: Cloaking an undesired trait also partially conceals other desirable traits.	翻訳日:2023-12-27 20:25:40 公開日:2023-12-22
# きめ細かい鳥の識別のためのハビタット情報の活用 Leveraging Habitat Information for Fine-grained Bird Identification ( http://arxiv.org/abs/2312.14999v1 ) ライセンス: Link先を確認	Tin Nguyen, Anh Nguyen	(参考訳) 従来の鳥分類器は、主に鳥の視覚特性に依存している。以前の作品の中には、背景に不変な分類器を訓練し、鳥類の生活環境を完全に破棄するものもある。その代わり、私たちは鳥類学者によって鳥類を識別する4つの主要な方法の1つである生息地情報を現代の鳥類分類器に統合する研究を初めて行った。 1)下流の鳥のデータセットに基づいて訓練されたCNNとViT,(2)オリジナルでマルチモーダルなCLIPである。 CNNとViTを生息地データでトレーニングすると、NABirdsとCUB-200で最大0.83点、+0.23点が改善される。同様に、CLIPのプロンプトに生息地記述子を追加すると、NABirdsとCUB-200で最大0.99と+1.1ポイントの精度が向上する。画像拡張プロセスと視覚言語CLIP分類器のテキスト記述子に環境特徴を統合することにより,一貫した精度の向上が得られた。コードは、https://anonymous.4open.science/r/reasoning-8B7E/で入手できる。 Traditional bird classifiers mostly rely on the visual characteristics of birds. Some prior works even train classifiers to be invariant to the background, completely discarding the living environment of birds. Instead, we are the first to explore integrating habitat information, one of the four major cues for identifying birds by ornithologists, into modern bird classifiers. We focus on two leading model types: (1) CNNs and ViTs trained on the downstream bird datasets; and (2) original, multi-modal CLIP. Training CNNs and ViTs with habitat-augmented data results in an improvement of up to +0.83 and +0.23 points on NABirds and CUB-200, respectively. Similarly, adding habitat descriptors to the prompts for CLIP yields a substantial accuracy boost of up to +0.99 and +1.1 points on NABirds and CUB-200, respectively. We find consistent accuracy improvement after integrating habitat features into the image augmentation process and into the textual descriptors of vision-language CLIP classifiers. Code is available at: https://anonymous.4open.science/r/reasoning-8B7E/.	翻訳日:2023-12-27 20:25:10 公開日:2023-12-22
# 合成画像は人造アート偽造者の認識を助ける Synthetic images aid the recognition of human-made art forgeries ( http://arxiv.org/abs/2312.14998v1 ) ライセンス: Link先を確認	Johann Ostmeyer, Ludovica Schaerf, Pavel Buividovich, Tessa Charles, Eric Postma, Carina Popovici	(参考訳) これまでの研究によると、人工知能は特定のアーティストによる本物の絵画と、驚くほどの精度で人造の偽造品を区別できるという。しかし, 既知偽造の数が限られているため, 偽造検出のための増補法が望まれる。本研究では, 合成アートワークをトレーニングデータセットに組み込むことにより, 偽造検出性能を向上させる可能性を検討する。我々はVincent van Gogh氏による絵画に焦点を当て、偽造検出に特化した最初のデータセットをリリースしました。結果を強化するため、Amedeo Modigliani と Raphael で同様の分析を行った。原画と偽物とを区別するために分類器を訓練する。このために、有名なアーティストのスタイルで人造の偽造品や模倣品を使用し、Stable DiffusionとStyleGANが生成した同様のスタイルのイメージでトレーニングセットを拡張する。追加の合成偽造物は、一貫して人造偽造物の検出を改善している。さらに, 従来の研究と並行して, トレーニングに合成偽造物を含めることで, 特に類似の発電機を用いて生成したAI生成偽造物の検出が可能となった。 Previous research has shown that Artificial Intelligence is capable of distinguishing between authentic paintings by a given artist and human-made forgeries with remarkable accuracy, provided sufficient training. However, with the limited amount of existing known forgeries, augmentation methods for forgery detection are highly desirable. In this work, we examine the potential of incorporating synthetic artworks into training datasets to enhance the performance of forgery detection. Our investigation focuses on paintings by Vincent van Gogh, for which we release the first dataset specialized for forgery detection. To reinforce our results, we conduct the same analyses on the artists Amedeo Modigliani and Raphael. We train a classifier to distinguish original artworks from forgeries. For this, we use human-made forgeries and imitations in the style of well-known artists and augment our training sets with images in a similar style generated by Stable Diffusion and StyleGAN. We find that the additional synthetic forgeries consistently improve the detection of human-made forgeries. In addition, we find that, in line with previous research, the inclusion of synthetic forgeries in the training also enables the detection of AI-generated forgeries, especially if created using a similar generator.	翻訳日:2023-12-27 20:24:49 公開日:2023-12-22
# ブリッジングAIと臨床実践: 自動睡眠スコアアルゴリズムと不確かさガイドの医師レビューの統合 Bridging AI and Clinical Practice: Integrating Automated Sleep Scoring Algorithm with Uncertainty-Guided Physician Review ( http://arxiv.org/abs/2312.14996v1 ) ライセンス: Link先を確認	Michal Bechny (1 and 2), Giuliana Monachino (1 and 2), Luigi Fiorillo (2), Julia van der Meer (3), Markus H. Schmidt (3 and 4), Claudio L. A. Bassetti (3), Athina Tzovara (1 and 5), Francesca D. Faraci (2) ((1) Institute of Computer Science, University of Bern, Bern, Switzerland (2) Institute of Digital Technologies for Personalized Healthcare (MeDiTech), University of Applied Sciences and Arts of Southern Switzerland, Lugano, Switzerland (3) Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland (4) Ohio Sleep Medicine Institute, Dublin, United States (5) Center for Experimental Neurology, Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland)	(参考訳) 目的: 本研究の目的は, 予測催眠図のマニュアルレビューにおいて, 臨床医を効率的に支援するための不確実性推定手法を組み込むことにより, 自動睡眠コーリングアルゴリズムの臨床的利用を促進することである。本研究は,事前定義された合意レベルを達成するために必要なレビュー範囲を目標とし,ドメイン内データとドメイン外データの両方を調べ,対象者の診断を検討する。患者と方法:13のオープンアクセスデータベースから合計19578のPSGを使用して、最先端の睡眠スコアアルゴリズムであるU-Sleepをトレーニングした。我々は、年齢と睡眠障害の全スペクトルをカバーする8832psgの総合的な臨床データベースを利用して、u-sleepを洗練し、新しい信頼ネットワークを含む異なる不確実性定量化アプローチを評価する。 idデータは50名以上の医師が獲得したpsgからなり、2つのoodセットはそれぞれユニークな上級医師が記録した。結果: U-Sleepは堅牢な性能を示し、CohenのKappaはIDが76.2%、OODデータが73.8-78.8%だった。信頼ネットワークは不確実な予測の特定に優れており、AUROCはIDが85.7%、OODデータが82.5-85.6%だった。睡眠障害状態とは関係なく, 統計的評価では, 整合性と不協和性予測の信頼スコアの有意差がみられた。医師の介入で少なくとも90%のKを達成するためには、不確実なエポックの29.0%未満を検査し、医師の負担を大幅に減らし、ほぼ完全な合意を容易にした。 Purpose: This study aims to enhance the clinical use of automated sleep-scoring algorithms by incorporating an uncertainty estimation approach to efficiently assist clinicians in the manual review of predicted hypnograms, a necessity due to the notable inter-scorer variability inherent in polysomnography (PSG) databases. Our efforts target the extent of review required to achieve predefined agreement levels, examining both in-domain and out-of-domain data, and considering subjects diagnoses. Patients and methods: Total of 19578 PSGs from 13 open-access databases were used to train U-Sleep, a state-of-the-art sleep-scoring algorithm. We leveraged a comprehensive clinical database of additional 8832 PSGs, covering a full spectrum of ages and sleep-disorders, to refine the U-Sleep, and to evaluate different uncertainty-quantification approaches, including our novel confidence network. The ID data consisted of PSGs scored by over 50 physicians, and the two OOD sets comprised recordings each scored by a unique senior physician. Results: U-Sleep demonstrated robust performance, with Cohen's kappa (K) at 76.2% on ID and 73.8-78.8% on OOD data. The confidence network excelled at identifying uncertain predictions, achieving AUROC scores of 85.7% on ID and 82.5-85.6% on OOD data. Independently of sleep-disorder status, statistical evaluations revealed significant differences in confidence scores between aligning vs discording predictions, and significant correlations of confidence scores with classification performance metrics. To achieve K of at least 90% with physician intervention, examining less than 29.0% of uncertain epochs was required, substantially reducing physicians workload, and facilitating near-perfect agreement.	翻訳日:2023-12-27 20:24:29 公開日:2023-12-22
# 大規模マルチモーダルモデルを用いた多機能食品アシスタントFoodLMM FoodLMM: A Versatile Food Assistant using Large Multi-modal Model ( http://arxiv.org/abs/2312.14991v1 ) ライセンス: Link先を確認	Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, Chong-Wah Ngo	(参考訳) 大規模マルチモーダルモデル(LMM)は多くの視覚言語タスクにおいて顕著な進歩を遂げている。しかし、特定の領域における一般LMMの性能は、まだ十分ではない。本稿では,食品認識,食材認識,レシピ生成,栄養推定,食品セグメンテーション,多ラウンド会話など,多機能なLMMに基づく多目的食品アシスタントであるFoodLMMを提案する。純粋なテキスト出力以外のタスクの処理を容易にするために,一連のタスク固有のトークンとヘッドを導入し,食品栄養値と複数のセグメンテーションマスクの予測を可能にした。 2段階のトレーニング戦略を採用しています。第1段階では,インストラクション・フォロー・パラダイムを活用し,マルチタスク学習に複数の公開食品ベンチマークを利用する。第2段階では,マルチラウンド会話と推論セグメンテーションデータセットを構築し,モデルを微調整し,食事領域における複雑な推論に基づく専門的な対話やセグメンテーションマスクの生成を可能にする。微調整したFoodLMMは、いくつかの食品ベンチマークで最先端の結果が得られる。コード、モデル、データセットを一般公開します。 Large Multi-modal Models (LMMs) have made impressive progress in many vision-language tasks. Nevertheless, the performance of general LMMs in specific domains is still far from satisfactory. This paper proposes FoodLMM, a versatile food assistant based on LMMs with various capabilities, including food recognition, ingredient recognition, recipe generation, nutrition estimation, food segmentation and multi-round conversation. To facilitate FoodLMM to deal with tasks beyond pure text output, we introduce a series of novel task-specific tokens and heads, enabling the model to predict food nutritional values and multiple segmentation masks. We adopt a two-stage training strategy. In the first stage, we utilize multiple public food benchmarks for multi-task learning by leveraging instruct-following paradigm. In the second stage, we construct a multi-round conversation and a reasoning segmentation datasets to fine-tune the model, enabling it to conduct professional dialogues and generate segmentation masks based on complex reasoning in food domain. Our fine-tuned FoodLMM achieves state-of-the-art results across several food benchmarks. We will make our code, models and datasets publicly available.	翻訳日:2023-12-27 20:23:52 公開日:2023-12-22
# オープンワールド連続学習のための知識伝達促進のための学習 Learning to Prompt Knowledge Transfer for Open-World Continual Learning ( http://arxiv.org/abs/2312.14990v1 ) ライセンス: Link先を確認	Yujie Li, Xin Yang, Hao Wang, Xiangkun Wang and Tianrui Li	(参考訳) 本稿では,open-world continual learning (owcl) と呼ばれるオープンワールドシナリオにおける連続学習の問題について述べる。 OwCLは増加傾向にあり、2倍に非常に挑戦的です。一過去の知識を忘れることなく、一連のタスクを学習すること。二将来の未知物(未知物又はクラス)を識別すること。既存のowclメソッドは、既知のものと未知の間のタスク認識境界の適応性に苦しみ、知識伝達のメカニズムを考慮しない。本稿では,OwCLの知識伝達モデルであるPro-KTを提案する。 Pro-KTは、(1)タスクジェネリックな知識とタスク固有の知識の両方をエンコードし、転送するプロンプトバンク、(2)タスクアウェアなオープンセット境界により、新しいタスクの未知を識別する。 2つの実世界のデータセットを用いた実験の結果、提案したPro-KTは未知の発見と既知の分類の両方において最先端のデータセットよりも優れていた。 This paper studies the problem of continual learning in an open-world scenario, referred to as Open-world Continual Learning (OwCL). OwCL is increasingly rising while it is highly challenging in two-fold: i) learning a sequence of tasks without forgetting knowns in the past, and ii) identifying unknowns (novel objects/classes) in the future. Existing OwCL methods suffer from the adaptability of task-aware boundaries between knowns and unknowns, and do not consider the mechanism of knowledge transfer. In this work, we propose Pro-KT, a novel prompt-enhanced knowledge transfer model for OwCL. Pro-KT includes two key components: (1) a prompt bank to encode and transfer both task-generic and task-specific knowledge, and (2) a task-aware open-set boundary to identify unknowns in the new tasks. Experimental results using two real-world datasets demonstrate that the proposed Pro-KT outperforms the state-of-the-art counterparts in both the detection of unknowns and the classification of knowns markedly.	翻訳日:2023-12-27 20:23:35 公開日:2023-12-22
# Emage:非自己回帰型テキスト画像生成 Emage: Non-Autoregressive Text-to-Image Generation ( http://arxiv.org/abs/2312.14988v1 ) ライセンス: Link先を確認	Zhangyin Feng, Runyi Hu, Liangxin Liu, Fan Zhang, Duyu Tang, Yong Dai, Xiaocheng Feng, Jiwei Li, Bing Qin, Shuming Shi	(参考訳) 自己回帰モデルと拡散モデルは、テキストから画像への生成における最近のブレークスルーを駆動する。自動回帰モデルは画像トークンを生成するために数千回以上連続して実行され、拡散モデルはガウスノイズを数百のデノゲーションステップでイメージに変換する。本研究では,何百もの画像トークンを並列に効率的に生成する非自己回帰的テキスト・画像モデルについて検討する。学習戦略や推論戦略,初期化テキストエンコーダなど,さまざまなモデルバリエーションを開発しています。 1000回実行する必要がある自己回帰ベースラインと比較すると、私たちのモデルは16回しか動作せず、非常に低い推論レイテンシで競合品質のイメージを生成します。 346Mパラメータを持つ我々の非自己回帰モデルは、256$\times$256の画像を1つのV100 GPU上で約1秒生成する。 Autoregressive and diffusion models drive the recent breakthroughs on text-to-image generation. Despite their huge success of generating high-realistic images, a common shortcoming of these models is their high inference latency - autoregressive models run more than a thousand times successively to produce image tokens and diffusion models convert Gaussian noise into images with many hundreds of denoising steps. In this work, we explore non-autoregressive text-to-image models that efficiently generate hundreds of image tokens in parallel. We develop many model variations with different learning and inference strategies, initialized text encoders, etc. Compared with autoregressive baselines that needs to run one thousand times, our model only runs 16 times to generate images of competitive quality with an order of magnitude lower inference latency. Our non-autoregressive model with 346M parameters generates an image of 256$\times$256 with about one second on one V100 GPU.	翻訳日:2023-12-27 20:23:17 公開日:2023-12-22
# 立体規則化生体力学平衡による変形性画像登録 Deformable Image Registration with Stochastically Regularized Biomechanical Equilibrium ( http://arxiv.org/abs/2312.14987v1 ) ライセンス: Link先を確認	Pablo Alvarez (MIMESIS), St\'ephane Cotin (MIMESIS)	(参考訳) 変形可能な画像登録のための多数の正規化手法は、スムーズな変換を強制することを目的としているが、事前調整が困難であり、明確な物理的基盤が欠如している。物理的にインスピレーションを受けた戦略が出現し、健全な理論的基礎を提供するが、それでも複雑な離散化と解決のスキームを必要とする。本研究は, 医用画像登録の物理的動機付けによる正規化のメリットを維持しつつ, 離散化を必要としない正規化戦略を導入し, 現行の登録フレームワークと互換性を持たせた。提案手法は合成データと実データの両方において好適に動作し,現在の最先端手法に匹敵する精度を示す。 Numerous regularization methods for deformable image registration aim at enforcing smooth transformations, but are difficult to tune-in a priori and lack a clear physical basis. Physically inspired strategies have emerged, offering a sound theoretical basis, but still necessitating complex discretization and resolution schemes. This study introduces a regularization strategy that does not require discretization, making it compatible with current registration frameworks, while retaining the benefits of physically motivated regularization for medical image registration. The proposed method performs favorably in both synthetic and real datasets, exhibiting an accuracy comparable to current state-of-the-art methods.	翻訳日:2023-12-27 20:23:06 公開日:2023-12-22
# unihuman:野生で人間の画像を編集するための統一モデル UniHuman: A Unified Model for Editing Human Images in the Wild ( http://arxiv.org/abs/2312.14985v1 ) ライセンス: Link先を確認	Nannan Li, Qing Liu, Krishna Kumar Singh, Yilin Wang, Jianming Zhang, Bryan A. Plummer, Zhe Lin	(参考訳) 人間の画像編集には、人のポーズや服装を変えたり、テキストのプロンプトに従って画像を編集したりするタスクが含まれる。しかし、先行研究はしばしばこれらの課題に別々に取り組み、共同学習による相互強化の利益を見落としている。本論文では,実際の環境下での人間の画像編集の複数の側面を扱う統一モデルUniHumanを提案する。モデルの生成品質と一般化能力を高めるために、人間の視覚エンコーダからのガイダンスを活用して、異なるポーズ表現を活用できる軽量なポーズウォーピングモジュールを導入し、目に見えないテクスチャやパターンに適応する。さらに,既存の人体編集ベンチマークと実世界のデータとの格差を埋めるために,400Kの高品質な人体画像テキストペアをトレーニングし,ドメイン外テストのために2Kの人体画像を収集した。ドメイン内テストセットとドメイン外テストセットの両方の実験では、UniHumanがタスク固有のモデルよりも大きなマージンで優れていることが示されている。ユーザスタディでは、UniHumanは平均して77%のケースでユーザに好まれる。 Human image editing includes tasks like changing a person's pose, their clothing, or editing the image according to a text prompt. However, prior work often tackles these tasks separately, overlooking the benefit of mutual reinforcement from learning them jointly. In this paper, we propose UniHuman, a unified model that addresses multiple facets of human image editing in real-world settings. To enhance the model's generation quality and generalization capacity, we leverage guidance from human visual encoders and introduce a lightweight pose-warping module that can exploit different pose representations, accommodating unseen textures and patterns. Furthermore, to bridge the disparity between existing human editing benchmarks with real-world data, we curated 400K high-quality human image-text pairs for training and collected 2K human images for out-of-domain testing, both encompassing diverse clothing styles, backgrounds, and age groups. Experiments on both in-domain and out-of-domain test sets demonstrate that UniHuman outperforms task-specific models by a significant margin. In user studies, UniHuman is preferred by the users in an average of 77% of cases.	翻訳日:2023-12-27 20:22:52 公開日:2023-12-22
# TPTNet:乱流電位温度に基づくデータ駆動温度予測モデル TPTNet: A Data-Driven Temperature Prediction Model Based on Turbulent Potential Temperature ( http://arxiv.org/abs/2312.14980v1 ) ライセンス: Link先を確認	Jun Park and Changhoon Lee	(参考訳) 数値気象予測(NWP)の計算負担を軽減するため,ニューラルネットワークを用いた表面温度予測のためのデータ駆動モデルを提案した。 TPTNetと命名された我々のモデルは, 気象観測所で観測された2mの温度のみを用いて, 限られた予報時間における局部温度の予測を行う。年間および毎日の変動を考慮した気候成分を分離し, 観測値から温度の乱流変動成分を抽出した。ステーション高度の影響は、潜在的な温度を導入することで補償された。その結果得られた不規則分布局の乱流電位温度データは、畳み込みニューラルネットワーク(cnn)、スウィントランス、グラフィックニューラルネットワーク(gnn)に基づいて、3つの訓練されたネットワークを通して予測時間における乱流電位温度を予測する入力として用いられた。ネットワークの予測性能はpersistenceとnwpと比較され、モデルが最大12時間nwpを上回ったことを確認した。 A data-driven model for predicting the surface temperature using neural networks was proposed to alleviate the computational burden of numerical weather prediction (NWP). Our model, named TPTNet uses only 2m temperature measured at the weather stations of the South Korean Peninsula as input to predict the local temperature at finite forecast hours. The turbulent fluctuation component of the temperature was extracted from the station measurements by separating the climatology component accounting for the yearly and daily variations. The effect of station altitude was then compensated by introducing a potential temperature. The resulting turbulent potential temperature data at irregularly distributed stations were used as input for predicting the turbulent potential temperature at forecast hours through three trained networks based on convolutional neural network (CNN), Swin Transformer, and a graphic neural network (GNN). The prediction performance of our network was compared with that of persistence and NWP, confirming that our model outperformed NWP for up to 12 forecast hours.	翻訳日:2023-12-27 20:22:32 公開日:2023-12-22
# 予測自由エネルギー最小化による情報探索多項式narxモデル予測制御 Information-seeking polynomial NARX model-predictive control through expected free energy minimization ( http://arxiv.org/abs/2312.15046v1 ) ライセンス: Link先を確認	Wouter M. Kouw	(参考訳) 本稿では,システムの目標状態への運転と,非線形自己回帰的外因性モデルのパラメータに関する情報的システム観測を求める適応型モデル予測制御器を提案する。コントローラの目的関数は期待される自由エネルギー関数から派生し、モデルパラメータや出力予測に対する不確実性を表す情報理論用語を含む。パラメータの不確かさが制御対象にどのように影響するかを実験で示し、振り子スイングアップタスクのための提案したコントローラを評価する。 We propose an adaptive model-predictive controller that balances driving the system to a goal state and seeking system observations that are informative with respect to the parameters of a nonlinear autoregressive exogenous model. The controller's objective function is derived from an expected free energy functional and contains information-theoretic terms expressing uncertainty over model parameters and output predictions. Experiments illustrate how parameter uncertainty affects the control objective and evaluate the proposed controller for a pendulum swing-up task.	翻訳日:2023-12-27 20:15:53 公開日:2023-12-22
# 連続時間における集合列の確率的モデリング Probabilistic Modeling for Sequences of Sets in Continuous-Time ( http://arxiv.org/abs/2312.15045v1 ) ライセンス: Link先を確認	Yuxin Chang, Alex Boyd, Padhraic Smyth	(参考訳) ニューラルマーク付き時間的ポイントプロセスは、連続時間イベントデータのための統計パラメトリックモデルの既存のツールボックスに価値ある追加である。これらのモデルは、各イベントが1つのアイテム(単一のイベントタイプまたは"マーク")に関連付けられるシーケンスに役立ちますが、これらのモデルは、各イベントが一連のアイテムに関連付けられる実用的な状況には適していません。本研究では,インテンシティに基づくリカレントニューラルポイントプロセスモデルと互換性のある,連続時間にセット値データをモデリングするための汎用フレームワークを開発した。さらに,このようなモデルを用いて,シーケンス履歴を条件とした「アイテム $b$ 前に観測されるアイテム $a$ の確率」のような確率的クエリに答える推論手法を開発した。このようなクエリの正確な答えの計算は、問題設定の連続時間の性質と、各イベントの潜在的な結果の組合せ的に大きな空間の両方によって、神経モデルでは一般的には役に立たない。そこで,本研究では,実世界の4つのデータセットを用いた体系的な実験を通して,直接サンプリングよりも桁違いに効率が向上することを示す。また、このフレームワークを用いて1段階の予測を伴わない確率を用いてモデル選択を行う方法について説明する。 Neural marked temporal point processes have been a valuable addition to the existing toolbox of statistical parametric models for continuous-time event data. These models are useful for sequences where each event is associated with a single item (a single type of event or a "mark") -- but such models are not suited for the practical situation where each event is associated with a set of items. In this work, we develop a general framework for modeling set-valued data in continuous-time, compatible with any intensity-based recurrent neural point process model. In addition, we develop inference methods that can use such models to answer probabilistic queries such as "the probability of item $A$ being observed before item $B$," conditioned on sequence history. Computing exact answers for such queries is generally intractable for neural models due to both the continuous-time nature of the problem setting and the combinatorially-large space of potential outcomes for each event. To address this, we develop a class of importance sampling methods for querying with set-based sequences and demonstrate orders-of-magnitude improvements in efficiency over direct sampling via systematic experiments with four real-world datasets. We also illustrate how to use this framework to perform model selection using likelihoods that do not involve one-step-ahead prediction.	翻訳日:2023-12-27 20:15:45 公開日:2023-12-22
# GroundVLP:視覚言語事前学習とオープン語彙オブジェクト検出によるゼロショット視覚グラウンドのハーネス化 GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection ( http://arxiv.org/abs/2312.15043v1 ) ライセンス: Link先を確認	Haozhan Shen, Tiancheng Zhao, Mingwei Zhu, Jianwei Yin	(参考訳) ビジュアルグラウンド(Visual Grounding)は、クエリ表現に基づく視覚的コンテキストの理解を含む重要な視覚言語タスクであり、オブジェクト間の相互作用をキャプチャするモデルと、様々な空間的および属性情報を必要とする。しかし、視覚的接地作業のアノテーションデータは、その時間と労働集約的なアノテーションプロセスによって制限され、訓練されたモデルは、その能力をより広い領域に一般化することから制約される。この課題に対処するために,画像テキストペアと純粋なオブジェクト検出データから学習した既存のモデルから視覚的接地能力を活用する,シンプルで効果的なゼロショット手法であるGroundVLPを提案する。 GroundVLPはGradCAMのヒートマップとオープン語彙検出器のオブジェクト提案を組み合わせた融合機構を提案する。提案手法は,RefCOCOとRefCOCO+のテスト分割において,従来のゼロショット・オブ・ザ・アートを約28倍上回り,RefCOCO//gデータセット上の他のゼロショット・メソッドを著しく上回ることを示す。さらに、GroundVLPはFlickr30kエンティティデータセット上のいくつかの非VLPベースの教師付きモデルと互換性があるか、それ以上に機能する。私たちのコードはhttps://github.com/om-ai-lab/GroundVLPで利用可能です。 Visual grounding, a crucial vision-language task involving the understanding of the visual context based on the query expression, necessitates the model to capture the interactions between objects, as well as various spatial and attribute information. However, the annotation data of visual grounding task is limited due to its time-consuming and labor-intensive annotation process, resulting in the trained models being constrained from generalizing its capability to a broader domain. To address this challenge, we propose GroundVLP, a simple yet effective zero-shot method that harnesses visual grounding ability from the existing models trained from image-text pairs and pure object detection data, both of which are more conveniently obtainable and offer a broader domain compared to visual grounding annotation data. GroundVLP proposes a fusion mechanism that combines the heatmap from GradCAM and the object proposals of open-vocabulary detectors. We demonstrate that the proposed method significantly outperforms other zero-shot methods on RefCOCO/+/g datasets, surpassing prior zero-shot state-of-the-art by approximately 28\% on the test split of RefCOCO and RefCOCO+. Furthermore, GroundVLP performs comparably to or even better than some non-VLP-based supervised models on the Flickr30k entities dataset. Our code is available at https://github.com/om-ai-lab/GroundVLP.	翻訳日:2023-12-27 20:15:05 公開日:2023-12-22
# 学習分析ダッシュボードはハイプに耐えただろうか? 学生の達成、モチベーション、参加、態度への影響に関する体系的考察 Have Learning Analytics Dashboards Lived Up to the Hype? A Systematic Review of Impact on Students' Achievement, Motivation, Participation and Attitude ( http://arxiv.org/abs/2312.15042v1 ) ライセンス: Link先を確認	Rogers Kaliisa, Kamila Misiejuk, Sonsoles L\'opez-Pernas, Mohammad Khalil, Mohammed Saqr	(参考訳) 学習分析ダッシュボード(LAD)はLA介入の最も一般的な形態であるが、学生の学習結果への影響については限定的な証拠がある。本研究は,学生の学習成果,達成,参加,モチベーション,態度にLADが与える影響を総合的に調査するために,38件の研究成果を総合するものである。私たちが現在立っているように、LADが学術的業績を改善するという約束を果たすまで生きてきたという結論を支持する証拠はない。ほとんどの研究は無視または小さな効果を報告し、十分に制御された実験の限られた証拠を報告した。多くの研究は、ladのユーザと非ユーザを比較し、ダッシュボード効果を学生のエンゲージメントレベルと組み合わせている。同様に、LADがモチベーションや態度に与える影響は、わずかに例外的に顕著な効果を示した。これらの研究の小さなサンプルサイズは、これらの発見を検証するための大規模な調査の必要性を強調している。特に、LADは学生参加に比較的大きな影響を及ぼした。いくつかの研究は中～大きな効果の大きさを報告し、LADがオンライン学習環境におけるエンゲージメントと相互作用を促進することを示唆している。しかし, 従来の評価手法への依存, 自己選択バイアス, 利用に等しいという仮定, 標準化された評価ツールの欠如など, 方法論上の欠点が繰り返し発生する。 ladの研究ラインを前進させるために、研究者は厳密な評価手法を使い、学習構成を評価するための明確な基準を確立する必要がある。このような取り組みは、LADの可能性の理解を深め、学習成果を高め、教育者や研究者にも貴重な洞察を提供する。 While learning analytics dashboards (LADs) are the most common form of LA intervention, there is limited evidence regarding their impact on students learning outcomes. This systematic review synthesizes the findings of 38 research studies to investigate the impact of LADs on students' learning outcomes, encompassing achievement, participation, motivation, and attitudes. As we currently stand, there is no evidence to support the conclusion that LADs have lived up to the promise of improving academic achievement. Most studies reported negligible or small effects, with limited evidence from well-powered controlled experiments. Many studies merely compared users and non-users of LADs, confounding the dashboard effect with student engagement levels. Similarly, the impact of LADs on motivation and attitudes appeared modest, with only a few exceptions demonstrating significant effects. Small sample sizes in these studies highlight the need for larger-scale investigations to validate these findings. Notably, LADs showed a relatively substantial impact on student participation. Several studies reported medium to large effect sizes, suggesting that LADs can promote engagement and interaction in online learning environments. However, methodological shortcomings, such as reliance on traditional evaluation methods, self-selection bias, the assumption that access equates to usage, and a lack of standardized assessment tools, emerged as recurring issues. To advance the research line for LADs, researchers should use rigorous assessment methods and establish clear standards for evaluating learning constructs. Such efforts will advance our understanding of the potential of LADs to enhance learning outcomes and provide valuable insights for educators and researchers alike.	翻訳日:2023-12-27 20:14:24 公開日:2023-12-22
# twitter上のバイアスド・メディカル・クレームのカスケード検出に向けて Towards Detecting Cascades of Biased Medical Claims on Twitter ( http://arxiv.org/abs/2312.15040v1 ) ライセンス: Link先を確認	Libby Tiderman, Juan Sanchez Mercedes, Fiona Romanoschi, Fabricio Murai	(参考訳) ソーシャルメディアは、社会的識別子と病気の間の誤解を招く相関関係を強調する医療的主張を広める可能性がある。われわれの研究は、Twitter上の偏りのある医療クレームを特定し、その拡散を測定することを目的としている。本稿では,医学的クレームを検出するRoBERTaとバイアスを分類するDistilBERTという2つのモデルを用いた機械学習フレームワークを提案する。偏りのある医療クレームを特定した後、リツイートカスケード分析を行い、個々のリーチと拡散率を計算した。偏りのあるクレームを含むツイートは、偏りのないクレームよりも速く、さらに拡散することが判明した。 Social media may disseminate medical claims that highlight misleading correlations between social identifiers and diseases due to not accounting for structural determinants of health. Our research aims to identify biased medical claims on Twitter and measure their spread. We propose a machine learning framework that uses two models in tandem: RoBERTa to detect medical claims and DistilBERT to classify bias. After identifying original biased medical claims, we conducted a retweet cascade analysis, computing their individual reach and rate of spread. Tweets containing biased claims were found to circulate faster and further than unbiased claims.	翻訳日:2023-12-27 20:13:37 公開日:2023-12-22
# latents2semantics: 顔画像の局所的なスタイル操作に生成モデルの潜在空間を利用する Latents2Semantics: Leveraging the Latent Space of Generative Models for Localized Style Manipulation of Face Images ( http://arxiv.org/abs/2312.15037v1 ) ライセンス: Link先を確認	Snehal Singh Tomar, A.N. Rajagopalan	(参考訳) メタバースが徐々に現実のものとなり、デジタル人間の創造に向けた急速な発展のペースを考えると、人間の顔のための原理化されたスタイルの編集パイプラインの必要性は多様体を増加させることに縛られる。顔画像中の複数の領域(ROI)のスタイル属性の高度に局所化された編集を容易にする生成オートエンコーダモデルであるLatents2Semantics Autoencoder (L2SAE)を導入することで、このニーズに応える。 L2SAEは、符号化された画像の構造とスタイル情報に対する別個の潜在表現を学習する。これにより、選択したroisの構造保存スタイル編集が可能になる。符号化された構造表現は空間次元を小さくしたマルチチャネル2次元テンソルであり、局所構造特性と大域構造特性の両方をキャプチャする。スタイル表現はグローバルなスタイル属性をキャプチャする1Dテンソルである。フレームワークでは、構造表現をスライスして、異なるROIの強い不整合対応を構築する。選択されたROIのスタイル編集は、単純な組み合わせに相当します。 (a)スライスされた構造表現から生じるROIマスク及び (b)グローバルスタイル(ガウスノイズを使用)と不変構造テンソルから生成されたグローバルスタイル変更によるデコード画像。スタイル編集は、スタイル編集に意味的意味をもたらすために、既存の作品の多くは追加の人的努力(スーパービジョン)を必要とするため、SOTAスタイルの編集パイプラインよりも人的監督が優れている。また、反復最適化に基づく反転や、計算コストのかかる演算を必要とする訓練後の潜在方向の制御を廃止する。複数のデータセットからサンプリングされたテスト画像を用いて、選択的なスタイル編集やスワップなど、複数のアプリケーションに対して、定性的かつ定量的な結果を提供する。 With the metaverse slowly becoming a reality and given the rapid pace of developments toward the creation of digital humans, the need for a principled style editing pipeline for human faces is bound to increase manifold. We cater to this need by introducing the Latents2Semantics Autoencoder (L2SAE), a Generative Autoencoder model that facilitates highly localized editing of style attributes of several Regions of Interest (ROIs) in face images. The L2SAE learns separate latent representations for encoded images' structure and style information. Thus, allowing for structure-preserving style editing of the chosen ROIs. The encoded structure representation is a multichannel 2D tensor with reduced spatial dimensions, which captures both local and global structure properties. The style representation is a 1D tensor that captures global style attributes. In our framework, we slice the structure representation to build strong and disentangled correspondences with different ROIs. Consequentially, style editing of the chosen ROIs amounts to a simple combination of (a) the ROI-mask generated from the sliced structure representation and (b) the decoded image with global style changes, generated from the manipulated (using Gaussian noise) global style and unchanged structure tensor. Style editing sans additional human supervision is a significant win over SOTA style editing pipelines because most existing works require additional human effort (supervision) post-training for attributing semantic meaning to style edits. We also do away with iterative-optimization-based inversion or determining controllable latent directions post-training, which requires additional computationally expensive operations. We provide qualitative and quantitative results for the same over multiple applications, such as selective style editing and swapping using test images sampled from several datasets.	翻訳日:2023-12-27 20:13:00 公開日:2023-12-22
# SODA:オンデバイス機械学習モデルにおけるプライオリティ情報保護 SODA: Protecting Proprietary Information in On-Device Machine Learning Models ( http://arxiv.org/abs/2312.15036v1 ) ライセンス: Link先を確認	Akanksha Atrey, Ritwik Sinha, Saayan Mitra, Prashant Shenoy	(参考訳) ローエンドハードウェアの成長は、エッジアプリケーションにおける機械学習ベースのサービスの増加につながった。これらのアプリケーションはユーザに関するコンテキスト情報を収集し、マシンラーニング(ML)モデルを通じてパーソナライズされたオファーなどのサービスを提供する。このようなMLモデルをユーザのデバイスにデプロイすることで、レイテンシの低減、ユーザのプライバシの維持、集中的なソースへの継続的依存の最小化を実現している。しかし、ユーザのエッジデバイスにMLモデルをデプロイすると、サービスプロバイダに関するプロプライエタリな情報が漏洩する可能性がある。本研究では,モバイルサービス提供に使用されるオンデバイスMLモデルについて検討し,簡単な攻撃がサービスプロバイダのプロプライエタリな情報を漏洩させる可能性を実証する。異なる敵が容易にこのようなモデルを利用して利益を最大化し、コンテンツ盗難を達成できることを示す。このような攻撃を阻止する必要性に感銘を受け、敵の攻撃を防ぎながらエッジデバイス上でのデプロイとサービスを行うためのエンドツーエンドフレームワークであるSODAを提示する。以上の結果から,サービス性能,レイテンシ,ストレージへの影響を最小限に抑えつつ,50クエリ未満で89%の精度で敵使用を検出できることが示唆された。 The growth of low-end hardware has led to a proliferation of machine learning-based services in edge applications. These applications gather contextual information about users and provide some services, such as personalized offers, through a machine learning (ML) model. A growing practice has been to deploy such ML models on the user's device to reduce latency, maintain user privacy, and minimize continuous reliance on a centralized source. However, deploying ML models on the user's edge device can leak proprietary information about the service provider. In this work, we investigate on-device ML models that are used to provide mobile services and demonstrate how simple attacks can leak proprietary information of the service provider. We show that different adversaries can easily exploit such models to maximize their profit and accomplish content theft. Motivated by the need to thwart such attacks, we present an end-to-end framework, SODA, for deploying and serving on edge devices while defending against adversarial usage. Our results demonstrate that SODA can detect adversarial usage with 89% accuracy in less than 50 queries with minimal impact on service performance, latency, and storage.	翻訳日:2023-12-27 20:12:05 公開日:2023-12-22
# 2時間量子ゆらぎのアプローチとBethe-Salpeter方程式との関係 Two-Time Quantum Fluctuations Approach and its Relation to the Bethe--Salpeter Equation ( http://arxiv.org/abs/2312.15034v1 ) ライセンス: Link先を確認	Erik Schroedter and Michael Bonitz	(参考訳) 平衡状態の関連量子多粒子系は、相関固体、超低温原子、高密度プラズマを含む多くの分野で高い関心を持つ。これらのシステムの正確な理論記述は、概念的にも計算資源に関しても困難である。我々は最近、非平衡 $gw$ 近似(英語版)(nonequilibrium $gw$ approximation)と同値な量子揺らぎのアプローチを提示した。 Schroedter \textit{et al。と、Cond。マット Phys 23401 (2022)] 計算コストが低い場合に高い精度を保証します。第二の出版物で. Schroedter \textit{et al。とPhys。 B \textbf{108}, 205109 (2023)] では、このアプローチは2時間交換相関関数と密度応答特性にまで拡張された。ここでは、このアプローチの特性をより詳細に分析する。一般化されたkadanoff-baym ansatz と hartree-fock propagator を適用した場合、この手法は2回交換相関関数の bethe-salpeter 方程式と等価であることを示す。 Correlated quantum many-particle systems out of equilibrium are of high interest in many fields, including correlated solids, ultracold atoms or dense plasmas. Accurate theoretical description of these systems is challenging both, conceptionally and with respect to computational resources. We have recently presented a quantum fluctuations approach which is equivalent to the nonequilibrium $GW$ approximation [E. Schroedter \textit{et al.}, Cond. Matt. Phys. \textbf{25}, 23401 (2022)] that promises high accuracy at low computational cost. In a second publication [E. Schroedter \textit{et al.}, Phys. Rev. B \textbf{108}, 205109 (2023)], this approach was extended to the two-time exchange-correlation functions and the density response properties. Here, we analyze the properties of this approach in more detail. We demonstrate that the method is equivalent to the Bethe--Salpeter equation for the two-time exchange-correlation function when the generalized Kadanoff-Baym ansatz with Hartree-Fock propagators is applied.	翻訳日:2023-12-27 20:11:46 公開日:2023-12-22
# 解釈可能な推論時間干渉によるLLMの空間誘導ホロスティック説明法 Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention ( http://arxiv.org/abs/2312.15033v1 ) ライセンス: Link先を確認	Zhen Tan, Tianlong Chen, Zhenyu Zhang, Huan Liu	(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理領域において前例のないブレークスルーを達成した。しかし、llmsの謎めいた「ブラックボックス」の性質は、透過的かつ説明可能な応用を妨げる、解釈可能性にとって重要な課題である。注目の可視化、重要なサブネットワーク抽出、概念に基づく分析といった過去のアプローチは、いくつかの洞察を与えるが、彼らはしばしば1次元内の局所的またはグローバルな説明に焦点を合わせ、時には包括的明確性の提供に不足する。そこで本研究では,LLMの全体的解釈を目的とし,空間性誘導技術に係わる新たな方法論を提案する。我々のフレームワークは、SparseCBMと呼ばれ、空間性を革新的に統合し、インプット、サブネットワーク、コンセプトレベルという3つの相互解釈層を解明する。さらに、新たに導入された解釈可能な推論時間介入の次元は、展開中のモデルに対する動的調整を容易にする。実世界のデータセットに対する厳密な経験的評価を通じて、SparseCBMはLLMの振る舞いを深く理解し、モデルの不正確な解釈と改善の両面で分離することを実証した。コードはサプリメントで提供される。 Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains. However, the enigmatic ``black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications. While past approaches, such as attention visualization, pivotal subnetwork extraction, and concept-based analyses, offer some insight, they often focus on either local or global explanations within a single dimension, occasionally falling short in providing comprehensive clarity. In response, we propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs. Our framework, termed SparseCBM, innovatively integrates sparsity to elucidate three intertwined layers of interpretation: input, subnetwork, and concept levels. In addition, the newly introduced dimension of interpretable inference-time intervention facilitates dynamic adjustments to the model during deployment. Through rigorous empirical evaluations on real-world datasets, we demonstrate that SparseCBM delivers a profound understanding of LLM behaviors, setting it apart in both interpreting and ameliorating model inaccuracies. Codes are provided in supplements.	翻訳日:2023-12-27 20:11:22 公開日:2023-12-22
# Federated Q-Learning: 通信コストの低い線形レグレット高速化 Federated Q-Learning: Linear Regret Speedup with Low Communication Cost ( http://arxiv.org/abs/2312.15023v1 ) ライセンス: Link先を確認	Zhong Zheng, Fengyu Gao, Lingzhou Xue, Jing Yang	(参考訳) 本稿では,中央サーバの協調の下で複数のエージェントが協調して環境を探索し,それらの生データを共有することなく最適な方針を学習する,表状エピソディックマルコフ決定プロセス(mdp)のためのフェデレート強化学習について検討する。収束率やサンプルの複雑さなどの指標では,エージェント数の線形スピードアップが達成されているが,通信コストの低い線形後悔スピードアップを実現するために,モデルフリーなアルゴリズムを設計できるかどうかは不明である。本稿では,FedQ-Hoeffding とFedQ-Bernstein という2つの連立Q-Learningアルゴリズムを提案し,時間的地平線が十分に大きい場合と比較して,対応する全後悔が線形なスピードアップを達成することを示し,通信コストは時間的ステップの総数$T$で対数的にスケールすることを示した。これらの結果は、エージェントとサーバ間のイベントトリガー同期機構、サーバがステートアクション値の局所的な見積を集約してグローバルな見積を形成する場合の新たなステップサイズ選択、および非マーチンゲール差の和を束縛する新しい濃度不等式に頼っている。これは、連帯強化学習におけるモデルフリーアルゴリズムによって線形後悔のスピードアップと対数コミュニケーションコストが達成できることを示す最初の研究である。 In this paper, we consider federated reinforcement learning for tabular episodic Markov Decision Processes (MDP) where, under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. While linear speedup in the number of agents has been achieved for some metrics, such as convergence rate and sample complexity, in similar settings, it is unclear whether it is possible to design a model-free algorithm to achieve linear regret speedup with low communication cost. We propose two federated Q-Learning algorithms termed as FedQ-Hoeffding and FedQ-Bernstein, respectively, and show that the corresponding total regrets achieve a linear speedup compared with their single-agent counterparts when the time horizon is sufficiently large, while the communication cost scales logarithmically in the total number of time steps $T$. Those results rely on an event-triggered synchronization mechanism between the agents and the server, a novel step size selection when the server aggregates the local estimates of the state-action values to form the global estimates, and a set of new concentration inequalities to bound the sum of non-martingale differences. This is the first work showing that linear regret speedup and logarithmic communication cost can be achieved by model-free algorithms in federated reinforcement learning.	翻訳日:2023-12-27 20:10:59 公開日:2023-12-22
# 統一マルチモーダル推論フレームワークに向けて Towards a Unified Multimodal Reasoning Framework ( http://arxiv.org/abs/2312.15021v1 ) ライセンス: Link先を確認	Abhinav Arun and Dipendra Singh Mal and Mehul Soni and Tomohiro Sawada	(参考訳) 近年のディープラーニングの進歩は、様々なタスクに優れた強力な言語モデル(LM)の開発につながっている。これらの成果にもかかわらず、特に推論能力の向上とマルチモーダルデータの導入には改善の余地がある。本報告は,複数質問の解答におけるLMの精度を向上させるために,CoT推論とVQA技術を組み合わせることによる潜在的影響について検討する。テキストVQAとScienceQAを用いて、3つのテキスト埋め込み手法と3つの視覚埋め込み手法の有効性を評価した。本実験は,CoTとVQAの複合的影響を調査することによって,現在の研究のギャップを埋めることを目的としており,これらの技術がGPT-4のような最先端モデルの推論能力をいかに改善できるかの理解に寄与している。実験の結果は、LMの推論能力と質問応答能力の向上、この分野におけるさらなる研究と開発のための洞察の提供、および複数のモードにわたる複雑な推論タスクを処理可能なより正確で信頼性の高いAIシステムの実現における、これらのアプローチの可能性を実証した。 Recent advancements in deep learning have led to the development of powerful language models (LMs) that excel in various tasks. Despite these achievements, there is still room for improvement, particularly in enhancing reasoning abilities and incorporating multimodal data. This report investigates the potential impact of combining Chain-of-Thought (CoT) reasoning and Visual Question Answering (VQA) techniques to improve LM's accuracy in solving multiple-choice questions. By employing TextVQA and ScienceQA datasets, we assessed the effectiveness of three text embedding methods and three visual embedding approaches. Our experiments aimed to fill the gap in current research by investigating the combined impact of CoT and VQA, contributing to the understanding of how these techniques can improve the reasoning capabilities of state-of-the-art models like GPT-4. Results from our experiments demonstrated the potential of these approaches in enhancing LM's reasoning and question-answering capabilities, providing insights for further research and development in the field, and paving the way for more accurate and reliable AI systems that can handle complex reasoning tasks across multiple modalities.	翻訳日:2023-12-27 20:10:31 公開日:2023-12-22
# Gemini vs GPT-4V : 定性ケースによる視覚言語モデルの予備比較と組み合わせ Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases ( http://arxiv.org/abs/2312.15011v1 ) ライセンス: Link先を確認	Zhangyang Qi, Ye Fang, Mengchen Zhang, Zeyi Sun, Tong Wu, Ziwei Liu, Dahua Lin, Jiaqi Wang, Hengshuang Zhao	(参考訳) MLLM(Multi-modal Large Language Models)の急速に発展する分野は、人工知能における言語処理と視覚処理の統合の最前線にある。本稿では,GoogleのGeminiとOpenAIのGPT-4V(ision)の2つのパイオニアモデルについて,詳細な比較研究を行った。本研究は,視覚言語能力,人間とのインタラクション,時間的理解,知性と感情的商の両方における評価など,両モデルの多面的評価を含む。分析の核心は、それぞれのモデルの視覚的理解能力に分解されます。各種産業応用シナリオにおける性能評価のための構造化実験を行い,実用性に関する総合的な考察を行った。直接的なパフォーマンス比較だけでなく、均衡と公正な分析を保証するためのプロンプトやシナリオの調整も含んでいます。我々の発見は、両方のモデルのユニークな強みとニッチを照らしている。 GPT-4Vは応答の正確さと簡潔さで自分自身を区別し、ジェミニは関連する画像とリンクを伴って詳細で拡張的な回答を提供する。これらの理解は、geminiとgpt-4vの比較的な利点に光を当てただけでなく、マルチモーダル基礎モデルの進化の風景を強調し、この分野における将来の進歩への道を開いた。比較後, 2つのモデルを組み合わせることにより, より良い結果を得ることができた。最後に、GPT-4VとGeminiの開発チームに、この分野への先駆的な貢献を感謝します。当社の認定は、Yang et al の 'Dawn' で示された包括的質的分析にまで拡張されている。本研究は, 画像サンプル, プロンプト, GPT-4V関連結果の広範な収集とともに, 解析の基礎となった。 The rapidly evolving sector of Multi-modal Large Language Models (MLLMs) is at the forefront of integrating linguistic and visual processing in artificial intelligence. This paper presents an in-depth comparative study of two pioneering models: Google's Gemini and OpenAI's GPT-4V(ision). Our study involves a multi-faceted evaluation of both models across key dimensions such as Vision-Language Capability, Interaction with Humans, Temporal Understanding, and assessments in both Intelligence and Emotional Quotients. The core of our analysis delves into the distinct visual comprehension abilities of each model. We conducted a series of structured experiments to evaluate their performance in various industrial application scenarios, offering a comprehensive perspective on their practical utility. We not only involve direct performance comparisons but also include adjustments in prompts and scenarios to ensure a balanced and fair analysis. Our findings illuminate the unique strengths and niches of both models. GPT-4V distinguishes itself with its precision and succinctness in responses, while Gemini excels in providing detailed, expansive answers accompanied by relevant imagery and links. These understandings not only shed light on the comparative merits of Gemini and GPT-4V but also underscore the evolving landscape of multimodal foundation models, paving the way for future advancements in this area. After the comparison, we attempted to achieve better results by combining the two models. Finally, We would like to express our profound gratitude to the teams behind GPT-4V and Gemini for their pioneering contributions to the field. Our acknowledgments are also extended to the comprehensive qualitative analysis presented in 'Dawn' by Yang et al. This work, with its extensive collection of image samples, prompts, and GPT-4V-related results, provided a foundational basis for our analysis.	翻訳日:2023-12-27 20:10:12 公開日:2023-12-22
# SI-MIL:ギガピクセル病理における自己解釈性のための深部MILのモデリング SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology ( http://arxiv.org/abs/2312.15010v1 ) ライセンス: Link先を確認	Saarthak Kapse, Pushpak Pati, Srijan Das, Jingwei Zhang, Chao Chen, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras, Rajarsi R. Gupta, Prateek Prasanna	(参考訳) ギガピクセルスライドの複雑さを考えると、全スライド画像(WSI)解析のための解釈可能性と推論をMIL(Multiple Instance Learning)手法に導入することは困難である。伝統的に、ミル解釈性は下流タスクに適していると考えられる突出した領域を特定することに限定されており、これらの選択の背景にある根拠についてエンドユーザー(病理学者)にほとんど洞察を与えていない。そこで本研究では,自己解釈型MIL(Self-Interpretable MIL, SI-MIL)を提案する。 SI-MILは、手作りの病理的特徴に基づく解釈可能な分岐をガイドし、線形予測を容易にする。 SI-MILは、正常な領域を識別する以外に、WSIの病理学的洞察に根ざした特徴レベルの解釈を提供する。特に、SI-MILは線形予測制約を伴い、モデル解釈可能性と性能の間の必然的なトレードオフの神話に挑戦し、3種類の癌に対してWSIレベルの予測タスクに関する最先端の手法と比較して、競争の結果を示す。さらに,si-milの局所的およびグローバル的解釈可能性について,統計的分析,ドメインエキスパート研究,解釈可能性のデシデラタ,すなわちユーザフレンドリーさと忠実性の観点から徹底的に評価した。 Introducing interpretability and reasoning into Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis is challenging, given the complexity of gigapixel slides. Traditionally, MIL interpretability is limited to identifying salient regions deemed pertinent for downstream tasks, offering little insight to the end-user (pathologist) regarding the rationale behind these selections. To address this, we propose Self-Interpretable MIL (SI-MIL), a method intrinsically designed for interpretability from the very outset. SI-MIL employs a deep MIL framework to guide an interpretable branch grounded on handcrafted pathological features, facilitating linear predictions. Beyond identifying salient regions, SI-MIL uniquely provides feature-level interpretations rooted in pathological insights for WSIs. Notably, SI-MIL, with its linear prediction constraints, challenges the prevalent myth of an inevitable trade-off between model interpretability and performance, demonstrating competitive results compared to state-of-the-art methods on WSI-level prediction tasks across three cancer types. In addition, we thoroughly benchmark the local- and global-interpretability of SI-MIL in terms of statistical analysis, a domain expert study, and desiderata of interpretability, namely, user-friendliness and faithfulness.	翻訳日:2023-12-27 20:09:44 公開日:2023-12-22
# ChatGPTの算数能力に及ぼすプロンプト, ペルソナ, および思考方法の連鎖の影響の評価 Assessing the Impact of Prompting, Persona, and Chain of Thought Methods on ChatGPT's Arithmetic Capabilities ( http://arxiv.org/abs/2312.15006v1 ) ライセンス: Link先を確認	Yuhao Chen, Chloe Wong, Hanwen Yang, Juan Aguenza, Sai Bhujangari, Benthan Vu, Xun Lei, Amisha Prasad, Manny Fluss, Eric Phuong, Minghao Liu, James Davis	(参考訳) 本研究は,OpenAIの言語モデルChatGPTの数学的習熟度を,戦略的プロンプト,ペルソナ実装,思考の連鎖といった3つの規範的手法の効率に対して,デフォルトの計算能力を近似することで評価する。この評価は、数学の広い範囲と複雑さのレベルを包含する、数学、gsm8k、mmluデータセットの多様で広範な問題集合を活用した。モデルの数学的精度を高めるためにこれらの介入の有効性を判断するために洗練されたグレーディングスクリプトが設計された。期待に反して,実験手法ではchatgptのベースライン性能が大幅に向上することはなかった。いくつかのケースでは、これらの介入は不注意にモデルの応答生成を妨害した。この調査は、言語モデルの性能向上のための革新的な戦略の追求は依然として重要であるが、本研究では、ChatGPTの計算能力に大きな改善をもたらすことはなかった。これらの知見は、様々な領域にまたがるモデルの精度と信頼性を高めるために、より包括的な研究と新しい技術の探索の重要性を浮き彫りにしている。 This study critically evaluates the mathematical proficiency of OpenAI's language model, ChatGPT, by juxtaposing its default computational capabilities against the efficiency of three prescriptive methods: strategic prompting, persona implementation, and the Chain of Thought approach. The evaluation harnessed the diverse and extensive problem sets from the MATH, GSM8K, and MMLU data-sets, which encompassing a broad spectrum of mathematical conundrums and levels of complexity. A sophisticated grading script was designed to determine the efficacy of these interventions in enhancing the model's mathematical precision. Contrary to expectations, our empirical analysis revealed that none of the trialed methods substantially improved ChatGPT's baseline performance. In some cases, these interventions inadvertently disrupted the model's response generation. This investigation concluded that while the pursuit of innovative strategies for augmenting language model performance remains crucial, the specific methods examined within this study did not induce significant improvements in ChatGPT's computational aptitude. These findings underscore the importance of further comprehensive research and exploration of novel techniques to enhance the precision and dependability of such models across diverse domains.	翻訳日:2023-12-27 20:09:18 公開日:2023-12-22
# FineMoGen: 微粒な時空間運動生成と編集 FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing ( http://arxiv.org/abs/2312.15004v1 ) ライセンス: Link先を確認	Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei Liu	(参考訳) テキスト駆動モーション生成は拡散モデルの出現によって大きく進歩した。しかし、既存の手法では、細かな記述に対応する複雑な動き列を生成するのに苦労しており、詳細かつ正確な時空間的動作を描写している。この制御性の欠如は、モーション生成の使用をより多くのオーディエンスに制限する。このような課題に対処するために,ユーザの指示に空間的時間的組成を組み込んだ微細な動きを合成できる拡散型モーション生成・編集フレームワークであるFineMoGenを提案する。具体的には、FineMoGenはSAMI(Spatio-Temporal Mixture Attention)と呼ばれる新しいトランスフォーマーアーキテクチャで拡散モデルを構築している。 SAMIは2つの視点からグローバルアテンションテンプレートの生成を最適化する。 1)時空間構成の制約を明示的にモデル化し, 2) 微粒化を適応的に抽出するために, スパース活性混合物を利用する。本研究は,2,968本の動画と102,336本の微細な時空間記述からなるHumman-MoGenデータセットを寄贈する。大規模な実験により、FineMoGenは最先端の手法よりも優れたモーション生成品質を示すことが示された。特に、FinMoGenは、最新の大言語モデル(LLM)の助けを借りて、よりきめ細かな命令で動きシーケンスを忠実に操作することで、ゼロショットモーション編集を可能にする。プロジェクトページ: https://mingyuan-zhang.github.io/projects/finemogen.html Text-driven motion generation has achieved substantial progress with the emergence of diffusion models. However, existing methods still struggle to generate complex motion sequences that correspond to fine-grained descriptions, depicting detailed and accurate spatio-temporal actions. This lack of fine controllability limits the usage of motion generation to a larger audience. To tackle these challenges, we present FineMoGen, a diffusion-based motion generation and editing framework that can synthesize fine-grained motions, with spatial-temporal composition to the user instructions. Specifically, FineMoGen builds upon diffusion model with a novel transformer architecture dubbed Spatio-Temporal Mixture Attention (SAMI). SAMI optimizes the generation of the global attention template from two perspectives: 1) explicitly modeling the constraints of spatio-temporal composition; and 2) utilizing sparsely-activated mixture-of-experts to adaptively extract fine-grained features. To facilitate a large-scale study on this new fine-grained motion generation task, we contribute the HuMMan-MoGen dataset, which consists of 2,968 videos and 102,336 fine-grained spatio-temporal descriptions. Extensive experiments validate that FineMoGen exhibits superior motion generation quality over state-of-the-art methods. Notably, FineMoGen further enables zero-shot motion editing capabilities with the aid of modern large language models (LLM), which faithfully manipulates motion sequences with fine-grained instructions. Project Page: https://mingyuan-zhang.github.io/projects/FineMoGen.html	翻訳日:2023-12-27 20:08:54 公開日:2023-12-22
# 適応型ドメイン推論攻撃 Adaptive Domain Inference Attack ( http://arxiv.org/abs/2312.15088v1 ) ライセンス: Link先を確認	Yuechun Gu, Keke Chen	(参考訳) ディープニューラルネットワークは、医療やセキュリティといったセンシティブなアプリケーションドメインにますますデプロイされているため、これらのモデルからどのようなセンシティブな情報を推測できるかを理解する必要がある。既存のモデルターゲティング攻撃はすべて、攻撃者がアプリケーションドメインやトレーニングデータ分散を知っていると仮定する。これらの攻撃からモデルを保護するモデルAPIからドメイン情報を削除できるだろうか? 本稿では,この問題について考察する。残念なことに、最小限の知識、すなわち入力と出力の意味を漏らさずにモデルにアクセスしても、提案された適応ドメイン推論攻撃(ADI)はトレーニングデータの関連するサブセットをうまく推定することができる。抽出された関連データは,例えばモデル・インバージョン攻撃の性能が著しく向上することを示す。具体的には、利用可能な公開データセットとプライベートデータセットの集合の上に構築された概念階層と、未知のトレーニングデータに現れる葉の概念の可能性を適応的に調整する新しいアルゴリズムを利用する。 ADI攻撃は概念レベルで部分的なトレーニングデータを抽出するだけでなく、高速に収束し、他のドメイン推論攻撃であるGDIよりもはるかに少ないターゲットモデルアクセスを必要とする。 As deep neural networks are increasingly deployed in sensitive application domains, such as healthcare and security, it's necessary to understand what kind of sensitive information can be inferred from these models. Existing model-targeted attacks all assume the attacker has known the application domain or training data distribution, which plays an essential role in successful attacks. Can removing the domain information from model APIs protect models from these attacks? This paper studies this critical problem. Unfortunately, even with minimal knowledge, i.e., accessing the model as an unnamed function without leaking the meaning of input and output, the proposed adaptive domain inference attack (ADI) can still successfully estimate relevant subsets of training data. We show that the extracted relevant data can significantly improve, for instance, the performance of model-inversion attacks. Specifically, the ADI method utilizes a concept hierarchy built on top of a large collection of available public and private datasets and a novel algorithm to adaptively tune the likelihood of leaf concepts showing up in the unseen training data. The ADI attack not only extracts partial training data at the concept level, but also converges fast and requires much fewer target-model accesses than another domain inference attack, GDI.	翻訳日:2023-12-27 20:02:27 公開日:2023-12-22
# hypermix: アウトオブディストリビューションの検出と分類 HyperMix: Out-of-Distribution Detection and Classification in Few-Shot Settings ( http://arxiv.org/abs/2312.15086v1 ) ライセンス: Link先を確認	Nikhil Mehta, Kevin J Liang, Jing Huang, Fu-Jen Chu, Li Yin, Tal Hassner	(参考訳) アウト・オブ・ディストリビューション(OOD)検出は、現実世界の機械学習システムにとって重要なトピックであるが、限定的な分散サンプルによる設定は過小評価されている。モデルがOODサンプルを識別する前にデータ配布を学習する機会が少ないため、このような数ショットのOOD設定は難しい。実際、最近の最先端OOD法は、数ショット設定で単純なベースラインを上回りません。そこで我々はHyperMixと呼ばれるハイパーネットワークフレームワークを提案し、生成した分類器パラメータのMixupと、追加のoutlierデータセットを必要としない自然なout-of-episodeoutlierエクスポージャー手法を提案する。我々はCIFAR-FSとMiniImageNetで実験を行い、数ショットで他のOOD法よりも優れています。 Out-of-distribution (OOD) detection is an important topic for real-world machine learning systems, but settings with limited in-distribution samples have been underexplored. Such few-shot OOD settings are challenging, as models have scarce opportunities to learn the data distribution before being tasked with identifying OOD samples. Indeed, we demonstrate that recent state-of-the-art OOD methods fail to outperform simple baselines in the few-shot setting. We thus propose a hypernetwork framework called HyperMix, using Mixup on the generated classifier parameters, as well as a natural out-of-episode outlier exposure technique that does not require an additional outlier dataset. We conduct experiments on CIFAR-FS and MiniImageNet, significantly outperforming other OOD methods in the few-shot regime.	翻訳日:2023-12-27 20:02:06 公開日:2023-12-22
# 森林自動在庫:3次元深層学習による高密度空中LiDAR点雲の解析 Automated forest inventory: analysis of high-density airborne LiDAR point clouds with 3D deep learning ( http://arxiv.org/abs/2312.15084v1 ) ライセンス: Link先を確認	Binbin Xiang and Maciej Wielgosz and Theodora Kontogianni and Torben Peters and Stefano Puliti and Rasmus Astrup and Konrad Schindler	(参考訳) 詳細な森林在庫は、森林資源の持続的かつ柔軟な管理、様々な生態系の維持に不可欠である。現代の空中レーザースキャナーは、高密度の点雲を微細な森林の在庫と分析に大いに活用するが、点雲を個々の木や木の構成要素のような有意義な実体に自動的に分割することは課題である。本研究は,このギャップを埋めることを目的として,多様な森林タイプや地理的領域にまたがるセグメンテーションが可能なディープラーニングフレームワークを導入する。区分けされたデータから、個々の木の生物物理学的パラメータとスタンドを導出する。このシステムは、調査ドローンを使って5つの国で買収されたポイントクラウドのデータセットであるfor-instanceでテストされている。セグメンテーションのバックエンドは、各木の85%以上のFスコアを達成しており、それぞれ73%以上は、地上、低植生、茎、生きた枝、枯れた枝の5つの意味カテゴリーでIoUの平均値である。セグメンテーションの結果に基づいて、パイプラインは個々の木の生物物理特性(直径、クラウン径、クラウン体積、dbh、位置)とスタンドごとの特性(デジタル地形モデルとスタンド密度)を密に計算します。特にクラウン関連の特徴は,ほとんどの場合高い精度で回収されるが,DBHと位置推定の信頼性は低い。 Detailed forest inventories are critical for sustainable and flexible management of forest resources, to conserve various ecosystem services. Modern airborne laser scanners deliver high-density point clouds with great potential for fine-scale forest inventory and analysis, but automatically partitioning those point clouds into meaningful entities like individual trees or tree components remains a challenge. The present study aims to fill this gap and introduces a deep learning framework that is able to perform such a segmentation across diverse forest types and geographic regions. From the segmented data, we then derive relevant biophysical parameters of individual trees as well as stands. The system has been tested on FOR-Instance, a dataset of point clouds that have been acquired in five different countries using surveying drones. The segmentation back-end achieves over 85% F-score for individual trees, respectively over 73% mean IoU across five semantic categories: ground, low vegetation, stems, live branches and dead branches. Building on the segmentation results our pipeline then densely calculates biophysical features of each individual tree (height, crown diameter, crown volume, DBH, and location) and properties per stand (digital terrain model and stand density). Especially crown-related features are in most cases retrieved with high accuracy, whereas the estimates for DBH and location are less reliable, due to the airborne scanning setup.	翻訳日:2023-12-27 20:01:50 公開日:2023-12-22
# リッチランキングの学習 Learning Rich Rankings ( http://arxiv.org/abs/2312.15081v1 ) ライセンス: Link先を確認	Arjun Seshadri, Stephen Ragain, Johan Ugander	(参考訳) ランク付けの基礎はよく確立されているが、ランキング文学は主に単純なユニモーダルモデル(例えば、マロとプラケット=ルースモデル)に焦点を当てており、1つの順序付けを中心に分布を定義する。明示的な混合モデルはマルチモーダルランキングデータをモデル化するためのツールを提供しているが、そのようなモデルをデータから学習することは難しいことが多い。本研究では,最近の選択モデリングの進歩を活かし,階層空間に自然な多様性と豊かさをもたらす,文脈的反復選択(crs)モデルを提案する。構造依存型テールリスクと予測リスクバウンダリによるモデルの下での最大推定の厳密な理論的保証を提供する。副産物として,多項ロジット(mnl)選択モデルとプラケットルース(pl)ランキングモデル,およびplランキングモデルに紐づけられた第1のテールリスクについて,最大確率推定値の予測リスクに関する最初の厳密な境界を設ける。 crsモデルは、レースからランク選択投票まで、さまざまな設定で現実世界のランキングデータをモデル化する既存の方法を大幅に上回っている。 Although the foundations of ranking are well established, the ranking literature has primarily been focused on simple, unimodal models, e.g. the Mallows and Plackett-Luce models, that define distributions centered around a single total ordering. Explicit mixture models have provided some tools for modelling multimodal ranking data, though learning such models from data is often difficult. In this work, we contribute a contextual repeated selection (CRS) model that leverages recent advances in choice modeling to bring a natural multimodality and richness to the rankings space. We provide rigorous theoretical guarantees for maximum likelihood estimation under the model through structure-dependent tail risk and expected risk bounds. As a by-product, we also furnish the first tight bounds on the expected risk of maximum likelihood estimators for the multinomial logit (MNL) choice model and the Plackett-Luce (PL) ranking model, as well as the first tail risk bound on the PL ranking model. The CRS model significantly outperforms existing methods for modeling real world ranking data in a variety of settings, from racing to rank choice voting.	翻訳日:2023-12-27 20:01:21 公開日:2023-12-22
# クーパー対対光による高次光子過程のスペクトルシグネチャ Spectral signature of high-order photon processes mediated by Cooper-pair pairing ( http://arxiv.org/abs/2312.15075v1 ) ライセンス: Link先を確認	W. C. Smith, A. Borgognoni, M. Villiers, E. Roverc'h, J. Palomo, M. R. Delbecq, T. Kontos, P. Campagne-Ibarcq, B. Dou\c{c}ot, Z. Leghtas	(参考訳) 個々の光子間の相互作用を誘導することは、フォトニック量子情報処理や多体光子状態に関する基礎研究に必須である。強い相互作用と低損失を組み合わせるのに適した分野は、マイクロ波量子光学と超伝導回路である。光子は典型的には$LC$の回路に格納され、ジョセフソントンネル接合によって回路が絞られると相互作用が現れる。重要な点は、接合部を横切る超伝導相の零点揺らぎが誘導相互作用の強さと秩序を制御することである。超伝導回路は、位相ゆらぎが単体よりも小さく、カー効果として知られる2光子相互作用が支配的な状態において、ほぼ独占的に動作している。この実験では、2対のクーパーペアのみをトンネルに通すダイポールで、高インピーダンスの$LC$発振器をシャットダウンした。このペアリングによって効果的に2倍になる位相変動は3.4に達する。この極端なゆらぎの状況では、無調和なはしごを登るとき、非単調に変化する遷移周波数を観測する。この測定結果から, 2-, 3-, 4-光子相互作用エネルギーを等価振幅で抽出し, すべて光子損失率を上回った。この研究は、多光子量子論理から高相関のマイクロ波放射の研究まで、マイクロ波量子光学における高次光子相互作用の新しい状態を探究する。 Inducing interactions between individual photons is essential for applications in photonic quantum information processing and fundamental research on many-body photon states. A field that is well suited to combine strong interactions and low losses is microwave quantum optics with superconducting circuits. Photons are typically stored in an $LC$ circuit, and interactions appear when the circuit is shunted by a Josephson tunnel junction. Importantly, the zero-point fluctuations of the superconducting phase across the junction control the strength and order of the induced interactions. Superconducting circuits have almost exclusively operated in the regime where phase fluctuations are smaller than unity, and two-photon interactions, known as the Kerr effect, dominate. In this experiment, we shunt a high-impedance $LC$ oscillator by a dipole that only allows pairs of Cooper pairs to tunnel. Phase fluctuations, which are effectively doubled by this pairing, reach the value of 3.4. In this regime of extreme fluctuations, we observe transition frequencies that shift non-monotonically as we climb the anharmonic ladder. From this spectroscopic measurement, we extract two-, three- and four-photon interaction energies of comparable amplitude, and all exceeding the photon loss rate. This work explores a new regime of high-order photon interactions in microwave quantum optics, with applications ranging from multi-photon quantum logic to the study of highly correlated microwave radiation.	翻訳日:2023-12-27 20:01:01 公開日:2023-12-22
# 技術的重複検出のためのシームス構造を有するGPT-3インベディングの精製 Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection ( http://arxiv.org/abs/2312.15068v1 ) ライセンス: Link先を確認	Xingfang Wu, Heng Li, Nobukazu Yoshioka, Hironori Washizaki, Foutse Khomh	(参考訳) 技術的オンラインコミュニティの1つのゴールは、開発者が一箇所で正しい答えを見つけるのを助けることである。一つの質問は異なる言葉で異なる方法で問うことができ、技術的フォーラムに重複するポストが存在する。重複投稿の発見とリンクに関する問題は、開発者コミュニティと研究者の両方の注目を集めている。例えばstack overflowでは,重複記事のマークとクローズに投票ベースのメカニズムを採用している。しかし、これら繰り返し発生する重複投稿にタイムリーに対処することは、課題を生じ続けている。そのため,技術フォーラム投稿の重複投稿を自動的に検出する手法が提案されている。既存のメソッドは、投稿の意味を十分に把握できない手作りの類似度メトリクスに依存するか、パフォーマンスを改善するための監督の欠如によって、制限に苦しめられている。さらに、これらの手法の効率は、大量のデータに対して実用的でないペアワイズ特徴生成への依存によって妨げられる。本研究では,重複検出タスクのためのgpt-3組込みを採用し,改良する。 GPT-3埋め込みはポストのセマンティクスを正確に表現できると仮定する。さらに,gpt-3組込みに基づくシャム語ベースのネットワークを訓練することにより,技術フォーラム投稿における重複関係を正確に捉えた潜在埋め込みを実現する。ベンチマークデータセットを用いた実験により,提案手法の有効性を確認し,ベースライン法と比較して優れた性能を示す。最近のStack Overflowダンプで構築したデータセットに適用すると、Top-1、Top-5、Top-30の精度はそれぞれ23.1%、43.9%、68.9%に達します。マニュアル研究により,技術フォーラムでラベルなしの複製を発見できる可能性を確認した。 One goal of technical online communities is to help developers find the right answer in one place. A single question can be asked in different ways with different wordings, leading to the existence of duplicate posts on technical forums. The question of how to discover and link duplicate posts has garnered the attention of both developer communities and researchers. For example, Stack Overflow adopts a voting-based mechanism to mark and close duplicate posts. However, addressing these constantly emerging duplicate posts in a timely manner continues to pose challenges. Therefore, various approaches have been proposed to detect duplicate posts on technical forum posts automatically. The existing methods suffer from limitations either due to their reliance on handcrafted similarity metrics which can not sufficiently capture the semantics of posts, or their lack of supervision to improve the performance. Additionally, the efficiency of these methods is hindered by their dependence on pair-wise feature generation, which can be impractical for large amount of data. In this work, we attempt to employ and refine the GPT-3 embeddings for the duplicate detection task. We assume that the GPT-3 embeddings can accurately represent the semantics of the posts. In addition, by training a Siamese-based network based on the GPT-3 embeddings, we obtain a latent embedding that accurately captures the duplicate relation in technical forum posts. Our experiment on a benchmark dataset confirms the effectiveness of our approach and demonstrates superior performance compared to baseline methods. When applied to the dataset we constructed with a recent Stack Overflow dump, our approach attains a Top-1, Top-5, and Top-30 accuracy of 23.1%, 43.9%, and 68.9%, respectively. With a manual study, we confirm our approach's potential of finding unlabelled duplicates on technical forums.	翻訳日:2023-12-27 20:00:38 公開日:2023-12-22
# 時間局所非Lindbladマスター方程式の最適形式 Optimal form of time-local non-Lindblad master equations ( http://arxiv.org/abs/2312.15066v1 ) ライセンス: Link先を確認	Tobias Becker and Andr\'e Eckardt	(参考訳) 超弱系-バス結合の極限を超えた開量子系を記述する時間局所量子マスター方程式は、しばしばゴリーニ=コサコフスキー=スダルシャン=リンドブラッド形式(GKSL)ではない。代表的な例として、一般の開量子系を近似するレッドフィールド方程式や、減衰調和振動子を正確に記述したhu-paz-zhang方程式がある。ここでは、項のいくつかが負の重みを持つという事実を除いて、前者だけでなく後者もGKSL方程式に類似した散逸子で擬似Lndblad形式にすることができることを示す。さらに,擬似Lindblad方程式の散逸を変化させる変換について,正項と負項の相対重みを変化させながら体系的に検討した。これらは、最近開発された擬Lindblad方程式の量子軌道展開の収束と、GKSL方程式を得るために負項の切り離しの両方に最適である負項の重みを最小化するために使用できる。 Time-local quantum master equations that describe open quantum systems beyond the limit of ultraweak system-bath coupling are often not of Gorini-Kossakowski-Sudarshan-Lindblad (GKSL) form. Prominent examples are the Redfield equation approximating general open quantum systems and the Hu-Paz-Zhang equation exactly describing a damped harmonic oscillator. Here, we show that not only the former, but also the latter can be brought to pseudo-Lindblad form, with a dissipator that resembles that of a GKSL equation, except for the fact that some of the terms have negative weights. Moreover, we systematically investigate transformations that leave the dissipator of pseudo-Lindblad equations unchanged, while changing the relative weight between its positive and negative terms. These can be used to minimize the weights of the negative terms, which is optimal both for the convergence of a recently developed quantum-trajectory unraveling of pseudo-Lindblad equations as well as for the truncation of the negative terms to obtain a GKSL equation.	翻訳日:2023-12-27 20:00:09 公開日:2023-12-22
# 多項集合に対する排他的有限時間相関関数:量子輸送と熱力学の理論的枠組みの連結 Exact finite-time correlation functions for multi-terminal setups: Connecting theoretical frameworks for quantum transport and thermodynamics ( http://arxiv.org/abs/2312.15065v1 ) ライセンス: Link先を確認	Gianmichele Blasi, Shishir Khandelwal, and G\'eraldine Haack	(参考訳) 開量子系における輸送は、量子マスター方程式、散乱行列、ハイゼンベルク運動方程式など、様々な理論的な枠組みを通して研究することができる。フレームワークの選択は、インタラクションの存在、システムと環境の結合力、定常的あるいは一時的なレジームに焦点を当てているかどうかといった要因に依存する。既存の文献はこれらの枠組みを独立して扱い、統一的な視点を欠いている。本研究は,電圧および温度バイアス下での2段階設定において,最小レベルの量子ドットモデルを用いて,これらのアプローチの役割と現状を明らかにすることで,このギャップに対処する。粒子およびエネルギー電流の解析式と定常状態と過渡状態の両方における変動を導出する。ハイゼンベルク方程式の正確な結果は、それぞれの有効範囲内で散乱行列とマスター方程式のアプローチと一致することが示されている。まず,弱結合限界のプロトコルを確立し,ハイゼンベルクとの弱結合におけるマスター方程式の適用可能性や任意の結合強度での散乱行列アプローチを橋渡しする。 Transport in open quantum systems can be explored through various theoretical frameworks, including the quantum master equation, scattering matrix, and Heisenberg equation of motion. The choice of framework depends on factors such as the presence of interactions, the coupling strength between the system and environment, and whether the focus is on steady-state or transient regimes. Existing literature treats these frameworks independently, lacking a unified perspective. Our work addresses this gap by clarifying the role and status of these approaches using a minimal single-level quantum dot model in a two-terminal setup under voltage and temperature biases. We derive analytical expressions for particle and energy currents and their fluctuations in both steady-state and transient regimes. Exact results from the Heisenberg equation are shown to align with scattering matrix and master equation approaches within their respective validity regimes. Crucially, we establish a protocol for the weak-coupling limit, bridging the applicability of master equations at weak-coupling with Heisenberg or scattering matrix approaches at arbitrary coupling strength.	翻訳日:2023-12-27 19:59:46 公開日:2023-12-22
# マルチモーダルMRIデータを用いた自己監督型コントラスト学習 : 異常神経発達予測に向けて Joint Self-Supervised and Supervised Contrastive Learning for Multimodal MRI Data: Towards Predicting Abnormal Neurodevelopment ( http://arxiv.org/abs/2312.15064v1 ) ライセンス: Link先を確認	Zhiyuan Li, Hailong Li, Anca L. Ralescu, Jonathan R. Dillman, Mekibib Altaye, Kim M. Cecil, Nehal A. Parikh, Lili He	(参考訳) 構造,拡散テンソル,機能的磁気共鳴画像などの異なる画像モダリティの深層学習モデルとの融合により,表現特性の識別や疾患診断の強化が期待できる結果となった。このような手法の開発は、当初は異なる表現空間内に存在する異種多様特徴の効率的な融合にかかっている。マルチモーダルな特徴をネゴライズすることは相補的な情報を適切に捉えず、冗長性さえも生み出す。本研究では,マルチモーダルMRIデータから頑健な潜在特徴表現を学習し,異種特徴の共通空間への投射を可能にし,相補的情報と類似的情報の両方を様々なモダリティと類似した主題に集約する,新しい共同教師付きコントラスト学習法を提案する。提案手法と代替的な深層マルチモーダル学習手法の比較分析を行った。 2つの独立したデータセットに対する広範な実験により,本手法は異常な神経発達を予測するための他の深層マルチモーダル学習法よりも優れていることが示された。本手法は,マルチモーダルデータのパワーを活用し,臨床におけるコンピュータ支援診断を容易にする能力を有する。 The integration of different imaging modalities, such as structural, diffusion tensor, and functional magnetic resonance imaging, with deep learning models has yielded promising outcomes in discerning phenotypic characteristics and enhancing disease diagnosis. The development of such a technique hinges on the efficient fusion of heterogeneous multimodal features, which initially reside within distinct representation spaces. Naively fusing the multimodal features does not adequately capture the complementary information and could even produce redundancy. In this work, we present a novel joint self-supervised and supervised contrastive learning method to learn the robust latent feature representation from multimodal MRI data, allowing the projection of heterogeneous features into a shared common space, and thereby amalgamating both complementary and analogous information across various modalities and among similar subjects. We performed a comparative analysis between our proposed method and alternative deep multimodal learning approaches. Through extensive experiments on two independent datasets, the results demonstrated that our method is significantly superior to several other deep multimodal learning methods in predicting abnormal neurodevelopment. Our method has the capability to facilitate computer-aided diagnosis within clinical practice, harnessing the power of multimodal data.	翻訳日:2023-12-27 19:59:29 公開日:2023-12-22
# 非線形抵抗ネットワークに対する普遍近似定理 A universal approximation theorem for nonlinear resistive networks ( http://arxiv.org/abs/2312.15063v1 ) ライセンス: Link先を確認	Benjamin Scellier, Siddhartha Mishra	(参考訳) レジストレータネットワークは近年、エネルギー効率のよい自己学習マシンの基盤として注目されている。この研究は、これらの抵抗ネットワークの計算能力を研究する。電圧源,リニア抵抗器,ダイオード,電圧制御電圧源(vcvs)からなる電気ネットワークは,任意の連続機能を実現することができることを示す。これを証明するために、回路要素は理想的であり、可変抵抗器のコンダクタンスとvcvsの増幅係数は任意に小さく、あるいは任意に大きい値を取ることができると仮定する。また,本稿では,このような自己学習型電気ネットワークの設計について述べる。 Resistor networks have recently had a surge of interest as substrates for energy-efficient self-learning machines. This work studies the computational capabilities of these resistor networks. We show that electrical networks composed of voltage sources, linear resistors, diodes and voltage-controlled voltage sources (VCVS) can implement any continuous functions. To prove it, we assume that the circuit elements are ideal and that the conductances of variable resistors and the amplification factors of the VCVS's can take arbitrary values -- arbitrarily small or arbitrarily large. The constructive nature of our proof could also inform the design of such self-learning electrical networks.	翻訳日:2023-12-27 19:59:05 公開日:2023-12-22
# アニマタブルヒトアバターのための変形可能な3次元ガウススプラッティング Deformable 3D Gaussian Splatting for Animatable Human Avatars ( http://arxiv.org/abs/2312.15059v1 ) ライセンス: Link先を確認	HyunJun Jung, Nikolas Brasch, Jifei Song, Eduardo Perez-Pellitero, Yiren Zhou, Zhihao Li, Nassir Navab, Benjamin Busam	(参考訳) 近年のニューラルラディアンス分野の進歩は、人間のアニメーションのシナリオに適用可能な、動的設定におけるフォトリアリスティック画像の新しいビュー合成を可能にする。しかし、正確なモデルを確立するために暗黙のバックボーンは、多くの入力ビューと人間のマスク、uvマップ、深度マップなどの追加アノテーションを必要とする。本研究では,1つの単細胞配列からデジタルアバターを構築するための完全明示的なアプローチであるpardy-human (parameterized dynamic human avatar)を提案する。 pardy-human は 3d gaussian splatting にパラメータ駆動ダイナミクスを導入し、3d gaussian は人間のポーズモデルによって変形してアバターをアニメーション化する。本手法は, 正準3次元ガウス多様体をsmpl頂点に従って変形する第1モジュールと, 設計したジョイント符号化を更に取り入れてガウス変形ごとに予測し, smpl頂点変形を超えるダイナミクスを扱う連続モジュールの2つの部分からなる。画像はラスタライザーによって合成される。 pardy-humanは、リアルな動的人間のアバターのための明示的なモデルを構成する。当社のアバター学習にはマスクなどの追加アノテーションが不要で,ユーザのハードウェア上でも,フル解像度の画像の推測を効率的に行うことができる。本稿では,ZJU-MoCap と THUman4.0 データセットにおいて,ParDy-Human が最先端の手法よりも定量的かつ視覚的に優れていることを示す実験的証拠を提供する。 Recent advances in neural radiance fields enable novel view synthesis of photo-realistic images in dynamic settings, which can be applied to scenarios with human animation. Commonly used implicit backbones to establish accurate models, however, require many input views and additional annotations such as human masks, UV maps and depth maps. In this work, we propose ParDy-Human (Parameterized Dynamic Human Avatar), a fully explicit approach to construct a digital avatar from as little as a single monocular sequence. ParDy-Human introduces parameter-driven dynamics into 3D Gaussian Splatting where 3D Gaussians are deformed by a human pose model to animate the avatar. Our method is composed of two parts: A first module that deforms canonical 3D Gaussians according to SMPL vertices and a consecutive module that further takes their designed joint encodings and predicts per Gaussian deformations to deal with dynamics beyond SMPL vertex deformations. Images are then synthesized by a rasterizer. ParDy-Human constitutes an explicit model for realistic dynamic human avatars which requires significantly fewer training views and images. Our avatars learning is free of additional annotations such as masks and can be trained with variable backgrounds while inferring full-resolution images efficiently even on consumer hardware. We provide experimental evidence to show that ParDy-Human outperforms state-of-the-art methods on ZJU-MoCap and THUman4.0 datasets both quantitatively and visually.	翻訳日:2023-12-27 19:58:53 公開日:2023-12-22
# サードパーティの機械学習モデルとデータセットのドキュメンテーションプラクティスの現状 The State of Documentation Practices of Third-party Machine Learning Models and Datasets ( http://arxiv.org/abs/2312.15058v1 ) ライセンス: Link先を確認	Ernesto Lang Oreamuno, Rohan Faiyaz Khan, Abdul Ali Bangash, Catherine Stinson, Bram Adams	(参考訳) モデルストアは、プロジェクト統合が容易なサードパーティのmlモデルとデータセットを提供し、コーディング作業を最小化する。モデルやデータセットカードなどのドキュメント標準を活用して、これらのモデルとデータセットの詳細な仕様をドキュメントに見つけたいと思っています。本研究では,現在使用されている最大のモデルストアであるHugging Face (HF)において,モデルカードとデータセットカードの文書化の実践状況を評価するために,統計解析とハイブリッドカードソートを用いる。その結果,21,902モデル (39.62\%) と1,925データセット (28.48\%) のみがドキュメントを持っていることがわかった。さらに,mlモデルやデータセットに対する倫理や透明性に関する文書の一貫性の欠如を観察する。 Model stores offer third-party ML models and datasets for easy project integration, minimizing coding efforts. One might hope to find detailed specifications of these models and datasets in the documentation, leveraging documentation standards such as model and dataset cards. In this study, we use statistical analysis and hybrid card sorting to assess the state of the practice of documenting model cards and dataset cards in one of the largest model stores in use today--Hugging Face (HF). Our findings show that only 21,902 models (39.62\%) and 1,925 datasets (28.48\%) have documentation. Furthermore, we observe inconsistency in ethics and transparency-related documentation for ML models and datasets.	翻訳日:2023-12-27 19:58:25 公開日:2023-12-22
# 効率的なGWAS特徴選択のための深層学習 Deep Learning for Efficient GWAS Feature Selection ( http://arxiv.org/abs/2312.15055v1 ) ライセンス: Link先を確認	Kexuan Li	(参考訳) ゲノムワイド・アソシエーション研究(gwas)は、大きなゲノムデータの時代において、特に遺伝学的特徴の数が利用可能なサンプルを大幅に超える超高次元データセットを扱う際に、ユニークな課題に直面している。本稿では,超高次元gwasデータに関連する複雑な問題に対処するために,mirzaeiら(2020)によって提案された特徴選択手法の拡張を提案する。拡張アプローチは,学生ネットワークにフロベニウス規範のペナルティを導入し,多数の特徴と限られたサンプルで特徴付けられるシナリオに適応する能力を高めることで,元の手法を強化する。教師なし設定と教師なし設定の両方でシームレスに動作し、2つの重要なニューラルネットワークを用いる。 1つ目は、次元減少のためにオートエンコーダまたは教師付きオートエンコーダを利用し、超高次元ゲノムデータから顕著な特徴を抽出する。第2のネットワークは、単一の隠蔽層を持つ正規化フィードフォワードモデルであり、正確な特徴選択のために設計されている。学生ネットワークにおけるフロベニウスのノルムペナルティの導入は、超高次元GWASデータセットがもたらす課題に対する方法のレジリエンスを著しく向上させる。 GWASデータの特徴選択における提案手法の有効性を実験的に検証した。この手法は超高次元設定の複雑さを扱うだけでなく、ゲノムデータに存在するニュアンス構造に優れた適応性を示す。提案手法の柔軟性と汎用性は,提案手法が様々な実験で成功していることに起因している。 Genome-Wide Association Studies (GWAS) face unique challenges in the era of big genomics data, particularly when dealing with ultra-high-dimensional datasets where the number of genetic features significantly exceeds the available samples. This paper introduces an extension to the feature selection methodology proposed by Mirzaei et al. (2020), specifically tailored to tackle the intricacies associated with ultra-high-dimensional GWAS data. Our extended approach enhances the original method by introducing a Frobenius norm penalty into the student network, augmenting its capacity to adapt to scenarios characterized by a multitude of features and limited samples. Operating seamlessly in both supervised and unsupervised settings, our method employs two key neural networks. The first leverages an autoencoder or supervised autoencoder for dimension reduction, extracting salient features from the ultra-high-dimensional genomic data. The second network, a regularized feed-forward model with a single hidden layer, is designed for precise feature selection. The introduction of the Frobenius norm penalty in the student network significantly boosts the method's resilience to the challenges posed by ultra-high-dimensional GWAS datasets. Experimental results showcase the efficacy of our approach in feature selection for GWAS data. The method not only handles the inherent complexities of ultra-high-dimensional settings but also demonstrates superior adaptability to the nuanced structures present in genomics data. The flexibility and versatility of our proposed methodology are underscored by its successful performance across a spectrum of experiments.	翻訳日:2023-12-27 19:58:15 公開日:2023-12-22
# 変分量子アルゴリズムのための階層型マルチグリッドアンサッツ Hierarchical Multigrid Ansatz for Variational Quantum Algorithms ( http://arxiv.org/abs/2312.15048v1 ) ライセンス: Link先を確認	Christo Meriwether Keller, Stephan Eidenbenz, Andreas B\"artschi, Daniel O'Malley, John Golden, Satyajayant Misra	(参考訳) 量子コンピューティングは、基礎物理学を用いてスーパーコンピューティングを強化することを約束する工学の新しいトピックである。短期的には、この利点を達成する最良の候補アルゴリズムは変分量子アルゴリズム(VQA)である。本稿では,変分量子固有解法(VQE)を中心に,新しいVQAアンサッツの設計と数値評価を行う。私たちの ansatz は、古典的なマルチグリッド階層メソッドにインスパイアされているので、これを "multigrid'' ansatz" と呼んでいます。マルチグリッドアンサッツは、より小さなキュービット数に対する回路を連続的に構築し最適化することにより、$n$ qubits上の量子問題に対するパラメータ化量子回路を生成し、$j+1$の次の階層に対する初期解として最適化されたパラメータ値を再利用する。数値シミュレーションにより,Laplacian 固有解器の解法品質やMaxCut と Maximum $k$-Satisfiability の具体例による組合せ最適化問題において,マルチグリッドアンサッツは標準的なハードウェア効率のアンサッツよりも優れていることを示す。本稿では,多くのVQAの候補としてマルチグリッドアンサッツが確立され,特に組合せ最適化問題に対するQAOAアプローチの代替として有望であることを示す。 Quantum computing is an emerging topic in engineering that promises to enhance supercomputing using fundamental physics. In the near term, the best candidate algorithms for achieving this advantage are variational quantum algorithms (VQAs). We design and numerically evaluate a novel ansatz for VQAs, focusing in particular on the variational quantum eigensolver (VQE). As our ansatz is inspired by classical multigrid hierarchy methods, we call it "multigrid'' ansatz. The multigrid ansatz creates a parameterized quantum circuit for a quantum problem on $n$ qubits by successively building and optimizing circuits for smaller qubit counts $j < n$, reusing optimized parameter values as initial solutions to next level hierarchy at $j+1$. We show through numerical simulation that the multigrid ansatz outperforms the standard hardware-efficient ansatz in terms of solution quality for the Laplacian eigensolver as well as for a large class of combinatorial optimization problems with specific examples for MaxCut and Maximum $k$-Satisfiability. Our studies establish the multi-grid ansatz as a viable candidate for many VQAs and in particular present a promising alternative to the QAOA approach for combinatorial optimization problems.	翻訳日:2023-12-27 19:57:49 公開日:2023-12-22
# 測位および通信のための最適ノイズ絡み合い試験 Optimal noisy entanglement testing for ranging and communication ( http://arxiv.org/abs/2312.15047v1 ) ライセンス: Link先を確認	Pengcheng Liao and Quntao Zhuang	(参考訳) 量子システム$S$が他のシステム$I$と絡み合うと、絡み合いテストの問題は発生し、$m \ge 2$の同一システム内のシステム$S$が識別される。このシナリオは、量子レンジングおよび絡み合い支援通信(Phys. Rev. Lett. 126, 240501, (2021)]で発生する測定タスクのモデルとして機能する。この文脈では、最適測定アプローチは典型的にはすべての$m+1$システムの共同測定を伴う。しかし、システム$s$を含むサブシステムが絡み合うノイズにさらされている場合、これはそうではないことを実証する。提案手法は,最近開発された相関-変位変換の計測手法を利用する。我々は,m+1$システム上での局所的操作と古典的通信(locc)で実装可能な,絡み合いテスト計測のための構造化設計を提案する。さらに, この測定手法は, 雑音条件下で漸近的に誤差確率の観点から最適性が得られることを示す。量子照明に適用すると, 信号の輝度が低く, ノイズのレベルが高いシナリオにおいて, 最適範囲の計測が可能となる。同様に、エンタングルメント支援の古典的通信に適用すると、測定設計は通信速度、特に信号の輝度が低いシナリオにおいて、相対的に有利となる。 Given a quantum system $S$ entangled with another system $I$, the entanglement testing problem arises, prompting the identification of the system $S$ within a set of $m \ge 2$ identical systems. This scenario serves as a model for the measurement task encountered in quantum ranging and entanglement-assisted communication [Phys. Rev. Lett. 126, 240501, (2021)]. In this context, the optimal measurement approach typically involves joint measurements on all $m+1$ systems. However, we demonstrate that this is not the case when the subsystems containing system $S$ are subjected to entanglement-breaking noise. Our approach utilizes the recently developed measurement technique of correlation-to-displacement conversion. We present a structured design for the entanglement testing measurement, implementable with local operations and classical communications (LOCC) on the $m+1$ systems. Furthermore, we prove that this measurement approach achieves optimality in terms of error probability asymptotically under noisy conditions. When applied to quantum illumination, our measurement design enables optimal ranging in scenarios with low signal brightness and high levels of noise. Similarly, when applied to entanglement-assisted classical communication, the measurement design leads to a significant relative advantage in communication rates, particularly in scenarios with low signal brightness.	翻訳日:2023-12-27 19:57:27 公開日:2023-12-22
# EGAIN: 拡張GANインバージョン EGAIN: Extended GAn INversion ( http://arxiv.org/abs/2312.15116v1 ) ライセンス: Link先を確認	Wassim Kabbani, Marcel Grimmer, Christoph Busch	(参考訳) GAN(Generative Adversarial Networks)は近年顕著な進歩を目の当たりにしており、より高品質な画像を生成している。近年のGANは、アンタングル空間における特徴を符号化し、ポーズ、照明、性別などの生成された顔画像の様々な意味的属性を正確に制御できることが証明されている。 GANの潜在空間に画像を投影するGANインバージョンは、実際の顔画像の顔意味論を操作するための扉を開く。これは顔認識システムの性能評価など、多くのアプリケーションで有用である。本稿では,GAN逆変換モデルを構築するためのアーキテクチャであるEGAINについて述べる。このアーキテクチャは、以前のganインバージョンモデルの欠点のいくつかを明示的に取り扱う。このアーキテクチャをベースとした同名の固有モデルも提案され、最先端モデルよりも優れた再構築品質を示し、EGAINアーキテクチャの有効性を示す。 Generative Adversarial Networks (GANs) have witnessed significant advances in recent years, generating increasingly higher quality images, which are non-distinguishable from real ones. Recent GANs have proven to encode features in a disentangled latent space, enabling precise control over various semantic attributes of the generated facial images such as pose, illumination, or gender. GAN inversion, which is projecting images into the latent space of a GAN, opens the door for the manipulation of facial semantics of real face images. This is useful for numerous applications such as evaluating the performance of face recognition systems. In this work, EGAIN, an architecture for constructing GAN inversion models, is presented. This architecture explicitly addresses some of the shortcomings in previous GAN inversion models. A specific model with the same name, egain, based on this architecture is also proposed, demonstrating superior reconstruction quality over state-of-the-art models, and illustrating the validity of the EGAIN architecture.	翻訳日:2023-12-27 19:51:58 公開日:2023-12-22
# 非退化パラメトリック増幅器のベリー位相とマンデルパラメータ Berry phase and the Mandel parameter of the non-degenerate parametric amplifier ( http://arxiv.org/abs/2312.15114v1 ) ライセンス: Link先を確認	J. C. Vega, E. Chore\~no, D. Ojeda-Guill\'en and R. D. Mota	(参考訳) 我々は、$SU(1,1)$群の代数的アプローチから非退化パラメトリック増幅問題を研究する。我々は、この問題のハミルトニアンを$SU(1,1)$群のボソン生成子と差分作用素の項で記述する。我々は、このハミルトニアンを正確に解くために傾き変換を適用し、そのエネルギースペクトルと固有関数を得る。そして、ハミルトニアンが時間の明示的な関数であると仮定することで、ベリー位相を計算する。最後に、光子数 $n_a$ と $n_b$ の Mandel $Q-$parameter を得る。 We study the non-degenerate parametric amplifier problem from an algebraic approach of the $SU(1,1)$ group. We write the Hamiltonian of this problem in terms of the boson generators of the $SU(1,1)$ group and the difference operator. We apply the tilting transformation to our results to exactly solve this Hamiltonian and obtain its energy spectrum and eigenfunctions. Then, by assuming that our Hamiltonian is an explicit function of time we calculate its Berry phase. Finally we obtain the Mandel $Q-$parameter of the photon numbers $n_a$ and $n_b$.	翻訳日:2023-12-27 19:51:44 公開日:2023-12-22
# ドライバーと歩行者の相互作用を理解してドライバーの利得を予測する:ミネソタ州で収集された自然主義的オープンソースデータセット Understanding driver-pedestrian interactions to predict driver yielding: naturalistic open-source dataset collected in Minnesota ( http://arxiv.org/abs/2312.15113v1 ) ライセンス: Link先を確認	Tianyi Li, Joshua Klavins, Te Xu, Niaz Mahmud Zafri, Raphael Stern	(参考訳) 交通量、車両速度、道路特性など、ドライバーとペデストリアンの相互作用の成果に影響を与える多くの要因がある。これらの相互作用の個々の側面は研究されているが、特に建設環境がドライバの利得行動に与える影響を考えると、包括的で自然主義的な研究は欠落している。このギャップに対処するために、ミネソタ州横断の18の未指定交差点でビデオデータから収集された広範なオープンソースデータセットを紹介した。このデータセットは3000以上のインタラクションを文書化し、ドライバとペデストリアンのインタラクションと50以上の異なるコンテキスト変数の詳細なビューを提供する。個々のドライバーと歩行者のインタラクションとコンテキスト要因をカバーするデータは、https://github.com/tianyi17/pedestrian_yielding_data_MNで公開されている。ロジスティック回帰法を用いて,特定変数に基づいてドライバの利得を予測する分類モデルを開発した。分析の結果,自動車の速度,駐車場の存在,公園や学校に近い距離,道路横断道路の幅は,未指定交差点での運転者収量に大きな影響を及ぼすことがわかった。この研究は、米国で最も包括的なドライバー-ペデストリアンデータセットの1つに寄与し、交通安全改善のための貴重な洞察を提供する。この情報を利用できるようにすることで、ミネソタ州と米国中のコミュニティが歩行者の道路安全を改善する努力を続けていることを支援します。 Many factors influence the yielding result of a driver-pedestrian interaction, including traffic volume, vehicle speed, roadway characteristics, etc. While individual aspects of these interactions have been explored, comprehensive, naturalistic studies, particularly those considering the built environment's influence on driver-yielding behavior, are lacking. To address this gap, our study introduces an extensive open-source dataset, compiled from video data at 18 unsignalized intersections across Minnesota. Documenting more than 3000 interactions, this dataset provides a detailed view of driver-pedestrian interactions and over 50 distinct contextual variables. The data, which covers individual driver-pedestrian interactions and contextual factors, is made publicly available at https://github.com/tianyi17/pedestrian_yielding_data_MN. Using logistic regression, we developed a classification model that predicts driver yielding based on the identified variables. Our analysis indicates that vehicle speed, the presence of parking lots, proximity to parks or schools, and the width of major road crossings significantly influence driver yielding at unsignalized intersections. This study contributes to one of the most comprehensive driver-pedestrian datasets in the US, offering valuable insights for traffic safety improvements. By making this information available, our study will support communities across Minnesota and the United States in their ongoing efforts to improve road safety for pedestrians.	翻訳日:2023-12-27 19:51:35 公開日:2023-12-22
# 教師の多かれ少なかれ--知識蒸留における三方幾何学の活用 Less or More From Teacher: Exploiting Trilateral Geometry For Knowledge Distillation ( http://arxiv.org/abs/2312.15112v1 ) ライセンス: Link先を確認	Chengming Hu, Haolun Wu, Xuan Li, Chen Ma, Xi Chen, Jun Yan, Boyu Wang, Xue Liu	(参考訳) 知識蒸留は、より大きな教師ネットワークからのソフトな監督と地上の真実からのハードな監督を用いて、コンパクトな学生ネットワークを訓練することを目的としている。しかし、これらの監視信号のバランスをとる最適な知識融合比を決定することは依然として困難である。従来の方法では、通常、一定のあるいはヒューリスティックな融合比を頼りにしており、しばしば適切なバランスに欠ける。本研究では,教師と生徒の正当性を生かし,各生徒が各サンプルに対していかにその教師を模倣しているかを生かし,サンプルの知識融合比を学習するための適応的手法を提案する。本手法は,学生の予測値(S$),教師の予測値(T$),基礎的真理値(G$)の3値内幾何学的関係を自然に導く。外れ値の影響を均衡させるため、教師のグローバル平均予測$\bar{t}$を同じクラス内のサンプルに組み込むことで、サンプル間関係をさらに拡張する。単純なニューラルネットワークは、サンプル内およびサンプル間関係から、適応的でサンプル単位の知識融合比への暗黙のマッピングをバイレベル最適化方式で学習する。我々のアプローチは、様々なアーキテクチャやモデルサイズにまたがって適用可能な、シンプルで実用的で適応可能な知識蒸留ソリューションを提供する。広範な実験により、画像分類、攻撃検出、クリックスルー率予測において、他の損失再重み付け方法よりも一貫した改善が示されている。 Knowledge distillation aims to train a compact student network using soft supervision from a larger teacher network and hard supervision from ground truths. However, determining an optimal knowledge fusion ratio that balances these supervisory signals remains challenging. Prior methods generally resort to a constant or heuristic-based fusion ratio, which often falls short of a proper balance. In this study, we introduce a novel adaptive method for learning a sample-wise knowledge fusion ratio, exploiting both the correctness of teacher and student, as well as how well the student mimics the teacher on each sample. Our method naturally leads to the intra-sample trilateral geometric relations among the student prediction ($S$), teacher prediction ($T$), and ground truth ($G$). To counterbalance the impact of outliers, we further extend to the inter-sample relations, incorporating the teacher's global average prediction $\bar{T}$ for samples within the same class. A simple neural network then learns the implicit mapping from the intra- and inter-sample relations to an adaptive, sample-wise knowledge fusion ratio in a bilevel-optimization manner. Our approach provides a simple, practical, and adaptable solution for knowledge distillation that can be employed across various architectures and model sizes. Extensive experiments demonstrate consistent improvements over other loss re-weighting methods on image classification, attack detection, and click-through rate prediction.	翻訳日:2023-12-27 19:51:07 公開日:2023-12-22
# 視覚データ分析と最適化によるUASによる自動構造検査経路計画 UAS-based Automated Structural Inspection Path Planning via Visual Data Analytics and Optimization ( http://arxiv.org/abs/2312.15109v1 ) ライセンス: Link先を確認	Yuxiang Zhao, Benhao Lu, Mohamad Alipour	(参考訳) Unmanned Aerial Systems (UAS) はインフラ検査の分野で大きな注目を集めている。しかし、インフラの大規模かつ複雑な性質を考えると、自動化は検査作業の効率化と品質向上に不可欠である。この点において大きな問題の1つは、飛行時間を最小化しながらミッション目標を達成できる最適な自動飛行経路を選択することである。本稿では,構造検査の文脈における経路計画問題の効果的な定式化について述べる。カバレッジは、損傷検出性とパス長を目標として最小化するための制約として保証され、検査品質を確保しながら効率を最大化する。次に、視点の位置を決定する遺伝的アルゴリズムと、ポーズを計算する欲求アルゴリズムからなる経路計画問題を解くために、2段階のアルゴリズムを考案する。提案アルゴリズムの有効性と適用範囲を示すため,包括的感度解析を行った。また,実世界の構造検査要件を満たすため,提案手法の柔軟性を実証する手法として,飛行禁止ゾーンを用いた部分空間検査や集中検査などの応用例も提示した。結論として,本研究は,提案手法の実現可能性を強調し,uasに基づく構造検査ミッション計画に自動化を組み込むための基礎作業を確立する。 Unmanned Aerial Systems (UAS) have gained significant traction for their application in infrastructure inspections. However, considering the enormous scale and complex nature of infrastructure, automation is essential for improving the efficiency and quality of inspection operations. One of the core problems in this regard is electing an optimal automated flight path that can achieve the mission objectives while minimizing flight time. This paper presents an effective formulation for the path planning problem in the context of structural inspections. Coverage is guaranteed as a constraint to ensure damage detectability and path length is minimized as an objective, thus maximizing efficiency while ensuring inspection quality. A two-stage algorithm is then devised to solve the path planning problem, composed of a genetic algorithm for determining the positions of viewpoints and a greedy algorithm for calculating the poses. A comprehensive sensitivity analysis is conducted to demonstrate the proposed algorithm's effectiveness and range of applicability. Applied examples of the algorithm, including partial space inspection with no-fly zones and focused inspection, are also presented, demonstrating the flexibility of the proposed method to meet real-world structural inspection requirements. In conclusion, the results of this study highlight the feasibility of the proposed approach and establish the groundwork for incorporating automation into UAS-based structural inspection mission planning.	翻訳日:2023-12-27 19:50:42 公開日:2023-12-22
# 生成AIと建築史 Generative AI and the History of Architecture ( http://arxiv.org/abs/2312.15106v1 ) ライセンス: Link先を確認	Joern Ploennigs and Markus Berger	(参考訳) 最近の生成aiプラットフォームは、単純なテキストプロンプトからテキストや印象的なイメージを作成できる。これにより、アーキテクチャ履歴に関する知識を要約したり、アイデア、スケッチ、モデリングといった初期のデザインタスクで新しい創造的な仕事を引き出す強力なツールになります。しかし、建築史における生成的AIモデルの理解は、どの程度優れているのか? スタイルを適切に区別することを学んだか、あるいは情報を幻覚させるか? 本章では,これらのツールの知識の能力と境界を理解するために,異なるアーキテクチャスタイルのテキストと画像生成のための生成AIプラットフォームに対するこの問題について検討する。また、1億100万のMidjourneyクエリのデータセットを分析して、実践者がすでに特定のアーキテクチャ概念を問合っているかどうか、どのように分析しています。 Recent generative AI platforms are able to create texts or impressive images from simple text prompts. This makes them powerful tools for summarizing knowledge about architectural history or deriving new creative work in early design tasks like ideation, sketching and modelling. But, how good is the understanding of the generative AI models of the history of architecture? Has it learned to properly distinguish styles, or is it hallucinating information? In this chapter, we investigate this question for generative AI platforms for text and image generation for different architectural styles, to understand the capabilities and boundaries of knowledge of those tools. We also analyze how they are already being used by analyzing a data set of 101 million Midjourney queries to see if and how practitioners are already querying for specific architectural concepts.	翻訳日:2023-12-27 19:50:22 公開日:2023-12-22
# アナログコンピューティングのためのエネルギーベース学習アルゴリズムの比較研究 Energy-based learning algorithms for analog computing: a comparative study ( http://arxiv.org/abs/2312.15103v1 ) ライセンス: Link先を確認	Benjamin Scellier, Maxence Ernoult, Jack Kendall, Suhas Kumar	(参考訳) エネルギーベースの学習アルゴリズムは最近、アナログ(ポストデジタル)ハードウェアとの互換性から、注目を集めている。既存のアルゴリズムには、コントラスト学習(cl)、平衡伝播(ep)、結合学習(cpl)があり、いずれも2つの状態と対照的に構成され、第1の状態から第2状態を得るのに使用される摂動の種類が異なる。しかし、これらのアルゴリズムは、同じモデルやデータセットと等価な基盤で明示的に比較されることはないため、スケーラビリティを評価し、実際にどれを選ぶかを決めるのが困難である。本研究では, 摂動の兆候に応じて, 7つの学習アルゴリズム,すなわちCLとEPとCpLの異なる変種を比較した。具体的には、これらの学習アルゴリズムを用いて、5つの視覚タスク(MNIST、F-MNIST、SVHN、CIFAR-10、CIFAR-100)で深層畳み込みホップフィールドネットワーク(DCHN)を訓練する。全てのアルゴリズムがMNISTに匹敵する性能をもたらすが、タスクの難しさが増すにつれて、性能上の重要な違いが生じる。私たちの重要な発見は、負の摂動は正の摂動よりも良いことを示し、ep(反対符号の2つの摂動を用いる)の中心的変種を最も優れたアルゴリズムとして強調する。また、これらの発見を理論的議論で裏付ける。さらに、DCHNを5つのデータセットすべてに対して、性能と速度の両方で新しいSOTA結果を確立する。特に,我々のDCHNシミュレーションは,非同期更新に基づく新しいエネルギー最小化アルゴリズムと,精度の低下(16ビット)を併用して実現したLabieux et al.(2021)の13.5倍高速である。 Energy-based learning algorithms have recently gained a surge of interest due to their compatibility with analog (post-digital) hardware. Existing algorithms include contrastive learning (CL), equilibrium propagation (EP) and coupled learning (CpL), all consisting in contrasting two states, and differing in the type of perturbation used to obtain the second state from the first one. However, these algorithms have never been explicitly compared on equal footing with same models and datasets, making it difficult to assess their scalability and decide which one to select in practice. In this work, we carry out a comparison of seven learning algorithms, namely CL and different variants of EP and CpL depending on the signs of the perturbations. Specifically, using these learning algorithms, we train deep convolutional Hopfield networks (DCHNs) on five vision tasks (MNIST, F-MNIST, SVHN, CIFAR-10 and CIFAR-100). We find that, while all algorithms yield comparable performance on MNIST, important differences in performance arise as the difficulty of the task increases. Our key findings reveal that negative perturbations are better than positive ones, and highlight the centered variant of EP (which uses two perturbations of opposite sign) as the best-performing algorithm. We also endorse these findings with theoretical arguments. Additionally, we establish new SOTA results with DCHNs on all five datasets, both in performance and speed. In particular, our DCHN simulations are 13.5 times faster with respect to Laborieux et al. (2021), which we achieve thanks to the use of a novel energy minimisation algorithm based on asynchronous updates, combined with reduced precision (16 bits).	翻訳日:2023-12-27 19:50:09 公開日:2023-12-22
# 肌色を伴わない顔画像品質評価のためのロバスト・スクレラ・セグメンテーション Robust Sclera Segmentation for Skin-tone Agnostic Face Image Quality Assessment ( http://arxiv.org/abs/2312.15102v1 ) ライセンス: Link先を確認	Wassim Kabbani, Christoph Busch, Kiran Raja	(参考訳) 顔画像品質評価(FIQA)は、良好な顔認識性能を得るために重要である。 FIQAアルゴリズムは、人口統計要因に敏感で堅牢であるべきである。眼強膜は、年齢、民族、肌の色に関わらず、すべてのヒトにおいて一貫した白みがかった色をしている。本研究は,囲い込みにおける顔画像と境界制御顔認識シナリオに適した頑健な強膜分節法を提案する。このことは、スクレラピクセルの統計分析が、スキントーン、年齢、民族性に不変な特徴をいかに生み出すかを示し、したがって、人口統計学的要因に依存しないようにFIQAアルゴリズムに組み込むことができることを示している。 Face image quality assessment (FIQA) is crucial for obtaining good face recognition performance. FIQA algorithms should be robust and insensitive to demographic factors. The eye sclera has a consistent whitish color in all humans regardless of their age, ethnicity and skin-tone. This work proposes a robust sclera segmentation method that is suitable for face images in the enrolment and the border control face recognition scenarios. It shows how the statistical analysis of the sclera pixels produces features that are invariant to skin-tone, age and ethnicity and thus can be incorporated into FIQA algorithms to make them agnostic to demographic factors.	翻訳日:2023-12-27 19:49:35 公開日:2023-12-22
# Fix-Con: 自動フォールトローカライゼーションとディープラーニングモデル変換の修復 Fix-Con: Automatic Fault Localization and Repair of Deep Learning Model Conversions ( http://arxiv.org/abs/2312.15101v1 ) ライセンス: Link先を確認	Nikolaos Louloudakis, Perry Gibson, Jos\'e Cano, and Ajitha Rajan	(参考訳) ディープラーニングモデルをフレームワーク間で変換することは、デバイス間のモデル互換性を最大化し、ひとつのディープラーニングフレームワークでのみ提供される最適化機能を活用するための一般的なステップである。しかし、この変換プロセスはバグによって取り除かれ、変換されたモデルはデプロイ不能または問題なく、予測の正確性を著しく低下させる。本稿では,ディープラーニングフレームワーク間のモデル変換におけるフォールトローカライズと修復のための自動アプローチであるfix-conを提案する。 Fix-Conは、変換中にモデル入力、パラメータ、ハイパーパラメータ、モデルグラフに導入された障害を検出し、修正することができる。 Fix-Conでは、変換問題から抽出した一連のフォールトタイプを使用して、変換対象モデルの潜在的な変換障害をローカライズし、例えばターゲットモデルのパラメータをソースモデルに置き換えるなど、適切な修正を行う。これは、すべての差が解決されるまで、ソースモデルと変換対象モデルの間に出力ラベルの差があるデータセットのすべての画像に対して反復的に行われる。 4つの異なるディープラーニングフレームワークで変換された3つの画像認識モデルのモデル変換バグの修正におけるfix-conの有効性を評価した。全体として、Fix-Conは完全に修復できたか、15の誤変換ケースのうち14が大幅に改善された。 Converting deep learning models between frameworks is a common step to maximize model compatibility across devices and leverage optimization features that may be exclusively provided in one deep learning framework. However, this conversion process may be riddled with bugs, making the converted models either undeployable or problematic, considerably degrading their prediction correctness. We propose an automated approach for fault localization and repair, Fix-Con, during model conversion between deep learning frameworks. Fix-Con is capable of detecting and fixing faults introduced in model input, parameters, hyperparameters, and the model graph during conversion. Fix-Con uses a set of fault types mined from surveying conversion issues raised to localize potential conversion faults in the converted target model, and then repairs them appropriately, e.g. replacing the parameters of the target model with those from the source model. This is done iteratively for every image in the dataset with output label differences between the source model and the converted target model until all differences are resolved. We evaluate the effectiveness of Fix-Con in fixing model conversion bugs of three widely used image recognition models converted across four different deep learning frameworks. Overall, Fix-Con was able to either completely repair, or significantly improve the performance of 14 out of the 15 erroneous conversion cases.	翻訳日:2023-12-27 19:49:23 公開日:2023-12-22
# 大規模言語モデルにおける連鎖推論によるオンラインヘイトの変化 Moderating New Waves of Online Hate with Chain-of-Thought Reasoning in Large Language Models ( http://arxiv.org/abs/2312.15099v1 ) ライセンス: Link先を確認	Nishant Vishwamitra, Keyan Guo, Farhan Tajwar Romit, Isabelle Ondracek, Long Cheng, Ziming Zhao, Hongxin Hu	(参考訳) オンライン憎悪はインターネットユーザーの生活に悪影響を及ぼすエスカレートする問題であり、進化する出来事によって急激な変化を招き、新たなオンライン憎悪の波が重大な脅威をもたらす。これらの新たな波の検出と緩和は、ヘイトフルコンテンツの存在を判断するために推論に基づく複雑な意思決定を要求することと、トレーニングサンプルの可用性の制限によって検出モデルの更新が妨げられる、という2つの大きな課題をもたらす。この重要な問題に対処するために、オンライン憎悪の新しい波を効果的に緩和するHATEGUARDという新しいフレームワークを提案する。 HATEGUARDは、最近導入されたチェーン・オブ・ソート(CoT)プロンプト技術を利用して、大規模言語モデル(LLM)の機能を活用する推論ベースのアプローチを採用している。 hateguardはさらに、オンラインヘイトの新しい波に効果的に対応するために、新しいデロギ的用語とターゲットによる検出プロンプトを自動生成および更新することで、プロンプトベースのゼロショット検出を実現する。このアプローチの有効性を示すために、我々は、最近目撃された3つの新しい波、2022年のロシアによるウクライナ侵攻、2021年の米国議会議事堂の暴動、COVID-19パンデミックに関するツイートからなる新しいデータセットをコンパイルした。本研究は,イベントの進化と,それに対応するための既存のモデレーションツールを迅速に更新する技術の必要性について,これらの新しい波における重要な縦断パターンを明らかにした。最先端ツールに対する比較評価は、我々のフレームワークの優位性を示し、オンライン嫌悪の3つの新しい波の検出において、22.22%から83.33%の大幅な改善を示しました。我々の研究は、オンラインヘイトの新しい波の出現によって引き起こされる深刻な脅威を強調し、この脅威に現実的に対処するパラダイムシフトを表している。 Online hate is an escalating problem that negatively impacts the lives of Internet users, and is also subject to rapid changes due to evolving events, resulting in new waves of online hate that pose a critical threat. Detecting and mitigating these new waves present two key challenges: it demands reasoning-based complex decision-making to determine the presence of hateful content, and the limited availability of training samples hinders updating the detection model. To address this critical issue, we present a novel framework called HATEGUARD for effectively moderating new waves of online hate. HATEGUARD employs a reasoning-based approach that leverages the recently introduced chain-of-thought (CoT) prompting technique, harnessing the capabilities of large language models (LLMs). HATEGUARD further achieves prompt-based zero-shot detection by automatically generating and updating detection prompts with new derogatory terms and targets in new wave samples to effectively address new waves of online hate. To demonstrate the effectiveness of our approach, we compile a new dataset consisting of tweets related to three recently witnessed new waves: the 2022 Russian invasion of Ukraine, the 2021 insurrection of the US Capitol, and the COVID-19 pandemic. Our studies reveal crucial longitudinal patterns in these new waves concerning the evolution of events and the pressing need for techniques to rapidly update existing moderation tools to counteract them. Comparative evaluations against state-of-the-art tools illustrate the superiority of our framework, showcasing a substantial 22.22% to 83.33% improvement in detecting the three new waves of online hate. Our work highlights the severe threat posed by the emergence of new waves of online hate and represents a paradigm shift in addressing this threat practically.	翻訳日:2023-12-27 19:49:01 公開日:2023-12-22
# ディープニューラルネットワークを用いた教師なし聴覚・意味学習モデル Unsupervised Auditory and Semantic Entrainment Models with Deep Neural Networks ( http://arxiv.org/abs/2312.15098v1 ) ライセンス: Link先を確認	Jay Kejriwal, Stefan Benus, Lina M. Rojas-Barahona	(参考訳) 話者は、会話のさまざまな側面において対話者と類似するようになると、エントレーメントとして知られる適応行動に関与する傾向がある。本稿では,テキストの特徴から意味のある表現を導き出す教師なしのディープラーニングフレームワークを提案する。本研究では,BERT モデル (DistilBERT と XLM-RoBERTa) と Google の普遍文エンコーダ (USE) を2つの人間 (HH) コーパス (The Fisher Corpus English Part 1, Columbia Games corpus) と1つの人間 (HM) コーパス (Voice Assistant Conversation Corpus (VACC)) に埋め込んだ特徴を抽出し,その性能について検討する。セマンティック機能に加えて、2つの聴覚埋め込み(TRILL)ベクトル、低レベル記述子(LLD)特徴)と2つの分析単位(Inter pausal unit and Turn)を用いてDNNベースのモデルを訓練した。その結果,本モデルでは,HHとHMの相互作用を区別し,音響特性を抽出する2つの分析単位が同等な結果をもたらすことが示唆された。 Speakers tend to engage in adaptive behavior, known as entrainment, when they become similar to their interlocutor in various aspects of speaking. We present an unsupervised deep learning framework that derives meaningful representation from textual features for developing semantic entrainment. We investigate the model's performance by extracting features using different variations of the BERT model (DistilBERT and XLM-RoBERTa) and Google's universal sentence encoder (USE) embeddings on two human-human (HH) corpora (The Fisher Corpus English Part 1, Columbia games corpus) and one human-machine (HM) corpus (Voice Assistant Conversation Corpus (VACC)). In addition to semantic features we also trained DNN-based models utilizing two auditory embeddings (TRIpLet Loss network (TRILL) vectors, Low-level descriptors (LLD) features) and two units of analysis (Inter pausal unit and Turn). The results show that semantic entrainment can be assessed with our model, that models can distinguish between HH and HM interactions and that the two units of analysis for extracting acoustic features provide comparable findings.	翻訳日:2023-12-27 19:48:11 公開日:2023-12-22
# 代弁的組立によるモデル多重度下での講義 Recourse under Model Multiplicity via Argumentative Ensembling ( http://arxiv.org/abs/2312.15097v1 ) ライセンス: Link先を確認	Junqi Jiang, Antonio Rago, Francesco Leofante, Francesca Toni	(参考訳) モデル重複度(model multiplicity, mm)は、同じ予測タスクを解決するために、複数の均等な機械学習モデルをトレーニングできる場合に発生する。近年の研究では、MMで得られたモデルが同一入力に対して一貫性のない予測を生成する可能性が示されている。これが起こると、モデル予測によって負の影響を受ける個人にリコメンデーションレコメンデーションを提供する一般的な手段である、反実的説明(CE)の提供が困難になる。本稿では,recourse-aware ensemblingと名づけたこの問題を定式化し,その解決法が満たすべきいくつかの望ましい性質を明らかにする。既存のセンシングメソッドは、cesのさまざまな方法で自然に拡張されているが、これらの特性を満たさないことを示している。次に,ces から mm へのロバスト性を保証するために計算的議論を展開し,カスタマイズ可能なユーザ嗜好を満たした議論的センスリングを導入する。理論的および実験的に、議論的アンサンブルは既存の手法に欠けている性質を満足し、トレードオフは最小のWrt精度であることを示す。 Model Multiplicity (MM) arises when multiple, equally performing machine learning models can be trained to solve the same prediction task. Recent studies show that models obtained under MM may produce inconsistent predictions for the same input. When this occurs, it becomes challenging to provide counterfactual explanations (CEs), a common means for offering recourse recommendations to individuals negatively affected by models' predictions. In this paper, we formalise this problem, which we name recourse-aware ensembling, and identify several desirable properties which methods for solving it should satisfy. We show that existing ensembling methods, naturally extended in different ways to provide CEs, fail to satisfy these properties. We then introduce argumentative ensembling, deploying computational argumentation to guarantee robustness of CEs to MM, while also accommodating customisable user preferences. We show theoretically and experimentally that argumentative ensembling satisfies properties which the existing methods lack, and that the trade-offs are minimal wrt accuracy.	翻訳日:2023-12-27 19:47:27 公開日:2023-12-22
# $\mathbb{Z}_3$対称性で保護される二次元トポロジカルパラマグネット:境界ハミルトニアンの性質 Two-dimensional topological paramagnets protected by $\mathbb{Z}_3$ symmetry: Properties of the boundary Hamiltonian ( http://arxiv.org/abs/2312.15095v1 ) ライセンス: Link先を確認	Hrant Topchyan, Vasilii Iugov, Mkhitar Mirumyan, Tigran S. Hakobyan, Tigran A. Sedrakyan, Ara G. Sedrakyan	(参考訳) 三角格子上に隙間のないエッジモードを持つ2次元$\mathbb{Z}_3$対称性保護トポロジー(SPT)3状態ポッツパラマグネットを体系的に構築する。まず, ギャップレスエッジの微視的格子モデルについて検討し, 密度行列再正規化群(dmrg)法を用いて, 低次励起スペクトルとエンタングルメントエントロピーの有限サイズスケーリングについて検討した。得られた結果に基づき、臨界エッジの普遍性クラス、すなわち対応する共形場理論と中心電荷を同定する。最後に、エッジモデルの固有対称性と2つのspt相を区別する創発的巻線対称性について考察する。その結果、二つの位相的に非自明な位相と自明な位相は、三重性をサポートする一般の1次元鎖を定義する。 We systematically construct two-dimensional $\mathbb{Z}_3$ symmetry-protected topological (SPT) three-state Potts paramagnets with gapless edge modes on a triangular lattice. First, we study microscopic lattice models for the gapless edge and, using the density-matrix renormalization group (DMRG) approach, investigate the finite size scaling of the low-lying excitation spectrum and the entanglement entropy. Based on the obtained results, we identify the universality class of the critical edge, namely the corresponding conformal field theory and the central charge. Finally, we discuss the inherent symmetries of the edge models and the emergent winding symmetry distinguishing between two SPT phases. As a result, the two topologically nontrivial and the trivial phases define a general one-dimensional chain supporting a tricriticality, which we argue supports a gapless SPT order in one dimension.	翻訳日:2023-12-27 19:46:56 公開日:2023-12-22
# 2つのステップと1つのステップバック:CPRAの下で販売をオプトアウトする権利 Two Steps Forward and One Step Back: The Right to Opt-out of Sale under CPRA ( http://arxiv.org/abs/2312.15094v1 ) ライセンス: Link先を確認	Jan Charatan and Eleanor Birrell	(参考訳) カリフォルニア州プライバシ・ライツ法(California Privacy Rights Act、CPRA)は、カリフォルニア州消費者プライバシ法(CCPA)を改正した法案である。プライバシの権利の拡大と強化をめざすことが多いが、以前の法律の変更とCPRAガイドラインの以前の草案の変更の両方で、テキストによる改訂の綿密な分析は、現実がより微妙なものになる可能性を示唆している。本研究では,cpraにおける販売オプトアウトの権利に悪影響を及ぼす可能性がある3つのテキストリビジョンを特定し,これらのリビジョンの効果を,(1)12ヶ月にわたる25,000サイトを対象とした大規模縦断調査,(2)多作で募集された775人の実験ユーザ調査を用いて評価した。すべてのリビジョンは、販売をオプトアウトする権利のユーザビリティ、スコープ、可視性に悪影響を及ぼすことが分かりました。その結果,インターネットのプライバシーに対するCPRAの影響を総合的に評価した。彼らはまた、法律が施行された後にガイドラインと事例法が進化するにつれて、法的要件の継続的な評価の重要性を強調している。 The California Privacy Rights Act (CPRA) was a ballot initiative that revised the California Consumer Privacy Act (CCPA). Although often framed as expanding and enhancing privacy rights, a close analysis of textual revisions -- both changes from the earlier law and changes from earlier drafts of the CPRA guidelines -- suggest that the reality might be more nuanced. In this work, we identify three textual revisions that have potential to negatively impact the right to opt-out of sale under CPRA and evaluate the effect of these textual revisions using (1) a large-scale longitudinal measurement study of 25,000 websites over twelve months and (2) an experimental user study with 775 participants recruited through Prolific. We find that all revisions negatively impacted the usability, scope, and visibility of the right to opt-out of sale. Our results provide the first comprehensive evaluation of the impact of CPRA on Internet privacy. They also emphasize the importance of continued evaluation of legal requirements as guidelines and case law evolve after a law goes into effect.	翻訳日:2023-12-27 19:46:15 公開日:2023-12-22
# 通信遅延のない非同期確率近似の安定性に関する一考察 A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays ( http://arxiv.org/abs/2312.15091v1 ) ライセンス: Link先を確認	Huizhen Yu, Yi Wan, Richard S. Sutton	(参考訳) 本稿では,通信遅延のない非同期確率近似アルゴリズムについて検討する。我々の主な貢献は、より一般的な雑音条件を調節することによってボルカーとメインの手法を拡張するこれらのアルゴリズムの安定性証明である。また, この安定性から収束結果を導出し, 重要な平均回帰強化学習問題への応用について考察した。 In this paper, we study asynchronous stochastic approximation algorithms without communication delays. Our main contribution is a stability proof for these algorithms that extends a method of Borkar and Meyn by accommodating more general noise conditions. We also derive convergence results from this stability result and discuss their application in important average-reward reinforcement learning problems.	翻訳日:2023-12-27 19:45:51 公開日:2023-12-22
# 敵対的模倣学習の自動エンコーディング Auto-Encoding Adversarial Imitation Learning ( http://arxiv.org/abs/2206.11004v4 ) ライセンス: Link先を確認	Kaifeng Zhang, Rui Zhao, Ziming Zhang, Yang Gao	(参考訳) 強化学習(rl)は意思決定のための強力なフレームワークを提供するが、実際には注意深く設計された報酬機能を必要とすることが多い。 AIL(Adversarial Imitation Learning)は、環境からの報酬信号にアクセスせずに自動ポリシー取得に光を当てる。本稿では,堅牢でスケーラブルな AIL フレームワークである Auto-Encoding Adversarial Imitation Learning (AEAIL) を提案する。 AEAILは、実証から専門家ポリシーを誘導するため、オートエンコーダの再構成エラーを報奨信号として利用し、従来の差別者ベースのものよりも、ポリシーを最適化するための情報を提供する。その後、導出した目的関数を用いてオートエンコーダとエージェントポリシーを訓練する。実験の結果,AEAILは現状および画像ベース環境において,最先端の手法よりも優れていることがわかった。さらに重要なのは、AEAILは、専門家によるデモが騒々しいときに、はるかに優れた堅牢性を示します。 Reinforcement learning (RL) provides a powerful framework for decision-making, but its application in practice often requires a carefully designed reward function. Adversarial Imitation Learning (AIL) sheds light on automatic policy acquisition without access to the reward signal from the environment. In this work, we propose Auto-Encoding Adversarial Imitation Learning (AEAIL), a robust and scalable AIL framework. To induce expert policies from demonstrations, AEAIL utilizes the reconstruction error of an auto-encoder as a reward signal, which provides more information for optimizing policies than the prior discriminator-based ones. Subsequently, we use the derived objective functions to train the auto-encoder and the agent policy. Experiments show that our AEAIL performs superior compared to state-of-the-art methods on both state and image based environments. More importantly, AEAIL shows much better robustness when the expert demonstrations are noisy.	翻訳日:2023-12-25 19:13:39 公開日:2023-12-22
# 低リソース言語に対するテキスト正規化--Ligurianの場合 Text normalization for low-resource languages: the case of Ligurian ( http://arxiv.org/abs/2206.07861v2 ) ライセンス: Link先を確認	Stefano Lusito and Edoardo Ferrante and Jean Maillard	(参考訳) テキストの正規化は、厳格な綴り規則を欠いた低リソース言語や、複数の綴り改革を行った言語にとって重要な技術である。これまでのところ、低リソースのテキスト正規化は手作りのルールに依存しており、これはニューラルネットワークよりもデータ効率が高いと考えられている。本稿では,絶滅危惧言語であるリグリア語のテキスト正規化事例について検討する。正規化バージョンと組み合わせた4,394のLigurian文と、Ligurian用の最初のオープンソースモノリンガルコーパスを収集する。少ないデータ量にもかかわらず、バックトランスや適切なトークン化を用いることで、コンパクトなトランスフォーマーベースのモデルを非常に低いエラー率を達成するように訓練できることを実証する。 Text normalization is a crucial technology for low-resource languages which lack rigid spelling conventions or that have undergone multiple spelling reforms. Low-resource text normalization has so far relied upon hand-crafted rules, which are perceived to be more data efficient than neural methods. In this paper we examine the case of text normalization for Ligurian, an endangered Romance language. We collect 4,394 Ligurian sentences paired with their normalized versions, as well as the first open source monolingual corpus for Ligurian. We show that, in spite of the small amounts of data available, a compact transformer-based model can be trained to achieve very low error rates by the use of backtranslation and appropriate tokenization.	翻訳日:2023-12-25 19:13:23 公開日:2023-12-22
# 新型コロナウイルス:パンデミックにおける病原体関連データ共有の連続的障害の探索 COVID-19: An exploration of consecutive systemic barriers to pathogen-related data sharing during a pandemic ( http://arxiv.org/abs/2205.12098v3 ) ライセンス: Link先を確認	Yo Yehudi, Lukas Hughes-Noehrer, Carole Goble and Caroline Jay	(参考訳) 2020年、新型コロナウイルスのパンデミックは世界中の政府や研究者から急速に反応した。 2023年後半には、新型コロナウイルス(COVID-19)の影響で数百万人以上が死亡し、多くの生存者が数週間、数ヶ月、数年の長期的影響を経験している。パンデミックに関連するデータを扱う人々は、このデータにアクセス、共有、再利用するための重要なシステム的障壁に直面していることが多い。本稿では、ソーシャルメディア、移動性、ウイルスゲノム、検査、感染、入院、死亡など、新型コロナウイルス関連のデータ型を扱うデータ専門家にインタビューを行った結果について報告する。これらのデータタイプは、パンデミックのスプレッド・モデリング、医療システムのストレス・アウェアネス、およびcovid-19治療の考案のために様々な用途に使用される。 Barriers to data access, sharing and re-use include the cost of access to data (primarily certain healthcare sources and mobility data from mobile phone carriers), human throughput bottlenecks, unclear pathways to request access to data, unnecessarily strict access controls and data re-use policies, unclear data provenance, inability to link separate data sources that could collectively create a more complete picture, poor adherence to metadata standards, and a lack of computer-suitable data formats. In 2020, the COVID-19 pandemic resulted in a rapid response from governments and researchers worldwide. As of late 2023, over millions have died as a result of COVID-19, with many COVID-19 survivors going on to experience long-term effects weeks, months, or years after their illness. Despite this staggering toll, those who work with pandemic-relevant data often face significant systemic barriers to accessing, sharing or re-using this data. In this paper we report results of a study, where we interviewed data professionals working with COVID-19-relevant data types including social media, mobility, viral genome, testing, infection, hospital admission, and deaths. These data types are variously used for pandemic spread modelling, healthcare system strain awareness, and devising therapeutic treatments for COVID-19. Barriers to data access, sharing and re-use include the cost of access to data (primarily certain healthcare sources and mobility data from mobile phone carriers), human throughput bottlenecks, unclear pathways to request access to data, unnecessarily strict access controls and data re-use policies, unclear data provenance, inability to link separate data sources that could collectively create a more complete picture, poor adherence to metadata standards, and a lack of computer-suitable data formats.	翻訳日:2023-12-25 19:13:12 公開日:2023-12-22
# 密度行列を用いた量子密度推定:量子異常検出への応用 Quantum density estimation with density matrices: Application to quantum anomaly detection ( http://arxiv.org/abs/2201.10006v4 ) ライセンス: Link先を確認	Diego H. Useche, Oscar A. Bustos-Brinez, Joseph A. Gallego, Fabio A. Gonz\'alez	(参考訳) 密度推定は統計学と機械学習の中心的なタスクである。この問題は、観測されたデータセットに最もよく適合する基礎となる確率密度関数を決定することを目的としている。応用例としては、統計的推論、教師なし学習、異常検出などがある。その関連性にもかかわらず、量子コンピューティングの密度推定への応用を探求した研究は少ない。本稿では,密度行列の期待値と量子フーリエ特徴と呼ばれる新しい量子埋め込みに基づく,量子古典的密度行列密度推定モデルq-demdeを提案する。量子ハードウェアを用いて、混合量子状態によるトレーニングデータの確率分布を構築する。コアサブルーチンとして,量子コンピュータ上でのスペクトル分解から混合密度行列の期待値を推定する新しいアルゴリズムを提案する。さらに,本手法の量子古典的異常検出への応用について述べる。量子シミュレータと実量子コンピュータの異なるデータセット上の量子ランダムおよび量子適応フーリエ特徴を用いた密度推定モデルの評価を行った。この研究の重要な結果は、現在の量子コンピュータで高い性能で密度推定と異常検出を行うことができることを示すことである。 Density estimation is a central task in statistics and machine learning. This problem aims to determine the underlying probability density function that best aligns with an observed data set. Some of its applications include statistical inference, unsupervised learning, and anomaly detection. Despite its relevance, few works have explored the application of quantum computing to density estimation. In this article, we present a novel quantum-classical density matrix density estimation model, called Q-DEMDE, based on the expected values of density matrices and a novel quantum embedding called quantum Fourier features. The method uses quantum hardware to build probability distributions of training data via mixed quantum states. As a core subroutine, we propose a new algorithm to estimate the expected value of a mixed density matrix from its spectral decomposition on a quantum computer. In addition, we present an application of the method for quantum-classical anomaly detection. We evaluated the density estimation model with quantum random and quantum adaptive Fourier features on different data sets on a quantum simulator and a real quantum computer. An important result of this work is to show that it is possible to perform density estimation and anomaly detection with high performance on present-day quantum computers.	翻訳日:2023-12-25 19:12:50 公開日:2023-12-22
# ランダムデータに欠落したモデルベースクラスタリング Model-based Clustering with Missing Not At Random Data ( http://arxiv.org/abs/2112.10425v4 ) ライセンス: Link先を確認	Aude Sportisse (UCA, MAASAI), Matthieu Marbac (UR, ENSAI, CNRS, CREST), Fabien Laporte (Nantes Univ, CNRS, ITX-lab), Gilles Celeux (CELESTE), Claire Boyer (SU, LPSM (UMR\_8001), MOKAPLAN), Julie Josse (IDESP, PREMEDICAL), Christophe Biernacki (CNRS, MODAL)	(参考訳) モデルベースの教師なし学習は、学習タスクとして、データが失われるとすぐに停止します。これは、欠落したデータが情報化されている場合や、不明なデータがランダムではない場合(MNAR)にさらに真実である。本稿では、mnarデータを含む非常に一般的なデータ型を扱うように設計されたモデルベースクラスタリングアルゴリズムを提案する。そこで本研究では,データ分布とMNAR機構を協調的にモデル化するために,データの種類(連続的,数的,分類的,混合的)の混合モデルを導入する。いくつかのmnarモデルについて議論され、欠落の原因は欠落した変数自体の値とクラスメンバシップの両方に依存する。しかし、MNARzと呼ばれる特定のMNARモデルに焦点をあて、欠落はクラスメンバーシップにのみ依存する。まず, 標準mar機構を考慮し, 紛失マスクと連結したデータ行列上で統計的推論を行うことにより, 推定の容易さを強調する。そこで我々は,この単純化された再解釈のために開発された期待最大化アルゴリズムを用いてクラスタリングを行う。最後に,提案手法の合成データおよび実際の医療用レジストリであるTraumaBase上での数値的性能を評価した。 Model-based unsupervised learning, as any learning task, stalls as soon as missing data occurs. This is even more true when the missing data are informative, or said missing not at random (MNAR). In this paper, we propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data. To do so, we introduce a mixture model for different types of data (continuous, count, categorical and mixed) to jointly model the data distribution and the MNAR mechanism, remaining vigilant to the relative degrees of freedom of each. Several MNAR models are discussed, for which the cause of the missingness can depend on both the values of the missing variable themselves and on the class membership. However, we focus on a specific MNAR model, called MNARz, for which the missingness only depends on the class membership. We first underline its ease of estimation, by showing that the statistical inference can be carried out on the data matrix concatenated with the missing mask considering finally a standard MAR mechanism. Consequently, we propose to perform clustering using the Expectation Maximization algorithm, specially developed for this simplified reinterpretation. Finally, we assess the numerical performances of the proposed methods on synthetic data and on the real medical registry TraumaBase as well.	翻訳日:2023-12-25 19:12:35 公開日:2023-12-22
# 説明可能な深層学習による壁面乱流の重要領域の同定 Identifying regions of importance in wall-bounded turbulence through explainable deep learning ( http://arxiv.org/abs/2302.01250v3 ) ライセンス: Link先を確認	Andres Cremades, Sergio Hoyas, Rahul Deshpande, Pedro Quintero, Martin Lellep, Will Junghoon Lee, Jason Monty, Nicholas Hutchins, Moritz Linkmann, Ivan Marusic, Ricardo Vinuesa	(参考訳) その科学的、技術的重要性にもかかわらず、壁境界乱流は古典物理学において未解決の問題であり、新しい視点に取り組む必要がある。重要な戦略の1つは、流れ中のエネルギーを含むコヒーレント構造間の相互作用を研究することである。このような相互作用を,説明可能な深層学習法を用いて初めて検討した。乱流流シミュレーションから得られた瞬時速度場を用いて,U-netアーキテクチャを用いて時間内速度場を予測する。予測フローに基づいて,SHAP(SHapley Additive exPlanations)のゲーム理論アルゴリズムを用いて,この予測における各構造の重要性を評価する。この研究は、文献における以前の観測結果と一致し、フローにおける最も重要な構造が必ずしもレイノルズせん断応力に最も寄与した構造であるとは限らないことを明らかにすることでそれらを拡張した。また,本手法を実験データベースに適用し,その重要度に基づいて全く新しい構造を同定する。この枠組みは、流れ制御の新しい戦略を含む多数の壁境界乱流の基本的な現象に光を当てる可能性がある。 Despite its great scientific and technological importance, wall-bounded turbulence is an unresolved problem in classical physics that requires new perspectives to be tackled. One of the key strategies has been to study interactions among the energy-containing coherent structures in the flow. Such interactions are explored in this study for the first time using an explainable deep-learning method. The instantaneous velocity field obtained from a turbulent channel flow simulation is used to predict the velocity field in time through a U-net architecture. Based on the predicted flow, we assess the importance of each structure for this prediction using the game-theoretic algorithm of SHapley Additive exPlanations (SHAP). This work provides results in agreement with previous observations in the literature and extends them by revealing that the most important structures in the flow are not necessarily the ones with the highest contribution to the Reynolds shear stress. We also apply the method to an experimental database, where we can identify completely new structures based on their importance score. This framework has the potential to shed light on numerous fundamental phenomena of wall-bounded turbulence, including novel strategies for flow control.	翻訳日:2023-12-25 19:08:37 公開日:2023-12-22
# テキストスタイル転送のためのプロンプトベース編集 Prompt-Based Editing for Text Style Transfer ( http://arxiv.org/abs/2301.11997v2 ) ライセンス: Link先を確認	Guoqing Luo, Yu Tong Han, Lili Mou, Mauajama Firdaus	(参考訳) テキストプロンプト(textual prompt)は、事前学習された言語モデルにクエリし、スタイル変換されたテキストを単語毎に自己回帰的に生成するために使用される。しかし、このような生成プロセスは制御しにくく、早期予測エラーは将来の単語予測に影響を及ぼす可能性がある。本稿では,テキストスタイル転送のためのプロンプトベースの編集手法を提案する。具体的には,事前学習した言語モデルを用いてスタイル分類を行い,分類確率を用いてスタイルスコアを計算する。次に,単語レベルの編集による離散探索を行い,スタイル変換タスクの総合的スコアリング関数を最大化する。このように、プロンプトに基づく生成問題を、学習フリーなプロセスであり、文の自己回帰生成よりも制御しやすい分類問題に変換する。私たちの実験では、3つのスタイル転送ベンチマークデータセットで自動評価とヒューマン評価の両方を行い、このアプローチが20倍のパラメータを持つ最先端システムを大きく上回っていることを示した。さらなる実証分析は、我々のアプローチの有効性をさらに示します。 Prompting approaches have been recently explored in text style transfer, where a textual prompt is used to query a pretrained language model to generate style-transferred texts word by word in an autoregressive manner. However, such a generation process is less controllable and early prediction errors may affect future word predictions. In this paper, we present a prompt-based editing approach for text style transfer. Specifically, we prompt a pretrained language model for style classification and use the classification probability to compute a style score. Then, we perform discrete search with word-level editing to maximize a comprehensive scoring function for the style-transfer task. In this way, we transform a prompt-based generation problem into a classification one, which is a training-free process and more controllable than the autoregressive generation of sentences. In our experiments, we performed both automatic and human evaluation on three style-transfer benchmark datasets, and show that our approach largely outperforms the state-of-the-art systems that have 20 times more parameters. Additional empirical analyses further demonstrate the effectiveness of our approach.	翻訳日:2023-12-25 19:07:59 公開日:2023-12-22
# 相空間と水素原子における一般化力学理論 Generalized dynamical theories in phase space and the hydrogen atom ( http://arxiv.org/abs/2212.12267v2 ) ライセンス: Link先を確認	Martin Pl\'avala and Matthias Kleinmann	(参考訳) 一般確率論の位相空間定式化は一般化された時間発展を含むように拡張でき、安定で離散エネルギー準位を持ちゼーマン効果を含む非量子水素系を記述することができる。これにより、共鳴レーザーとラザフォード散乱による水素様系の励起などの動的効果を研究することができる。我々の構成は、古典理論と量子論は位相空間における一般確率論の特定の選択と見なすことができ、他の確率論も測定可能な予測をもたらすことを示した。 We show that the phase-space formulation of general probabilistic theories can be extended to include a generalized time-evolution and that it can describe a nonquantum hydrogen-like system which is stable, has discrete energy levels, and includes the Zeeman effect. This allows us to study dynamical effects such as excitations of the hydrogen-like system by a resonant laser and Rutherford scattering. Our construction demonstrates that classical theory and quantum theory can be seen as specific choices of general probabilistic theory in phase space and that other probabilistic theories also lead to measurable predictions.	翻訳日:2023-12-25 19:07:43 公開日:2023-12-22
# Reduce&chop: より深い問題のための浅回路 Reduce&chop: Shallow circuits for deeper problems ( http://arxiv.org/abs/2212.11862v3 ) ライセンス: Link先を確認	Adri\'an P\'erez-Salinas, Radoica Dra\v{s}ki\'c, Jordi Tura, Vedran Dunjko	(参考訳) 最先端の量子コンピュータは、量子ビット数と計算深度に制限のある回路しか確実に実行できない。これにより、実行可能なアルゴリズムの範囲が大幅に削減される。数量子ビットデバイスを利用するために多くの技術が発明されているが、深さ制限計算の対応するスキームは研究されていない。本研究は、より浅いデバイスを繰り返し使用することにより、より深い量子計算の性能をどの程度模倣できるかを考察する。この目的のために、与えられた回路を2つに切断するFeynmanシミュレーションにインスパイアされた手法を提案する。第1片は早期に実行され測定され、第2片は前の結果に基づいて実行される。この方法は、可能な結果の数が多いため、直接的に適用した場合は非効率である。この問題を軽減するために,既定義の許容限界内における手法の複雑さの維持を目的とした浅変分回路を提案し,そのような回路を見つけるための新しい最適化手法を提案する。これらの手法の成分の合成は reduce\&chop と呼ばれる。私たちが議論するとおり、このアプローチは特定のケースで有効です。この研究は、浅い量子コンピュータの可能性を活用するための新しい研究を刺激する可能性がある。 State-of-the-art quantum computers can only reliably execute circuits with limited qubit numbers and computational depth. This severely reduces the scope of algorithms that can be run. While numerous techniques have been invented to exploit few-qubit devices, corresponding schemes for depth-limited computations are less explored. This work investigates to what extent we can mimic the performance of a deeper quantum computation by repeatedly using a shallower device. We propose a method for this purpose, inspired by Feynman simulation, where a given circuit is chopped in two pieces. The first piece is executed and measured early on, and the second piece is run based on the previous outcome. This method is inefficient if applied in a straightforward manner due to the high number of possible outcomes. To mitigate this issue, we propose a shallow variational circuit, whose purpose is to maintain the complexity of the method within pre-defined tolerable limits, and provide a novel optimisation method to find such circuit. The composition of these components of the methods is called reduce\&chop. As we discuss, this approach works for certain cases of interest. We believe this work may stimulate new research towards exploiting the potential of shallow quantum computers.	翻訳日:2023-12-25 19:07:34 公開日:2023-12-22
# 統計的推論としての説明可能性 Explainability as statistical inference ( http://arxiv.org/abs/2212.03131v2 ) ライセンス: Link先を確認	Hugo Henri Joseph Senetaire, Damien Garreau, Jes Frellsen, Pierre-Alexandre Mattei	(参考訳) 近年、様々なモデル説明アプローチが提案されており、いずれも非常に異なる理論とヒューリスティックによって導かれている。本稿では,統計的推論問題として新しい経路と解釈可能性を提案する。本稿では,解釈可能な予測を生成するために設計された一般の深部確率モデルを提案する。モデルパラメータは最大確率で学習でき、この方法は任意の予測器ネットワークアーキテクチャと任意の種類の予測問題に適用することができる。本手法は,ニューラルネットワークをセレクタとして使用し,推論時の解釈を高速に行う無形解釈モデルの一例である。いくつかの一般的な解釈可能性法は、一般モデルに対する正規化極大確率の特別な場合であることが示されている。そこで本稿では,特徴重要度マップの評価を可能にする,真理選択に基づく新しいデータセットを提案する。これらのデータセットを用いて、複数の命令を用いることでより合理的な解釈が得られることを示す。 A wide variety of model explanation approaches have been proposed in recent years, all guided by very different rationales and heuristics. In this paper, we take a new route and cast interpretability as a statistical inference problem. We propose a general deep probabilistic model designed to produce interpretable predictions. The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture and any type of prediction problem. Our method is a case of amortized interpretability models, where a neural network is used as a selector to allow for fast interpretation at inference time. Several popular interpretability methods are shown to be particular cases of regularised maximum likelihood for our general model. We propose new datasets with ground truth selection which allow for the evaluation of the features importance map. Using these datasets, we show experimentally that using multiple imputation provides more reasonable interpretations.	翻訳日:2023-12-25 19:07:16 公開日:2023-12-22
# 開量子多体系におけるデコヒーレンス過程の準粒子:インコヒーレントン Quasiparticles of Decoherence Processes in Open Quantum Many-Body Systems: Incoherentons ( http://arxiv.org/abs/2211.14991v2 ) ライセンス: Link先を確認	Taiki Haga, Masaya Nakagawa, Ryusuke Hamazaki, Masahito Ueda	(参考訳) 開量子系の緩和ダイナミクスは、系のコヒーレントハミルトン力学と環境との相互作用による散逸力学との競合によって決定される。したがって、コヒーレント体制から非コヒーレント体制への移行を理解することは基本的な関心事である。ヒッヘルト非認識準粒子(インコヒーレントン)は、開量子多体系の力学を支配するリウヴィリア超作用素の固有モデムにおけるコヒーレント-非コヒーレント遷移を記述する。ここで、インコヒーレントンは、系の密度行列を表す補助ラダー系において、鎖間結合状態として定義される。リウヴィリアン固有モードは、関連するインコヒーレントンの数を反映する異なる減衰率を持つ群に分類される。また、固有モードの異なるグループを分離するスペクトルギャップ(量子コヒーレンスギャップ)も導入します。我々は, 劣化を受ける格子ボソンモデルにおけるインコヒーレントンの存在を実証し, インコヒーレントンが分解されると量子コヒーレンスギャップが閉じることを示し, 指数的崩壊による非コヒーレント緩和からコヒーレント振動緩和への動的遷移を示す。さらに, 量子多体系のデコヒーレンスダイナミクスが, インコヒーレントンの生成, 局在, 拡散の観点でどのように理解できるかを考察する。 The relaxation dynamics of an open quantum system is determined by the competition between the coherent Hamiltonian dynamics of a system and the dissipative dynamics due to interactions with environments. It is therefore of fundamental interest to understand the transition from the coherent to incoherent regimes. We find that hitherto unrecognized quasiparticles -- incoherentons -- describe this coherent-to-incoherent transition in eigenmodes of a Liouvillian superoperator that governs the dynamics of an open quantum many-body system. Here, an incoherenton is defined as an interchain bound state in an auxiliary ladder system that represents the density matrix of a system. The Liouvillian eigenmodes are classified into groups with different decay rates that reflect the number of incoherentons involved therein. We also introduce a spectral gap -- quantum coherence gap -- that separates the different groups of eigenmodes. We demonstrate the existence of incoherentons in a lattice boson model subject to dephasing, and show that the quantum coherence gap closes when incoherentons are deconfined, which signals a dynamical transition from incoherent relaxation with exponential decay to coherent oscillatory relaxation. Furthermore, we discuss how the decoherence dynamics of quantum many-body systems can be understood in terms of the generation, localization, and diffusion of incoherentons.	翻訳日:2023-12-25 19:06:55 公開日:2023-12-22
# FI-ODE:ニューラル・オードにおけるロバストな前方不変性 FI-ODE: Certifiably Robust Forward Invariance in Neural ODEs ( http://arxiv.org/abs/2210.16940v4 ) ライセンス: Link先を確認	Yujia Huang, Ivan Dario Jimenez Rodriguez, Huan Zhang, Yuanyuan Shi, Yisong Yue	(参考訳) フォワード不変性(フォワード不変性、Forward invariance)とは、制御理論において、力学系が常に指定された状態の集合内に留まり、堅牢性を保証する(例えば、証明書は摂動の下で保持される)ことを証明するために用いられる長期研究された性質である。本稿では,ニューラルネットワークにおけるフォワード不変性の証明とトレーニングのための一般的なフレームワークを提案する。このフレームワークは、堅牢な継続的制御において認証された安全性を提供する。私たちの知る限りでは、このような保証のない保証でNeural ODEポリシーをトレーニングする最初の例です。さらに,画像分類の可逆的ロバスト性を証明するために,このフレームワークの汎用性について検討する。 Forward invariance is a long-studied property in control theory that is used to certify that a dynamical system stays within some pre-specified set of states for all time, and also admits robustness guarantees (e.g., the certificate holds under perturbations). We propose a general framework for training and provably certifying robust forward invariance in Neural ODEs. We apply this framework to provide certified safety in robust continuous control. To our knowledge, this is the first instance of training Neural ODE policies with such non-vacuous certified guarantees. In addition, we explore the generality of our framework by using it to certify adversarial robustness for image classification.	翻訳日:2023-12-25 19:06:28 公開日:2023-12-22
# 量子ソボレフ不等式について On Quantum Sobolev Inequalities ( http://arxiv.org/abs/2210.03013v3 ) ライセンス: Link先を確認	Laurent Lafleche	(参考訳) 位相空間における古典ソボレフ不等式(英語版)の量子アナログを、可換子のシャッテンノルムによって定義される量子ソボレフノルムを用いて検討する。これらの不等式はウィグナー・ヤネーゼのスキュー情報に対する不確実性原理を提供し、またその記号の観点からワイル量子化のシャッテンノルムに新しい境界をもたらす。中間ツールとして、畳み込みの半古典的なアナログに対するハーディ・リトルウッド・ソボレフの不等式の類似を取得し、量子ベソフ空間を導入する。明示的な推定は最適定数で得られる。 We investigate the quantum analogue of the classical Sobolev inequalities in the phase space, with the quantum Sobolev norms defined in terms of Schatten norms of commutators. These inequalities provide an uncertainty principle for the Wigner-Yanase skew information, and also lead to new bounds on the Schatten norms of the Weyl quantization in terms of its symbol. As an intermediate tool, we obtain the analogue of Hardy-Littlewood-Sobolev's inequalities for a semiclassical analogue of the convolution, and introduce quantum Besov spaces. Explicit estimates are obtained on the optimal constants.	翻訳日:2023-12-25 19:05:40 公開日:2023-12-22
# 2つの両複素および1つの多重複素最小平均平方アルゴリズム Two Bicomplex and One Multicomplex Least Mean Square algorithms ( http://arxiv.org/abs/2209.11899v2 ) ライセンス: Link先を確認	Daniel Alpay, Kamal Diki, Mihaela Vajiac	(参考訳) 我々は1960年にWidrow and Hoff for Adaptive Linear Neuron (ADALINE)によって発明されたLMSアルゴリズムから着想を得た、複素および複複素条件における新しい勾配作用素を研究、導入した。これらの勾配演算子は、両複素最小平均平方(BLMS)アルゴリズムの新しい学習規則を定式化するために使用され、また、多複素LMSアルゴリズム(MLMS)の場合、これらの学習規則を定式化する。このアプローチは古典的実数と複素LMSアルゴリズムの両方を拡張する。 We study and introduce new gradient operators in the complex and bicomplex settings, inspired from the well-known Least Mean Square (LMS) algorithm invented in 1960 by Widrow and Hoff for Adaptive Linear Neuron (ADALINE). These gradient operators will be used to formulate new learning rules for the Bicomplex Least Mean Square (BLMS) algorithms and we will also formulate these learning rules will for the case of multicomplex LMS algorithms (MLMS). This approach extends both the classical real and complex LMS algorithms.	翻訳日:2023-12-25 19:05:27 公開日:2023-12-22
# NELLIE: グラウンドド、コンポジション、説明可能な推論のためのニューロシンボリック推論エンジン NELLIE: A Neuro-Symbolic Inference Engine for Grounded, Compositional, and Explainable Reasoning ( http://arxiv.org/abs/2209.07662v4 ) ライセンス: Link先を確認	Nathaniel Weir, Peter Clark, and Benjamin Van Durme	(参考訳) 我々のゴールは,nlコーパスに根拠のある人間の解釈可能な証明木によって回答が支持される体系的推論を通じて,疑問に答える現代的なアプローチである。このようなシステムは、現代のlmsによる解釈可能性と幻覚の課題の緩和と、現在の説明方法(例えば、連鎖的思考)の根拠の欠如に役立つ。本稿では,手作りのルールを,ニューラルネットワークのモデリング,誘導生成,半パラメトリックな高密度検索の組み合わせに置き換える,prologに基づく推論エンジンの新たなアプローチを提案する。我々の実装であるNELLIEは、テキストから既知の事実を解説する以前の研究を超えて、包括木証明探索として完全に解釈可能でエンドツーエンドの接地されたQAを示す最初のシステムである。実験では、NELLIEは知識に基づく説明をしながら、同様の大きさの最先端の推論器(Tafjord et al., 2022)より優れています。また、NELLIEは半構造化テキストコーパスとNLテキストコーパスの両方を利用して推論を導くことができる。これらを組み合わせることで、現代のニューラルメソッドと伝統的なシンボリック推論の両方の利点を共同で享受する新しい方法が示唆される。 Our goal is a modern approach to answering questions via systematic reasoning where answers are supported by human interpretable proof trees grounded in an NL corpus of authoritative facts. Such a system would help alleviate the challenges of interpretability and hallucination with modern LMs, and the lack of grounding of current explanation methods (e.g., Chain-of-Thought). This paper proposes a new take on Prolog-based inference engines, where we replace handcrafted rules with a combination of neural language modeling, guided generation, and semiparametric dense retrieval. Our implementation, NELLIE, is the first system to demonstrate fully interpretable, end-to-end grounded QA as entailment tree proof search, going beyond earlier work explaining known-to-be-true facts from text. In experiments, NELLIE outperforms a similar-sized state-of-the-art reasoner [Tafjord et al., 2022] while producing knowledge-grounded explanations. We also find NELLIE can exploit both semi-structured and NL text corpora to guide reasoning. Together these suggest a new way to jointly reap the benefits of both modern neural methods and traditional symbolic reasoning.	翻訳日:2023-12-25 19:05:16 公開日:2023-12-22
# 部分的ラベル学習のためのメタ客観指導型曖昧さ解消 Meta Objective Guided Disambiguation for Partial Label Learning ( http://arxiv.org/abs/2208.12459v2 ) ライセンス: Link先を確認	Bo-Shi Zou, Ming-Kun Xie, Sheng-Jun Huang	(参考訳) 部分ラベル学習(pll)は典型的な弱い教師付き学習フレームワークであり、各トレーニングインスタンスは候補ラベルセットに関連付けられ、1つのラベルのみが有効である。 PLL問題を解決するには、訓練データの構造情報や自己学習方式でモデル出力を精査するといった事前知識を用いて、候補集合の曖昧さを解こうとする手法が一般的である。残念なことに、これらの手法は、モデルトレーニングの初期段階において、事前情報や信頼できない予測が欠如しているため、望ましい性能を得ることができないことが多い。本稿では,小さな検証セット上でのメタ目的を解いて,候補ラベルから基底ラベルを回収することを目的とした,メタ目的導出不曖昧化(mogd)を用いた部分ラベル学習のための新しい枠組みを提案する。具体的には、偽陽性ラベルの悪影響を軽減するため、バリデーションセットのメタ損失に基づいて各候補ラベルを再強調する。そして、重み付きクロスエントロピー損失を最小化して分類器を訓練する。提案手法は,通常のsgdオプティマイザを用いた各種深層ネットワークを用いて容易に実装できる。理論的には,メタ目的の収束特性を証明し,提案手法の推定誤差境界を導出する。様々なベンチマークデータセットと実世界のPLLデータセットに対する大規模な実験により、提案手法は最先端の手法と比較して有能な性能が得られることを示した。 Partial label learning (PLL) is a typical weakly supervised learning framework, where each training instance is associated with a candidate label set, among which only one label is valid. To solve PLL problems, typically methods try to perform disambiguation for candidate sets by either using prior knowledge, such as structure information of training data, or refining model outputs in a self-training manner. Unfortunately, these methods often fail to obtain a favorable performance due to the lack of prior information or unreliable predictions in the early stage of model training. In this paper, we propose a novel framework for partial label learning with meta objective guided disambiguation (MoGD), which aims to recover the ground-truth label from candidate labels set by solving a meta objective on a small validation set. Specifically, to alleviate the negative impact of false positive labels, we re-weight each candidate label based on the meta loss on the validation set. Then, the classifier is trained by minimizing the weighted cross entropy loss. The proposed method can be easily implemented by using various deep networks with the ordinary SGD optimizer. Theoretically, we prove the convergence property of meta objective and derive the estimation error bounds of the proposed method. Extensive experiments on various benchmark datasets and real-world PLL datasets demonstrate that the proposed method can achieve competent performance when compared with the state-of-the-art methods.	翻訳日:2023-12-25 19:04:53 公開日:2023-12-22
# 複雑相互作用下での拡散に基づくマルチヒューマンモーション生成 InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions ( http://arxiv.org/abs/2304.05684v2 ) ライセンス: Link先を確認	Han Liang, Wenqian Zhang, Wenxuan Li, Jingyi Yu, Lan Xu	(参考訳) 最近、現実的な人間の動きを生み出すための拡散の進歩が著しく進んでいる。しかし、それらは多人の相互作用をほとんど無視している。本稿では,人間同士のインタラクションを移動拡散プロセスに組み込んだ効果的な拡散に基づくアプローチであるInterGenを提案する。まず、InterHumanというマルチモーダルデータセットをコントリビュートする。様々な2人インタラクションのための約107万フレームで構成され、正確な骨格運動と23,337の自然言語記述がある。アルゴリズム側では、動き拡散モデルを2人のインタラクション設定に注意深く調整します。相互作用中の人間のアイデンティティの対称性を扱うために,重みを明示的に共有する2つの共用変圧器と,これら2つの重み付けプロセスをさらに接続するための相互注意機構を提案する。次に,世界フレームにおける2人の演奏者間の世界関係を明示的に定式化したインタラクション拡散モデルにおいて,新たな動き入力表現を提案する。さらに, 相互作用拡散モデルの学習中に対応する減衰スキームを備える空間関係を符号化する2つの新しい正規化項を導入する。大規模な実験はInterGenの有効性と一般化性を検証する。特に、従来の方法よりも多様で説得力のある2人の動作を生成し、人間のインタラクションに様々な下流の応用を可能にする。 We have recently seen tremendous progress in diffusion advances for generating realistic human motions. Yet, they largely disregard the multi-human interactions. In this paper, we present InterGen, an effective diffusion-based approach that incorporates human-to-human interactions into the motion diffusion process, which enables layman users to customize high-quality two-person interaction motions, with only text guidance. We first contribute a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 23,337 natural language descriptions. For the algorithm side, we carefully tailor the motion diffusion model to our two-person interaction setting. To handle the symmetry of human identities during interactions, we propose two cooperative transformer-based denoisers that explicitly share weights, with a mutual attention mechanism to further connect the two denoising processes. Then, we propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame. We further introduce two novel regularization terms to encode spatial relations, equipped with a corresponding damping scheme during the training of our interaction diffusion model. Extensive experiments validate the effectiveness and generalizability of InterGen. Notably, it can generate more diverse and compelling two-person motions than previous methods and enables various downstream applications for human interactions.	翻訳日:2023-12-25 18:57:40 公開日:2023-12-22
# 拡散橋の混合輸送, schr\"odinger bridge問題と生成モデル Diffusion Bridge Mixture Transports, Schr\"odinger Bridge Problems and Generative Modeling ( http://arxiv.org/abs/2304.00917v2 ) ライセンス: Link先を確認	Stefano Peluchetti	(参考訳) 動的schr\"odinger bridge問題(英語版)は、2つの目標確率測度間の移動を定義する確率過程を求め、クルバック・リーバーの発散の観点から最接近の基準を最適に満たしている。本稿では,動的schr\"odinger bridge問題を解くために,新しいサンプリングベース反復アルゴリズムである反復拡散橋混合法(idbm)を提案する。 IDBM手順は、各イテレーションにおける目標確率測度間の有効な輸送を実現するという魅力的な性質を示す。我々はIDBM手順に関する最初の理論的研究を行い、その収束特性を確立した。理論的結果は,IDBM法の競争性能を示す数値実験によって補完される。生成モデリングの最近の進歩は、拡散過程の時間反転を用いて、単純な分布をデータ分布に大まかに輸送する生成過程を定義する。代替案として, idbm 手続きの最初のイテレーションを, このトランスポートを実現する近似フリー手法として利用することを提案する。このアプローチは、生成過程のダイナミクスを選択する際の柔軟性を向上し、より大きな離散化間隔よりも加速されたトレーニングと優れたサンプル品質を示す。実装面では、必要な修正は最小限の侵入的であり、トレーニング損失の定義に限定される。 The dynamic Schr\"odinger bridge problem seeks a stochastic process that defines a transport between two target probability measures, while optimally satisfying the criteria of being closest, in terms of Kullback-Leibler divergence, to a reference process. We propose a novel sampling-based iterative algorithm, the iterated diffusion bridge mixture (IDBM) procedure, aimed at solving the dynamic Schr\"odinger bridge problem. The IDBM procedure exhibits the attractive property of realizing a valid transport between the target probability measures at each iteration. We perform an initial theoretical investigation of the IDBM procedure, establishing its convergence properties. The theoretical findings are complemented by numerical experiments illustrating the competitive performance of the IDBM procedure. Recent advancements in generative modeling employ the time-reversal of a diffusion process to define a generative process that approximately transports a simple distribution to the data distribution. As an alternative, we propose utilizing the first iteration of the IDBM procedure as an approximation-free method for realizing this transport. This approach offers greater flexibility in selecting the generative process dynamics and exhibits accelerated training and superior sample quality over larger discretization intervals. In terms of implementation, the necessary modifications are minimally intrusive, being limited to the training loss definition.	翻訳日:2023-12-25 18:56:36 公開日:2023-12-22
# ChatGPTは良いキーワード生成器か? 予備的研究 Is ChatGPT A Good Keyphrase Generator? A Preliminary Study ( http://arxiv.org/abs/2303.13001v3 ) ライセンス: Link先を確認	Mingyang Song, Haiyun Jiang, Shuming Shi, Songfang Yao, Shilong Lu, Yi Feng, Huafeng Liu, Liping Jing	(参考訳) ChatGPTの出現は、最近、計算言語学コミュニティから大きな注目を集めている。キーフレーズ生成器としての機能を実証するために,キーフレーズ生成タスクにおけるchatgptの予備評価を行う。我々は,キーフレーズ生成プロンプト,キーフレーズ生成多様性,長い文書理解など,様々な面でその性能を評価する。評価は6つのベンチマークデータセットに基づいており、OpenAIが提案するプロンプトを6つの候補プロンプトに拡張しながら採用しています。 chatgptは6つの候補プロンプトすべてにおいて非常によく機能しており、データセット全体では小さなパフォーマンスの違いが観察されている。以上の結果から,chatgptはキーフレーズ生成に大きな可能性があると結論づけた。さらに,チャットgptではキーフレーズの欠落が問題となっていることも判明した。一方,最終節では,本報告の限界と今後の拡張についても紹介する。 The emergence of ChatGPT has recently garnered significant attention from the computational linguistics community. To demonstrate its capabilities as a keyphrase generator, we conduct a preliminary evaluation of ChatGPT for the keyphrase generation task. We evaluate its performance in various aspects, including keyphrase generation prompts, keyphrase generation diversity, and long document understanding. Our evaluation is based on six benchmark datasets, and we adopt the prompt suggested by OpenAI while extending it to six candidate prompts. We find that ChatGPT performs exceptionally well on all six candidate prompts, with minor performance differences observed across the datasets. Based on our findings, we conclude that ChatGPT has great potential for keyphrase generation. Moreover, we discover that ChatGPT still faces challenges when it comes to generating absent keyphrases. Meanwhile, in the final section, we also present some limitations and future expansions of this report.	翻訳日:2023-12-25 18:56:14 公開日:2023-12-22
# 量子鍵分布系保護のための光パワーリミッタのセキュリティ境界 Security boundaries of an optical power limiter for protecting quantum key distribution systems ( http://arxiv.org/abs/2303.12355v3 ) ライセンス: Link先を確認	Qingquan Peng, Binwu Gao, Konstantin Zaitsev, Dongyang Wang, Jiangfang Ding, Yingwen Liu, Qin Liao, Ying Guo, Anqi Huang and Junjie Wu	(参考訳) 無認可光注入は、量子鍵分布(QKD)システムの実用的セキュリティにとって、常に重要な脅威である。熱・光デフォーカス効果に基づく光パワーリミッタ (opl) を提案し, 注入されたハッキング光を制限した。ハードウェア対策として、様々な光注入攻撃によるOPLの性能試験を行い、広く展開される前にセキュリティ境界を明らかにする。量子暗号におけるOPLのセキュリティ境界を調べるために、連続波(例えば)光注入攻撃の下でのOPLの挙動を総合的に検証し分析し、パルスの繰り返し率を0.5$-$\hertz$,$40$-$\mega\hertz$,$1$-$\giga\hertz$とするパルス照明攻撃を行う。テスト結果は、OPLのセキュリティ境界を照らし、ユースケースでOPLを適切に利用することを可能にする。ここで提案する試験と解析の方法論は,QKDシステムにおける他のパワーリミテーションコンポーネントに適用可能である。 Unauthorized light injection has always been a vital threat to the practical security of a quantum key distribution (QKD) system. An optical power limiter (OPL) based on the thermo-optical defocusing effect has been proposed and implemented, limiting the injected hacking light. As a hardware countermeasure, the performance of the OPL under various light-injection attacks shall be tested to clarify the security boundary before being widely deployed. To investigate the OPL's security boundary in quantum cryptography, we comprehensively test and analyse the behavior of OPL under continuous-wave (c.w.) light-injection attacks and pulse illumination attacks with pulses' repetition rate at $0.5$-$\hertz$, $40$-$\mega\hertz$, and $1$-$\giga\hertz$. The testing results illuminate the security boundary of the OPL, which allows one to properly employ the OPL in the use cases. The methodology of testing and analysis proposed here is applicable to other power-limitation components in a QKD system.	翻訳日:2023-12-25 18:55:33 公開日:2023-12-22
# マルチエージェント強化学習による量的市場における取引戦略の最適化 Optimizing Trading Strategies in Quantitative Markets using Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2303.11959v2 ) ライセンス: Link先を確認	Hengxi Zhang, Zhendong Shi, Yuanquan Hu, Wenbo Ding, Ercan E. Kuruoglu, Xiao-Ping Zhang	(参考訳) 量的市場は、迅速なダイナミクスと豊富な不確実性によって特徴づけられ、利益主導の株式取引行動の追求は本質的に困難である。この文脈の中では、最適制御のための報酬中心のメカニズムで機能する強化学習(RL)が、提示される複雑な金融意思決定の難問に対する潜在的に効果的な解決策として浮上している。本論文は、固定比率ポートフォリオ保険(CPPI)と時間不変ポートフォリオ保護(TIPP)の2つの確立された金融トレーディング戦略と、マルチエージェントディープ決定主義政策勾配(MADDPG)フレームワークの融合について述べる。その結果、量的市場における戦略的取引の探索に適した2つの新しいマルチエージェントRL(MARL)手法、CPPI-MADDPGとTIPP-MADDPGを導入した。これらのイノベーションを検証するため、我々は100のリアルマーケット株を多種多様に選別して実装した。実証実験の結果,CPPI-MADDPGとTIPP-MADDPGの戦略は従来よりも一貫して優れており,定量取引の分野での有効性が確認された。 Quantitative markets are characterized by swift dynamics and abundant uncertainties, making the pursuit of profit-driven stock trading actions inherently challenging. Within this context, reinforcement learning (RL), which operates on a reward-centric mechanism for optimal control, has surfaced as a potentially effective solution to the intricate financial decision-making conundrums presented. This paper delves into the fusion of two established financial trading strategies, namely the constant proportion portfolio insurance (CPPI) and the time-invariant portfolio protection (TIPP), with the multi-agent deep deterministic policy gradient (MADDPG) framework. As a result, we introduce two novel multi-agent RL (MARL) methods, CPPI-MADDPG and TIPP-MADDPG, tailored for probing strategic trading within quantitative markets. To validate these innovations, we implemented them on a diverse selection of 100 real-market shares. Our empirical findings reveal that the CPPI-MADDPG and TIPP-MADDPG strategies consistently outpace their traditional counterparts, affirming their efficacy in the realm of quantitative trading.	翻訳日:2023-12-25 18:55:01 公開日:2023-12-22
# SPSysML:シミュレーション物理システムの定量的評価のためのメタモデル SPSysML: A meta-model for quantitative evaluation of Simulation-Physical Systems ( http://arxiv.org/abs/2303.09565v3 ) ライセンス: Link先を確認	Wojciech Dudek, Narcis Miguel, Tomasz Winiarski	(参考訳) ロボットシステムは、複数のセンサーとエフェクターを備えた複雑なサイバー物理システム(CPS)である。最近のシミュレーション手法は、Digital Twin(DT)の概念の実現を可能にする。しかし、ロボットシステム開発におけるDTの雇用、例えば開発内テストは不明確である。システム開発の間、その部品は模擬モックアップから実際のハードウェアにデプロイされたソフトウェアを実行する物理部品へと進化する。したがって、シミュレーション部品と物理部品の整合性を確保するための設計ツールとフレキシブルな開発手順が必要である。我々は,CPSのシミュレーションと物理部品の統合を,様々な設定で最大化することを目的としている。統合性の向上、物理部分(ハードウェアとソフトウェア)のシミュレーションベースのテストカバレッジの向上。本稿では、SPSysML(Simulation-Physical System Modeling Language)と呼ばれるシステムモデリング言語(SysML)に基づくドメイン仕様言語(DSL)を提案する。 SPSysMLは、シミュレーション・物理システム(SPSys)の分類を定義し、少なくとも物理的またはシミュレートされた部分からなるCPSである。特に、シミュレーションされたものはDTである。本稿では,SPSys のシミュレーション・物理的整合性を最大化できる SPSys 開発手法を提案する。 SPSysDPはINCAREプロジェクトのための複雑なロボットシステムの開発に使用されている。その後のSPSysDPでは、システムのシミュレーションと物理の整合性が最大化される。結果として、システムモデルは少ないコンポーネントで構成され、システムコンポーネントの大部分は、さまざまなシステムセットアップ間で共有される。本稿では,ロボットオペレーティング・システム(ROS)とガゼボシミュレータを用いて,システムの実装とテストを行う。 SPSysDPを使用したSPSysMLは、SPSys(DTとCPSを含む)の設計を可能にし、シミュレーションと物理部品間の最大整合性を特徴とするマルチセットアップシステムの開発を可能にする。 Robotic systems are complex cyber-physical systems (CPS) commonly equipped with multiple sensors and effectors. Recent simulation methods enable the Digital Twin (DT) concept realisation. However, DT employment in robotic system development, e.g. in-development testing, is unclear. During the system development, its parts evolve from simulated mockups to physical parts which run software deployed on the actual hardware. Therefore, a design tool and a flexible development procedure ensuring the integrity of the simulated and physical parts are required. We aim to maximise the integration between a CPS's simulated and physical parts in various setups. The better integration, the better simulation-based testing coverage of the physical part (hardware and software). We propose a Domain Specification Language (DSL) based on Systems Modeling Language (SysML) that we refer to as SPSysML (Simulation-Physical System Modeling Language). SPSysML defines the taxonomy of a Simulation-Physical System (SPSys), being a CPS consisting of at least a physical or simulated part. In particular, the simulated ones can be DTs. We propose a SPSys Development Procedure (SPSysDP) that enables the maximisation of the simulation-physical integrity of SPSys by evaluating the proposed factors. SPSysDP is used to develop a complex robotic system for the INCARE project. In subsequent iterations of SPSysDP, the simulation-physical integrity of the system is maximised. As a result, the system model consists of fewer components, and a greater fraction of the system components are shared between various system setups. We implement and test the system with popular frameworks, Robot Operating System (ROS) and Gazebo simulator. SPSysML with SPSysDP enables the design of SPSys (including DT and CPS), multi-setup system development featuring maximised integrity between simulation and physical parts in its setups.	翻訳日:2023-12-25 18:54:31 公開日:2023-12-22
# 畳み込み型クロスビューポーズ推定 Convolutional Cross-View Pose Estimation ( http://arxiv.org/abs/2303.05915v3 ) ライセンス: Link先を確認	Zimin Xia, Olaf Booij, and Julian F. P. Kooij	(参考訳) 本稿では,新しい視点間ポーズ推定手法を提案する。クェリのローカルエリアをカバーする地上レベルのクェリ画像と空中画像が与えられた場合、クェリの3デグリー・オブ・フリーダムカメラのポーズは、その画像ディスクリプタと、その空中画像内のローカル領域のディスクリプタとのマッチングにより推定される。方向認識ディスクリプタは、変換同値な畳み込み畳み込み基底画像エンコーダとコントラスト学習とを用いて得られる。ローカライズデコーダは、新しいローカライズマッチングアップサンプリングモジュールと共に、粗〜微妙な方法で高密度確率分布を生成する。より小さなオリエンテーションデコーダは、ローカライゼーションに向き推定を条件付けるベクトル場を生成する。提案手法は,VIGORとKITTIのデータセットで検証され,最先端のベースラインを72%,中央値のローカライゼーション誤差が36%の精度で上回っている。予測確率分布は局所的曖昧性を表すことができ、誤った予測を拒否することができる。再トレーニングを行わなければ、異なる視野を持つ地上画像を推論し、利用可能であればオリエンテーション優先を利用することができる。オックスフォード・ロボットカーデータセットでは,1m以下で中央位置推定誤差を,14fpsで1度前後で中央方向誤差を算出し,経時的に ego-vehicle の姿勢を確実に推定する。 We propose a novel end-to-end method for cross-view pose estimation. Given a ground-level query image and an aerial image that covers the query's local neighborhood, the 3 Degrees-of-Freedom camera pose of the query is estimated by matching its image descriptor to descriptors of local regions within the aerial image. The orientation-aware descriptors are obtained by using a translationally equivariant convolutional ground image encoder and contrastive learning. The Localization Decoder produces a dense probability distribution in a coarse-to-fine manner with a novel Localization Matching Upsampling module. A smaller Orientation Decoder produces a vector field to condition the orientation estimate on the localization. Our method is validated on the VIGOR and KITTI datasets, where it surpasses the state-of-the-art baseline by 72% and 36% in median localization error for comparable orientation estimation accuracy. The predicted probability distribution can represent localization ambiguity, and enables rejecting possible erroneous predictions. Without re-training, the model can infer on ground images with different field of views and utilize orientation priors if available. On the Oxford RobotCar dataset, our method can reliably estimate the ego-vehicle's pose over time, achieving a median localization error under 1 meter and a median orientation error of around 1 degree at 14 FPS.	翻訳日:2023-12-25 18:54:03 公開日:2023-12-22
# 任意形状の位相物体の光学パラメータ推定精度の量子限界 Quantum limits for the precision of optical parameter estimation of arbitrarily shaped phase objects ( http://arxiv.org/abs/2302.14504v2 ) ライセンス: Link先を確認	Arturo Villegas, Marcello H. M. Passos, Silvania F. Pereira, Juan P. Torres	(参考訳) 位相対象を特徴付けるパラメータの集合である光・物質相互作用過程によって決定される最善の精度を最適な精度で推定する方法を提案する。この方法はpezzeらによって提唱された[phys. rev. lett. 119, 130504 (2017)]アイデアに由来する。我々のゴールは、この方法の主な特徴と物理学コミュニティへの応用を照らすことであり、量子推定理論に関する研究で通常使われる形式的な量子言語には馴染みがないだろう。まず、位相オブジェクトを特徴付けるパラメータの集合を推定するための精度境界を導出する。我々は、平均光子数 N の多重モードコヒーレント状態と、多重モード単一光子量子状態の N コピーの2つの実験的な種類の照明に対して、Cr\`amer-Rao の下界を計算する。この2つのモデルがどのような条件で等価かを示す。第2に, 物体から反射・透過された光を, 空間形状を工夫したモード群に投影することにより, 最適精度が得られることを示す。これらのモードの構築方法を説明し、これらの測定値を用いた推定精度が最適であることを示す。例えば, ナノファブリケーション技術の評価のために, 半導体産業に関連する物体である崖状ナノ構造の高さと側壁角度の推定にこれらの結果を適用する。 We show a general method to estimate with optimum precision, i.e., the best precision determined by the light-matter interaction process, a set of parameters that characterize a phase object. The method derives from ideas presented by Pezze et al., [Phys. Rev. Lett. 119, 130504 (2017)]. Our goal is to illuminate the main characteristics of this method as well as its applications to the physics community, probably not familiar with the formal quantum language usually employed in works related to quantum estimation theory. First, we derive precision bounds for the estimation of the set of parameters characterizing the phase object. We compute the Cr\`amer-Rao lower bound for two experimentally relevant types of illumination: a multimode coherent state with mean photon number N, and N copies of a multimode single-photon quantum state. We show under which conditions these two models are equivalent. Second, we show that the optimum precision can be achieved by projecting the light reflected/transmitted from the object onto a set of modes with engineered spatial shape. We describe how to construct these modes, and demonstrate explicitly that the precision of the estimation using these measurements is optimum. As example, we apply these results to the estimation of the height and sidewall angle of a cliff-like nanostructure, an object relevant in semiconductor industry for the evaluation of nanofabrication techniques.	翻訳日:2023-12-25 18:53:17 公開日:2023-12-22
# 位置依存有効質量を持つ半圧高調波振動子モデルのウィグナー関数 The Wigner function of a semiconfined harmonic oscillator model with a position-dependent effective mass ( http://arxiv.org/abs/2302.12673v5 ) ライセンス: Link先を確認	S.M. Nagiyev, A.M. Jafarova and E.I. Jafarov	(参考訳) 量子調和振動子モデルにおけるウィグナー関数の観点から位相空間表現の概念を提案する。新しい手法は、そのような半収束量子系に対して正確にウィグナー分布関数を計算するために用いられる。この方法は、量子分布関数の定義における積分の発散を抑制し、半圧振動子モデルの定常状態に対する解析式を計算させる。この量子系では、適用された外部同族体の存在と不在の両方が研究されている。得られたウィグナー分布関数の正確な表現は、第一種およびラゲール多項式のベッセル関数を介して表現される。さらに、特殊ケースや制限についても詳細に論じている。 We propose a phase-space representation concept in terms of the Wigner function for a quantum harmonic oscillator model that exhibits the semiconfinement effect through its mass varying with the position. The new method is used to compute the Wigner distribution function exactly for such a semiconfinement quantum system. This method suppresses the divergence of the integrand in the definition of the quantum distribution function and leads to the computation of its analytical expressions for the stationary states of the semiconfined oscillator model. For this quantum system, both the presence and absence of the applied external homogenous field are studied. Obtained exact expressions of the Wigner distribution function are expressed through the Bessel function of the first kind and Laguerre polynomials. Furthermore, some of the special cases and limits are discussed in detail.	翻訳日:2023-12-25 18:52:53 公開日:2023-12-22
# 持続可能なオンデマンドライドプールの価格設定とマッチング Future Aware Pricing and Matching for Sustainable On-demand Ride Pooling ( http://arxiv.org/abs/2302.10510v3 ) ライセンス: Link先を確認	Xianjie Zhang and Pradeep Varakantham and Hao Jiang	(参考訳) オンデマンドのライドプーリングの人気は、顧客(低価格)、タクシードライバー(高い収入)、環境(少ない車両によるカーボンフットプリント)、そしてuberのような集約企業(高い収入)に提供される利点がある。これらの利点を達成するには、2つの重要な相互リンク課題を効果的に解決する必要がある。 (a)価格 --タクシーの顧客要求に価格を設定すること (b)マッチング -- タクシー・車への顧客(価格を受け入れた)の割り当て。伝統的に、これら2つの課題は、将来の要求に対する現在のマッチングの影響を考慮せずに、個別に研究され、(現在の要求のみを考慮して)妙明なアプローチを用いている。本稿では,価格とマッチングの問題を取り扱うとともに,価格とマッチング決定の今後の影響も考慮しながら,新たな枠組みを提案する。実世界のタクシーデータセットにおける実験結果では、固定収入の取得に必要な車両数(最大14%、平均10.6%)と、車両の走行距離(最大11.1%、平均3.7%)を削減し、持続的に収益(平均17%、平均6.4%)を大幅に改善できることを実証した。つまり、顧客、ドライバー、アグリゲータ(ライドプール会社)に対して高い収益を得ると同時に、環境(道路上の車両の数が少なく、燃料消費も少ないため)に適している、すべての利害関係者(顧客、ドライバー、アグリゲータ、環境)に理想的なウィンウィンシナリオを提供することができるのです。 The popularity of on-demand ride pooling is owing to the benefits offered to customers (lower prices), taxi drivers (higher revenue), environment (lower carbon footprint due to fewer vehicles) and aggregation companies like Uber (higher revenue). To achieve these benefits, two key interlinked challenges have to be solved effectively: (a) pricing -- setting prices to customer requests for taxis; and (b) matching -- assignment of customers (that accepted the prices) to taxis/cars. Traditionally, both these challenges have been studied individually and using myopic approaches (considering only current requests), without considering the impact of current matching on addressing future requests. In this paper, we develop a novel framework that handles the pricing and matching problems together, while also considering the future impact of the pricing and matching decisions. In our experimental results on a real-world taxi dataset, we demonstrate that our framework can significantly improve revenue (up to 17% and on average 6.4%) in a sustainable manner by reducing the number of vehicles (up to 14% and on average 10.6%) required to obtain a given fixed revenue and the overall distance travelled by vehicles (up to 11.1% and on average 3.7%). That is to say, we are able to provide an ideal win-win scenario for all stakeholders (customers, drivers, aggregator, environment) involved by obtaining higher revenue for customers, drivers, aggregator (ride pooling company) while being good for the environment (due to fewer number of vehicles on the road and lesser fuel consumed).	翻訳日:2023-12-25 18:52:42 公開日:2023-12-22
# フレームワーク税:NLP研究と展開における推論効率の相違 The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment ( http://arxiv.org/abs/2302.06117v2 ) ライセンス: Link先を確認	Jared Fernandez, Jacob Kahn, Clara Na, Yonatan Bisk, Emma Strubell	(参考訳) NLPシステムの計算効率の向上は、効率的なモデルアーキテクチャの設計と基盤となるハードウェアアクセラレータの改善を動機付けている。しかし、計算スループットの向上と浮動小数点演算の削減は、直接ウォールクロックの推論遅延の改善に寄与していない。これらの差異は、ディープラーニングフレームワークがもたらしたボトルネックが大きな原因であることを実証する。我々は、この現象を \textit{framework tax} と表現し、ハードウェアの速度が時間とともに増加するにつれて差が大きくなることを観察する。本稿では,モデル設計決定,フレームワークパラダイム,ハードウェアプラットフォームが全体のモデル遅延に与える影響を分析する一連のケーススタディを通して,この現象を考察する。コードはhttps://github.com/JaredFern/Framework-Tax.comで入手できる。 Increased focus on the computational efficiency of NLP systems has motivated the design of efficient model architectures and improvements to underlying hardware accelerators. However, the resulting increases in computational throughput and reductions in floating point operations have not directly translated to improvements in wall-clock inference latency. We demonstrate that these discrepancies can be largely attributed to bottlenecks introduced by deep learning frameworks. We denote this phenomenon as the \textit{framework tax}, and observe that the disparity is growing as hardware speed increases over time. In this work, we examine this phenomenon through a series of case studies analyzing the effects of model design decisions, framework paradigms, and hardware platforms on total model latency. Code is available at https://github.com/JaredFern/Framework-Tax.	翻訳日:2023-12-25 18:52:13 公開日:2023-12-22
# 量子格子モデルにおけるニューラルネットワークによる手話規則学習の原理 Principle of learning sign rules by neural networks in qubit lattice models ( http://arxiv.org/abs/2302.02523v3 ) ライセンス: Link先を確認	Jin Cao, Shijie Hu, Zhiping Yin, and Ke Xia	(参考訳) ニューラルネットワークは、人間の直感を超えた隠された法則を発見できる強力なツールだ。しかし、複雑な非線形構造のため、しばしばブラックボックスとして現れる。 gutzwiller平均場理論を参考にすることで、キュービット格子モデルにおける順序状態の符号規則の原理を示すことができる。これらの符号規則を示すために、単一の隠れニューロンを持つ浅いフィードフォワードニューラルネットワークを導入する。一般化Ising, spin-1/2$XY, (フラストレーション)Heisenberg環, トーラス上の三角形XY反強磁性体, 任意の充填でFermi-Hubbard環など,様々なモデルで系統的なベンチマークを行う。これらのベンチマークは、すべての先行符号規則特性がピッチ角などの古典的な形式で可視化可能であることを示している。さらに、量子揺らぎは不完全な精度を定量的に得ることができる。 A neural network is a powerful tool that can uncover hidden laws beyond human intuition. However, it often appears as a black box due to its complicated nonlinear structures. By drawing upon the Gutzwiller mean-field theory, we can showcase a principle of sign rules for ordered states in qubit lattice models. We introduce a shallow feed-forward neural network with a single hidden neuron to present these sign rules. We conduct systematical benchmarks in various models, including the generalized Ising, spin-$1/2$ XY, (frustrated) Heisenberg rings, triangular XY antiferromagnet on a torus, and the Fermi-Hubbard ring at an arbitrary filling. These benchmarks show that all the leading-order sign rule characteristics can be visualized in classical forms, such as pitch angles. Besides, quantum fluctuations can result in an imperfect accuracy rate quantitatively.	翻訳日:2023-12-25 18:52:02 公開日:2023-12-22
# 設計によるIT/OT統合 IT/OT Integration by Design ( http://arxiv.org/abs/2305.19735v2 ) ライセンス: Link先を確認	Georg Sch\"afer, Hannes Waclawek, Sarah Riedmann, Christoph Binder, Christian Neureiter and Stefan Huber	(参考訳) 情報透明性、技術援助、相互接続、分散化決定の4つの設計原則は、産業システムに情報技術(IT)と運用技術(OT)を統合する際の課題を提起している。これらの異なるソリューションには矛盾する要件があり、システムと組織の両方でインターフェースが問題になる。 ITとOTの領域の仲介役として機能するIBPT(Industrial Business Process Twin)エンティティは、この状況を克服するために必要なIT/OTインターフェースの量を効果的に削減するために、以前の研究で提案されている。本研究では,設計段階におけるこのアプローチの効果について検討する。システム設計における IT と OT コンポーネント間のインターフェースを排除することによって,組織内の通信チャネルの競合を排除している,と我々は主張する。議論を検証するため、産業4.0の4つの重要な産業4.0設計原則に対処する産業4.0シナリオを用いて、参照アーキテクチャモデルインダストリー4.0(RAMI4.0)に従ってIBPT概念のモデルを開発する。結果は、IBPTアプローチがシステム設計フェーズにおいて潜在的に競合するIT/OTインターフェースを排除していることを示している。 The four Industry 4.0 design principles information transparency, technical assistance, interconnection, and decentralized decisions pose challenges in integrating information technology (IT) and operational technology (OT) solutions in industrial systems. These different solutions have conflicting requirements, making interfaces between them problematic for both systems and organizations. An Industrial Business Process Twin (IBPT) entity, acting as an intermediary between the realms of IT and OT, has been proposed in a previous work, to effectively reduce the amount of required IT/OT interfaces in an attempt of overcoming this situation. In this work, we investigate the effects of this approach during the design phase. We argue that, by eliminating interfaces between IT and OT components in the system design, this approach is therefore eliminating conflicting communication channels within the organization's communication structure. In order to verify our argument, we develop a model of our IBPT concept according to the Reference Architecture Model Industrie 4.0 (RAMI4.0) using an Industry 4.0 scenario addressing the four essential Industry 4.0 design principles. Results show that the IBPT approach indeed eliminates potentially conflicting IT/OT interfaces during the system design phase.	翻訳日:2023-12-25 18:46:23 公開日:2023-12-22
# 教師なしメロディ-歌詞生成 Unsupervised Melody-to-Lyric Generation ( http://arxiv.org/abs/2305.19228v2 ) ライセンス: Link先を確認	Yufei Tian, Anjali Narayan-Chen, Shereen Oraby, Alessandra Cervone, Gunnar Sigurdsson, Chenyang Tao, Wenbo Zhao, Yiwen Chen, Tagyoung Chung, Jing Huang, Nanyun Peng	(参考訳) メロディと歌詞の自動生成は、与えられたメロディと共に歌詞を生成するタスクである。音楽が歌詞に追加の制約を課すため、これは、制約のない歌詞生成よりも重要な実践的関心と挑戦である。ほとんどの楽曲は著作権を侵害されるため、トレーニングデータは制限され、メロディと歌詞の複雑な相互モーダル関係に不適合なモデルとなる。本研究では,任意のメロディ・歌詞データを訓練することなく高品質な歌詞を生成する手法を提案する。具体的には、まず歌の輪郭を生成し、次に完全な歌詞を生成する階層的歌詞生成フレームワークを設計する。このフレームワークは、(純粋にテキストに基づく)トレーニングを推論(メロディ誘導テキスト生成)から切り離すことで、並列データの不足を回避する。我々はメロディと歌詞のセグメンテーションとリズムアライメントを活用し、そのメロディを推論中の指示としてデコード制約にコンパイルする。 2段階の階層デザインは、共同曲作成を民主化するための非常に望ましい機能である、歌詞概要によるコンテンツ制御を可能にする。実験結果から,本モデルは,例えば,並列データセットを用いたSOTAモデルであるSongMASSや,人間の評価に基づく全体的な品質改善率の24%といった,強靭なベースラインよりもオントピー的,歌声的,知的な,一貫性のある高品質な歌詞を生成することができることがわかった。 Automatic melody-to-lyric generation is a task in which song lyrics are generated to go with a given melody. It is of significant practical interest and more challenging than unconstrained lyric generation as the music imposes additional constraints onto the lyrics. The training data is limited as most songs are copyrighted, resulting in models that underfit the complicated cross-modal relationship between melody and lyrics. In this work, we propose a method for generating high-quality lyrics without training on any aligned melody-lyric data. Specifically, we design a hierarchical lyric generation framework that first generates a song outline and second the complete lyrics. The framework enables disentanglement of training (based purely on text) from inference (melody-guided text generation) to circumvent the shortage of parallel data. We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints as guidance during inference. The two-step hierarchical design also enables content control via the lyric outline, a much-desired feature for democratizing collaborative song creation. Experimental results show that our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines, for example SongMASS, a SOTA model trained on a parallel dataset, with a 24% relative overall quality improvement based on human ratings.	翻訳日:2023-12-25 18:46:02 公開日:2023-12-22
# 条件不変意味セグメンテーション Condition-Invariant Semantic Segmentation ( http://arxiv.org/abs/2305.17349v2 ) ライセンス: Link先を確認	Christos Sakaridis, David Bruggemann, Fisher Yu, Luc Van Gool	(参考訳) セマンティクスセグメンテーションネットワークの異なる視覚条件への適応は、自律走行車やロボットのロバストな知覚に不可欠である。しかし、従来の研究は、ほとんどの特徴レベル適応法は、敵対的トレーニングを採用し、合成から現実への適応で検証されているが、条件レベル適応では限界ゲインを与え、スタイリゼーションによる単純なピクセルレベル適応により性能が向上することを示した。これらの結果から,ネットワークのエンコーダが抽出した内部ネットワーク特徴と,各入力画像のスタイリングビューとを新たな特徴分散損失に整合させることにより,特徴レベルの適応を行う上でのスタイル化を活用することを提案する。このようにして、エンコーダは入力のスタイルに不変な特徴を抽出することを奨励し、デコーダはこれらの特徴を解析することに集中でき、入力の特定のスタイルからさらに抽象化することができない。本研究では,現状のドメイン適応アーキテクチャに基づいて条件不変セマンティックセマンティックセマンティックセマンティックシグメンテーション (CISS) という手法を実装し,条件レベル適応の優れた結果を得る。特に、CISSは、人気の高い昼から夜までのCityscapes$\to$Dark Zurichベンチマークで、アートの新たな状態を設定している。さらに,本手法は,通常の都市景観$\to$ACDCベンチマークにおける2番目に高い性能を実現する。 CISSはBDD100K-nightのようなトレーニング中に見つからない領域によく一般化している。コードはhttps://github.com/SysCV/CISSで公開されている。 Adaptation of semantic segmentation networks to different visual conditions is vital for robust perception in autonomous cars and robots. However, previous work has shown that most feature-level adaptation methods, which employ adversarial training and are validated on synthetic-to-real adaptation, provide marginal gains in condition-level adaptation, being outperformed by simple pixel-level adaptation via stylization. Motivated by these findings, we propose to leverage stylization in performing feature-level adaptation by aligning the internal network features extracted by the encoder of the network from the original and the stylized view of each input image with a novel feature invariance loss. In this way, we encourage the encoder to extract features that are already invariant to the style of the input, allowing the decoder to focus on parsing these features and not on further abstracting from the specific style of the input. We implement our method, named Condition-Invariant Semantic Segmentation (CISS), on the current state-of-the-art domain adaptation architecture and achieve outstanding results on condition-level adaptation. In particular, CISS sets the new state of the art in the popular daytime-to-nighttime Cityscapes$\to$Dark Zurich benchmark. Furthermore, our method achieves the second-best performance on the normal-to-adverse Cityscapes$\to$ACDC benchmark. CISS is shown to generalize well to domains unseen during training, such as BDD100K-night. Code is publicly available at https://github.com/SysCV/CISS .	翻訳日:2023-12-25 18:45:36 公開日:2023-12-22
# 変圧器ニューラルプロセスを用いたエンドツーエンドメタベイズ最適化 End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes ( http://arxiv.org/abs/2305.15930v4 ) ライセンス: Link先を確認	Alexandre Maraval, Matthieu Zimmer, Antoine Grosnit, Haitham Bou Ammar	(参考訳) Meta-Bayesian optimization (Meta-BO)は、関連するタスクからのデータを活用することで、ベイズ最適化のサンプル効率を改善することを目的としている。従来の手法はサロゲートモデルまたは獲得関数を独立にメタ学習することに成功したが、両コンポーネントの共同トレーニングは依然としてオープンな課題である。本稿では、トランスフォーマーアーキテクチャを介して獲得関数を学ぶために、神経過程を一般化する最初のエンドツーエンドの微分可能メタボフレームワークを提案する。強化学習(rl)を用いたこのエンドツーエンドフレームワークにより,ラベル付き取得データの欠如に対処できる。初期の段階では、特に報酬が不足している場合、RLでスクラッチからトランスフォーマーベースのニューラルプロセスのトレーニングが困難であることに気付きました。この主張を,報奨信号として広く用いられている後悔の概念が,軌道長の対数間隔パターンを示すことを示す組合せ解析で定式化した。この問題に対処するため,アーキテクチャの一部を指導し,帰納的バイアスとして有効な確率モデルを学習する補助的なタスクでRLの目的を増強する。提案手法は, 標準的なハイパーパラメータ最適化タスクの実験において, 様々なベースラインに対して, 最先端の後悔結果を達成するとともに, 混合整数プログラミングチューニング, 抗体設計, 電子設計自動化のための論理合成の現実的問題において, 他よりも優れていることを示す。 Meta-Bayesian optimisation (meta-BO) aims to improve the sample efficiency of Bayesian optimisation by leveraging data from related tasks. While previous methods successfully meta-learn either a surrogate model or an acquisition function independently, joint training of both components remains an open challenge. This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data. Early on, we notice that training transformer-based neural processes from scratch with RL is challenging due to insufficient supervision, especially when rewards are sparse. We formalise this claim with a combinatorial analysis showing that the widely used notion of regret as a reward signal exhibits a logarithmic sparsity pattern in trajectory lengths. To tackle this problem, we augment the RL objective with an auxiliary task that guides part of the architecture to learn a valid probabilistic model as an inductive bias. We demonstrate that our method achieves state-of-the-art regret results against various baselines in experiments on standard hyperparameter optimisation tasks and also outperforms others in the real-world problems of mixed-integer programming tuning, antibody design, and logic synthesis for electronic design automation.	翻訳日:2023-12-25 18:45:12 公開日:2023-12-22
# ウエハスケールMgB2超電導デバイス Wafer-Scale MgB2 Superconducting Devices ( http://arxiv.org/abs/2305.15190v2 ) ライセンス: Link先を確認	Changsub Kim, Christina Bell, Jake Evans, Jonathan Greenfield, Emma Batson, Karl Berggren, Nathan Lewis, Daniel Cunnane	(参考訳) 過去10年間の超伝導デバイスと検出器技術の進歩は、量子コンピュータ、遠赤外線望遠鏡用検出器、光通信における実用的な応用を実現している。しかし、超伝導薄膜材料は依然としてほとんど変化がなく、アルミニウムは超伝導量子ビットの材料であり、ニオブ化合物は高周波・高速度インダクタンスデバイスである。ジホリドマグネシウム (\mathrm{mgb}_2$) は、金属超伝導体の中で最も高い遷移温度 (\mathrm{t}_c$ = 39 k) で知られており、高温で高周波数の超伝導デバイスがthz周波数に向かって移動するための有効な材料である。しかし、ウェハスケール薄膜の合成の難しさは超伝導エレクトロニクスの応用基盤への$\mathrm{MgB}_2$デバイスの導入を妨げている。本稿では,直径100mm以上の超スムース(<0.5 nm)と均一な$\mathrm{mgb}_2$薄膜(<100 nm)を初めて報告し,これらフィルムを用いて作製した試作装置において,4.5 kで$\mathrm{10}^4$,40 nmフィルムで10sqのph/sqの順に高波長の動力学的インダクタンスを有する内部超伝導特性を示す。この画期的な進歩は、高温、高周波超伝導量子回路およびデバイスの開発を可能にする。 Progress in superconducting device and detector technologies over the past decade have realized practical applications in quantum computers, detectors for far-infrared telescopes, and optical communications. Superconducting thin film materials, however, have remained largely unchanged, with aluminum still being the material of choice for superconducting qubits, and niobium compounds for high frequency/high kinetic inductance devices. Magnesium diboride ($\mathrm{MgB}_2$), known for its highest transition temperature ($\mathrm{T}_c$ = 39 K) among metallic superconductors, is a viable material for elevated temperature and higher frequency superconducting devices moving towards THz frequencies. However, difficulty in synthesizing wafer-scale thin films have prevented implementation of $\mathrm{MgB}_2$ devices into the application base of superconducting electronics. Here, we report ultra-smooth (< 0.5 nm root-mean-square roughness) and uniform $\mathrm{MgB}_2$ thin (< 100 nm) films over 100 mm in diameter for the first time and present prototype devices fabricated with these films demonstrating key superconducting properties including internal quality factor over $\mathrm{10}^4$ at 4.5 K and high tunable kinetic inductance in the order of tens of pH/sq in a 40 nm film. This groundbreaking advancement will enable development of elevated temperature, high frequency superconducting quantum circuits and devices.	翻訳日:2023-12-25 18:44:44 公開日:2023-12-22
# In-Context Probing:大規模言語モデルによるロバスト分類器の構築に向けて In-Context Probing: Toward Building Robust Classifiers via Probing Large Language Models ( http://arxiv.org/abs/2305.14171v3 ) ライセンス: Link先を確認	Afra Amini and Massimiliano Ciaramita	(参考訳) 大きな言語モデルは、新しいタスクをコンテキストで学習することができ、命令といくつかの注釈付きの例が提供されている。しかし、文脈内学習の有効性は提供されたコンテキストに依存しており、下流タスクのパフォーマンスは命令によって大きく異なる可能性がある。重要なのは、このようなコンテキストへの依存が予測不能な方法で現れる可能性があることだ。本稿では, In-Context Probing (ICP) という代替手法を提案する。文脈内学習と同様に、入力の表現を命令でコンテキスト化するが、出力予測をデコードする代わりに、文脈化された表現を探索してラベルを予測する。多様な分類タスクの一連の実験を通して、文脈内探索は命令の変化に対してはるかに堅牢であることを示す。さらに、ICPは微調整よりも優れた性能を示し、より小さなモデルの上に分類器を構築するのに特に役立ち、訓練例は100に満たない。 Large language models are able to learn new tasks in context, where they are provided with instructions and a few annotated examples. However, the effectiveness of in-context learning is dependent on the provided context, and the performance on a downstream task can vary considerably, depending on the instruction. Importantly, such dependency on the context can surface in unpredictable ways, e.g., a seemingly more informative instruction might lead to a worse performance. In this paper, we propose an alternative approach, which we term In-Context Probing (ICP). Similar to in-context learning, we contextualize the representation of the input with an instruction, but instead of decoding the output prediction, we probe the contextualized representation to predict the label. Through a series of experiments on a diverse set of classification tasks, we show that in-context probing is significantly more robust to changes in instructions. We further show that ICP performs competitive or superior to finetuning and can be particularly helpful to build classifiers on top of smaller models, with less than a hundred training examples.	翻訳日:2023-12-25 18:44:04 公開日:2023-12-22
# 異なるランダム性をもつスパースランダム行列とガウスアンサンブル Sparse random matrices and Gaussian ensembles with varying randomness ( http://arxiv.org/abs/2305.07505v2 ) ライセンス: Link先を確認	Takanori Anegawa, Norihiro Iizuka, Arkaprava Mukherjee, Sunil Kumar Sake, Sandip P. Trivedi	(参考訳) ガウス分布からのカップリング定数を様々な方法で描画して得られるランダムハミルトニアンと n$ qubits の系について検討した。この結果、GUEと固定$q$SYK理論を含む豊富な系のクラスが得られる。私たちのモチベーションは、システムを大体$N$で理解することにあります。実際、我々の計算のほとんどは、正確な対角化技術を用いて行われる(最大$N=24$)。 gue から始めて,ランダム性が低下するにつれて生じる行動について検討する。一般に、ランダム性が低下するにつれて、システムはカオスからより順序づけられるようになるが、状態の密度、スペクトル形状係数、レベル統計、時間外相関器などの様々な特性の変化は興味深いパターンを明らかにする。主に数値的な解析の限界について、ハミルトニアンにおける非ゼロ独立項の数が指数関数的に$N$であるときに、その振る舞いが突然に変化するという証拠がいくつか見つかる。また,sykモデルの局所バージョンでは,結合数をn$で線形にスケールし,その挙動を特徴付けるような非ランダム性の逆極限についても検討した。我々の調査は、このタイプのシステムのより完全な理論解析は、かなり価値があることを示唆している。 We study a system of $N$ qubits with a random Hamiltonian obtained by drawing coupling constants from Gaussian distributions in various ways. This results in a rich class of systems which include the GUE and the fixed $q$ SYK theories. Our motivation is to understand the system at large $N$. In practice most of our calculations are carried out using exact diagonalisation techniques (up to $N=24$). Starting with the GUE, we study the resulting behaviour as the randomness is decreased. While in general the system goes from being chaotic to being more ordered as the randomness is decreased, the changes in various properties, including the density of states, the spectral form factor, the level statistics and out-of-time-ordered correlators, reveal interesting patterns. Subject to the limitations of our analysis which is mainly numerical, we find some evidence that the behaviour changes in an abrupt manner when the number of non-zero independent terms in the Hamiltonian is exponentially large in $N$. We also study the opposite limit of much reduced randomness obtained in a local version of the SYK model where the number of couplings scales linearly in $N$, and characterise its behaviour. Our investigation suggests that a more complete theoretical analysis of this class of systems will prove quite worthwhile.	翻訳日:2023-12-25 18:43:46 公開日:2023-12-22
# 大規模言語モデルによるスピアフィッシング Spear Phishing With Large Language Models ( http://arxiv.org/abs/2305.06972v3 ) ライセンス: Link先を確認	Julian Hazell	(参考訳) 人工知能(AI)の最近の進歩、特に大規模言語モデル(LLM)の領域は、強力で汎用的なデュアルユースシステムを生み出している。この知能は、様々な有益なタスクに向けられるが、害を引き起こすためにも使用できる。本研究は,標的を操り,機密情報を漏洩させるサイバー犯罪の一種であるスピアフィッシングに対して,llmがいかに利用できるかを調べることで,そのような害を探求する。まず,LLMが槍フィッシング攻撃の偵察およびメッセージ生成を補助する能力について検討し,その上で,槍フィッシング攻撃の電子メール生成フェーズを支援できることを見出した。次に、OpenAIのGPT-3.5およびGPT-4モデルを使用して、600人以上の英国議会議員に対して、LLMのスピアフィッシングキャンペーンの規模を拡大する可能性を探るため、ユニークなスピアフィッシングメッセージを作成しました。私の調査結果は、これらのメッセージが現実的なだけでなく、コスト効率も高く、それぞれのメールが生成するのにわずか1セントしかかからないことを示しています。次に、基本的なプロンプトエンジニアリングがllmsにインストールされたセーフガードを回避し、モデルの誤用を防ぐロバストな介入に関するさらなる研究の必要性を強調する。これらの進化するリスクにさらに対処するために、アプリケーションプログラミングインタフェースのような構造化アクセススキームとLLMベースの防御システムという2つの潜在的なソリューションを検討します。 Recent progress in artificial intelligence (AI), particularly in the domain of large language models (LLMs), has resulted in powerful and versatile dual-use systems. This intelligence can be put towards a wide variety of beneficial tasks, yet it can also be used to cause harm. This study explores one such harm by examining how LLMs can be used for spear phishing, a form of cybercrime that involves manipulating targets into divulging sensitive information. I first explore LLMs' ability to assist with the reconnaissance and message generation stages of a spear phishing attack, where I find that LLMs are capable of assisting with the email generation phase of a spear phishing attack. To explore how LLMs could potentially be harnessed to scale spear phishing campaigns, I then create unique spear phishing messages for over 600 British Members of Parliament using OpenAI's GPT-3.5 and GPT-4 models. My findings provide some evidence that these messages are not only realistic but also cost-effective, with each email costing only a fraction of a cent to generate. Next, I demonstrate how basic prompt engineering can circumvent safeguards installed in LLMs, highlighting the need for further research into robust interventions that can help prevent models from being misused. To further address these evolving risks, I explore two potential solutions: structured access schemes, such as application programming interfaces, and LLM-based defensive systems.	翻訳日:2023-12-25 18:43:24 公開日:2023-12-22
# ランダムlpノルム劣化を伴う画像分類器の破壊ロバスト性の検討 Investigating the Corruption Robustness of Image Classifiers with Random Lp-norm Corruptions ( http://arxiv.org/abs/2305.05400v3 ) ライセンス: Link先を確認	Georg Siedel, Weijia Shao, Silvia Vock, Andrey Morozov	(参考訳) 堅牢性は、安全性と信頼性を達成するために必要な機械学習分類器の基本特性である。画像分類器の対向ロバストネスの分野では、ロバストネスはp-ノルム距離内の全ての入力変化に対するモデルの安定性として定義される。しかしながら、ランダムな腐敗の堅牢性の分野では、現実世界で観測される変動が使われ、p-ノルムの腐敗はめったに考慮されない。本研究では,画像分類器のトレーニングとテストデータを強化するために,ランダムなpノルム腐敗の利用を検討する。既視的ランダムpノルム破壊に対するモデルロバスト性を評価し,新しいロバストネス指標を提案する。 p-ノルム間のロバスト性伝達とモデルがp-ノルム崩壊を訓練し評価すべき結論を導出するかどうかを実証的に検討する。 p-ノルムの汚職の組み合わせによるトレーニングデータの増大は、最先端のデータ増補スキームにおいても、汚職の堅牢性を大幅に向上させる。 Robustness is a fundamental property of machine learning classifiers required to achieve safety and reliability. In the field of adversarial robustness of image classifiers, robustness is commonly defined as the stability of a model to all input changes within a p-norm distance. However, in the field of random corruption robustness, variations observed in the real world are used, while p-norm corruptions are rarely considered. This study investigates the use of random p-norm corruptions to augment the training and test data of image classifiers. We evaluate the model robustness against imperceptible random p-norm corruptions and propose a novel robustness metric. We empirically investigate whether robustness transfers across different p-norms and derive conclusions on which p-norm corruptions a model should be trained and evaluated. We find that training data augmentation with a combination of p-norm corruptions significantly improves corruption robustness, even on top of state-of-the-art data augmentation schemes.	翻訳日:2023-12-25 18:42:59 公開日:2023-12-22
# プロトタイプベース多段階学習による半教師付きドメイン適応 Semi-supervised Domain Adaptation via Prototype-based Multi-level Learning ( http://arxiv.org/abs/2305.02693v3 ) ライセンス: Link先を確認	Xinyang Huang, Chuang Zhu and Wenkai Chen	(参考訳) 半教師付きドメイン適応(ssda)では、各クラスのラベル付きターゲットサンプルが、モデルが完全なラベル付きソースドメインからターゲットドメインへの知識表現の転送を支援する。既存の多くのメソッドは、ラベル付きターゲットサンプルをマルチレベルから完全に利用する利点を無視している。この追加データをよりよく活用するために,ラベル付き対象サンプルの可能性をうまく活用するためのプロトタイプベース多段階学習(ProML)フレームワークを提案する。ドメイン内適応を実現するために,まず,ドメイン内最適移動に基づく擬似ラベルアグリゲーションを導入し,ラベルなしのターゲットサンプルとプロトタイプの特徴分布をモデル化する。ドメイン間レベルでは、モデルがドメイン間知識転送のターゲットプロトタイプを使用するのを助けるために、クロスドメインアライメントロスを提案する。さらに,プロトタイプ類似性と線形分類器に基づく2重一貫性を提案し,バッチレベルでのコンパクトな特徴表現の識別学習を促進する。 DomainNet, VisDA2017, Office-Homeの3つのデータセットに対する大規模な実験により,提案手法がSSDAの最先端性能を実現することを示す。 In semi-supervised domain adaptation (SSDA), a few labeled target samples of each class help the model to transfer knowledge representation from the fully labeled source domain to the target domain. Many existing methods ignore the benefits of making full use of the labeled target samples from multi-level. To make better use of this additional data, we propose a novel Prototype-based Multi-level Learning (ProML) framework to better tap the potential of labeled target samples. To achieve intra-domain adaptation, we first introduce a pseudo-label aggregation based on the intra-domain optimal transport to help the model align the feature distribution of unlabeled target samples and the prototype. At the inter-domain level, we propose a cross-domain alignment loss to help the model use the target prototype for cross-domain knowledge transfer. We further propose a dual consistency based on prototype similarity and linear classifier to promote discriminative learning of compact target feature representation at the batch level. Extensive experiments on three datasets, including DomainNet, VisDA2017, and Office-Home demonstrate that our proposed method achieves state-of-the-art performance in SSDA.	翻訳日:2023-12-25 18:42:40 公開日:2023-12-22
# FlightBERT++: 自動回帰型マルチ水平飛行軌道予測フレームワーク FlightBERT++: A Non-autoregressive Multi-Horizon Flight Trajectory Prediction Framework ( http://arxiv.org/abs/2305.01658v2 ) ライセンス: Link先を確認	Dongyue Guo, Zheng Zhang, Zhen Yan, Jianwei Zhang, and Yi Lin	(参考訳) フライト軌道予測(ftp)は、航空管制における重要なタスクであり、航空管制官がより安全かつ効率的に航空空間を管理するのを助ける。既存のアプローチは、通常、自動回帰的にマルチ水平FTPタスクを実行するため、エラーの蓄積や低効率の問題に悩まされる。本稿では,FlightBERT++と呼ばれる新しいフレームワークを提案する。一自己回帰的でない方法で直接マルチホライゾン飛行軌道を予測すること。 ii) FlightBERTにおけるバイナリエンコーディング(BE)表現の制限を改善すること。特に、FlightBERT++は、エンコーダ-デコーダアーキテクチャによって実装され、エンコーダは歴史的観測から時間空間パターンを学習し、デコーダは将来の地平線の飛行状態を予測する。従来のアーキテクチャと比較して,事前の地平線情報を考慮するために,革新的な地平線認識コンテキスト生成器が設計されている。さらに、差分列の定常性を利用して、差分予測の能力を高めるために、差分誘導復号器を提案する。実世界のデータセット実験の結果、FlightBERT++はFTP性能と計算効率の両面で競合するベースラインを上回った。 Flight Trajectory Prediction (FTP) is an essential task in Air Traffic Control (ATC), which can assist air traffic controllers in managing airspace more safely and efficiently. Existing approaches generally perform multi-horizon FTP tasks in an autoregressive manner, thereby suffering from error accumulation and low-efficiency problems. In this paper, a novel framework, called FlightBERT++, is proposed to i) forecast multi-horizon flight trajectories directly in a non-autoregressive way, and ii) improve the limitation of the binary encoding (BE) representation in the FlightBERT. Specifically, the FlightBERT++ is implemented by a generalized encoder-decoder architecture, in which the encoder learns the temporal-spatial patterns from historical observations and the decoder predicts the flight status for the future horizons. Compared with conventional architecture, an innovative horizon-aware contexts generator is dedicatedly designed to consider the prior horizon information, which further enables non-autoregressive multi-horizon prediction. Moreover, a differential prompted decoder is proposed to enhance the capability of the differential predictions by leveraging the stationarity of the differential sequence. The experimental results on a real-world dataset demonstrated that the FlightBERT++ outperformed the competitive baselines in both FTP performance and computational efficiency.	翻訳日:2023-12-25 18:42:19 公開日:2023-12-22
# 屈曲軟導波路の束縛状態 Bound States in Bent Soft Waveguides ( http://arxiv.org/abs/2304.14776v2 ) ライセンス: Link先を確認	Pavel Exner and Semjon Vugalter	(参考訳) 本論文の目的は,固定プロファイルの'ditch'形式のポテンシャルを持つ2次元schr\"odinger演算子が幾何学的に誘起される離散スペクトルを持つことを示すことである。さらに、強い幾何学的制約の下では、この主張はチャネルの「バンク」の1つに潜在的なバイアスが存在する場合にも真である。 The aim of this paper is to show that a two-dimensional Schr\"odinger operator with the potential in the form of a `ditch' of a fixed profile can have a geometrically induced discrete spectrum; this happens if such a potential channel has a single or multiple bends being straight outside a compact. Moreover, under stronger geometric restrictions the claim remains true in the presence of a potential bias at one of the channel `banks'.	翻訳日:2023-12-25 18:41:54 公開日:2023-12-22
# AutoNeRF: 自律エージェントによる暗黙のシーン表現のトレーニング AutoNeRF: Training Implicit Scene Representations with Autonomous Agents ( http://arxiv.org/abs/2304.11241v2 ) ライセンス: Link先を確認	Pierre Marza, Laetitia Matignon, Olivier Simonin, Dhruv Batra, Christian Wolf, Devendra Singh Chaplot	(参考訳) ニューラルレージアンス場(NeRF)のような入射表現は、新規なビュー合成に非常に有効であることが示されている。しかし、これらのモデルは通常、トレーニングのために手動で注意深い人的データ収集を必要とする。本稿では,自律型エンボディエージェントを用いたNeRF訓練に必要なデータ収集手法であるAutoNeRFを提案する。本手法では,エージェントが未知の環境を効率的に探索し,その経験を用いて暗黙の地図表現を自律的に構築できる。我々は,手作りのフロンティア探索や,訓練された高レベルプランナーと古典的な低レベルパスフォロワーからなるエンドツーエンドおよびモジュラーアプローチなど,さまざまな探索戦略の影響を比較した。我々は,この問題に適応した異なる報酬関数を持つこれらのモデルを訓練し,古典的視点レンダリング,地図再構成,計画,ポーズリファインメントという4つの下流タスクにおける学習表現の品質を評価する。実験結果から,nerfsは未発見の環境において1回の体験のみを使用して,アクティブに収集されたデータに対してトレーニングすることが可能であり,いくつかの下流ロボットタスクに使用できること,モジュール型学習された探索モデルは,他の古典的およびエンドツーエンドのベースラインよりも優れることが示された。最後に,AutoNeRFは大規模シーンの再構成が可能であり,生成した3D環境モデルをシミュレータにロードし,興味のあるポリシーを微調整できるため,シーン固有の適応を行う上で有用なツールであることを示す。 Implicit representations such as Neural Radiance Fields (NeRF) have been shown to be very effective at novel view synthesis. However, these models typically require manual and careful human data collection for training. In this paper, we present AutoNeRF, a method to collect data required to train NeRFs using autonomous embodied agents. Our method allows an agent to explore an unseen environment efficiently and use the experience to build an implicit map representation autonomously. We compare the impact of different exploration strategies including handcrafted frontier-based exploration, end-to-end and modular approaches composed of trained high-level planners and classical low-level path followers. We train these models with different reward functions tailored to this problem and evaluate the quality of the learned representations on four different downstream tasks: classical viewpoint rendering, map reconstruction, planning, and pose refinement. Empirical results show that NeRFs can be trained on actively collected data using just a single episode of experience in an unseen environment, and can be used for several downstream robotic tasks, and that modular trained exploration models outperform other classical and end-to-end baselines. Finally, we show that AutoNeRF can reconstruct large-scale scenes, and is thus a useful tool to perform scene-specific adaptation as the produced 3D environment models can be loaded into a simulator to fine-tune a policy of interest.	翻訳日:2023-12-25 18:41:24 公開日:2023-12-22
# ランダム回路サンプリングにおける位相遷移 Phase transition in Random Circuit Sampling ( http://arxiv.org/abs/2304.11119v2 ) ライセンス: Link先を確認	A. Morvan, B. Villalonga, X. Mi, S. Mandr\`a, A. Bengtsson, P. V. Klimov, Z. Chen, S. Hong, C. Erickson, I. K. Drozdov, J. Chau, G. Laun, R. Movassagh, A. Asfaw, L. T.A.N. Brand\~ao, R. Peralta, D. Abanin, R. Acharya, R. Allen, T. I. Andersen, K. Anderson, M. Ansmann, F. Arute, K. Arya, J. Atalaya, J. C. Bardin, A. Bilmes, G. Bortoli, A. Bourassa, J. Bovaird, L. Brill, M. Broughton, B. B. Buckley, D. A. Buell, T. Burger, B. Burkett, N. Bushnell, J. Campero, H. S. Chang, B. Chiaro, D. Chik, C. Chou, J. Cogan, R. Collins, P. Conner, W. Courtney, A. L. Crook, B. Curtin, D. M. Debroy, A. Del Toro Barba, S. Demura, A. Di Paolo, A. Dunsworth, L. Faoro, E. Farhi, R. Fatemi, V. S. Ferreira, L. Flores Burgos, E. Forati, A. G. Fowler, B. Foxen, G. Garcia, E. Genois, W. Giang, C. Gidney, D. Gilboa, M. Giustina, R. Gosula, A. Grajales Dau, J. A. Gross, S. Habegger, M. C. Hamilton, M. Hansen, M. P. Harrigan, S. D. Harrington, P. Heu, M. R. Hoffmann, T. Huang, A. Huff, W. J. Huggins, L. B. Ioffe, S. V. Isakov, J. Iveland, E. Jeffrey, Z. Jiang, C. Jones, P. Juhas, D. Kafri, T. Khattar, M. Khezri, M. Kieferov\'a, S. Kim, A. Kitaev, A. R. Klots, A. N. Korotkov, F. Kostritsa, J. M. Kreikebaum, D. Landhuis, P. Laptev, K.-M. Lau, L. Laws, J. Lee, K. W. Lee, Y. D. Lensky, B. J. Lester, A. T. Lill, W. Liu, W. P. Livingston, A. Locharla, F. D. Malone, O. Martin, S. Martin, J. R. McClean, M. McEwen, K. C. Miao, A. Mieszala, S. Montazeri, W. Mruczkiewicz, O. Naaman, M. Neeley, C. Neill, A. Nersisyan, M. Newman, J. H. Ng, A. Nguyen, M. Nguyen, M. Yuezhen Niu, T. E. O'Brien, S. Omonije, A. Opremcak, A. Petukhov, R. Potter, L. P. Pryadko, C. Quintana, D. M. Rhodes, E. Rosenberg, C. Rocque, P. Roushan, N. C. Rubin, N. Saei, D. Sank, K. Sankaragomathi, K. J. Satzinger, H. F. Schurkus, C. Schuster, M. J. Shearn, A. Shorter, N. Shutty, V. Shvarts, V. Sivak, J. Skruzny, W. C. Smith, R. D. Somma, G. Sterling, D. Strain, M. Szalay, D. Thor, A. Torres, G. Vidal, C. Vollgraff Heidweiller, T. White, B. W. K. Woo, C. Xing, Z. J. Yao, P. Yeh, J. Yoo, G. Young, A. Zalcman, Y. Zhang, N. Zhu, N. Zobrist, E. G. Rieffel, R. Biswas, R. Babbush, D. Bacon, J. Hilton, E. Lucero, H. Neven, A. Megrant, J. Kelly, I. Aleiner, V. Smelyanskiy, K. Kechedzhi, Y. Chen, S. Boixo	(参考訳) 周囲環境への望ましくない結合は、量子プロセッサ上の長距離相関を破壊し、名目上利用可能な計算空間におけるコヒーレント進化を妨げる。この非コヒーレントノイズは、短期量子プロセッサの計算能力を完全に活用する際、顕著な課題である。ランダム回路サンプリング (RCS) とクロスエントロピーベンチマーク (XEB) のベンチマークにより、ヒルベルト空間の有効サイズを確実に推定できることが示されている。雑音の存在が与えられた量子アルゴリズムの出力を自明にできる程度、すなわち古典的計算によってスポアブル化できる程度は、解き放たれた問題である。ここでは、RCSアルゴリズムの実装により、XEBで観測可能な2つの相転移が存在することを実験的に実証し、統計的モデルを用いて理論的に説明する。 1つ目はサイクルの数の関数としての動的遷移であり、無騒音の場合の反集中点の継続である。 2つ目は1サイクルあたりの誤差によって制御される量子相転移であり、解析的および実験的に識別するために、ノイズの強さとコヒーレントな進化を両立させる弱いリンクモデルを作成する。さらに, 67キュービットのRCS実験を32サイクルで行うことにより, 従来のスーパーコンピュータの計算コストが, ノイズの存在を考慮に入れた場合でも, 従来のスーパーコンピュータの能力を超えることを示した。我々の実験的および理論的研究は、現在の量子プロセッサで到達可能な安定な計算複雑相への遷移の存在を確立する。 Undesired coupling to the surrounding environment destroys long-range correlations on quantum processors and hinders the coherent evolution in the nominally available computational space. This incoherent noise is an outstanding challenge to fully leverage the computation power of near-term quantum processors. It has been shown that benchmarking Random Circuit Sampling (RCS) with Cross-Entropy Benchmarking (XEB) can provide a reliable estimate of the effective size of the Hilbert space coherently available. The extent to which the presence of noise can trivialize the outputs of a given quantum algorithm, i.e. making it spoofable by a classical computation, is an unanswered question. Here, by implementing an RCS algorithm we demonstrate experimentally that there are two phase transitions observable with XEB, which we explain theoretically with a statistical model. The first is a dynamical transition as a function of the number of cycles and is the continuation of the anti-concentration point in the noiseless case. The second is a quantum phase transition controlled by the error per cycle; to identify it analytically and experimentally, we create a weak link model which allows varying the strength of noise versus coherent evolution. Furthermore, by presenting an RCS experiment with 67 qubits at 32 cycles, we demonstrate that the computational cost of our experiment is beyond the capabilities of existing classical supercomputers, even when accounting for the inevitable presence of noise. Our experimental and theoretical work establishes the existence of transitions to a stable computationally complex phase that is reachable with current quantum processors.	翻訳日:2023-12-25 18:41:00 公開日:2023-12-22
# 開放シュウィンガー模型のリウビリアンダイナミクス:熱媒質における弦破断と運動散逸 Liouvillian Dynamics of the Open Schwinger Model: String Breaking and Kinetic Dissipation in a Thermal Medium ( http://arxiv.org/abs/2308.03878v3 ) ライセンス: Link先を確認	Kyle Lee, James Mulligan, Felix Ringer and Xiaojun Yao	(参考訳) 境界状態形成のダイナミクスを理解することは、量子色力学(qcd)のような量子場理論を閉じ込める基本的な問題の1つである。最初にフェルミオンと反フェルミオンをつなぐ弦の破断が大きな注目を集めたハドロン化機構の1つである。シュウィンガーモデルのようなより単純で低次元のモデルでリアルタイムの弦破れ力学の理解を深めることにより、凝縮物質や統計システムで見られるQCDやその他の凝縮系におけるハドロン化過程の理解を深めることができる。本稿では,シュウィンガーモデルにおける弦破壊のダイナミクスを考察し,熱媒質中での修正を考察し,シュウィンガーモデルを熱環境に結合した開量子系として扱う。システムと環境の間の弱い結合の仕組みの中で、システムのリアルタイムな進化はリンドブラッド進化方程式によって説明できる。このリンドブラッド方程式のリウヴィリアンギャップとシステムのフォン・ノイマンエントロピーの時間依存性を解析した。環境相関時間の増加に伴い, 後期緩和速度は低下する。さらに、環境相関長が無限であるとき、系は2つの定常状態を示し、各々のチャージ共役パリティ(cp)量子数を持つセクタに1つずつを示す。初期弦が真空で壊れるパラメータ状態に対しては, 運動的消散効果により, 媒体内の弦破壊の遅れが観察される。逆に、真空時間進化において初期弦がそのまま残る状態においては、熱媒体内の弦の破れ(融解)が観察される。さらに,オープンシュウィンガーモデルのリウビリアンダイナミクスを量子コンピュータ上でシミュレートし,関連するトロッター誤差を推定する方法についても検討した。 Understanding the dynamics of bound state formation is one of the fundamental questions in confining quantum field theories such as Quantum Chromodynamics (QCD). One hadronization mechanism that has garnered significant attention is the breaking of a string initially connecting a fermion and an anti-fermion. Deepening our understanding of real-time string-breaking dynamics with simpler, lower dimensional models like the Schwinger model can improve our understanding of the hadronization process in QCD and other confining systems found in condensed matter and statistical systems. In this paper, we consider the string-breaking dynamics within the Schwinger model and investigate its modification inside a thermal medium, treating the Schwinger model as an open quantum system coupled to a thermal environment. Within the regime of weak coupling between the system and environment, the real-time evolution of the system can be described by a Lindblad evolution equation. We analyze the Liouvillian gaps of this Lindblad equation and the time dependence of the system's von Neumann entropy. We observe that the late-time relaxation rate decreases as the environment correlation length increases. Moreover, when the environment correlation length is infinite, the system exhibits two steady states, one in each of the sectors with definite charge-conjugation-parity (CP) quantum numbers. For parameter regimes where an initial string breaks in vacuum, we observe a delay of the string breaking in the medium, due to kinetic dissipation effects. Conversely, in regimes where an initial string remains intact in vacuum time evolution, we observe string breaking (melting) in the thermal medium. We further discuss how the Liouvillian dynamics of the open Schwinger model can be simulated on quantum computers and provide an estimate of the associated Trotter errors.	翻訳日:2023-12-25 18:35:22 公開日:2023-12-22
# unival: 画像、ビデオ、オーディオ、言語タスクのための統一モデル UnIVAL: Unified Model for Image, Video, Audio and Language Tasks ( http://arxiv.org/abs/2307.16184v2 ) ライセンス: Link先を確認	Mustafa Shukor, Corentin Dancette, Alexandre Rame, Matthieu Cord	(参考訳) 大規模言語モデル(LLM)は、汎用エージェントの野心的な探求を幻想からかなり遠ざかっている。このような一般的なモデルを構築する上で重要なハードルは、タスクとモダリティの多様性と多様性である。有望な解決策は統一であり、一つの統一フレームワーク内で多数のタスクとモダリティをサポートすることができる。大規模なデータセットで訓練されたFlamingo (Alayrac et al., 2022)のような大規模なモデルはほとんど2つのモダリティをサポートできないが、現在の小型モデルと中規模モデルはまだ2つのモダリティに制限されている。すべてのモダリティを効率的にサポートする統一モデルを構築することは可能ですか? そこで我々は,この野心的な目標に向けての一歩として,UnIVALを提案する。データセットのサイズや数十億のパラメータを持つモデルに頼ることなく、0.55bのパラメータユニバルモデルは2つのモダリティを超えて、テキスト、イメージ、ビデオ、オーディオを1つのモデルに統合します。我々のモデルはタスクバランスとマルチモーダルカリキュラム学習に基づいて,多くのタスクで効率的に事前学習される。 UnIVALは、画像およびビデオテキストタスク間で、既存の最先端アプローチと競合するパフォーマンスを示す。画像とビデオテキストのモダリティから学んだ特徴表現は、オーディオに事前学習されていないにもかかわらず、オーディオテキストタスクで微調整された場合、モデルが競合性能を達成することができる。統一モデルにより,異なるマルチモーダルタスクで訓練されたモデルの重み補間によるマルチモーダルモデルマージに関する新しい研究を提案し,その効果を分散一般化に示している。最後に,タスク間の相乗効果を示すことによって,統合の動機付けを行う。モデルウェイトとコードは以下にリリースされている。 Large Language Models (LLMs) have made the ambitious quest for generalist agents significantly far from being a fantasy. A key hurdle for building such general models is the diversity and heterogeneity of tasks and modalities. A promising solution is unification, allowing the support of a myriad of tasks and modalities within one unified framework. While few large models (e.g., Flamingo (Alayrac et al., 2022), trained on massive datasets, can support more than two modalities, current small to mid-scale unified models are still limited to 2 modalities, usually image-text or video-text. The question that we ask is: is it possible to build efficiently a unified model that can support all modalities? To answer this, we propose UnIVAL, a step further towards this ambitious goal. Without relying on fancy datasets sizes or models with billions of parameters, the ~ 0.25B parameter UnIVAL model goes beyond two modalities and unifies text, images, video, and audio into a single model. Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning. UnIVAL shows competitive performance to existing state-of-the-art approaches, across image and video-text tasks. The feature representations learned from image and video-text modalities, allows the model to achieve competitive performance when finetuned on audio-text tasks, despite not being pretrained on audio. Thanks to the unified model, we propose a novel study on multimodal model merging via weight interpolation of models trained on different multimodal tasks, showing their benefits in particular for out-of-distribution generalization. Finally, we motivate unification by showing the synergy between tasks. The model weights and code are released here: https://github.com/mshukor/UnIVAL.	翻訳日:2023-12-25 18:34:49 公開日:2023-12-22
# 時間相関ノイズを有する量子デバイスの圧縮ゲート特性評価 Compressed gate characterization for quantum devices with time-correlated noise ( http://arxiv.org/abs/2307.14432v2 ) ライセンス: Link先を確認	M. J. Gullans, M. Caranti, A. R. Mills, and J. R. Petta	(参考訳) 量子デバイスは、中間スケールとフォールトトレラントな量子コンピューティングに向けて着実に進歩するので、既知のノイズ源を説明する厳密で効率的な測定プロトコルを開発することが不可欠である。ゲートセットトモグラフィやランダム化ベンチマークのような既存の量子特徴づけプロトコルの多くは、量子ビットに作用するノイズがマルコビアンであると仮定する。しかし、1/fの電荷ノイズや超微細核スピンノイズの場合のように、この仮定はしばしば有効ではない。本稿では,時間関連ノイズの存在下での量子プロセストモグラフィ(QPT)の一般的な枠組みについて述べる。さらに,マルコフ音源と非マルコフノイズの相対強度を定量化する忠実度ベンチマークも導入する。本手法の適用例として,シリコンスピン量子ビットの比較理論的および実験的解析を行った。まず, 支配的雑音源を考慮した詳細なノイズモデルを開発し, 実験データに対する評価を行った。時間関連QPTの枠組みを適用すると、完全汎用の場合と比較して、1と2のキュービットゲートを特徴付けるのに必要な独立パラメータの数を10倍、100倍圧縮できることがわかった。これらの圧縮は実験に必要なトモグラフィ測定量を減少させると同時に、時間依存のハミルトニアンシミュレーションと比較してノイズ量子回路ダイナミクスの数値シミュレーションを著しく高速化する。この圧縮雑音モデルを用いて, シリコンスピン量子ビットに関する最近の実験において, 理論的に予測されたプロセスフィデリティと2つの量子ビット間ランダム化ベンチマークフィデリティの99.8%との一致が確認された。より広範に、我々のフォーマリズムは直接拡張することができ、非マルコフノイズを持つ大規模量子デバイスの高忠実性制御のための効率的でスケーラブルなチューニングプロトコルを開発することができる。 As quantum devices make steady progress towards intermediate scale and fault-tolerant quantum computing, it is essential to develop rigorous and efficient measurement protocols that account for known sources of noise. Most existing quantum characterization protocols such as gate set tomography and randomized benchmarking assume the noise acting on the qubits is Markovian. However, this assumption is often not valid, as for the case of 1/f charge noise or hyperfine nuclear spin noise. Here, we present a general framework for quantum process tomography (QPT) in the presence of time-correlated noise. We further introduce fidelity benchmarks that quantify the relative strength of different sources of Markovian and non-Markovian noise. As an application of our method, we perform a comparative theoretical and experimental analysis of silicon spin qubits. We first develop a detailed noise model that accounts for the dominant sources of noise and validate the model against experimental data. Applying our framework for time-correlated QPT, we find that the number of independent parameters needed to characterize one and two-qubit gates can be compressed by 10x and 100x, respectively, when compared to the fully generic case. These compressions reduce the amount of tomographic measurements needed in experiment, while also significantly speeding up numerical simulations of noisy quantum circuit dynamics compared to time-dependent Hamiltonian simulation. Using this compressed noise model, we find good agreement between our theoretically predicted process fidelities and two qubit interleaved randomized benchmarking fidelities of 99.8% measured in recent experiments on silicon spin qubits. More broadly, our formalism can be directly extended to develop efficient and scalable tuning protocols for high-fidelity control of large-arrays of quantum devices with non-Markovian noise.	翻訳日:2023-12-25 18:34:18 公開日:2023-12-22
# 単一qudit符号化によるフォールトトレラント計算 Fault-Tolerant Computing with Single Qudit Encoding ( http://arxiv.org/abs/2307.10761v3 ) ライセンス: Link先を確認	Matteo Mezzadri, Alessandro Chiesa, Luca Lepori and Stefano Carretta	(参考訳) 我々は、複数の量子ビット符号の典型的なリソースエスカレーションを回避するため、単一のマルチレベルキューディットに実装された安定化器量子エラー補正符号について議論する。これらのコードはquditの特定の物理的エラーに合わせてカスタマイズすることができ、効果的に抑制することができる。分子スピンquditsに対するフォールトトレラントな実装を実証し,線形quditサイズ成長のみを用いてほぼ指数関数的誤差抑制を示す。特にこれは、数千単位のqubitコードよりも優れている。また,これら組込みコードをフォールトトレラントに実装するための汎用物理システムに必要な特性についても概説する。 We discuss stabilizer quantum-error correction codes implemented in a single multi-level qudit to avoid resource escalation typical of multi-qubit codes. These codes can be customized to the specific physical errors on the qudit, effectively suppressing them. We demonstrate a Fault-Tolerant implementation on molecular spin qudits, showcasing nearly exponential error suppression with only linear qudit size growth. Notably, this outperforms qubit codes using thousands of units. We also outline the required properties for a generic physical system to Fault-Tolerantly implement these embedded codes.	翻訳日:2023-12-25 18:33:52 公開日:2023-12-22
# 連合基盤モデルに向けて: グループ構造学習のためのスケーラブルなデータセットパイプライン Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning ( http://arxiv.org/abs/2307.09619v2 ) ライセンス: Link先を確認	Zachary Charles, Nicole Mitchell, Krishna Pillutla, Michael Reneer, Zachary Garrett	(参考訳) 我々は,大規模なグループ構造化(フェデレート)データセットを作成するためのライブラリであるDataset Grouperを導入し,基礎モデルのスケールでのフェデレーション学習シミュレーションを可能にする。このライブラリは、ユーザ指定のパーティションに基づいて、既存のデータセットのグループ構造バージョンの作成を容易にするとともに、既存のソフトウェアフレームワークにプラグイン可能な、さまざまな有用な異種データセットに直接つながる。 Dataset Grouperには3つの利点がある。まず、単一のグループのデータセットでさえメモリに収まるには大きすぎる設定にスケールします。第2に、基本(非分割)データセットの選択とパーティション定義の両方において、柔軟性を提供します。最後に、フレームワークに依存しない。我々は、Dataset Grouperが、以前の作業よりも桁違いに大きいデータセット上で、大規模なフェデレートされた言語モデリングシミュレーションを可能にし、数十億のパラメータを持つ言語モデルのフェデレーショントレーニングを可能にすることを実証的に実証した。実験の結果,FedAvgのようなアルゴリズムは,この規模の経験的リスク最小化手法よりもメタラーニング手法として機能し,下流のパーソナライズやタスク固有の適応に有用であることが示唆された。 dataset grouperはhttps://github.com/google-research/dataset_grouperで入手できる。 We introduce Dataset Grouper, a library to create large-scale group-structured (e.g., federated) datasets, enabling federated learning simulation at the scale of foundation models. This library facilitates the creation of group-structured versions of existing datasets based on user-specified partitions and directly leads to a variety of useful heterogeneous datasets that can be plugged into existing software frameworks. Dataset Grouper offers three key advantages. First, it scales to settings where even a single group's dataset is too large to fit in memory. Second, it provides flexibility, both in choosing the base (non-partitioned) dataset and in defining partitions. Finally, it is framework-agnostic. We empirically demonstrate that Dataset Grouper enables large-scale federated language modeling simulations on datasets that are orders of magnitude larger than in previous work, allowing for federated training of language models with hundreds of millions, and even billions, of parameters. Our experimental results show that algorithms like FedAvg operate more as meta-learning methods than as empirical risk minimization methods at this scale, suggesting their utility in downstream personalization and task-specific adaptation. Dataset Grouper is available at https://github.com/google-research/dataset_grouper.	翻訳日:2023-12-25 18:33:42 公開日:2023-12-22
# S.T.A.R.トラック:適応時空間表現を用いたエンドツーエンド3次元物体追跡のための潜在運動モデル S.T.A.R.-Track: Latent Motion Models for End-to-End 3D Object Tracking with Adaptive Spatio-Temporal Appearance Representations ( http://arxiv.org/abs/2306.17602v2 ) ライセンス: Link先を確認	Simon Doll, Niklas Hanselmann, Lukas Schneider, Richard Schulz, Markus Enzweiler, Hendrik P.A. Lensch	(参考訳) 本稿では,トラッキング・バイ・アテンションのパラダイムに従って,オブジェクト中心のトランスフォーマーベースの3d追跡フレームワークを提案する。従来のモデルに基づく追跡手法は、幾何運動モデルを用いたフレーム間のオブジェクトとエゴの動きの幾何学的効果を取り入れている。そこで,我々はs.t.a.r.-trackを提案する。s.t.a.r.-trackは,新しい潜在運動モデル (lmm) を用いて,潜在空間における視方向や照明条件の変化を考慮したオブジェクトクエリの調整を行う。トラックの存在確率をモデル化する新しい学習可能なトラック埋め込みと組み合わせることで、任意のクエリベースの検出器と統合可能な汎用的なトラッキングフレームワークが実現される。 nuScenes ベンチマークによる大規模な実験により,DETR3D ベースのトラッカーの \ac{sota} 性能を示すとともに,トラックの同一性スイッチ数を劇的に削減した。 Following the tracking-by-attention paradigm, this paper introduces an object-centric, transformer-based framework for tracking in 3D. Traditional model-based tracking approaches incorporate the geometric effect of object- and ego motion between frames with a geometric motion model. Inspired by this, we propose S.T.A.R.-Track, which uses a novel latent motion model (LMM) to additionally adjust object queries to account for changes in viewing direction and lighting conditions directly in the latent space, while still modeling the geometric motion explicitly. Combined with a novel learnable track embedding that aids in modeling the existence probability of tracks, this results in a generic tracking framework that can be integrated with any query-based detector. Extensive experiments on the nuScenes benchmark demonstrate the benefits of our approach, showing \ac{sota} performance for DETR3D-based trackers while drastically reducing the number of identity switches of tracks at the same time.	翻訳日:2023-12-25 18:33:20 公開日:2023-12-22
# 人間中心の生成AIの次のステップ:技術的視点 Next Steps for Human-Centered Generative AI: A Technical Perspective ( http://arxiv.org/abs/2306.15774v2 ) ライセンス: Link先を確認	Xiang 'Anthony' Chen, Jeff Burke, Ruofei Du, Matthew K. Hong, Jennifer Jacobs, Philippe Laban, Dingzeyu Li, Nanyun Peng, Karl D. D. Willis, Chien-Sheng Wu, Bolei Zhou	(参考訳) 繰り返し、学際的な議論を通じて、我々はHuman-centered Generative AI(HGAI)の次のステップを定義し、提案する。我々は、人的価値の整合性、人間の意図の同化、人間の能力の増強という3つのレベルにまたがるジェネレーティブAIの今後の方向性を示す包括的な研究課題に貢献する。これらの次のステップを特定することで、学際的な研究チームがHGAIにおける一貫したアイデアの集合を追求し、その関心事に焦点を合わせながら、将来的な作業環境の全体像を維持していくことを目指しています。 Through iterative, cross-disciplinary discussions, we define and propose next-steps for Human-centered Generative AI (HGAI). We contribute a comprehensive research agenda that lays out future directions of Generative AI spanning three levels: aligning with human values; assimilating human intents; and augmenting human abilities. By identifying these next-steps, we intend to draw interdisciplinary research teams to pursue a coherent set of emergent ideas in HGAI, focusing on their interested topics while maintaining a coherent big picture of the future work landscape.	翻訳日:2023-12-25 18:33:00 公開日:2023-12-22
# 量子最適輸送と弱位相 Quantum Optimal Transport and Weak Topologies ( http://arxiv.org/abs/2306.12944v3 ) ライセンス: Link先を確認	Laurent Lafleche	(参考訳) 古典的最適輸送距離の量子設定へのいくつかの拡張が提案されている。本稿では、golse, mouhot, paul [commun math phys 343:165-205, 2016] と golse, paul [arch ration mech anal 223:57-94, 2017] によって導入された擬メトリックスについて検討する。これらの擬メトリックは、位相空間上のモンゲ-カントロヴィチ-ワッサーシュタイン距離の量子アナログとして機能する。これらは、半古典近似における正の「自己距離」のため、小さな項まで負のソボレフノルムに匹敵することを証明する。これにより、初期データに対する正規性が少なくなり、平均場と半古典的限界の文脈で既知の結果を改善することができる。 Several extensions of the classical optimal transport distances to the quantum setting have been proposed. In this paper, we investigate the pseudometrics introduced by Golse, Mouhot and Paul in [Commun Math Phys 343:165-205, 2016] and by Golse and Paul in [Arch Ration Mech Anal 223:57-94, 2017]. These pseudometrics serve as a quantum analogue of the Monge-Kantorovich-Wasserstein distances of order $2$ on the phase space. We prove that they are comparable to negative Sobolev norms up to a small term due to a positive "self-distance" in the semiclassical approximation, which can be bounded above using the Wigner-Yanase skew information. This enables us to improve the known results in the context of the mean-field and semiclassical limits by requiring less regularity on the initial data.	翻訳日:2023-12-25 18:32:49 公開日:2023-12-22
# RoboCat:ロボットマニピュレーションのための自己改善型ジェネリストエージェント RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation ( http://arxiv.org/abs/2306.11706v2 ) ライセンス: Link先を確認	Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad \.Zo{\l}na, Scott Reed, Sergio G\'omez Colmenarejo, Jon Scholz, Abbas Abdolmaleki, Oliver Groth, Jean-Baptiste Regli, Oleg Sushkov, Tom Roth\"orl, Jos\'e Enrique Chen, Yusuf Aytar, Dave Barker, Joy Ortiz, Martin Riedmiller, Jost Tobias Springenberg, Raia Hadsell, Francesco Nori, Nicolas Heess	(参考訳) 異なるロボットやタスクから異種ロボット体験を活用し、新しいスキルや体格を素早く習得できる能力は、ロボット学習を変革する可能性がある。視覚と言語の基礎モデルの最近の進歩に触発されて,ロボット操作のためのマルチアンボディメントマルチタスク汎用エージェントを提案する。このエージェントはrobocatと呼ばれ、アクションラベルの視覚体験を消費できる視覚目標条件決定トランスフォーマーである。このデータは、シミュレートされた本物のロボットアームから、さまざまな観察とアクションのセットでモーターコントロールスキルの大規模なレパートリーにまたがる。 RoboCatでは、ゼロショットだけでなく、ターゲットタスクの100-1000例のみを使用して適応することで、新しいタスクやロボットに一般化する能力を示す。また、トレーニングされたモデル自体が、その後のトレーニングイテレーションでデータを生成するためにどのように使われるかを示し、自律的な改善ループのための基本的な構築ブロックを提供する。本研究は,シミュレーションと3種類の実ロボットを用いた大規模評価を行い,エージェントの能力について検討する。トレーニングデータの拡大と多様化が進むにつれ、robocatはクロスタスク転送の兆候を示すだけでなく、新しいタスクへの適応もより効率的になります。 The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.	翻訳日:2023-12-25 18:32:13 公開日:2023-12-22
# Sparse and Invisible Trigger によるバックドアアタック Backdoor Attack with Sparse and Invisible Trigger ( http://arxiv.org/abs/2306.06209v2 ) ライセンス: Link先を確認	Yinghua Gao, Yiming Li, Xueluan Gong, Zhifeng Li, Shu-Tao Xia, Qian Wang	(参考訳) ディープニューラルネットワーク(DNN)は、バックドア攻撃に対して脆弱であり、敵は、被害者モデルが通常、良性サンプルで予測するが、トリガーされたサンプルをターゲットクラスに分類するように、少数のトレーニングデータを操作する。バックドア攻撃は、トレーニングフェーズの脅威として浮上しているが、DNNベースのアプリケーションに深刻なリスクをもたらす。本稿では,既存のバックドア攻撃のトリガパターンを再検討する。私たちは、それらが見えているか、スパースでないかを明らかにします。さらに重要なのは、既存の手法を組み合わせて効果的なスパースで見えないバックドア攻撃を設計することは不可能である。この問題に対処するために、疎度と可視性制約を伴う二段階最適化問題としてトリガ生成を定式化し、それを解決する効果的な方法を提案する。提案手法はsparse and visible backdoor attack (SIBA)と呼ばれる。異なる設定下でベンチマークデータセットを広範囲に実験し、攻撃の有効性と既存のバックドア防御に対する耐性を検証する。主な実験を再現するためのコードは \url{https://github.com/yinghuagao/siba} で入手できる。 Deep neural networks (DNNs) are vulnerable to backdoor attacks, where the adversary manipulates a small portion of training data such that the victim model predicts normally on the benign samples but classifies the triggered samples as the target class. The backdoor attack is an emerging yet threatening training-phase threat, leading to serious risks in DNN-based applications. In this paper, we revisit the trigger patterns of existing backdoor attacks. We reveal that they are either visible or not sparse and therefore are not stealthy enough. More importantly, it is not feasible to simply combine existing methods to design an effective sparse and invisible backdoor attack. To address this problem, we formulate the trigger generation as a bi-level optimization problem with sparsity and invisibility constraints and propose an effective method to solve it. The proposed method is dubbed sparse and invisible backdoor attack (SIBA). We conduct extensive experiments on benchmark datasets under different settings, which verify the effectiveness of our attack and its resistance to existing backdoor defenses. The codes for reproducing main experiments are available at \url{https://github.com/YinghuaGao/SIBA}.	翻訳日:2023-12-25 18:31:53 公開日:2023-12-22
# スケッチ美化:学習部人工物体のスケッチの美化と構造洗練 Sketch Beautification: Learning Part Beautification and Structure Refinement for Sketches of Man-made Objects ( http://arxiv.org/abs/2306.05832v2 ) ライセンス: Link先を確認	Deng Yu, Manfred Lau, Lin Gao, Hongbo Fu	(参考訳) 本稿では,人工物体の自由なスケッチを入力し,幾何学的にも構造的にも自動的に美化する,新しいフリーハンドスケッチ美化手法を提案する。スケッチの美化は、非常に抽象的で多彩な描画方法のため、難しい。既存の手法は通常、限られた訓練サンプルの分布に制限されるため、豊かなバリエーションで自由に描かれたスケッチを美化することはできない。この課題に対処するために、分割・組み合わせ戦略を採用します。具体的には、まず、入力スケッチを意味成分にパースし、部分レベルの暗黙多様体に基づく学習部美化モジュールにより個々のコンポーネントを美化し、次に構造美化モジュールを介して美化コンポーネントを再評価する。この戦略により,本手法はトレーニングサンプルを超えて,新しいフリーハンドスケッチを処理できる。本システムの有効性を広範な実験と知覚的研究で実証する。 We present a novel freehand sketch beautification method, which takes as input a freely drawn sketch of a man-made object and automatically beautifies it both geometrically and structurally. Beautifying a sketch is challenging because of its highly abstract and heavily diverse drawing manner. Existing methods are usually confined to the distribution of their limited training samples and thus cannot beautify freely drawn sketches with rich variations. To address this challenge, we adopt a divide-and-combine strategy. Specifically, we first parse an input sketch into semantic components, beautify individual components by a learned part beautification module based on part-level implicit manifolds, and then reassemble the beautified components through a structure beautification module. With this strategy, our method can go beyond the training samples and handle novel freehand sketches. We demonstrate the effectiveness of our system with extensive experiments and a perceptive study.	翻訳日:2023-12-25 18:31:33 公開日:2023-12-22
# 予測と統計のパリティの調和:因果的アプローチ Reconciling Predictive and Statistical Parity: A Causal Approach ( http://arxiv.org/abs/2306.05059v2 ) ライセンス: Link先を確認	Drago Plecko, Elias Bareinboim	(参考訳) 公正な機械学習が調査の重要分野として台頭して以来、差別の定量化と測定方法に関する多くの異なる概念が文献で提案されている。しかし、これらの概念のいくつかは互いに相容れないことが示されている。このような結果から,多種多様な公平性が存在することが明らかとなり,公平性に関する適切な尺度についてのコンセンサスが困難となり,実用上のツールの適用が妨げられた。本稿では,統計的および予測的パリティの概念を関連づけた,これらの重要な不可能な結果の1つについて検討する。具体的には,予測パリティに関連する公平度尺度の新たな因果分解式を導出し,この基準が,異質な待遇,異質な影響,ビジネスの必要性という法的ドクトリンを通じて,統計的パリティとどのように関連しているか,新たな知見を得る。以上の結果から, 統計的・予測パリティの概念は, より慎重な因果分析を通じて, 相互排他的ではなく, ビジネスニーズという概念を通じて, 公正な概念のスペクトルを補完し, 分散していることが明らかとなった。最後に,実例における発見の重要性を実証する。 Since the rise of fair machine learning as a critical field of inquiry, many different notions on how to quantify and measure discrimination have been proposed in the literature. Some of these notions, however, were shown to be mutually incompatible. Such findings make it appear that numerous different kinds of fairness exist, thereby making a consensus on the appropriate measure of fairness harder to reach, hindering the applications of these tools in practice. In this paper, we investigate one of these key impossibility results that relates the notions of statistical and predictive parity. Specifically, we derive a new causal decomposition formula for the fairness measures associated with predictive parity, and obtain a novel insight into how this criterion is related to statistical parity through the legal doctrines of disparate treatment, disparate impact, and the notion of business necessity. Our results show that through a more careful causal analysis, the notions of statistical and predictive parity are not really mutually exclusive, but complementary and spanning a spectrum of fairness notions through the concept of business necessity. Finally, we demonstrate the importance of our findings on a real-world example.	翻訳日:2023-12-25 18:30:10 公開日:2023-12-22
# クロスモーダル検索のためのプロトタイプベースアレエータ不確かさ定量化 Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval ( http://arxiv.org/abs/2309.17093v2 ) ライセンス: Link先を確認	Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, Heng Tao Shen	(参考訳) クロスモーダル検索手法は、共通表現空間を共同学習することにより、視覚と言語モダリティの類似性関係を構築する。しかし、この予測は、腐敗した画像、速いペースの動画、未詳のテキストなど、低品質のデータによって引き起こされるアリータティックな不確実性によって、しばしば信頼性が低下する。本稿では,不確実性から生じる不確かさを定量化することにより,信頼性の高い予測を実現するための新しいプロトタイプベースアレエータ型不確実性定量化(pau)フレームワークを提案する。具体的には、セマンティクス部分空間全体を表現するために、まず様々な学習可能なプロトタイプを各モダリティ向けに構築する。次に、デンプスター・シェーファー理論と主観論理理論を用いて、証拠とディリクレ分布パラメータを関連付けた実証的理論的枠組みを構築する。 PAUモデルは、クロスモーダル検索のための正確な不確実性と信頼性のある予測を誘導する。 MSR-VTT, MSVD, DiDeMo, MS-COCOの4つの主要なベンチマークデータセットを用いて実験を行い, 本手法の有効性を実証した。コードはhttps://github.com/leolee99/PAUでアクセスできる。 Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity. Concretely, we first construct a set of various learnable prototypes for each modality to represent the entire semantics subspace. Then Dempster-Shafer Theory and Subjective Logic Theory are utilized to build an evidential theoretical framework by associating evidence with Dirichlet Distribution parameters. The PAU model induces accurate uncertainty and reliable predictions for cross-modal retrieval. Extensive experiments are performed on four major benchmark datasets of MSR-VTT, MSVD, DiDeMo, and MS-COCO, demonstrating the effectiveness of our method. The code is accessible at https://github.com/leolee99/PAU.	翻訳日:2023-12-25 18:22:45 公開日:2023-12-22
# 可変抑制によるシャープネス認識最適化の強化 Enhancing Sharpness-Aware Optimization Through Variance Suppression ( http://arxiv.org/abs/2309.15639v3 ) ライセンス: Link先を確認	Bingcong Li, Georgios B. Giannakis	(参考訳) シャープネスを意識した最小化(SAM)は、大きなデータ拡張がなくても、ディープニューラルネットワークの一般化を向上する上でのメリットを十分に文書化している。一般化能力を高める「平坦なミニマ」近傍の損失関数の幾何学を取り入れたSAMは、近隣の摂動パラメータによる最大損失を最小化して「平坦な谷」を求める。損失関数の鋭さを考慮に入れることは重要であるが、このような「過密な敵」は一般化の最も外側のレベルを縮めることができる。この貢献の新しいアプローチは、そのような親和性を避けるために分散抑制(vasso)を通じて敵の安定化を促進する。 VaSSOの証明可能な安定性は、画像分類や機械翻訳を含むモデルに依存しないタスクにおいてSAMよりも数値的に改善されている。さらに、実験により、VaSSOはSAMを高レベルのラベルノイズに対して堅牢性で支持することを確認した。 Sharpness-aware minimization (SAM) has well documented merits in enhancing generalization of deep neural networks, even without sizable data augmentation. Embracing the geometry of the loss function, where neighborhoods of 'flat minima' heighten generalization ability, SAM seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood. Although critical to account for sharpness of the loss function, such an 'over-friendly adversary' can curtail the outmost level of generalization. The novel approach of this contribution fosters stabilization of adversaries through variance suppression (VaSSO) to avoid such friendliness. VaSSO's provable stability safeguards its numerical improvement over SAM in model-agnostic tasks, including image classification and machine translation. In addition, experiments confirm that VaSSO endows SAM with robustness against high levels of label noise.	翻訳日:2023-12-25 18:22:14 公開日:2023-12-22
# PrNet:Android Raw GNSS測定による位置決めを改善するために擬似空間を補正するニューラルネットワーク PrNet: A Neural Network for Correcting Pseudoranges to Improve Positioning with Android Raw GNSS Measurements ( http://arxiv.org/abs/2309.12204v2 ) ライセンス: Link先を確認	Xu Weng, Keck Voon Ling, Haochen Liu	(参考訳) 本稿では,携帯端末から収集したデータによる局所化性能を向上させるために,疑似配列のバイアス誤差を軽減するニューラルネットワークを提案する。衛星方向多層パーセプトロン (mlp) は, 6つの衛星, 受信機, android raw global navigation satellite system (gnss) 測定から得られた文脈関連特徴から疑似範囲バイアス補正を緩和するように設計されている。 mlpを訓練するために,位置情報と平滑化手法を用いて疑似バイアスの目標値を慎重に算出し,スマートフォンのクロックバイアスの推定残差を考慮した損失関数を最適化する。修正された擬似範囲は、モデルベースのローカライズエンジンによって位置を計算するために使用される。 Google Phone Decimeter Challenge (GSDC)データセットは、農村部と都市部から収集されたAndroidスマートフォンデータを含んでいる。フィンガープリントとクロストレースの双方のローカライゼーションの結果から,提案手法はモデルベースおよび最先端のデータ駆動手法より優れていることが示された。 We present a neural network for mitigating biased errors in pseudoranges to improve localization performance with data collected from mobile phones. A satellite-wise Multilayer Perceptron (MLP) is designed to regress the pseudorange bias correction from six satellite, receiver, context-related features derived from Android raw Global Navigation Satellite System (GNSS) measurements. To train the MLP, we carefully calculate the target values of pseudorange bias using location ground truth and smoothing techniques and optimize a loss function involving the estimation residuals of smartphone clock bias. The corrected pseudoranges are then used by a model-based localization engine to compute locations. The Google Smartphone Decimeter Challenge (GSDC) dataset, which contains Android smartphone data collected from both rural and urban areas, is utilized for evaluation. Both fingerprinting and cross-trace localization results demonstrate that our proposed method outperforms model-based and state-of-the-art data-driven approaches.	翻訳日:2023-12-25 18:21:55 公開日:2023-12-22
# ホログラフィーの限界と量子情報プロトコルの補正 Holographic Limitations and Corrections to Quantum Information Protocols ( http://arxiv.org/abs/2309.09939v3 ) ライセンス: Link先を確認	Stefano Pirandola	(参考訳) 我々は、ベッケンシュタイン境界やススキンド球面エントロピー境界のようなホログラフィック境界による絡み合い分布、量子テレポーテーション、および量子通信に課される制限について論じる。連続可変(CV)量子情報に対して、ホログラフィック補正の単純適用が確立された結果を妨げていることを示す。これらの補正は完全cvテレポーテーションを不可能にし、損失のある量子チャネルのテレポーテーションシミュレーションにおける一様収束を妨げ、量子通信に修正されたplobバウンドを課す。これらの数学的補正は、実用的量子技術に直ちには影響しないが、量子情報理論のより深い理論的理解には重要である。 We discuss the limitations imposed on entanglement distribution, quantum teleportation, and quantum communication by holographic bounds, such as the Bekenstein bound and Susskind's spherical entropy bound. For continuous-variable (CV) quantum information, we show how the naive application of holographic corrections disrupts well-established results. These corrections render perfect CV teleportation impossible, preclude uniform convergence in the teleportation simulation of lossy quantum channels, and impose a revised PLOB bound for quantum communication. While these mathematical corrections do not immediately impact practical quantum technologies, they are critical for a deeper theoretical understanding of quantum information theory.	翻訳日:2023-12-25 18:21:35 公開日:2023-12-22
# 拡張コンパスモデルにおけるサブシステム対称性、臨界ボース表面および非移動励起 Subsystem symmetries, critical Bose surface and immobile excitations in an extended compass model ( http://arxiv.org/abs/2309.08300v2 ) ライセンス: Link先を確認	Zhidan Li, Chun-Jiong Huang, Changle Liu and Hai-Zhou Lu	(参考訳) サブシステム対称性をホストする拡張コンパスモデルを提案し、3d遷移金属化合物との潜在的な実験的関連性を示す。サブシステム対称性はスピン励起の移動性を強く制限し、重大な結果をもたらす。量子臨界点では、$k_x$ と $k_y$ の軸全体に「臨界ボース曲面」が存在することが分かる。その向こう側には、低温でネマティック不安定になる陽極線スピン液体があります。フェロ四極子相では、1つの励起は「フラクトン」と個別に類似している。 We propose an extended compass model that hosts subsystem symmetries and has potential experimental relevance with 3d transition metal compounds. The subsystem symmetries strongly constrain the mobility of spin excitations and lead to profound consequences. At the quantum critical point we find the presence of "critical Bose surface" along the entire $k_x$ and $k_y$ axis. Across which we find a nodal-line spin liquid that undergoes nematic instability at low temperatures. In the ferro-quadrupole phase, we find that one excitation is immobile individually analogous to "fractons".	翻訳日:2023-12-25 18:21:22 公開日:2023-12-22
# JPEGの差別化:悪魔は細部にある Differentiable JPEG: The Devil is in the Details ( http://arxiv.org/abs/2309.06978v4 ) ライセンス: Link先を確認	Christoph Reich, Biplob Debnath, Deep Patel, Srimat Chakradhar	(参考訳) jpegは最も広く普及している画像符号化方法の1つである。しかしながら、jpegの非微分性は、ディープラーニングパイプラインのアプリケーションを制限する。 JPEGのいくつかの異なる近似がこの問題に対処するために最近提案されている。本稿では既存の差分を包括的に検討する。 JPEGは従来の方法で見逃された重要な詳細にアプローチし、識別する。この目的のために、我々は新しい差分を提案する。 JPEGアプローチは、以前の制限を克服する。我々のアプローチは、入力画像、jpeg品質、量子化テーブル、色変換パラメータを微分可能なw.r.tである。我々は差分の前方および後方のパフォーマンスを評価する。既存のメソッドに対するJPEGアプローチ。さらに、重要な設計選択を評価するために広範なアブレーションが行われる。我々の提案した差分。 JPEGは(非差分)参照実装に最も似ており、近年の差分をはるかに上回っている。平均$3.47$dB (PSNR) のアプローチ。強い圧縮率では、PSNRも9.51ドルdB改善できる。強い敵攻撃の結果は差分によって得られる。 JPEGは、効果的な勾配近似を示す。私たちのコードはhttps://github.com/necla-ml/Diff-JPEGで公開されています。 JPEG remains one of the most widespread lossy image coding methods. However, the non-differentiable nature of JPEG restricts the application in deep learning pipelines. Several differentiable approximations of JPEG have recently been proposed to address this issue. This paper conducts a comprehensive review of existing diff. JPEG approaches and identifies critical details that have been missed by previous methods. To this end, we propose a novel diff. JPEG approach, overcoming previous limitations. Our approach is differentiable w.r.t. the input image, the JPEG quality, the quantization tables, and the color conversion parameters. We evaluate the forward and backward performance of our diff. JPEG approach against existing methods. Additionally, extensive ablations are performed to evaluate crucial design choices. Our proposed diff. JPEG resembles the (non-diff.) reference implementation best, significantly surpassing the recent-best diff. approach by $3.47$dB (PSNR) on average. For strong compression rates, we can even improve PSNR by $9.51$dB. Strong adversarial attack results are yielded by our diff. JPEG, demonstrating the effective gradient approximation. Our code is available at https://github.com/necla-ml/Diff-JPEG.	翻訳日:2023-12-25 18:21:13 公開日:2023-12-22
# セルフ・スーパービジョンによるLiDARデータのセマンティックシーンセグメンテーション Self-Supervised Pre-Training Boosts Semantic Scene Segmentation on LiDAR Data ( http://arxiv.org/abs/2309.02139v2 ) ライセンス: Link先を確認	Mariona Car\'os, Ariadna Just, Santi Segu\'i, Jordi Vitri\`a	(参考訳) 空中ライダーシステムは、主に3d座標で定義された点からなる広範囲な点雲データを生成することで、地球表面を捉えることができる。しかし、教師付き学習タスクにそのようなポイントをラベル付けするのは時間を要する。その結果,無ラベルデータから学習し,注釈付きサンプルの数を著しく削減できる技術を検討する必要がある。本研究では,Barlow Twins を用いた自己教師型エンコーダを訓練し,セマンティックシーンセグメンテーションのタスクにおいて,事前学習ネットワークとして使用することを提案する。実験の結果,教師なしの事前学習により,教師なしのタスク,特に未提示のカテゴリでパフォーマンスが向上することが示された。 Airborne LiDAR systems have the capability to capture the Earth's surface by generating extensive point cloud data comprised of points mainly defined by 3D coordinates. However, labeling such points for supervised learning tasks is time-consuming. As a result, there is a need to investigate techniques that can learn from unlabeled data to significantly reduce the number of annotated samples. In this work, we propose to train a self-supervised encoder with Barlow Twins and use it as a pre-trained network in the task of semantic scene segmentation. The experimental results demonstrate that our unsupervised pre-training boosts performance once fine-tuned on the supervised task, especially for under-represented categories.	翻訳日:2023-12-25 18:21:01 公開日:2023-12-22
# 変形性関節症に対する音響-調音インバージョン:事前訓練による自己指導的表現は好ましいか? Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable? ( http://arxiv.org/abs/2309.01108v3 ) ライセンス: Link先を確認	Sarthak Kumar Maharana, Krishna Kamal Adidam, Shoumik Nandi, Ajitesh Srivastava	(参考訳) aai (a acoustic-to-articulatory inversion) は、音響から調音空間へのマッピングである。 MFCCのような信号処理機能は、AAIタスクに広く使われている。変形性発声者にとって、AAIは不正確で不明瞭な発音のため困難である。本研究では,事前学習型自己教師付き学習(ssl)モデルを用いて,構音障害児のaaiを行う。我々は、この挑戦的なAAIタスクに対する様々な事前訓練された機能の影響を、低リソース環境で実証する。さらに、抽出したSSL機能にxベクトルを条件として、BLSTMネットワークをトレーニングする。実例では、3つのAIトレーニングスキーム(オブジェクト固有、プール化、微調整)を実験する。トレーニングスキーム間で一貫した結果、DeCoARは、微調整スキームにおいて、健康管理と患者のそれぞれに対して、パーソン相関係数(Pearson correlation Coefficient, CC)を ~1.81% と ~4.56% で相対的に改善することを明らかにする。見えないケースでは、SSLのさまざまな機能について、同様の平均トレンドを観察します。全体として、機能再構築や将来のタイムステップ予測タスクでトレーニングされたwav2vec、APC、DeCoARといったSSLネットワークは、変形性関節軌跡の予測にうまく機能する。 Acoustic-to-articulatory inversion (AAI) involves mapping from the acoustic to the articulatory space. Signal-processing features like the MFCCs, have been widely used for the AAI task. For subjects with dysarthric speech, AAI is challenging because of an imprecise and indistinct pronunciation. In this work, we perform AAI for dysarthric speech using representations from pre-trained self-supervised learning (SSL) models. We demonstrate the impact of different pre-trained features on this challenging AAI task, at low-resource conditions. In addition, we also condition x-vectors to the extracted SSL features to train a BLSTM network. In the seen case, we experiment with three AAI training schemes (subject-specific, pooled, and fine-tuned). The results, consistent across training schemes, reveal that DeCoAR, in the fine-tuned scheme, achieves a relative improvement of the Pearson Correlation Coefficient (CC) by ~1.81% and ~4.56% for healthy controls and patients, respectively, over MFCCs. We observe similar average trends for different SSL features in the unseen case. Overall, SSL networks like wav2vec, APC, and DeCoAR, trained with feature reconstruction or future timestep prediction tasks, perform well in predicting dysarthric articulatory trajectories.	翻訳日:2023-12-25 18:20:48 公開日:2023-12-22
# 医用画像登録のためのオンザフライ指導 On-the-Fly Guidance Training for Medical Image Registration ( http://arxiv.org/abs/2308.15216v4 ) ライセンス: Link先を確認	Yicheng Chen, Shengxiang Ji, Yuelin Xin, Kun Han, Xiaohui Xie	(参考訳) 本研究は,学習に基づく画像登録の分野において,弱い教師付きおよび教師なしの方法に固有の制限に対処した新しいアプローチを探求する。弱教師付き手法は少ないラベル付きデータに大きく依存するが、教師なし戦略は画像類似性による間接的精度測定に依存する。特に、従来の教師付き学習は、医療画像の正確な変形の欠如のために使われない。本研究は,既存のモデルを強化するために,OFG(On-the-Fly Guidance)を用いたユニークなトレーニングフレームワークを提案する。このフレームワークは、トレーニング中に、我々のカスタムオプティマイザで現在の変形予測を精査することで、数ステップ前に擬似地下真実を生成する。この疑似基底真理は、教師付き学習コンテキストでモデルを直接監督するのに役立ちます。このプロセスでは、予測変形を限られたステップで最適化し、トレーニング効率を確保し、各トレーニングフェーズの達成可能な目標を設定する。 OFGは、学習ベースの手法の速度を維持しながら、既存の画像登録技術の精度を著しく向上させる。提案手法は,既定登録モデルからの予測や最適化アウトプットを含む様々な疑似根拠真理生成戦略を用いて評価した。実験は3つのベンチマークデータセットと3つの最先端モデルにまたがって行われた。 OFGは、学習に基づく画像登録モデルのトレーニング効率を高めるために、容易に統合可能なプラグアンドプレイソリューションを提供する。コード: https://github.com/miraclefactory/on-the-fly-guidance.com This research explores a novel approach in the realm of learning-based image registration, addressing the limitations inherent in weakly-supervised and unsupervised methods. Weakly-supervised techniques depend heavily on scarce labeled data, while unsupervised strategies rely on indirect measures of accuracy through image similarity. Notably, traditional supervised learning is not utilized due to the lack of precise deformation ground-truth in medical imaging. Our study introduces a unique training framework with On-the-Fly Guidance (OFG) to enhance existing models. This framework, during training, generates pseudo-ground truth a few steps ahead by refining the current deformation prediction with our custom optimizer. This pseudo-ground truth then serves to directly supervise the model in a supervised learning context. The process involves optimizing the predicted deformation with a limited number of steps, ensuring training efficiency and setting achievable goals for each training phase. OFG notably boosts the precision of existing image registration techniques while maintaining the speed of learning-based methods. We assessed our approach using various pseudo-ground truth generation strategies, including predictions and optimized outputs from established registration models. Our experiments spanned three benchmark datasets and three cutting-edge models, with OFG demonstrating significant and consistent enhancements, surpassing previous state-of-the-arts in the field. OFG offers an easily integrable plug-and-play solution to enhance the training effectiveness of learning-based image registration models. Code at https://github.com/miraclefactory/on-the-fly-guidance.	翻訳日:2023-12-25 18:20:23 公開日:2023-12-22
# 音声・言語・聴覚科学における一般化可能な機械学習モデルに向けて : サンプルサイズの推定とオーバーフィッティングの低減 Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting ( http://arxiv.org/abs/2308.11197v3 ) ライセンス: Link先を確認	Hamzeh Ghasemzadeh, Robert E. Hillman, Daryush D. Mehta	(参考訳) この研究の第一の目的は、研究者がより堅牢なネストクロスバリデーション法を使う動機となる定量的証拠を提供することである。第2の目的は,MLに基づく解析のための電力分析を行うための方法とMATLABコードを提供することである。モンテカルロシミュレーションは、使用済みのクロスバリデーション法、特徴の判別力、特徴空間の次元、モデルの次元の間の相互作用を定量化するために用いられた。 MLモデルの統計力と統計的信頼度に基づいて,4種類のクロスバリデーション(シングルホールトアウト,10倍,列車バリデーションテスト,ネスト10倍)を比較した。統計学的に有意な結果を得るために最小のサンプルサイズを決定するためにヌル仮説と代替仮説の分布を用いた({\alpha}=0.05, 1-\b{eta}=0.8)。モデルの統計的信頼度は、正しい特徴が選択され、最終モデルに含まれる確率として定義された。分析の結果,単一ホールドアウト法に基づくモデルは非常に低い統計的パワーと統計的信頼性を示し,精度を著しく過大評価した。逆に、ネストした10倍のクロスバリデーションは、最も高い統計信頼と最も高い統計力をもたらし、その正確さの偏りのない推定を提供した。単一のホールドアウトで必要なサンプルサイズは、ネストされたクロスバリデーションを使用する場合に必要なものよりも50%高い。ネストされたクロスバリデーションに基づくモデルの信頼度は、単一のホールドアウトベースのモデルの信頼度より4倍も高かった。計算モデル、MATLAB符号およびルックアップテーブルは、将来の研究の設計において、サンプルサイズを推定する研究者を支援するために提供される。 This study's first purpose is to provide quantitative evidence that would incentivize researchers to instead use the more robust method of nested cross-validation. The second purpose is to present methods and MATLAB codes for doing power analysis for ML-based analysis during the design of a study. Monte Carlo simulations were used to quantify the interactions between the employed cross-validation method, the discriminative power of features, the dimensionality of the feature space, and the dimensionality of the model. Four different cross-validations (single holdout, 10-fold, train-validation-test, and nested 10-fold) were compared based on the statistical power and statistical confidence of the ML models. Distributions of the null and alternative hypotheses were used to determine the minimum required sample size for obtaining a statistically significant outcome ({\alpha}=0.05, 1-\b{eta}=0.8). Statistical confidence of the model was defined as the probability of correct features being selected and hence being included in the final model. Our analysis showed that the model generated based on the single holdout method had very low statistical power and statistical confidence and that it significantly overestimated the accuracy. Conversely, the nested 10-fold cross-validation resulted in the highest statistical confidence and the highest statistical power, while providing an unbiased estimate of the accuracy. The required sample size with a single holdout could be 50% higher than what would be needed if nested cross-validation were used. Confidence in the model based on nested cross-validation was as much as four times higher than the confidence in the single holdout-based model. A computational model, MATLAB codes, and lookup tables are provided to assist researchers with estimating the sample size during the design of their future studies.	翻訳日:2023-12-25 18:19:56 公開日:2023-12-22
# 機械学習のためのトレーニングデータの分布特性検証 Attesting Distributional Properties of Training Data for Machine Learning ( http://arxiv.org/abs/2308.09552v2 ) ライセンス: Link先を確認	Vasisht Duddu, Anudeep Das, Nora Khayata, Hossein Yalame, Thomas Schneider, N. Asokan	(参考訳) 機械学習(ML)の成功は、その信頼性に対する懸念が高まっている。いくつかの管轄区域がML規制の枠組みを準備している。そのような懸念の1つは、モデルトレーニングデータが特定の機密属性に対して望ましい分布特性を持つことである。例えば、ドラフト規則は、トレーニングデータセットが人口の多様性を反映するなど、特定の分布特性を持つことを示すためにモデルトレーナーが必要であることを示している。本研究では,証明者(例えばモデルトレーナー)が,学習データの適切な分布特性を検証者(例えば,顧客)に公開することなく示すことができる特性証明の概念を提案する。本稿では,プロパティ推論と暗号機構を組み合わせた効果的なハイブリッド特性証明を提案する。 The success of machine learning (ML) has been accompanied by increased concerns about its trustworthiness. Several jurisdictions are preparing ML regulatory frameworks. One such concern is ensuring that model training data has desirable distributional properties for certain sensitive attributes. For example, draft regulations indicate that model trainers are required to show that training datasets have specific distributional properties, such as reflecting diversity of the population. We propose the notion of property attestation allowing a prover (e.g., model trainer) to demonstrate relevant distributional properties of training data to a verifier (e.g., a customer) without revealing the data. We present an effective hybrid property attestation combining property inference with cryptographic mechanisms.	翻訳日:2023-12-25 18:19:25 公開日:2023-12-22
# テキスト認識のための自己蒸留正規化コネクショニスト時間的分類損失:単純かつ効果的なアプローチ Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach ( http://arxiv.org/abs/2308.08806v3 ) ライセンス: Link先を確認	Ziyin Zhang, Ning Lu, Minghui Liao, Yongshuai Huang, Cheng Li, Min Wang and Wei Peng	(参考訳) テキスト認識手法は急速に発展しつつある。強力なモジュール、言語モデル、un-および半教師なしの学習スキームなど、いくつかの高度なテクニックは、公開ベンチマークのパフォーマンスを継続的に押し上げる。しかし、損失関数の観点から、テキスト認識モデルをいかに最適化するかという問題は概ね見過ごされている。 CTCに基づく手法は、性能と推論速度のバランスが良く、精度の低下に苦慮しているため、実際に広く用いられている。 CTC損失は、個々の文字を学習することを無視しながら、シーケンスターゲット全体の最適化を強調するためである。本稿では,CTCモデルを用いた自己蒸留方式を提案する。フレームワイズ正規化項をctc損失に取り入れ、個々の監督を強調し、潜在アライメントの最大化後アライメントを活用し、ctcベースのモデル間の蒸留で生じる不整合問題を解決する。正規化ctc損失を蒸留接続主義時間的分類 (dctc) 損失と呼ぶ。 DCTCの損失はモジュールフリーで、余分なパラメータや推論遅延、追加のトレーニングデータやフェーズを必要としない。公開ベンチマークの大規模な実験は、DCTCがこれらの欠点を全くなく、テキスト認識モデルの精度を最大2.6%向上させることができることを示した。 Text recognition methods are gaining rapid development. Some advanced techniques, e.g., powerful modules, language models, and un- and semi-supervised learning schemes, consecutively push the performance on public benchmarks forward. However, the problem of how to better optimize a text recognition model from the perspective of loss functions is largely overlooked. CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with accuracy degradation. This is because CTC loss emphasizes the optimization of the entire sequence target while neglecting to learn individual characters. We propose a self-distillation scheme for CTC-based model to address this issue. It incorporates a framewise regularization term in CTC loss to emphasize individual supervision, and leverages the maximizing-a-posteriori of latent alignment to solve the inconsistency problem that arises in distillation between CTC-based models. We refer to the regularized CTC loss as Distillation Connectionist Temporal Classification (DCTC) loss. DCTC loss is module-free, requiring no extra parameters, longer inference lag, or additional training data or phases. Extensive experiments on public benchmarks demonstrate that DCTC can boost text recognition model accuracy by up to 2.6%, without any of these drawbacks.	翻訳日:2023-12-25 18:19:14 公開日:2023-12-22
# Rydberg量子アニール上の局所光シフト符号化による最適化問題の解法 Solving optimization problems with local light shift encoding on Rydberg quantum annealers ( http://arxiv.org/abs/2308.07798v2 ) ライセンス: Link先を確認	Kapil Goswami, Rick Mukherjee, Herwig Ott, Peter Schmelcher	(参考訳) 最大カット(max-cut)や最大独立集合(mis)といった組合せ最適化問題をrydberg量子アニーラー上で解くための非単位ディスクフレームワークを提供する。我々の構成は、グラフ問題をイジングスピンモデルにマッピングするために、局所制御可能な光シフトを個々のキュービットに適用する多体相互作用Rydbergシステムからなる。光トワイザーが空間配置で提供する柔軟性を生かした数値シミュレーションでは、rydberg annealerを所望の多体基底状態へとグローバルに駆動しながら局所調整プロトコルを実装し、最適化問題への解決策でもある。最適制御法を用いて, システムの寿命内, 近似比が1に近い時間スケールのプロトタイプグラフに対して, これらの解を求める。非ブロッケードアプローチは、2次元のRydberg構成で実現でき、非重み付きグラフと重み付きグラフの両方に適用できる特定のトポロジーによるグラフ問題の符号化を容易にする。システムサイズ, グラフの硬度, 解に収束するのに要するイテレーション数の観点から, 提案手法の利点を浮き彫りにした, 高速な模擬焼鈍による比較解析が提供される。 We provide a non-unit disk framework to solve combinatorial optimization problems such as Maximum Cut (Max-Cut) and Maximum Independent Set (MIS) on a Rydberg quantum annealer. Our setup consists of a many-body interacting Rydberg system where locally controllable light shifts are applied to individual qubits in order to map the graph problem onto the Ising spin model. Exploiting the flexibility that optical tweezers offer in terms of spatial arrangement, our numerical simulations implement the local-detuning protocol while globally driving the Rydberg annealer to the desired many-body ground state, which is also the solution to the optimization problem. Using optimal control methods, these solutions are obtained for prototype graphs with varying sizes at time scales well within the system lifetime and with approximation ratios close to one. The non-blockade approach facilitates the encoding of graph problems with specific topologies that can be realized in two-dimensional Rydberg configurations and is applicable to both unweighted as well as weighted graphs. A comparative analysis with fast simulated annealing is provided which highlights the advantages of our scheme in terms of system size, hardness of the graph, and the number of iterations required to converge to the solution.	翻訳日:2023-12-25 18:18:51 公開日:2023-12-22
# ディープラーニングを用いたカスタム熱力学の構築 Constructing Custom Thermodynamics Using Deep Learning ( http://arxiv.org/abs/2308.04119v3 ) ライセンス: Link先を確認	Xiaoli Chen, Beatrice W. Soh, Zi-En Ooi, Eleonore Vissol-Gaudin, Haijun Yu, Kostya S. Novoselov, Kedar Hippalgaonkar, Qianxiao Li	(参考訳) ai(artificial intelligence)の最もエキサイティングな応用の1つは、以前に蓄積されたデータに基づく自動科学的発見であり、対称性や保存則など、既知の物理原理による制限と組み合わせられている。このような自動仮説作成と検証は、従来の物理的直観が失敗する複雑な現象の研究を支援する。本稿では,任意の確率的散逸系の巨視的力学記述を,その微視的軌跡の観察から直接学習するための一般化オンザガー原理に基づくプラットフォームを開発する。本手法は, 還元された熱力学的座標を同時に構築し, それらの座標のダイナミクスを解釈する。提案手法の有効性を理論的に検証し, 外部応用分野における長鎖の伸長を実験的に検証した。具体的には、3つの解釈可能な熱力学座標を学習し、安定状態と遷移状態の同定と伸縮速度の制御を含む、ポリマー伸長の動的景観を構築する。我々の一般的な方法論は、幅広い科学的・技術的応用に利用できる。 One of the most exciting applications of artificial intelligence (AI) is automated scientific discovery based on previously amassed data, coupled with restrictions provided by known physical principles, including symmetries and conservation laws. Such automated hypothesis creation and verification can assist scientists in studying complex phenomena, where traditional physical intuition may fail. Here we develop a platform based on a generalized Onsager principle to learn macroscopic dynamical descriptions of arbitrary stochastic dissipative systems directly from observations of their microscopic trajectories. Our method simultaneously constructs reduced thermodynamic coordinates and interprets the dynamics on these coordinates. We demonstrate its effectiveness by studying theoretically and validating experimentally the stretching of long polymer chains in an externally applied field. Specifically, we learn three interpretable thermodynamic coordinates and build a dynamical landscape of polymer stretching, including the identification of stable and transition states and the control of the stretching rate. Our general methodology can be used to address a wide range of scientific and technological applications.	翻訳日:2023-12-25 18:18:26 公開日:2023-12-22
# テキスト条件拡散モデルに基づくシーンテキスト画像の超解像 Scene Text Image Super-resolution based on Text-conditional Diffusion Models ( http://arxiv.org/abs/2311.09759v2 ) ライセンス: Link先を確認	Chihiro Noguchi, Shun Fukuda, Masao Yamanaka	(参考訳) シーンテキスト画像超解像(STISR)は,シーンテキスト認識のための前処理手法として最近大きな成功を収めている。 STISRは、現実世界の設定でぼやけた低解像度(LR)テキストイメージを、シーンテキスト認識に適した鮮明な高解像度(HR)テキストイメージに変換することを目的としている。本研究では,テキストから画像への印象的な合成能力で知られるdms(text-conditional diffusion model)をstisrタスクに活用する。実験の結果,テキスト条件DMは既存のSTISR法をはるかに上回ることがわかった。特にLRテキスト画像からのテキストが入力として与えられると、テキスト条件DMは高品質な高解像度テキスト画像を生成することができる。この機能を利用して、LR-HRペアテキスト画像データセットを合成する新しいフレームワークを提案する。このフレームワークは3つの特殊なテキスト条件DMで構成され、それぞれがテキスト画像合成、超解像、画像劣化に特化している。これらの3つのモジュールは、STISR法の訓練に適している異なるLRとHRのペア画像の合成に不可欠である。実験により,これらの合成画像対はテキストZoom評価におけるSTISR法の性能を大幅に向上させることを確認した。 Scene Text Image Super-resolution (STISR) has recently achieved great success as a preprocessing method for scene text recognition. STISR aims to transform blurred and noisy low-resolution (LR) text images in real-world settings into clear high-resolution (HR) text images suitable for scene text recognition. In this study, we leverage text-conditional diffusion models (DMs), known for their impressive text-to-image synthesis capabilities, for STISR tasks. Our experimental results revealed that text-conditional DMs notably surpass existing STISR methods. Especially when texts from LR text images are given as input, the text-conditional DMs are able to produce superior quality super-resolution text images. Utilizing this capability, we propose a novel framework for synthesizing LR-HR paired text image datasets. This framework consists of three specialized text-conditional DMs, each dedicated to text image synthesis, super-resolution, and image degradation. These three modules are vital for synthesizing distinct LR and HR paired images, which are more suitable for training STISR methods. Our experiments confirmed that these synthesized image pairs significantly enhance the performance of STISR methods in the TextZoom evaluation.	翻訳日:2023-12-25 18:12:41 公開日:2023-12-22
# 医用画像分類のためのAlexNetのレビュー Review of AlexNet for Medical Image Classification ( http://arxiv.org/abs/2311.08655v2 ) ライセンス: Link先を確認	Wenhao Tang, Junding Sun, Shuihua Wang, Yudong Zhang	(参考訳) 近年, 深層学習の急速な発展が, 医用画像の分類分野に幅広い応用をもたらしている。オーバーフィッティングの緩和、一般化の改善、勾配の消失と爆発の回避など、常にパフォーマンスが向上しているニューラルネットワークモデルの変種には、いくつかの共通点がある。 AlexNetは最初にドロップアウト技術を使ってオーバーフィッティングを緩和し、ReLUアクティベーション機能を使って勾配の消滅を回避する。そこで我々は2012年のcnn開発に大きく貢献したalexnetに関する議論に焦点を当てた。ジャーナル論文やカンファレンス論文を含む40以上の論文をレビューした後、AlexNetの技術的な詳細、利点、応用分野について解説する。 In recent years, the rapid development of deep learning has led to a wide range of applications in the field of medical image classification. The variants of neural network models with ever-increasing performance share some commonalities: to try to mitigate overfitting, improve generalization, avoid gradient vanishing and exploding, etc. AlexNet first utilizes the dropout technique to mitigate overfitting and the ReLU activation function to avoid gradient vanishing. Therefore, we focus our discussion on AlexNet, which has contributed greatly to the development of CNNs in 2012. After reviewing over 40 papers, including journal papers and conference papers, we give a narrative on the technical details, advantages, and application areas of AlexNet.	翻訳日:2023-12-25 18:12:20 公開日:2023-12-22
# キーストローク検証チャレンジ(KVC: Biometric and Fairness Benchmark Evaluation) Keystroke Verification Challenge (KVC): Biometric and Fairness Benchmark Evaluation ( http://arxiv.org/abs/2311.06000v3 ) ライセンス: Link先を確認	Giuseppe Stragapede, Ruben Vera-Rodriguez, Ruben Tolosana, Aythami Morales, Naser Damer, Julian Fierrez, Javier Ortega-Garcia	(参考訳) 生体認証のためのキーストロークダイナミクス(KD)の分析にはいくつかの利点がある:最も差別的な行動特性の一つであり、キーボードはユーザーがテキストデータを入力するための主要な手段であり、その獲得には追加のハードウェアが必要であり、その処理は比較的軽量であり、透過的に被験者を認識することができる。しかし、実験プロトコルとメトリクスの不均一性と、文献で採用されているデータベースのサイズが限られているため、異なるシステム間の直接比較が妨げられ、キーストロークバイオメトリックスの進歩の障害となっている。そこで本稿では,Aalto Keystroke Databases から抽出したデスクトップおよびモバイルキーボードを用いて取得した185,000件以上の可変転写テキストのツイート長シーケンスに基づいて,KD に基づく生体認証性能と公平性をベンチマークする実験フレームワークを提案する。このフレームワークは、Keystroke Verification Challenge (KVC)という形でCodaLab上で動作する。さらに,新しい公平度指標であるsweted impostor ratio (sir) を導入し,検証スコアにおけるデム間およびデム内群バイアスパターンを捉えた。提案手法は,2つの最先端キーストローク検証システム「typenet」と「typeformer」を用いて異なる入力特徴の比較を行い,時間領域に拡張された特徴を優先してテキスト内容(押したキーのascii符号)の分析を破棄することで,プライバシーを侵害しないシステムを実現する。我々の実験は、このアプローチが満足なパフォーマンスを維持することができることを示している。 Analyzing keystroke dynamics (KD) for biometric verification has several advantages: it is among the most discriminative behavioral traits; keyboards are among the most common human-computer interfaces, being the primary means for users to enter textual data; its acquisition does not require additional hardware, and its processing is relatively lightweight; and it allows for transparently recognizing subjects. However, the heterogeneity of experimental protocols and metrics, and the limited size of the databases adopted in the literature impede direct comparisons between different systems, thus representing an obstacle in the advancement of keystroke biometrics. To alleviate this aspect, we present a new experimental framework to benchmark KD-based biometric verification performance and fairness based on tweet-long sequences of variable transcript text from over 185,000 subjects, acquired through desktop and mobile keyboards, extracted from the Aalto Keystroke Databases. The framework runs on CodaLab in the form of the Keystroke Verification Challenge (KVC). Moreover, we also introduce a novel fairness metric, the Skewed Impostor Ratio (SIR), to capture inter- and intra-demographic group bias patterns in the verification scores. We demonstrate the usefulness of the proposed framework by employing two state-of-the-art keystroke verification systems, TypeNet and TypeFormer, to compare different sets of input features, achieving a less privacy-invasive system, by discarding the analysis of text content (ASCII codes of the keys pressed) in favor of extended features in the time domain. Our experiments show that this approach allows to maintain satisfactory performance.	翻訳日:2023-12-25 18:12:08 公開日:2023-12-22
# 静的リーク検出のためのLLMに基づくリソース指向意図推論 LLM-based Resource-Oriented Intention Inference for Static Resource Leak Detection ( http://arxiv.org/abs/2311.04448v2 ) ライセンス: Link先を確認	Chong Wang, Jianan Liu, Xin Peng, Yang Liu, Yiling Lou	(参考訳) リソースリークは、買収後にリリースされないリソースによって引き起こされ、しばしばパフォーマンス上の問題やシステムクラッシュにつながる。既存の静的検出技術は、事前定義されたリソース獲得/リリースapiの機械的マッチング、事前定義されたapiの完全性、到達可能性の検証の特定、分析の複雑さなど、その有効性への挑戦に依存する。これらの課題を克服するために,我々は,機械的なapiマッチングではなく,リソース管理知識とコードコンテキスト理解に基づいて,コード内のリソース指向の意図(獲得,リリース,到達可能性検証)を直接推論するために,大規模言語モデル(llm)を活用する新しいアプローチであるinferroiを提案する。 InferROI は LLM に与えられたコードスニペットから関連する意図を推論するように指示するプロンプトを使用し、それを形式表現に変換する。これらの推論された意図を集約することにより、InferROIは軽量な静的解析に基づくアルゴリズムを使用して、コードから抽出された制御-フローパスを分析し、リソースリークを検出する。 InferROIをJavaプログラム上で評価し、リソース指向の意図推論とリソースリーク検出の両面での有効性を検討する。実験の結果、InferROIは74.6%の精度で、DroidLeaksデータセットから172のコードスニペットを意図的に推論して81.8%のリコールを達成した。さらに、InferROIは、データセットにリストされているAndroidリソースのかなりの部分をカバーしている。 DroidLeaksデータセットの86のバグに適用すると、InferROIは8つのベースライン検出器と比較して高いバグ検出率(53.5%)と低い偽アラーム率(8.1%)を示す。さらに,実世界のオープンソースプロジェクトからの100メソッドのリソースリーク検出にinferroiを適用し,未知の12のリソースリークバグを特定し,そのうち7つを開発者が確認した。 Resource leaks, caused by resources not being released after acquisition, often lead to performance issues and system crashes. Existing static detection techniques rely on mechanical matching of predefined resource acquisition/release APIs, posing challenges to their effectiveness, including completeness of predefined APIs, identification of reachability validation, and analysis complexity. To overcome these challenges, we propose InferROI, a novel approach that leverages large language models (LLMs) to directly infer resource-oriented intentions (acquisition, release, and reachability validation) in code, based on resource management knowledge and code context understanding, rather than mechanical API matching. InferROI uses a prompt to instruct the LLM in inferring involved intentions from a given code snippet, which are then translated into formal expressions. By aggregating these inferred intentions, InferROI utilizes a lightweight static-analysis based algorithm to analyze control-flow paths extracted from the code, thereby detecting resource leaks. We evaluate InferROI on Java program and investigate its effectiveness in both resource-oriented intention inference and resource leak detection. Experimental results demonstrate that InferROI achieves a precision of 74.6% and a recall of 81.8% in intention inference on 172 code snippets from the DroidLeaks dataset. Additionally, InferROI covers a significant portion of concerned Android resources listed in the dataset. When applied to 86 bugs from the DroidLeaks dataset, InferROI exhibits a high bug detection rate (53.5%) and a low false alarm rate (8.1%) compared to eight baseline detectors. Moreover, we apply InferROI to resource leak detection in 100 methods from real-world open-source projects, where it identifies 12 unknown resource leak bugs, with 7 of them being confirmed by developers.	翻訳日:2023-12-25 18:11:39 公開日:2023-12-22
# PriPrune: Pruned Federated Learningにおけるプライバシの定量化と保存 PriPrune: Quantifying and Preserving Privacy in Pruned Federated Learning ( http://arxiv.org/abs/2310.19958v2 ) ライセンス: Link先を確認	Tianyue Chu, Mengwei Yang, Nikolaos Laoutaris, Athina Markopoulou	(参考訳) Federated Learning(FL)は、複数のクライアントデバイスとサーバが、ローカルなトレーニングデータを共有することなく、モデル更新のみを交換することで、グローバルモデルを協調的にトレーニングできるパラダイムである。これらのデバイスは通信や計算リソースの面で制約されることが多く、モデルプルーニング(モデルのサイズと複雑さを減らすために広く使用されるパラダイム)の恩恵を受けることができる。直観的には、ローカルモデルをより粗いものにすることで、pruningはflのコンテキストにおけるプライバシ攻撃に対する保護を提供するものと期待される。しかし、この保護は以前にも正式にも実験的にも特徴づけられておらず、最先端の攻撃に対して十分なものかどうかは不明である。本稿では,flにおけるモデルプルーニングのプライバシ保証に関する最初の調査を行う。我々は,pruned flモデルによって漏洩した情報量に関する情報理論上の上限を導出する。我々はこれらの理論的な知見を補完し、ベンチマークデータセットを用いて、最先端のプライバシー攻撃を含む包括的な実験により検証する。この評価は、プルーニングによって提供されるプライバシー保護に影響を与える可能性のある選択とパラメータに関する貴重な洞察を提供する。このアルゴリズムでは、パーソナライズされたクライアント毎の防御マスクを使用し、防御プルーニング率を適用して、プライバシとモデルパフォーマンスを共同で最適化する。 PriPruneは、クライアント上でプラインドされたFLスキームを変更せずに適用し、サーバによる逆攻撃から保護する、普遍的な方法である。私たちの経験的評価は、プライバシを考慮しない最先端のpruned flスキームと比較して、pripruneがプライバシ-精度のトレードオフを大幅に改善していることを示しています。 Federated learning (FL) is a paradigm that allows several client devices and a server to collaboratively train a global model, by exchanging only model updates, without the devices sharing their local training data. These devices are often constrained in terms of communication and computation resources, and can further benefit from model pruning -- a paradigm that is widely used to reduce the size and complexity of models. Intuitively, by making local models coarser, pruning is expected to also provide some protection against privacy attacks in the context of FL. However this protection has not been previously characterized, formally or experimentally, and it is unclear if it is sufficient against state-of-the-art attacks. In this paper, we perform the first investigation of privacy guarantees for model pruning in FL. We derive information-theoretic upper bounds on the amount of information leaked by pruned FL models. We complement and validate these theoretical findings, with comprehensive experiments that involve state-of-the-art privacy attacks, on several state-of-the-art FL pruning schemes, using benchmark datasets. This evaluation provides valuable insights into the choices and parameters that can affect the privacy protection provided by pruning. Based on these insights, we introduce PriPrune -- a privacy-aware algorithm for local model pruning, which uses a personalized per-client defense mask and adapts the defense pruning rate so as to jointly optimize privacy and model performance. PriPrune is universal in that can be applied after any pruned FL scheme on the client, without modification, and protects against any inversion attack by the server. Our empirical evaluation demonstrates that PriPrune significantly improves the privacy-accuracy tradeoff compared to state-of-the-art pruned FL schemes that do not take privacy into account.	翻訳日:2023-12-25 18:11:04 公開日:2023-12-22
# 全光相関ノイズチャネルとその量子コヒーレンス回復への応用 All-optical correlated noisy channel and its application in recovering quantum coherence ( http://arxiv.org/abs/2310.16342v2 ) ライセンス: Link先を確認	Dan Lei, Disheng Guo, Jun Xin, and Xiao-Ming Lu	(参考訳) 減衰と増幅は光通信の最も一般的なプロセスである。増幅は、光学場の複素振幅の減衰を補償するために用いられるが、減衰チャネルと増幅チャネルが独立であることから、失われたコヒーレンスを回復することができない。そこで本研究では,減衰チャネルと増幅チャネルが相関したノイズを発生させると,印加した光の量子コヒーレンスを回復できることを示す。本研究では, 4波混合過程に基づく全光相関雑音チャネルを提案し, 連続変数系における量子コヒーレンス回復の可能性を示す。我々はコヒーレント状態と2モード圧縮状態のコヒーレンス回復現象を定量的に検討した。さらに,回復チャネルに依存しない他の光子損失が回復コヒーレンス性能に及ぼす影響について解析した。従来提案した電気光学変換に基づく相関ノイズチャネルとは違って,本プロトコルの相関ノイズチャネルは全光学的であり,より大きな動作帯域を有する。 Attenuation and amplification are the most common processes for optical communications. Amplification can be used to compensate the attenuation of the complex amplitude of an optical field, but is unable to recover the coherence lost, provided that the attenuation channel and the amplification channel are independent. In this work, we show that the quantum coherence of an optical filed can be regained if the attenuation channel and the amplification channel share correlated noise. We propose an all-optical correlated noisy channel relying on four-wave mixing process and demonstrate its capability of recovering quantum coherence within continuous-variable systems. We quantitatively investigate the coherence recovery phenomena for coherent states and two-mode squeezed states. Moreover, we analyze the effect of other photon losses that are independent with the recovery channel on the performance of recovering coherence. Different from correlated noisy channels previously proposed based on electro-optic conversions, the correlated noisy channel in our protocol is all-optical and thus owns larger operational bandwidths.	翻訳日:2023-12-25 18:10:35 公開日:2023-12-22
# 絶対政策最適化 Absolute Policy Optimization ( http://arxiv.org/abs/2310.13230v3 ) ライセンス: Link先を確認	Weiye Zhao, Feihan Li, Yifan Sun, Rui Chen, Tianhao Wei, Changliu Liu	(参考訳) 近年,信頼領域の政治強化学習は,複雑な制御タスクやゲームシナリオに対処する上で,目覚ましい成果を上げている。しかし、このカテゴリの現代の最先端のアルゴリズムは、期待されるパフォーマンスの改善を強調し、最悪のパフォーマンス結果を制御する能力が欠如している。この制限に対処するため、我々は新しい目的関数を導入し、その最適化により、ほぼ全ての性能サンプル(絶対性能)の下限における単調な改善が保証される。この画期的な理論の進歩を考えると、我々はこの理論的に基礎付けられたアルゴリズムを一連の近似によって洗練し、絶対政策最適化 (apo) と呼ばれる実用的な解法を生み出した。本実験は,継続制御ベンチマークタスクに挑戦する手法の有効性を実証し,atariゲームのマスタリングへの適用性を拡張する。以上の結果から,APOは最先端のポリシー勾配アルゴリズムよりも大幅に優れており,期待される性能と最悪の性能の両方が大幅に向上することがわかった。 In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function; by optimizing which, it will lead to guaranteed monotonic improvement in the lower bound of near-total performance samples (absolute performance). Considering this groundbreaking theoretical advancement, we then refine this theoretically grounded algorithm through a series of approximations, resulting in a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in both expected performance and worst-case performance.	翻訳日:2023-12-25 18:10:19 公開日:2023-12-22
# 構造概念はトランスフォーマー言語モデルに普遍的か? 解釈可能な言語間一般化に向けて Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization ( http://arxiv.org/abs/2310.12794v2 ) ライセンス: Link先を確認	Ningyu Xu, Qi Zhang, Jingting Ye, Menghan Zhang, Xuanjing Huang	(参考訳) 大規模言語モデル(llm)は、言語間の知識を暗黙的に伝達する、言語横断的一般化能力を示している。しかし、この転送はすべての言語、特に低リソース言語に対して等しく成功していないため、現在進行中の課題となっている。暗黙の言語間一般化の限界に達したのか、明示的な知識伝達が可能かどうかは不明だ。本稿では,言語間の概念対応を明確に整合させ,言語間の一般化を促進する可能性を検討する。言語構文的側面をテストベッドとして用いた43言語の解析により,エンコーダのみおよびデコーダのみのLLMに対して,言語内構造概念空間間で高い整合性を示す。次に,メタラーニングに基づく概念空間の整合学習手法を提案し,概念分類におけるゼロショットおよび少数ショットの一般化を促進するとともに,言語間相互学習現象に関する洞察を提供する。構文解析タスクの実験により,本手法は最先端の手法で競争的な結果を達成し,言語間の性能ギャップを狭め,特に資源の少ない者にとって有益であることが示された。 Large language models (LLMs) have exhibited considerable cross-lingual generalization abilities, whereby they implicitly transfer knowledge across languages. However, the transfer is not equally successful for all languages, especially for low-resource ones, which poses an ongoing challenge. It is unclear whether we have reached the limits of implicit cross-lingual generalization and if explicit knowledge transfer is viable. In this paper, we investigate the potential for explicitly aligning conceptual correspondence between languages to enhance cross-lingual generalization. Using the syntactic aspect of language as a testbed, our analyses of 43 languages reveal a high degree of alignability among the spaces of structural concepts within each language for both encoder-only and decoder-only LLMs. We then propose a meta-learning-based method to learn to align conceptual spaces of different languages, which facilitates zero-shot and few-shot generalization in concept classification and also offers insights into the cross-lingual in-context learning phenomenon. Experiments on syntactic analysis tasks show that our approach achieves competitive results with state-of-the-art methods and narrows the performance gap between languages, particularly benefiting those with limited resources.	翻訳日:2023-12-25 18:10:01 公開日:2023-12-22
# シリコンマイクロリング型貯水池計算における空洞非線形性と線形損失の影響 Effects of cavity nonlinearities and linear losses on silicon microring-based reservoir computing ( http://arxiv.org/abs/2310.09433v2 ) ライセンス: Link先を確認	Bernard J. Giron Castro, Christophe Peucheret, Darko Zibar, Francesco Da Ros	(参考訳) マイクロリング共振器(MRR)は、時間遅延フォトニック貯水池コンピューティングに有望な装置であるが、MRRにおける異なる物理効果が貯水池演算性能に与える影響は、まだ完全には理解されていない。時系列タスクnarma-10の予測誤差に対する線形損失と熱光学および自由キャリア効果緩和時間の影響を数値的に解析した。入力電力と光源とマイクロリング共鳴の周波数差で定義される3つの領域の存在を実証し、線形状態から非線形状態へのキャビティ遷移を明らかにする。これらの領域の1つは、比較的低い入力パワーとノード数の下での時系列予測において非常に低いエラーを提供する一方、他の領域は非線形性を欠いているか不安定になる。本研究は,mrrの設計と物理特性の最適化に関する知見を提供し,時間分解型貯留層計算の予測性能を向上させる。 Microring resonators (MRRs) are promising devices for time-delay photonic reservoir computing, but the impact of the different physical effects taking place in the MRRs on the reservoir computing performance is yet to be fully understood. We numerically analyze the impact of linear losses as well as thermo-optic and free-carrier effects relaxation times on the prediction error of the time-series task NARMA-10. We demonstrate the existence of three regions, defined by the input power and the frequency detuning between the optical source and the microring resonance, that reveal the cavity transition from linear to nonlinear regimes. One of these regions offers very low error in time-series prediction under relatively low input power and number of nodes while the other regions either lack nonlinearity or become unstable. This study provides insight into the design of the MRR and the optimization of its physical properties for improving the prediction performance of time-delay reservoir computing.	翻訳日:2023-12-25 18:09:38 公開日:2023-12-22
# 運動誘起スピン移動の最適化 Optimising motion-induced spin transfer ( http://arxiv.org/abs/2310.08200v2 ) ライセンス: Link先を確認	Daigo Oue, Matsuo Mamoru	(参考訳) 本稿では、2つの強磁性絶縁体間のスピン移動について検討する。強磁性絶縁体の間には狭い隙間があり、互いに弱い相互作用をしている。強磁性絶縁体のうちの1つは一定速度で動き、もう1つは静止している。せん断運動の存在下では、相互作用振幅はドップラー周波数で周期的に変調される。ユニタリ変換により、相互作用振幅の周期的変調を、スピン移動を駆動する有効なポテンシャルと考えることができる。スピン電流の量は、2つの強磁性媒体間のスペクトルオーバーラップとキャリア集団差によって制御される。 2つの強磁性体のスペクトルが適度に広がると、スペクトル領域の重なりが増加し、スピン電流が増大する。しかし、過度の拡大はスペクトルの重なりを損なうため、スピン電流は低下する。これは、スピン移動を最大化する最適条件が存在することを意味する。 In this paper, the spin transfer between two ferromagnetic insulators is studied. There is a narrow gap between the ferromagnetic insulators so that they are weakly interacting with each other. One of the ferromagnetic insulators is moving at a constant speed while the other is at rest; hence, the system is out of equilibrium. In the presence of the shearing motion, the interaction amplitude is periodically modulated at the Doppler frequency. A unitary transformation allows us to regard the periodic modulation of the interaction amplitude as an effective potential, which drives the spin transfer. The amount of the spin current is controlled by the spectral overlap and the carrier population difference between the two ferromagnetic media. If the spectra of the two ferromagnets are moderately broadened, the overlap in the spectral domain increases, enlarging the spin current. However, too much broadening spoils the spectral overlap and, hence, the spin current. This implies that there is an optimal condition for maximising the spin transfer.	翻訳日:2023-12-25 18:09:22 公開日:2023-12-22
# ベイズ的アプローチによる人選好言語モデルの調整 Aligning Language Models with Human Preferences via a Bayesian Approach ( http://arxiv.org/abs/2310.05782v2 ) ライセンス: Link先を確認	Jiashuo Wang, Haozhao Wang, Shichao Sun, Wenjie Li	(参考訳) 人間中心の自然言語生成(NLG)システムを推し進めるためには、NLGモデルと人間の嗜好の整合性を確保することが不可欠である。このアライメントのために、現在の一般的な方法は、人間からのフィードバックに基づいて訓練された報酬モデルで強化学習(RL)アプローチを利用する。しかし,人間の嗜好の主観的性質による内在的な不一致は,報酬モデルの訓練において大きな課題となり,nlgパフォーマンスの低下を招いた。この問題に対処するため、従来のアプローチは通常、複数の一貫性のない選好をマージしたものに集約するために、多数決または平均化に依存していた。理解と実行は容易であるが、このような手法は人間の不合理さを捉えることができず、個人の特別なサブセットのみを表現できるため、人間の嗜好の普遍性を定量的に開示する能力が欠如している。この課題に対処するために, ベイズ的枠組みを用いて, 選好モデルのトレーニングとして, 人選好間の不一致の分布を考慮し, d-PMと命名する手法を提案する。さらに,学習効率よりもRL戦略の非効率で複雑な訓練プロセスを考えると,NLGモデルをd-PMモデルから導出した選好スコアで学習するためのコントラスト学習戦略も提案する。感情的支援会話と整合性(Rule-of-Thumb)生成という2つの人間中心型NLGタスクに対する広範囲な実験により,本手法が従来のSOTAモデルを上回る結果が得られた。 In the quest to advance human-centric natural language generation (NLG) systems, ensuring alignment between NLG models and human preferences is crucial. For this alignment, current popular methods leverage a reinforcement learning (RL) approach with a reward model trained on feedback from humans. However, inherent disagreements due to the subjective nature of human preferences pose a significant challenge for training the reward model, resulting in a deterioration of the NLG performance. To tackle this issue, previous approaches typically rely on majority voting or averaging to consolidate multiple inconsistent preferences into a merged one. Although straightforward to understand and execute, such methods suffer from an inability to capture the nuanced degrees of disaggregation among humans and may only represent a specialized subset of individuals, thereby lacking the ability to quantitatively disclose the universality of human preferences. To address this challenge, this paper proposes a novel approach, which employs a Bayesian framework to account for the distribution of disagreements among human preferences as training a preference model, and names it as d-PM. Besides, considering the RL strategy's inefficient and complex training process over the training efficiency, we further propose utilizing the contrastive learning strategy to train the NLG model with the preference scores derived from the d-PM model. Extensive experiments on two human-centric NLG tasks, i.e., emotional support conversation and integrity "Rule-of-Thumb" generation, show that our method consistently exceeds previous SOTA models in both automatic and human evaluations.	翻訳日:2023-12-25 18:09:10 公開日:2023-12-22
# 計画トークンを用いた言語モデル推論の指導 Guiding Language Model Reasoning with Planning Tokens ( http://arxiv.org/abs/2310.05707v2 ) ライセンス: Link先を確認	Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, Alessandro Sordoni	(参考訳) 大規模言語モデル(LLM)は、最近、連鎖推論のような複雑な推論タスクを実行する能力に対して、かなりの関心を集めている。しかしながら、この能力を強化する既存のアプローチのほとんどは、モデルの推論能力の構造的な側面を無視しながら、データ駆動型メソッドに大きく依存しています。 LLMは個々の推論ステップをうまく管理できますが、すべての推論チェーンの一貫性を維持するのに苦労しています。これを解決するために,各推論ステップの始めに「計画トークン」を導入し,モデルのガイドとして機能する。これらのトークン埋め込みは、残りのモデルパラメータとともに微調整される。我々のアプローチでは、トレーニング可能なパラメータ(わずか0.001%)の無視可能な増加が必要であり、完全な微調整またはよりパラメータ効率の良いスキームによって適用できる。提案手法の有効性を3つの異なるLLMに適用し,3つの算術語問題データセットにおいて顕著な精度向上を示す。 Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning steps well, they struggle with maintaining consistency across an entire reasoning chain. To solve this, we introduce 'planning tokens' at the start of each reasoning step, serving as a guide for the model. These token embeddings are then fine-tuned along with the rest of the model parameters. Our approach requires a negligible increase in trainable parameters (just 0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets w.r.t. plain chain-of-thought fine-tuning baselines.	翻訳日:2023-12-25 18:08:42 公開日:2023-12-22
# 塩分誘導特徴の相関による一般化エージェントの学習 Learning Generalizable Agents via Saliency-Guided Features Decorrelation ( http://arxiv.org/abs/2310.05086v2 ) ライセンス: Link先を確認	Sili Huang, Yanchao Sun, Jifeng Hu, Siyuan Guo, Hechang Chen, Yi Chang, Lichao Sun, Bo Yang	(参考訳) 視覚に基づく強化学習(Reinforcement Learning, RL)では、エージェントは訓練中に観察されなかった状態空間の環境変動によく適応するのに苦労する。この変化は、背景雑音などのタスク非関連特徴と、最適決定に関連するロボット構成のようなタスク関連特徴の両方に生じる可能性がある。両状況の一般化を実現するために,エージェントは変化した特徴が決定に与える影響,すなわち変化した特徴と政策モデルにおける決定との真の関連性を確立することを正確に理解する必要がある。しかし、国家空間の特徴間の固有の相関関係のため、特徴と決定の関連が絡み合っており、政策がそれらの区別を困難にしている。そこで本研究では,これらの相関を除去すべく,sgfd(saliency-guided features decorrelation)を提案する。具体的には、SGFDはランダムフーリエ関数(RFF)とサリエンシマップの2つのコア技術から構成される。 RFFは高次元画像における複雑な非線形相関を推定するために利用され、サリエンシマップは変化した特徴を識別するために設計されている。サリエンシマップの指導のもと、SGFDはサンプル再重み付けを用いて、変化した特徴に関する推定相関を最小化し、視覚的RLタスクにおけるデコリレーションを実現する。実験の結果,sgfdは幅広いテスト環境において十分に一般化でき,タスクの無関係なバリエーションとタスク関連のバリエーションの両方を扱う場合,最先端の手法を著しく上回ることがわかった。 In visual-based Reinforcement Learning (RL), agents often struggle to generalize well to environmental variations in the state space that were not observed during training. The variations can arise in both task-irrelevant features, such as background noise, and task-relevant features, such as robot configurations, that are related to the optimal decisions. To achieve generalization in both situations, agents are required to accurately understand the impact of changed features on the decisions, i.e., establishing the true associations between changed features and decisions in the policy model. However, due to the inherent correlations among features in the state space, the associations between features and decisions become entangled, making it difficult for the policy to distinguish them. To this end, we propose Saliency-Guided Features Decorrelation (SGFD) to eliminate these correlations through sample reweighting. Concretely, SGFD consists of two core techniques: Random Fourier Functions (RFF) and the saliency map. RFF is utilized to estimate the complex non-linear correlations in high-dimensional images, while the saliency map is designed to identify the changed features. Under the guidance of the saliency map, SGFD employs sample reweighting to minimize the estimated correlations related to changed features, thereby achieving decorrelation in visual RL tasks. Our experimental results demonstrate that SGFD can generalize well on a wide range of test environments and significantly outperforms state-of-the-art methods in handling both task-irrelevant variations and task-relevant variations.	翻訳日:2023-12-25 18:08:24 公開日:2023-12-22
# フレキシブル、スケーラブル、マシンラーニング対応のマルチモーダルoncologyデータセットの構築 Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets ( http://arxiv.org/abs/2310.01438v2 ) ライセンス: Link先を確認	Aakash Tripathi, Asim Waqas, Kavya Venkatesan, Yasin Yilmaz, Ghulam Rasool	(参考訳) データ取得、ストレージ、処理技術の進歩は、異種医療データの急速な成長をもたらした。放射線スキャン,病理像,分子情報を臨床データと統合することは,疾患の総合的理解と治療の最適化に不可欠である。複数のソースからのデータを統合する必要性はさらに、精密医療やパーソナライズされた治療を可能にするために、がんなどの複雑な疾患で顕著である。本研究は,がん研究データコモンズ (CRDC) などの公開ソースからの異種データを相互接続型で患者中心のフレームワークに効率的に融合するための,柔軟でスケーラブルで費用対効果の高いメタデータフレームワークであるマルチモーダル・インテグレーション・オブ・オンコロジー・データ・システム (MINDS) を提案する。 MINDSはデータ型間の関係を探索し、大規模マルチモーダル機械学習モデルを開発するためのコホートを構築するためのインターフェースを提供する。 MINDSはマルチモーダルデータを調和させることで、研究者に診断と予後の洞察を明らかにし、エビデンスベースのパーソナライズされたケアを可能にする分析能力を高めることを目指している。 MINDSは詳細なエンドツーエンドのデータプロファイランスを追跡し、再現性と透明性を確保する。 MINDSのクラウドネイティブアーキテクチャは、大幅なストレージ最適化、レプリケーション回避、動的アクセス機能を確保しながら、安全でコスト最適化された方法で指数関数的なデータ成長を処理することができる。自動スケーリング、アクセス制御、その他のメカニズムは、パイプラインのスケーラビリティとセキュリティを保証する。 MINDSは、オンコロジーデータ統合の将来に向けた重要なステップである相互運用可能なメタデータ駆動アプローチを通じて、既存のバイオメディカルデータサイロの限界を克服する。 The advancements in data acquisition, storage, and processing techniques have resulted in the rapid growth of heterogeneous medical data. Integrating radiological scans, histopathology images, and molecular information with clinical data is essential for developing a holistic understanding of the disease and optimizing treatment. The need for integrating data from multiple sources is further pronounced in complex diseases such as cancer for enabling precision medicine and personalized treatments. This work proposes Multimodal Integration of Oncology Data System (MINDS) - a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources such as the Cancer Research Data Commons (CRDC) into an interconnected, patient-centric framework. MINDS offers an interface for exploring relationships across data types and building cohorts for developing large-scale multimodal machine learning models. By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability to uncover diagnostic and prognostic insights and enable evidence-based personalized care. MINDS tracks granular end-to-end data provenance, ensuring reproducibility and transparency. The cloud-native architecture of MINDS can handle exponential data growth in a secure, cost-optimized manner while ensuring substantial storage optimization, replication avoidance, and dynamic access capabilities. Auto-scaling, access controls, and other mechanisms guarantee pipelines' scalability and security. MINDS overcomes the limitations of existing biomedical data silos via an interoperable metadata-driven approach that represents a pivotal step toward the future of oncology data integration.	翻訳日:2023-12-25 18:07:57 公開日:2023-12-22
# 密度汎関数理論の凸条件 The Convexity Condition of Density-Functional Theory ( http://arxiv.org/abs/2309.17443v2 ) ライセンス: Link先を確認	Andrew C. Burgess, Edward Linscott, and David D. O'Regan	(参考訳) 密度汎関数理論(DFT)では、有限電子系の全エネルギーが電子数に対して凸であることから、2 E_v[N_0] <= E_v[N_0 - 1] + E_v[N_0 + 1] が成り立つ。無限分離リミット法を用いて、(1)すべてのv表現可能密度、(2)サイズ整合、(3)翻訳不変量に対して完全であるdftの定式化に対する凸条件を証明します。類似の結果は、一体還元密度行列汎関数理論でも証明されている。基底状態が常にアクセス可能であるとは限らない既知の DFT の定式化があり、そのような場合には凸性は保たないことを示しているが、それでもこの証明は正確な交換相関関数の厳密な制約を確認する。また,密度汎関数近似の開発に役立つ近似DFTの凸性について十分な条件を提供する。この結果は、Khn-ShamバンドギャップとDFTの交換相関微分不連続性を理解する中心となる電子数に関する分数線形性条件の証明において立証された仮定を持ち上げる。 It has long been postulated that within density-functional theory (DFT) the total energy of a finite electronic system is convex with respect to electron count, so that 2 E_v[N_0] <= E_v[N_0 - 1] + E_v[N_0 + 1]. Using the infinite-separation-limit technique, this article proves the convexity condition for any formulation of DFT that is (1) exact for all v-representable densities, (2) size-consistent, and (3) translationally invariant. An analogous result is also proven for one-body reduced density matrix functional theory. While there are known DFT formulations in which the ground state is not always accessible, indicating that convexity does not hold in such cases, this proof nonetheless confirms a stringent constraint on the exact exchange-correlation functional. We also provide sufficient conditions for convexity in approximate DFT, which could aid in the development of density-functional approximations. This result lifts a standing assumption in the proof of the piecewise linearity condition with respect to electron count, which has proven central to understanding the Kohn-Sham band-gap and the exchange-correlation derivative discontinuity of DFT.	翻訳日:2023-12-25 18:07:32 公開日:2023-12-22
# 多変量地球系データキューブとしてのシーズファイア SeasFire as a Multivariate Earth System Datacube for Wildfire Dynamics ( http://arxiv.org/abs/2312.07199v2 ) ライセンス: Link先を確認	Ilektra Karasante, Lazaro Alonso, Ioannis Prapas, Akanksha Ahuja, Nuno Carvalhais and Ioannis Papoutsis	(参考訳) 森林火災の世界的な発生、規模、頻度は、生態系サービスや人間の生活に大きな脅威をもたらす。森林火災の前兆条件を効果的に定量化し、属性付けするため、地球系力学の徹底的な理解が不可欠である。そこで,本研究では,地球観測による季節的野火モデルに準じた時空間データセットであるseasfire datacubeについて紹介する。海火データキューブは、気候、植生、海洋指数、人的要因を含む59の変数で構成され、8日間の時間分解能を持ち、空間分解能は0.25$^{\circ}$であり、2001年から2021年までの期間にわたる。深層学習モデルを用いて,山火事運転者の多様性と季節性を探究し,海と気候の相互接続と山火事の因果関係をモデル化し,複数の時間スケールにわたるサブシーズンの山火事パターンを予測した。私たちは、SeasFireデータキューブを公開し、地球システム科学者や機械学習の実践者に、山火事の理解と予測の改善に利用するようアピールします。 The global occurrence, scale, and frequency of wildfires pose significant threats to ecosystem services and human livelihoods. To effectively quantify and attribute the antecedent conditions for wildfires, a thorough understanding of Earth system dynamics is imperative. In response, we introduce the SeasFire datacube, a meticulously curated spatiotemporal dataset tailored for global sub-seasonal to seasonal wildfire modeling via Earth observation. The SeasFire datacube comprises of 59 variables encompassing climate, vegetation, oceanic indices, and human factors, has an 8-day temporal resolution and a spatial resolution of 0.25$^{\circ}$, and spans from 2001 to 2021. We showcase the versatility of SeasFire for exploring the variability and seasonality of wildfire drivers, modeling causal links between ocean-climate teleconnections and wildfires, and predicting sub-seasonal wildfire patterns across multiple timescales with a Deep Learning model. We publicly release the SeasFire datacube and appeal to Earth system scientists and Machine Learning practitioners to use it for an improved understanding and anticipation of wildfires.	翻訳日:2023-12-25 18:00:37 公開日:2023-12-22
# 人間のデータを超えた: 言語モデルによる問題解決のための自己学習のスケーリング Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models ( http://arxiv.org/abs/2312.06585v3 ) ライセンス: Link先を確認	Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura Culp, Lechao Xiao, Maxwell L. Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yundi Qian, Yamini Bansal, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel	(参考訳) 人間の生成したデータに対する微調整言語モデル~(lms)が普及している。しかし、これらのモデルの性能はしばしば高品質な人間のデータの量と多様性によって制限される。本稿では,スカラーフィードバックにアクセスできるタスク,例えば正当性を検証できる数学問題において,人間のデータを超えることができるかどうかを考察する。そこで我々は,(1)モデルからサンプルを生成し,二元フィードバックを用いてフィルタリングし,(2)これらのサンプル上でモデルを微調整し,(3)このプロセスを数回繰り返す。 PaLM-2モデルを用いた高度なMATH推論とAPPS符号化ベンチマークを用いて、ReST$^{EM}$はモデルサイズに好適にスケールし、人間のデータのみによる微調整を大幅に上回る。総じて,フィードバックによる自己学習は,人間生成データへの依存を大幅に低減できることが示唆された。 Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST$^{EM}$ scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can substantially reduce dependence on human-generated data.	翻訳日:2023-12-25 17:59:38 公開日:2023-12-22
# DG-TTA:ドメイン一般化とテスト時間適応による領域外医療画像分割 DG-TTA: Out-of-domain medical image segmentation through Domain Generalization and Test-Time Adaptation ( http://arxiv.org/abs/2312.06275v2 ) ライセンス: Link先を確認	Christian Weihsbach, Christian N. Kruse, Alexander Bigalke, Mattias P. Heinrich	(参考訳) ドメイン外の画像に事前訓練された医療セグメンテーションモデルを適用すると、品質の不足を予測できる。微調整や教師なしおよびソースフリーなドメイン適応など、モデルパフォーマンスを維持するためのいくつかの戦略が提案されている。これらの戦略はデータ可用性に対する制限的な要件を設定した。本研究では,未熟な対象領域における事前学習モデルの再使用に対して,ドメインの一般化とテスト時間適応を組み合わせることを提案する。ソースデータに対するドメイン一般化事前トレーニングは、ターゲット領域で最高の初期性能を得るために使用される。本稿では,これまで画像登録タスクで用いられてきたマインドディスクリプタを,従来の手法と比較して,汎用化を実現し,小規模データセットの優れた性能を示す技術として紹介する。テスト時には、画像の増大に応じてモデルの重み付けを最適化することで、1回のスキャン毎に高品質なセグメンテーションが保証される。これにより、ソースとターゲットデータの分離使用が可能となり、現在のデータ可用性の障壁が排除される。さらに、提案手法は、特定のモデルアーキテクチャや関連するドメインやラベルの事前知識を必要としないため、高度にモジュール化されている。我々は、現在医療画像セグメンテーションの最もポピュラーで正確なフレームワークであるnnUNetに統合することでこれを実証する。本研究では,腹部,心臓,腰椎を対象とする複数のデータセットを用い,いくつかの領域外シナリオを構築した。本手法は, 事前訓練した全身CTモデルと組み合わせることで, 上記すべてのシナリオにおいて, MR画像を高精度に分割できることを実証する。オープンソースコードは以下のとおりである。 https://github.com/multimodallearning/dg-tta Applying pre-trained medical segmentation models on out-of-domain images often yields predictions of insufficient quality. Several strategies have been proposed to maintain model performance, such as finetuning or unsupervised- and source-free domain adaptation. These strategies set restrictive requirements for data availability. In this study, we propose to combine domain generalization and test-time adaptation to create a highly effective approach for reusing pre-trained models in unseen target domains. Domain-generalized pre-training on source data is used to obtain the best initial performance in the target domain. We introduce the MIND descriptor previously used in image registration tasks as a further technique to achieve generalization and present superior performance for small-scale datasets compared to existing approaches. At test-time, high-quality segmentation for every single unseen scan is ensured by optimizing the model weights for consistency given different image augmentations. That way, our method enables separate use of source and target data and thus removes current data availability barriers. Moreover, the presented method is highly modular as it does not require specific model architectures or prior knowledge of involved domains and labels. We demonstrate this by integrating it into the nnUNet, which is currently the most popular and accurate framework for medical image segmentation. We employ multiple datasets covering abdominal, cardiac, and lumbar spine scans and compose several out-of-domain scenarios in this study. We demonstrate that our method, combined with pre-trained whole-body CT models, can effectively segment MR images with high accuracy in all of the aforementioned scenarios. Open-source code can be found here: https://github.com/multimodallearning/DG-TTA	翻訳日:2023-12-25 17:59:21 公開日:2023-12-22
# particle swarm optimization-back propagation neural network と multivariate gaussian-hidden markov model に基づくストックピッキングとタイミングの定量的融合戦略 A quantitative fusion strategy of stock picking and timing based on Particle Swarm Optimized-Back Propagation Neural Network and Multivariate Gaussian-Hidden Markov Model ( http://arxiv.org/abs/2312.05756v3 ) ライセンス: Link先を確認	Huajian Li, Longjian Li, Jiajian Liang, Weinan Dai	(参考訳) 近年、機械学習(ml)は経済的意思決定、投資予測、リスク管理などに効果的なアプローチと新しい技術をもたらし、経済・金融環境の可変かつ複雑な性質に対処している。本研究は,多変量ガウス・ハイデンマルコフモデル (MGHMM) とParticle Swarm (PSO-BPNN) に最適化されたバックプロパゲーションニューラルネットワークを活用することで,株価タイミングとピッキング戦略を組み合わせた定量的融合モデルを提案する。利得化、中和、標準化、CSI300指数の戻りを含む52の因子間の情報係数(IC)が算出された後、主成分分析(PCA)による次元減少後のPSO-BPNNの入力に向かう候補因子として、上位にランクインする要因の所定の量を選択し、次いで一定量の成分在庫を出力する。その後,過去4年間の卓越したパフォーマンスを示すBox-Cox変換後のCSI300インデックスデータを入力して訓練したMGHMMが出力するスクリーニング株と株式市場の状態に基づいて,予測と取引を行う。最終的に、従来の予測と取引の方法は、中国株式市場の戦略と比較される。本論文で提示する株式の選定とタイミングを取り入れた融合戦略は、金融分析の革新的な技術である。 In recent years, machine learning (ML) has brought effective approaches and novel techniques to economic decision, investment forecasting, and risk management, etc., coping the variable and intricate nature of economic and financial environments. For the investment in stock market, this research introduces a pioneering quantitative fusion model combining stock timing and picking strategy by leveraging the Multivariate Gaussian-Hidden Markov Model (MGHMM) and Back Propagation Neural Network optimized by Particle Swarm (PSO-BPNN). After the information coefficients (IC) between fifty-two factors that have been winsorized, neutralized and standardized and the return of CSI 300 index are calculated, a given amount of factors that rank ahead are choose to be candidate factors heading for the input of PSO-BPNN after dimension reduction by Principal Component Analysis (PCA), followed by a certain amount of constituent stocks outputted. Subsequently, we conduct the prediction and trading on the basis of the screening stocks and stock market state outputted by MGHMM trained using inputting CSI 300 index data after Box-Cox transformation, bespeaking eximious performance during the period of past four years. Ultimately, some conventional forecast and trading methods are compared with our strategy in Chinese stock market. Our fusion strategy incorporating stock picking and timing presented in this article provide a innovative technique for financial analysis.	翻訳日:2023-12-25 17:58:53 公開日:2023-12-22
# 大規模言語モデルを用いた脆弱性検出にどこまで関わったか How Far Have We Gone in Vulnerability Detection Using Large Language Models ( http://arxiv.org/abs/2311.12420v3 ) ライセンス: Link先を確認	Zeyu Gao, Hao Wang, Yuchen Zhou, Wenyu Zhu, Chao Zhang	(参考訳) ソフトウェアはますます複雑になり、脆弱性が生じる傾向にあるため、自動脆弱性検出は極めて重要でありながら困難である。様々なタスクにおける大規模言語モデル(llm)の著しい成功を考えると、脆弱性検出においてその効果が期待されている。しかし、脆弱性検出におけるその可能性の定量的理解はいまだに欠けている。このギャップを埋めるために,包括的脆弱性ベンチマークvulbenchを導入する。このベンチマークは、幅広いCTF(Capture-the-Flag)課題と実世界のアプリケーションからの高品質なデータを集約し、脆弱性タイプとその根本原因を詳述した各脆弱性関数に対するアノテーションを提供する。 16のLLMと6つの最先端(SOTA)ディープラーニングベースモデルと静的アナライザを含む実験により、複数のLLMが脆弱性検出において従来のディープラーニングアプローチよりも優れており、LLMの未解決の可能性を明らかにしていることがわかった。この作業は、ソフトウェアセキュリティ強化のためのllmの理解と利用に寄与する。 As software becomes increasingly complex and prone to vulnerabilities, automated vulnerability detection is critically important, yet challenging. Given the significant successes of large language models (LLMs) in various tasks, there is growing anticipation of their efficacy in vulnerability detection. However, a quantitative understanding of their potential in vulnerability detection is still missing. To bridge this gap, we introduce a comprehensive vulnerability benchmark VulBench. This benchmark aggregates high-quality data from a wide range of CTF (Capture-the-Flag) challenges and real-world applications, with annotations for each vulnerable function detailing the vulnerability type and its root cause. Through our experiments encompassing 16 LLMs and 6 state-of-the-art (SOTA) deep learning-based models and static analyzers, we find that several LLMs outperform traditional deep learning approaches in vulnerability detection, revealing an untapped potential in LLMs. This work contributes to the understanding and utilization of LLMs for enhanced software security.	翻訳日:2023-12-25 17:56:15 公開日:2023-12-22
# 直接クリフォード+T格子手術による実用量子回路の実用化 Realistic Cost to Execute Practical Quantum Circuits using Direct Clifford+T Lattice Surgery Compilation ( http://arxiv.org/abs/2311.10686v2 ) ライセンス: Link先を確認	Tyler LeBlond, Christopher Dean, George Watkins, and Ryan S. Bennink	(参考訳) 本稿では,clifford+tゲートで表現された量子回路を表面コード格子手術命令セットに明示的にコンパイルする資源推定パイプラインについて報告する。コンパイルされた回路からのマジック状態要求のケイデンスにより、ポストホック解析においてマジック状態の蒸留と貯蔵要求の最適化が可能となる。論理回路を格子状手術操作にコンパイルするために,オープンソースのLattice Surgery Compilerを構築した。修正されたコンパイラは、論理ゲートを抽象的なレイアウトに依存しない命令セットに変換し、第2は、特定のリソースレイアウトに従ってハードウェアタイルに割り当てられる局所格子手術命令にコンパイルする。第2段階では、フォールトトレラント層でのリソース競合を避けながら論理並列性を維持し、リアリズムを支援する。さらに、ユーザーはマジック状態が補充される専用のタイルを指定することができ、論理計算からのリソースコストはマジック状態の蒸留と貯蔵とは独立に考慮できる。分子の基底状態推定のための資源推定を提供することにより、パイプラインを大規模で実用的な量子回路に適用する可能性を示す。注意して考慮しなければ、マジック状態の消費率が異なる実回路において、マジック状態のストレージのリソースコストが支配的であることが分かる。 In this article, we report a resource estimation pipeline that explicitly compiles quantum circuits expressed using the Clifford+T gate set into a surface code lattice surgery instruction set. The cadence of magic state requests from the compiled circuit enables the optimization of magic state distillation and storage requirements in a post-hoc analysis. To compile logical circuits into lattice surgery operations, we build upon the open-source Lattice Surgery Compiler. The revised compiler operates in two stages: the first translates logical gates into an abstract, layout-independent instruction set; the second compiles these into local lattice surgery instructions that are allocated to hardware tiles according to a specified resource layout. The second stage retains logical parallelism while avoiding resource contention in the fault-tolerant layer, aiding realism. Additionally, users can specify dedicated tiles at which magic states are replenished, enabling resource costs from the logical computation to be considered independently from magic state distillation and storage. We demonstrate the applicability of our pipeline to large, practical quantum circuits by providing resource estimates for the ground state estimation of molecules. We find that, unless carefully considered, the resource costs of magic state storage can dominate in real circuits which have variable magic state consumption rates.	翻訳日:2023-12-25 17:55:58 公開日:2023-12-22
# 多エージェントpomdpにおけるファクタド・オンライン・プランニング Factored Online Planning in Many-Agent POMDPs ( http://arxiv.org/abs/2312.11434v2 ) ライセンス: Link先を確認	Maris F.L. Galesloot, Thiago D. Sim\~ao, Sebastian Junges, Nils Jansen	(参考訳) 集中型マルチエージェントシステムでは、しばしばマルチエージェント部分観測可能なマルコフ決定プロセス (MPOMDPs) としてモデル化され、アクションと観測空間はエージェントの数とともに指数関数的に増加し、単一エージェントのオンライン計画の価値と信念を効果的に見積もる。事前作業は、いわゆるコーディネーショングラフを通じて、マルチエージェント設定の固有の構造を利用して、部分的に価値見積もりに取り組む。さらに、近似に観測の可能性が組み込まれ、信念の推定が向上した。しかし、価値推定と信念推定の課題は個別にのみ取り組まれており、既存の手法が多くのエージェントへのスケーリングを妨げている。したがって、これらの課題を同時に解決する。まず,MPOMDPのサンプルベースオンラインプランナに重み付き粒子フィルタリングを導入する。第二に、我々はその信念をスケーラブルに近似する。第3に, エージェントインタラクションの典型的な局所性を活用した手法を, スパース粒子フィルタツリー上で動作させるmpomdpsの新しいオンライン計画アルゴリズムに適用する。いくつかの最先端のベースラインに対する実験的な評価は、(1)手法が少数のエージェントと競合し、(2)多数のエージェントが存在する場合のベースラインよりも改善されていることを示している。 In centralized multi-agent systems, often modeled as multi-agent partially observable Markov decision processes (MPOMDPs), the action and observation spaces grow exponentially with the number of agents, making the value and belief estimation of single-agent online planning ineffective. Prior work partially tackles value estimation by exploiting the inherent structure of multi-agent settings via so-called coordination graphs. Additionally, belief estimation has been improved by incorporating the likelihood of observations into the approximation. However, the challenges of value estimation and belief estimation have only been tackled individually, which prevents existing methods from scaling to many agents. Therefore, we address these challenges simultaneously. First, we introduce weighted particle filtering to a sample-based online planner for MPOMDPs. Second, we present a scalable approximation of the belief. Third, we bring an approach that exploits the typical locality of agent interactions to novel online planning algorithms for MPOMDPs operating on a so-called sparse particle filter tree. Our experimental evaluation against several state-of-the-art baselines shows that our methods (1) are competitive in settings with only a few agents and (2) improve over the baselines in the presence of many agents.	翻訳日:2023-12-25 17:48:01 公開日:2023-12-22
# OsmLocator:非学習的生成的視点による重なり合う散乱点の探索 OsmLocator: locating overlapping scatter marks with a non-training generative perspective ( http://arxiv.org/abs/2312.11146v2 ) ライセンス: Link先を確認	Yuming Qiu, Aleksandra Pizurica, Qi Ming, Nicolas Nadisic	(参考訳) 散乱画像におけるマークの自動定位は、膨大な文書画像の発見と理解に大いに役立ち、視覚的質問応答aiシステムにおける推論は、重複するマークの普遍性のため、非常に難しい問題である。重複するマークの配置には、テクスチャの欠如、文脈の少ない情報、ハロー形状、小さなサイズなど、多くの困難がある。本稿では,非学習的な生成的視点からクラスタリングに基づく再可視化に関する組合せ最適化問題として,目的関数が最小値に達した場合のマルチ変数の状態を見つけ,散乱マークの同定を行う。目的関数は、2値化散乱画像とそれに対応するクラスタリングに基づいて生成された再視覚化の差に基づいて構成される。基本的に、再視覚化は、ラスタ化された散乱画像を入力としてのみ新しい散乱グラフを生成し、再視覚化のための情報を提供するためにクラスタリングを用いる。この方法は、トレーニングデータセットや参照に依存することなく、散乱画像に重なり合い、可変サイズ、可変形状のマークを安定的に配置することができる。一方,本研究では,様々な接続領域で動作するシミュレートアニーリングの適応型を提案する。さらに,sml2023というデータセットを特に構築し,異なるマーカーと重なり合う重大さのさまざまなレベルを持つ数百の散乱画像を用いて,提案手法をテストし,既存の手法と比較した。その結果,重複重畳度やマーカータイプが異なる散乱画像において,割当コストに基づく測定値に対して0.3 % の絶対値の増加を,最先端法と比較して精度良く検出できることがわかった。この研究は、巨大なウェブページや文献のデータマイニングに価値があり、バブル計数などの画像計測に新たな光を当てている。 Automated mark localization in scatter images, greatly helpful for discovering knowledge and understanding enormous document images and reasoning in visual question answering AI systems, is a highly challenging problem because of the ubiquity of overlapping marks. Locating overlapping marks faces many difficulties such as no texture, less contextual information, hallow shape and tiny size. Here, we formulate it as a combinatorial optimization problem on clustering-based re-visualization from a non-training generative perspective, to locate scatter marks by finding the status of multi-variables when an objective function reaches a minimum. The objective function is constructed on difference between binarized scatter images and corresponding generated re-visualization based on their clustering. Fundamentally, re-visualization tries to generate a new scatter graph only taking a rasterized scatter image as an input, and clustering is employed to provide the information for such re-visualization. This method could stably locate severely-overlapping, variable-size and variable-shape marks in scatter images without dependence of any training dataset or reference. Meanwhile, we propose an adaptive variant of simulated annealing which can works on various connected regions. In addition, we especially built a dataset named SML2023 containing hundreds of scatter images with different markers and various levels of overlapping severity, and tested the proposed method and compared it to existing methods. The results show that it can accurately locate most marks in scatter images with different overlapping severity and marker types, with about 0.3 absolute increase on an assignment-cost-based metric in comparison with state-of-the-art methods. This work is of value to data mining on massive web pages and literatures, and shedding new light on image measurement such as bubble counting.	翻訳日:2023-12-25 17:47:40 公開日:2023-12-22
# 変圧器の数学的展望 A mathematical perspective on Transformers ( http://arxiv.org/abs/2312.10794v2 ) ライセンス: Link先を確認	Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet	(参考訳) トランスフォーマーは、大きな言語モデルの内部動作において中心的な役割を果たす。本研究では,相互作用する粒子系として解釈したトランスフォーマーを解析するための数学的枠組みを構築した。我々の研究は基礎となる理論を探求し、数学者と計算機科学者に新しい視点を提供する。 Transformers play a central role in the inner workings of large language models. We develop a mathematical framework for analyzing Transformers based on their interpretation as interacting particle systems, which reveals that clusters emerge in long time. Our study explores the underlying theory and offers new perspectives for mathematicians as well as computer scientists.	翻訳日:2023-12-25 17:47:09 公開日:2023-12-22
# 長期の公正制約を考慮したオンラインレスマルチアーマーバンド Online Restless Multi-Armed Bandits with Long-Term Fairness Constraints ( http://arxiv.org/abs/2312.10303v2 ) ライセンス: Link先を確認	Shufan Wang, Guojun Xiong, Jian Li	(参考訳) Restless Multi-armed bandits (RMAB) は、制約のある逐次決定問題をモデル化するために広く用いられている。意思決定者(dm)は、マルコフ決定過程(mdp)に従って各アームの状態が確率的に進化する任意の決定期において、最大bアームを活性化できる「即時活性化制約」の下で、無限の地平線上で期待される総報酬を最大化することを目指している。しかし、この基本モデルは武器間の公平性を保証することができない。本稿では, RMAB-Fモデルについて述べる。RMAB-Fは「長期公正性制約」を持つ新しいRMABモデルであり, 各アームに対する最小の長期活性化率を満たすことを目的としている。オンラインRMAB-F設定(つまり、各腕に付随するMDPがDMに未知である)に対して、Fair-UCRLという新しい強化学習アルゴリズムを開発する。 Fair-UCRLは、報酬の後悔と公正性違反の両面において、確率的サブリニア境界を保証することを証明している。既定のrl法と比較して、我々のフェアucrlは、意思決定に低複雑さのインデックスポリシーを利用する新しいエクスプロイトを含んでいるため、計算効率がはるかに高い。実験の結果,Fair-UCRLの有効性がさらに示された。 Restless multi-armed bandits (RMAB) have been widely used to model sequential decision making problems with constraints. The decision maker (DM) aims to maximize the expected total reward over an infinite horizon under an "instantaneous activation constraint" that at most B arms can be activated at any decision epoch, where the state of each arm evolves stochastically according to a Markov decision process (MDP). However, this basic model fails to provide any fairness guarantee among arms. In this paper, we introduce RMAB-F, a new RMAB model with "long-term fairness constraints", where the objective now is to maximize the long term reward while a minimum long-term activation fraction for each arm must be satisfied. For the online RMAB-F setting (i.e., the underlying MDPs associated with each arm are unknown to the DM), we develop a novel reinforcement learning (RL) algorithm named Fair-UCRL. We prove that Fair-UCRL ensures probabilistic sublinear bounds on both the reward regret and the fairness violation regret. Compared with off-the-shelf RL methods, our Fair-UCRL is much more computationally efficient since it contains a novel exploitation that leverages a low-complexity index policy for making decisions. Experimental results further demonstrate the effectiveness of our Fair-UCRL.	翻訳日:2023-12-25 17:46:44 公開日:2023-12-22
# 平衡内外相互作用鎖における2つの不連続区間の絡み合いエントロピーとスピン構造 Entanglement entropy of two disjoint intervals and spin structures in interacting chains in and out of equilibrium ( http://arxiv.org/abs/2312.10028v2 ) ライセンス: Link先を確認	Vanja Mari\'c, Saverio Bocini, Maurizio Fagotti	(参考訳) 我々は、ハイゼンベルクスピン-$\frac{1}{2}$ xxzモデルと相互作用するスピン鎖のパラダイムを基準系として、ヨルダン-ウィグナー変換と部分鎖への制限によってそれに関連する相互作用モデルを検討する。例えば、空隙のない XXZ ハミルトニアンのフェルミオン類似体は、連続的なスケーリング極限において、質量のないチューリングモデルによって記述される。基底状態における不連続ブロックの r\'enyi-$\alpha$ エントロピーを調べ、無限長の極限において r\'enyi-$\alpha$ 三成分情報を記述する普遍的スケーリング関数を抽出する。また、フォン・ノイマンのエントロピーを考えるが、大距離の限界のみを考える。スピンブロックのエントロピーを用いて、基礎となる無質量チューリングモデルのスピン構造を明らかにする方法を示す。最後に,大域的クエンチ後の三成分情報について推測し,無限時間と小クエンチの限界におけるその漸近的挙動を推測する。結果として得られる'residual tripartite information''の予想は、区間の長さが(大きな)距離よりも無限に大きい極限に対応するもので、最近、非相互作用スピン鎖の研究を行った普遍性(universality)の主張を支持する。我々の軽微な仮定は、XXZの隙間のない位相における異方性の小さなクエンチ後の残留三部体情報は、$-\log 2$と等しいことを示唆している。 We take the paradigm of interacting spin chains, the Heisenberg spin-$\frac{1}{2}$ XXZ model, as a reference system and consider interacting models that are related to it by Jordan-Wigner transformations and restrictions to sub-chains. An example is the fermionic analogue of the gapless XXZ Hamiltonian, which, in a continuum scaling limit, is described by the massless Thirring model. We work out the R\'enyi-$\alpha$ entropies of disjoint blocks in the ground state and extract the universal scaling functions describing the R\'enyi-$\alpha$ tripartite information in the limit of infinite lengths. We consider also the von Neumann entropy, but only in the limit of large distance. We show how to use the entropies of spin blocks to unveil the spin structures of the underlying massless Thirring model. Finally, we speculate about the tripartite information after global quenches and conjecture its asymptotic behaviour in the limit of infinite time and small quench. The resulting conjecture for the ``residual tripartite information'', which corresponds to the limit in which the intervals' lengths are infinitely larger than their (large) distance, supports the claim of universality recently made studying noninteracting spin chains. Our mild assumptions imply that the residual tripartite information after a small quench of the anisotropy in the gapless phase of XXZ is equal to $-\log 2$.	翻訳日:2023-12-25 17:46:18 公開日:2023-12-22
# Q-Segment: 血管型診断のためのイメージインセンサー Q-Segment: Segmenting Images In-Sensor for Vessel-Based Medical Diagnosis ( http://arxiv.org/abs/2312.09854v2 ) ライセンス: Link先を確認	Pietro Bonazzi, Julian Moosmann, Yawei Li, Sizhen Bian, Michele Magno	(参考訳) 本稿では,ディープラーニングモデルを直接センサに展開することへの関心が高まっている。本稿では,量子化リアルタイムセグメンテーションアルゴリズム"q-segment"を提案し,センサ内プロセッサであるsony imx500を用いた低消費電力エッジビジョンプラットフォームについて包括的評価を行う。このモデルの主な目的の1つは、血管ベースの診断のためのエンドツーエンドのイメージセグメンテーションを実現することである。 IMX500プラットフォーム上に展開されたQ-Segmentは、センサー内での超低推論時間と72mWの消費電力を実現している。提案したネットワークと,フロートおよび量子化の両方の最先端モデルを比較し,提案手法が計算効率の面で,例えばERFNetの75倍の係数で,様々なプラットフォーム上の既存ネットワークより優れていることを示す。このネットワークは、接続をスキップするエンコーダ・デコーダ構造を採用しており、2進法の精度は97.25%、受信器動作特性曲線(AUC)は96.97%である。また、IMX500処理コアと、低消費電力のマルチコアARM Cortex-Mマイクロコントローラ、シングルコアARM Cortex-M4を比較し、エンドツーエンドの低レイテンシ(17ms)と電力消費(254mW)でセンサ内処理を実現できることを示す。この研究は、エッジベースのイメージセグメンテーションに関する貴重な洞察をもたらし、低消費電力環境に適した効率的なアルゴリズムの基礎を築いた。 This paper addresses the growing interest in deploying deep learning models directly in-sensor. We present "Q-Segment", a quantized real-time segmentation algorithm, and conduct a comprehensive evaluation on a low-power edge vision platform with an in-sensors processor, the Sony IMX500. One of the main goals of the model is to achieve end-to-end image segmentation for vessel-based medical diagnosis. Deployed on the IMX500 platform, Q-Segment achieves ultra-low inference time in-sensor only 0.23 ms and power consumption of only 72mW. We compare the proposed network with state-of-the-art models, both float and quantized, demonstrating that the proposed solution outperforms existing networks on various platforms in computing efficiency, e.g., by a factor of 75x compared to ERFNet. The network employs an encoder-decoder structure with skip connections, and results in a binary accuracy of 97.25% and an Area Under the Receiver Operating Characteristic Curve (AUC) of 96.97% on the CHASE dataset. We also present a comparison of the IMX500 processing core with the Sony Spresense, a low-power multi-core ARM Cortex-M microcontroller, and a single-core ARM Cortex-M4 showing that it can achieve in-sensor processing with end-to-end low latency (17 ms) and power concumption (254mW). This research contributes valuable insights into edge-based image segmentation, laying the foundation for efficient algorithms tailored to low-power environments.	翻訳日:2023-12-25 17:45:52 公開日:2023-12-22
# 冷却・コヒーレンス転移機構としてのフォノン光子変換 Phonon-photon conversion as mechanism for cooling and coherence transfer ( http://arxiv.org/abs/2312.09837v2 ) ライセンス: Link先を確認	Alessandro Ferreri, David Edward Bruschi, Frank K. Wilhelm, Franco Nori and Vincenzo Macr\`i	(参考訳) 力学カシミール効果(dynamical casimir effect)は、量子場を閉じ込めた空洞の可動壁の機械的エネルギーを場の量子量に変換することができる物理現象である。この効果は、量子場理論の最も驚くべき予測の1つとして認識されている。量子スケールでは、エネルギー変換は非一貫性、すなわち壁の物理的運動なしでも起こりうる。量子熱力学を用いて, 壁面とキャビティの温度勾配が非破壊的な場合, この現象を壁面を冷却する道具として用いることができることを示した。同時に、熱伝達の過程は、レーザーによって駆動される1つのキャビティモードから壁へのコヒーレンスを共有し、コヒーレント振動を強制することができる。最後に、他のサブシステムで構成される場合を含むシステム全体を冷却するために、1つのレーザードライブを使用する方法を示す。 The dynamical Casimir effect is the physical phenomenon where the mechanical energy of a movable wall of a cavity confining a quantum field can be converted into quanta of the field itself. This effect has been recognized as one of the most astonishing predictions of quantum field theory. At the quantum scale, the energy conversion can also occur incoherently, namely without an physical motion of the wall. We employ quantum thermodynamics to show that this phenomenon can be employed as a tool to cool down the wall when there is a non-vanishing temperature gradient between the wall and the cavity. At the same time, the process of heat-transfer enables to share the coherence from one cavity mode, driven by a laser, to the wall, thereby forcing its coherent oscillation. Finally, we show how to employ one laser drive to cool the entire system including the case when it is composed of other subsystems.	翻訳日:2023-12-25 17:45:21 公開日:2023-12-22
# 大規模量子ネットワークのための真空ビームガイド Vacuum Beam Guide for Large-Scale Quantum Networks ( http://arxiv.org/abs/2312.09372v2 ) ライセンス: Link先を確認	Yuexun Huang, Francisco Salces--Carcoba, Rana X Adhikari, Amir H. Safavi-Naeini, Liang Jiang	(参考訳) 真空ビームガイド(vbg)は、長距離量子通信における既存のファイバーや衛星技術の限界を克服するための、量子チャネルの全く異なるソリューションを提供する。 VBGは、レンズの配列を1km間隔で配置することで、幅広い光波長に対して超高透過性を提供します。現実的なパラメータでは、VBGは減衰率の点で3桁の精度で最高の繊維を上回ります。その結果、vbgは、最先端の量子衛星通信レートよりも桁違いに高い10^{13}$ qubit/sec以上の量子チャネル容量を持つ数千km以上の長距離量子通信を可能にする。驚くべきことに、量子リピータを使わずに、vbgは地上ベース、低損失、高帯域幅の量子チャネルを提供し、コンピューティング、通信、センシングのための新しい分散量子情報アプリケーションを可能にする。 The vacuum beam guide (VBG) presents a completely different solution for quantum channels to overcome the limitations of existing fiber and satellite technologies for long-distance quantum communication. With an array of aligned lenses spaced kilometers apart, the VBG offers ultra-high transparency over a wide range of optical wavelengths. With realistic parameters, the VBG can outperform the best fiber by three orders of magnitude in terms of attenuation rate. Consequently, the VBG can enable long-range quantum communication over thousands of kilometers with quantum channel capacity beyond $10^{13}$ qubit/sec, orders of magnitude higher than the state-of-the-art quantum satellite communication rate. Remarkably, without relying on quantum repeaters, the VBG can provide a ground-based, low-loss, high-bandwidth quantum channel that enables novel distributed quantum information applications for computing, communication, and sensing.	翻訳日:2023-12-25 17:45:05 公開日:2023-12-22
# 夜間UAV追跡のための相互学習知識蒸留 Mutual-Learning Knowledge Distillation for Nighttime UAV Tracking ( http://arxiv.org/abs/2312.07884v2 ) ライセンス: Link先を確認	Yufeng Liu	(参考訳) 夜間無人航空機(UAV)の追跡は、必要不可欠なプラグアンドプレイの低照度エンハンサーによって促進されている。しかし、低照度エンハンサーの導入は、UAVの余分な計算負担を増大させ、リアルタイムUAVアプリケーションの開発を著しく妨げている。一方、これらの最先端のSOTA(State-of-the-art)エンハンサーは、高度な日中UAVトラッキングアプローチと密結合を欠いている。そこで本研究では,夜間UAV追跡のための新たな相互学習知識蒸留フレームワークであるMLKDを提案する。本フレームワークは,教師からの知識伝達と学生間の知識共有を通じて,コンパクトで迅速な夜間トラッカーを学習するために構築されている。具体的には,SOTAエンハンサーと優れたトラッキングバックボーンとに基づく上級教師を,タイトな結合認識トラッキングバックボーンのみに基づいて指導し,夜間のオブジェクト特徴を直接抽出する。一人の生徒のバイアス学習に対処するために,多様な蒸留方法を持つ多様な軽量の生徒が,教師の知識の様々な側面に焦点を合わせるように構築されている。さらに、先進的な相互学習室を設計し、上位の学生候補を選抜し、訓練段階において残りの学生をフレーム単位で支援する。さらに、テストデータセットから最後の最高の学生であるMLKD-Trackが選択される。 MLKDとMLKD-Trackの有効性と優位性を示す。 MLKD-Trackの実用性は、異なる課題のある実世界のテストで検証される。コードはhttps://github.com/lyfeng001/MLKDで公開されている。 Nighttime unmanned aerial vehicle (UAV) tracking has been facilitated with indispensable plug-and-play low-light enhancers. However, the introduction of low-light enhancers increases the extra computational burden for the UAV, significantly hindering the development of real-time UAV applications. Meanwhile, these state-of-the-art (SOTA) enhancers lack tight coupling with the advanced daytime UAV tracking approach. To solve the above issues, this work proposes a novel mutual-learning knowledge distillation framework for nighttime UAV tracking, i.e., MLKD. This framework is constructed to learn a compact and fast nighttime tracker via knowledge transferring from the teacher and knowledge sharing among various students. Specifically, an advanced teacher based on a SOTA enhancer and a superior tracking backbone is adopted for guiding the student based only on the tight coupling-aware tracking backbone to directly extract nighttime object features. To address the biased learning of a single student, diverse lightweight students with different distillation methods are constructed to focus on various aspects of the teacher's knowledge. Moreover, an innovative mutual-learning room is designed to elect the superior student candidate to assist the remaining students frame-by-frame in the training phase. Furthermore, the final best student, i.e., MLKD-Track, is selected through the testing dataset. Extensive experiments demonstrate the effectiveness and superiority of MLKD and MLKD-Track. The practicality of the MLKD-Track is verified in real-world tests with different challenging situations. The code is available at https://github.com/lyfeng001/MLKD.	翻訳日:2023-12-25 17:44:24 公開日:2023-12-22
# NeuSurf: スパースインプットビューからのニューラルサーフェスリコンストラクションのためのオンサーフェス NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views ( http://arxiv.org/abs/2312.13977v2 ) ライセンス: Link先を確認	Han Huang, Yulun Wu, Junsheng Zhou, Ge Gao, Ming Gu, Yu-Shen Liu	(参考訳) 近年,多視点再構成の分野では,神経暗黙関数が顕著な成果を上げている。しかし、既存のほとんどの手法は密集したビュー用に調整されており、スパースビューを扱う際に不満足なパフォーマンスを示す。スパースビュー再構築タスクに対処するために暗黙的再構成を一般化するために、いくつかの最新の方法が提案されているが、それらは依然として高いトレーニングコストを被り、慎重に選択された観点でのみ有効である。本稿では,表面上の事前情報を利用して高度に忠実な表面再構成を実現する新しいスパースビュー再構築フレームワークを提案する。具体的には,大域的幾何アライメントと局所幾何洗練に関する制約を設計し,粗い形状と細部を協調的に最適化する。これを実現するために、ニューラルネットワークをトレーニングし、SfMから得られる地上点からグローバルな暗黙の場を学習し、粗い幾何学的制約として活用する。局所的な幾何的整合性を利用するために、我々は地上の点を見かけや見えない視点に投影し、投影された特徴の一貫した損失を微細な幾何学的制約として扱う。 dtu と blendedmvs データセットによる2つの分散設定の実験結果は、最先端の方法よりも大幅に改善されていることを示している。 Recently, neural implicit functions have demonstrated remarkable results in the field of multi-view reconstruction. However, most existing methods are tailored for dense views and exhibit unsatisfactory performance when dealing with sparse views. Several latest methods have been proposed for generalizing implicit reconstruction to address the sparse view reconstruction task, but they still suffer from high training costs and are merely valid under carefully selected perspectives. In this paper, we propose a novel sparse view reconstruction framework that leverages on-surface priors to achieve highly faithful surface reconstruction. Specifically, we design several constraints on global geometry alignment and local geometry refinement for jointly optimizing coarse shapes and fine details. To achieve this, we train a neural network to learn a global implicit field from the on-surface points obtained from SfM and then leverage it as a coarse geometric constraint. To exploit local geometric consistency, we project on-surface points onto seen and unseen views, treating the consistent loss of projected features as a fine geometric constraint. The experimental results with DTU and BlendedMVS datasets in two prevalent sparse settings demonstrate significant improvements over the state-of-the-art methods.	翻訳日:2023-12-25 17:37:13 公開日:2023-12-22
# 部分最適輸送について:シンクホーンの実用性の改善と効率的な勾配法 On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods ( http://arxiv.org/abs/2312.13970v2 ) ライセンス: Link先を確認	Anh Duc Nguyen, Tuan Dung Nguyen, Quang Minh Nguyen, Hoang H. Nguyen, Lam M. Nguyen, Kim-Chuan Toh	(参考訳) 本稿では、最大$n$の非バランスな2つの測度間の部分最適輸送(POT)問題と、色移動やドメイン適応といった様々なAIタスクへの応用について検討する。したがって、アプリケーションの原因となる問題のサイズがますます大きくなるPOTの高速な近似が必要である。我々はまず,ポットに対する最先端のシンクホーンアルゴリズムの非互換な丸め手順による実現不可能性を理論的に実験的に検討し,ポイントクラウド登録のような実世界のアプリケーションにおける質的性能を低下させる。そこで本研究では,POT の新たなラウンドリングアルゴリズムを提案し,計算複雑性を$\mathcal{\widetilde O}(n^2/\varepsilon^4)$に修正した,実行可能な Sinkhorn プロシージャを提案する。丸めアルゴリズムはポット問題を近似する2つの一階法の開発も可能にしている。最初のアルゴリズムであるadaptive primal-dual accelerated gradient descent (apdagd) は、修正されたシンクホーンよりも$\varepsilon$の方が良い$\mathcal{\widetilde o}(n^{2.5}/\varepsilon)$のポット問題に対する$\varepsilon$-approximate solutionを見つける。 2つ目の方法であるDual Extrapolationは、$\mathcal{\widetilde O}(n^2/\varepsilon)$の計算複雑性を実現する。さらに,ポットの柔軟性を標準otと比較し,二つの限界分布が不均衡な実アプリケーションにおけるアルゴリズムの実用性を示す。 This paper studies the Partial Optimal Transport (POT) problem between two unbalanced measures with at most $n$ supports and its applications in various AI tasks such as color transfer or domain adaptation. There is hence the need for fast approximations of POT with increasingly large problem sizes in arising applications. We first theoretically and experimentally investigate the infeasibility of the state-of-the-art Sinkhorn algorithm for POT due to its incompatible rounding procedure, which consequently degrades its qualitative performance in real world applications like point-cloud registration. To this end, we propose a novel rounding algorithm for POT, and then provide a feasible Sinkhorn procedure with a revised computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon^4)$. Our rounding algorithm also permits the development of two first-order methods to approximate the POT problem. The first algorithm, Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD), finds an $\varepsilon$-approximate solution to the POT problem in $\mathcal{\widetilde O}(n^{2.5}/\varepsilon)$, which is better in $\varepsilon$ than revised Sinkhorn. The second method, Dual Extrapolation, achieves the computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon)$, thereby being the best in the literature. We further demonstrate the flexibility of POT compared to standard OT as well as the practicality of our algorithms on real applications where two marginal distributions are unbalanced.	翻訳日:2023-12-25 17:36:52 公開日:2023-12-22
# paint3d: ライティングレステクスチャ拡散モデルによる3dペイント Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models ( http://arxiv.org/abs/2312.13913v2 ) ライセンス: Link先を確認	Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, Gang Yu	(参考訳) 本研究では,テキストや画像の入力に条件付された非テクスチャ3Dメッシュに対して,高分解能,光レス,多彩な2KUVテクスチャマップを作成可能な,粗大かつ微細な生成フレームワークであるPaint3Dを提案する。対処すべき重要な課題は、組み込み照明情報なしで高品質なテクスチャを生成することだ。そこで本手法では,まず,事前学習した深度認識2次元拡散モデルを用いて視条件画像を生成し,マルチビューテクスチャ融合を行い,初期粗いテクスチャマップを生成する。しかし, 2次元モデルでは3次元形状を完全に表現できず, 照明効果が損なわれるため, 粗いテクスチャマップは不完全領域と照明アーチファクトを呈する。これを解決するために,不完全領域の形状認識と照明器具の除去に特化したUV塗装とUVHD拡散モデルを個別に訓練する。この粗いプロセスを通じて、Paint3Dは3Dオブジェクトのテクスチャ化において、セマンティック一貫性を維持しながらセマンティック一貫性を維持する高品質な2KUVテクスチャを生成することができる。 This paper presents Paint3D, a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs. The key challenge addressed is generating high-quality textures without embedded illumination information, which allows the textures to be re-lighted or re-edited within modern graphics pipelines. To achieve this, our method first leverages a pre-trained depth-aware 2D diffusion model to generate view-conditional images and perform multi-view texture fusion, producing an initial coarse texture map. However, as 2D models cannot fully represent 3D shapes and disable lighting effects, the coarse texture map exhibits incomplete areas and illumination artifacts. To resolve this, we train separate UV Inpainting and UVHD diffusion models specialized for the shape-aware refinement of incomplete areas and the removal of illumination artifacts. Through this coarse-to-fine process, Paint3D can produce high-quality 2K UV textures that maintain semantic consistency while being lighting-less, significantly advancing the state-of-the-art in texturing 3D objects.	翻訳日:2023-12-25 17:36:20 公開日:2023-12-22
# AppAgent: スマートフォンユーザとしてのマルチモーダルエージェント AppAgent: Multimodal Agents as Smartphone Users ( http://arxiv.org/abs/2312.13771v2 ) ライセンス: Link先を確認	Chi Zhang and Zhao Yang and Jiaxuan Liu and Yucheng Han and Xin Chen and Zebiao Huang and Bin Fu and Gang Yu	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、複雑なタスクを実行できるインテリジェントエージェントの開発につながっている。本稿では,スマートフォンアプリケーションを操作するための新しいLLMベースのマルチモーダルエージェントフレームワークを提案する。本フレームワークは,タッピングやスワイプなどのヒューマンライクなインタラクションを模倣した,簡易なアクションスペースによるスマートフォンアプリケーションの操作を可能にする。この新しいアプローチは、システムバックエンドアクセスの必要性を回避し、様々なアプリに適用性を広げる。エージェントの機能の中心は、その革新的な学習方法です。エージェントは、自律的な探索または人間のデモを観察することで、ナビゲートと新しいアプリの使用を学習する。このプロセスは、エージェントが異なるアプリケーション間で複雑なタスクを実行するために参照する知識ベースを生成する。エージェントの実用性を実証するため,ソーシャルメディア,メール,地図,ショッピング,高度な画像編集ツールなど10種類のアプリケーションで50以上のタスクを広範囲にテストした。以上の結果から,エージェントの多種多様なハイレベルタスクの処理能力が確認できた。 Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps. Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications. To demonstrate the practicality of our agent, we conducted extensive testing over 50 tasks in 10 different applications, including social media, email, maps, shopping, and sophisticated image editing tools. The results affirm our agent's proficiency in handling a diverse array of high-level tasks.	翻訳日:2023-12-25 17:35:53 公開日:2023-12-22
# NeRFをベースとした色とオパクティを持つガウススメッティング Gaussian Splatting with NeRF-based Color and Opacity ( http://arxiv.org/abs/2312.13729v2 ) ライセンス: Link先を確認	Dawid Malarz, Weronika Smolak, Jacek Tabor, S{\l}awomir Tadeja, Przemys{\l}aw Spurek	(参考訳) neural radiance fields (nerfs) は、3dオブジェクトの複雑さを捉えるためのニューラルネットワークの驚くべき可能性を実証している。ニューラルネットワークの重みの中に形状と色情報をエンコードすることで、NeRFは3Dオブジェクトの驚くほどシャープな新しいビューを生み出すのに優れています。近年, 生成モデルを用いたNeRFの一般化が数多く現れ, その汎用性が高まっている。対照的に、gaussian splatting (gs) はニューラルネットワークを必要とせず、より高速なトレーニングと推論で同様のレンダリング品質を提供する。ガウス分布の集合に3Dオブジェクトに関する情報をエンコードし、古典的メッシュと同様に3Dで描画できる。残念ながら、GSは通常数十万のガウス成分を必要とするため、条件付けが難しい。両モデルの欠点を軽減するために、3Dオブジェクトの形状のGS表現とNeRFによる色と不透明度の符号化を用いたハイブリッドモデルを提案する。我々のモデルは、ガウス分布とトレーニング可能な位置(すなわちガウスの手段)、形状(ガウスの共分散)、色と不透明度、ニューラルネットワークを用いており、ガウス分布と視方向のパラメータを使って色と不透明度の変化を生成する。その結果、3dオブジェクトのシャドウ、光反射、透明性をよりよく記述した。 Neural Radiance Fields (NeRFs) have demonstrated the remarkable potential of neural networks to capture the intricacies of 3D objects. By encoding the shape and color information within neural network weights, NeRFs excel at producing strikingly sharp novel views of 3D objects. Recently, numerous generalizations of NeRFs utilizing generative models have emerged, expanding its versatility. In contrast, Gaussian Splatting (GS) offers a similar renders quality with faster training and inference as it does not need neural networks to work. We encode information about the 3D objects in the set of Gaussian distributions that can be rendered in 3D similarly to classical meshes. Unfortunately, GS are difficult to condition since they usually require circa hundred thousand Gaussian components. To mitigate the caveats of both models, we propose a hybrid model that uses GS representation of the 3D object's shape and NeRF-based encoding of color and opacity. Our model uses Gaussian distributions with trainable positions (i.e. means of Gaussian), shape (i.e. covariance of Gaussian), color and opacity, and neural network, which takes parameters of Gaussian and viewing direction to produce changes in color and opacity. Consequently, our model better describes shadows, light reflections, and transparency of 3D objects.	翻訳日:2023-12-25 17:35:39 公開日:2023-12-22
# 運転シーンに対する弱監督型セマンティックセグメンテーション Weakly Supervised Semantic Segmentation for Driving Scenes ( http://arxiv.org/abs/2312.13646v2 ) ライセンス: Link先を確認	Dongseob Kim, Seungho Lee, Junsuk Choe, Hyunjung Shim	(参考訳) 画像レベルラベルを用いたweakly supervised semantic segmentation(wsss)における最先端技術は、都市景観などの運転シーンデータセットにおいて深刻な性能低下を示す。この課題に対処するため、シーンデータセットの駆動に適した新しいWSSSフレームワークを開発しました。データセットの特徴を広範囲に分析し,提案するベースラインとしてコントラスト言語画像事前学習(CLIP)を用いて擬似マスクを得る。しかし、CLIPは、(1)CLIPの擬似マスクが小さなオブジェクトクラスを表現していないこと、(2)これらのマスクが顕著なノイズを含んでいること、の2つの主要な課題を紹介している。それぞれの問題に対する解決策を次のように提案する。 1)モデルトレーニング中に小規模パッチをシームレスに組み込んだグローバルローカルビュートレーニングを考案し,モデルが運転シーン(例えば交通信号)において小型で重要なオブジェクトを扱う能力を高める。 2)CLIPマスクとセグメンテーション予測の整合性を評価することによって,信頼性と雑音の領域を識別する新しい手法であるCARBを導入する。適応的な損失重み付けによってノイズの多いピクセルよりも信頼性の高いピクセルを優先する。特に,提案手法はCityscapesテストデータセット上で51.8\% mIoUを達成し,シーンデータセットを駆動するWSSSベースラインとしての可能性を示した。 camvidとwilddash2の実験結果は、小規模のデータセットや視覚的に困難な状況でも、さまざまなデータセットにまたがる手法の有効性を示しています。コードはhttps://github.com/k0u-id/CARBで公開されている。 State-of-the-art techniques in weakly-supervised semantic segmentation (WSSS) using image-level labels exhibit severe performance degradation on driving scene datasets such as Cityscapes. To address this challenge, we develop a new WSSS framework tailored to driving scene datasets. Based on extensive analysis of dataset characteristics, we employ Contrastive Language-Image Pre-training (CLIP) as our baseline to obtain pseudo-masks. However, CLIP introduces two key challenges: (1) pseudo-masks from CLIP lack in representing small object classes, and (2) these masks contain notable noise. We propose solutions for each issue as follows. (1) We devise Global-Local View Training that seamlessly incorporates small-scale patches during model training, thereby enhancing the model's capability to handle small-sized yet critical objects in driving scenes (e.g., traffic light). (2) We introduce Consistency-Aware Region Balancing (CARB), a novel technique that discerns reliable and noisy regions through evaluating the consistency between CLIP masks and segmentation predictions. It prioritizes reliable pixels over noisy pixels via adaptive loss weighting. Notably, the proposed method achieves 51.8\% mIoU on the Cityscapes test dataset, showcasing its potential as a strong WSSS baseline on driving scene datasets. Experimental results on CamVid and WildDash2 demonstrate the effectiveness of our method across diverse datasets, even with small-scale datasets or visually challenging conditions. The code is available at https://github.com/k0u-id/CARB.	翻訳日:2023-12-25 17:35:16 公開日:2023-12-22
# 光とマイクロ波フォトニック量子ビットの量子絡み合い Quantum entanglement between optical and microwave photonic qubits ( http://arxiv.org/abs/2312.13559v2 ) ライセンス: Link先を確認	Srujan Meesala, David Lake, Steven Wood, Piero Chiappina, Changchun Zhong, Andrew D. Beyer, Matthew D. Shaw, Liang Jiang, and Oskar Painter	(参考訳) 絡み合いは量子力学の異常な特徴である。絡み合った光子源はベルの不等式を破って量子物理学の基礎をテストするのに不可欠であった。近年、マイクロ波回路と超伝導量子ビットの強い非線形相互作用により、絡み合った多体状態が実現されている。ここでは、光およびマイクロ波フォトニック量子ビットを絡み合うチップスケールの源を示す。我々のデバイスプラットフォームは、圧電オプトメカニカルトランスデューサと、光照射下で頑健な超伝導共振器を統合している。我々は光子対生成過程を駆動し、マイクロ波および光子の絡み合った状態を作成するために、本システムに固有のデュアルレール符号化を用いる。 2つの直交基底におけるマイクロ波および光光子を測定することにより、絡み合う状態の忠実度を低くする。この絡み合い源は、量子通信と計算のための確立された2つのプラットフォームである通信波長のタイムビン量子ビットとghz周波数超伝導量子ビットを直接接続することができる。 Entanglement is an extraordinary feature of quantum mechanics. Sources of entangled optical photons were essential to test the foundations of quantum physics through violations of Bell's inequalities. More recently, entangled many-body states have been realized via strong non-linear interactions in microwave circuits with superconducting qubits. Here we demonstrate a chip-scale source of entangled optical and microwave photonic qubits. Our device platform integrates a piezo-optomechanical transducer with a superconducting resonator which is robust under optical illumination. We drive a photon-pair generation process and employ a dual-rail encoding intrinsic to our system to prepare entangled states of microwave and optical photons. We place a lower bound on the fidelity of the entangled state by measuring microwave and optical photons in two orthogonal bases. This entanglement source can directly interface telecom wavelength time-bin qubits and GHz frequency superconducting qubits, two well-established platforms for quantum communication and computation, respectively.	翻訳日:2023-12-25 17:34:47 公開日:2023-12-22
# 対話型観光計画システムの開発 : 大規模言語モデルを用いた対話ロボットシステム Developing Interactive Tourism Planning: A Dialogue Robot System Powered by a Large Language Model ( http://arxiv.org/abs/2312.13545v2 ) ライセンス: Link先を確認	Katsumasa Yoshikawa and Takato Yamazaki and Masaya Ohagi and Tomoya Mizumoto and Keiya Sato	(参考訳) 近年,大規模言語モデル (LLM) が急速に普及し,対話システムの研究など,様々なタスクに活用されている。我々は, LLMの柔軟な会話能力を活用するだけでなく, 人間の会話負荷を低減し, 旅行を効率的に計画できるシステムを構築することを目指していた。さらに,旅行代理店の複雑なタスクを複数のサブタスクに分割し,それぞれを個別のフェーズとして管理し,効果的にタスクを実現する手法を提案する。提案システムは,対話ロボットコンペティション2023のプリリミナリーラウンドにおいて,第4位に到達し,一定の成功を収めた。競技を通して特定した課題について報告する。 In recent years, large language models (LLMs) have rapidly proliferated and have been utilized in various tasks, including research in dialogue systems. We aimed to construct a system that not only leverages the flexible conversational abilities of LLMs but also their advanced planning capabilities to reduce the speaking load on human interlocutors and efficiently plan trips. Furthermore, we propose a method that divides the complex task of a travel agency into multiple subtasks, managing each as a separate phase to effectively accomplish the task. Our proposed system confirmed a certain level of success by achieving fourth place in the Dialogue Robot Competition 2023 preliminaries rounds. We report on the challenges identified through the competition.	翻訳日:2023-12-25 17:34:31 公開日:2023-12-22
# Zero-1-to-3:3つの診断対象に対する早期学生の1バッチによるドメインレベルのゼロショット認知診断 Zero-1-to-3: Domain-level Zero-shot Cognitive Diagnosis via One Batch of Early-bird Students towards Three Diagnostic Objectives ( http://arxiv.org/abs/2312.13434v2 ) ライセンス: Link先を確認	Weibo Gao, Qi Liu, Hao Wang, Linan Yue, Haoyang Bi, Yin Gu, Fangzhou Yao, Zheng Zhang, Xin Li, Yuanjing He	(参考訳) 認知診断は、記録された実践クイズデータを探索することで、学生の認知状態を推定しようとする。知的教育システムにおけるパーソナライズされた学習指導において重要な役割を果たす。本稿では,新たに立ち上げられたドメインに学生の実践ログがないために生じる,ドメインレベルのゼロショット認知診断(DZCD)という,重要かつ実用的だがしばしば未発見の課題に焦点を当てる。最近のクロスドメイン診断モデルはDZCDにとって有望な戦略であることが示されている。これらの手法は主に、ドメイン間で学生状態を転送する方法に焦点を当てている。しかし、生徒の表現に不注意な情報を組み込むことで、知識伝達の有効性を制限できる。そこで本研究では,早期学習者の3つの診断目的に向けて,ドメインレベルのゼロショット認知診断フレームワークZero-1-to-3を提案する。本手法は, 学生状態をドメイン共有部分とドメイン固有部分に分離する2つの正則化器を用いた診断モデルの事前学習から始める。共有された認知信号は対象領域に転送することができ、新しい領域の認知的事前を豊かにすることにより、認知状態の伝播目標が保証される。その後,早期学習者の行動パターンを解析し,ドメイン適応目標を達成し,冷間開始学生のための模擬実践ログを作成する戦略を考案した。その結果, コールドスタート学生の認知状態は, 仮想データによる診断結果として洗練され, 診断目標と一致した。最後に、実世界の6つのデータセットに対する広範な実験により、DZCDに対する我々のモデルの有効性と、その課題に対する実践的応用を強調した。 Cognitive diagnosis seeks to estimate the cognitive states of students by exploring their logged practice quiz data. It plays a pivotal role in personalized learning guidance within intelligent education systems. In this paper, we focus on an important, practical, yet often underexplored task: domain-level zero-shot cognitive diagnosis (DZCD), which arises due to the absence of student practice logs in newly launched domains. Recent cross-domain diagnostic models have been demonstrated to be a promising strategy for DZCD. These methods primarily focus on how to transfer student states across domains. However, they might inadvertently incorporate non-transferable information into student representations, thereby limiting the efficacy of knowledge transfer. To tackle this, we propose Zero-1-to-3, a domain-level zero-shot cognitive diagnosis framework via one batch of early-bird students towards three diagnostic objectives. Our approach initiates with pre-training a diagnosis model with dual regularizers, which decouples student states into domain-shared and domain-specific parts. The shared cognitive signals can be transferred to the target domain, enriching the cognitive priors for the new domain, which ensures the cognitive state propagation objective. Subsequently, we devise a strategy to generate simulated practice logs for cold-start students through analyzing the behavioral patterns from early-bird students, fulfilling the domain-adaption goal. Consequently, we refine the cognitive states of cold-start students as diagnostic outcomes via virtual data, aligning with the diagnosis-oriented goal. Finally, extensive experiments on six real-world datasets highlight the efficacy of our model for DZCD and its practical application in question recommendation.	翻訳日:2023-12-25 17:34:18 公開日:2023-12-22
# ancilla qubits を伴わない多対数奥行き制御なしゲート Polylogarithmic-depth controlled-NOT gates without ancilla qubits ( http://arxiv.org/abs/2312.13206v2 ) ライセンス: Link先を確認	Baptiste Claudon, Julien Zylberman, C\'esar Feniou, Fabrice Debbasch, Alberto Peruzzo, Jean-Philip Piquemal	(参考訳) 制御された操作は量子アルゴリズムの基本構成要素である。 n$-control-not ゲート(c^n(x)$) を任意のシングルキュービットと cnot ゲートに分解することは、重要ではあるが非自明な作業である。本研究は、漸近的および非漸近的レジームにおいて、従来の方法に匹敵する$c^n(x)$回路を導入する。回路深度$\Theta\left(\log(n)^{\log_2(12)}\right)$、回路深度$\mathcal O \left(\log(n)^{\log_2(12)}\log(1/\epsilon)\right)$、m\leq n$ ancilla qubitsを用いた調整可能な深度回路を持つ正確なもの。その結果生じる指数関数的スピードアップは、量子化学から物理学、ファイナンス、量子機械学習に至るまで、無数の量子アルゴリズムの複雑さを改善することによって、フォールトトレラントな量子コンピューティングに大きな影響を与える可能性がある。 Controlled operations are fundamental building blocks of quantum algorithms. Decomposing $n$-control-NOT gates ($C^n(X)$) into arbitrary single-qubit and CNOT gates, is a crucial but non-trivial task. This study introduces $C^n(X)$ circuits outperforming previous methods in the asymptotic and non-asymptotic regimes. Three distinct decompositions are presented: an exact one using one borrowed ancilla with a circuit depth $\Theta\left(\log(n)^{\log_2(12)}\right)$, an approximating one without ancilla qubits with a circuit depth $\mathcal O \left(\log(n)^{\log_2(12)}\log(1/\epsilon)\right)$ and an exact one with an adjustable-depth circuit using $m\leq n$ ancilla qubits. The resulting exponential speedup is likely to have a substantial impact on fault-tolerant quantum computing by improving the complexities of countless quantum algorithms with applications ranging from quantum chemistry to physics, finance and quantum machine learning.	翻訳日:2023-12-25 17:33:50 公開日:2023-12-22
# MoSAR:微分シェーディングを用いた単眼アバター再構成モデル MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading ( http://arxiv.org/abs/2312.13091v2 ) ライセンス: Link先を確認	Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got, Trevor Anderson, Amin Fadaeinejad, Rafael M. O. Cruz, Marc-Andre Carbonneau	(参考訳) ポートレート画像からアバターを再構築することはマルチメディアに多くの応用があるが、依然として困難な研究課題である。 1つの画像から反射率マップと幾何を抽出することは誤りであり、幾何の復元は1対多のマッピング問題であり、反射率と光の分離は困難である。正確な幾何学と反射率を光段の制御条件下で捉えることはできるが、この方法で大規模なデータセットを取得するにはコストがかかる。さらに、この種のデータのみでのトレーニングは、Wildイメージによる一般化の貧弱につながる。これはモノクロ画像から3Dアバターを生成するMoSARの導入を動機付けている。そこで本研究では,光ステージと地中データセットの両方から学習することで,一般化を向上する半教師付きトレーニング手法を提案する。これは、新しい微分可能なシェーディング式を用いて達成される。提案手法は,本質的な顔パラメータを効果的に切り離し,照らしやすいアバターを生成する。その結果、MoSARはよりリッチな皮膚反射マップを推定し、既存の最先端手法よりも現実的なアバターを生成する。 FFHQ-UV-Intrinsicsという名の新しいデータセットも導入しました。これは10万件の被験者に対して、内在的な顔属性をスケールで提供する最初の公開データセットです。 https://ubisoft-laforge.github.io/character/mosar/ プロジェクトのwebサイトとデータセットは以下のリンクで利用可能である。 Reconstructing an avatar from a portrait image has many applications in multimedia, but remains a challenging research problem. Extracting reflectance maps and geometry from one image is ill-posed: recovering geometry is a one-to-many mapping problem and reflectance and light are difficult to disentangle. Accurate geometry and reflectance can be captured under the controlled conditions of a light stage, but it is costly to acquire large datasets in this fashion. Moreover, training solely with this type of data leads to poor generalization with in-the-wild images. This motivates the introduction of MoSAR, a method for 3D avatar generation from monocular images. We propose a semi-supervised training scheme that improves generalization by learning from both light stage and in-the-wild datasets. This is achieved using a novel differentiable shading formulation. We show that our approach effectively disentangles the intrinsic face parameters, producing relightable avatars. As a result, MoSAR estimates a richer set of skin reflectance maps, and generates more realistic avatars than existing state-of-the-art methods. We also introduce a new dataset, named FFHQ-UV-Intrinsics, the first public dataset providing intrinsic face attributes at scale (diffuse, specular, ambient occlusion and translucency maps) for a total of 10k subjects. The project website and the dataset are available on the following link: https://ubisoft-laforge.github.io/character/mosar/	翻訳日:2023-12-25 17:33:21 公開日:2023-12-22
# DiffPortrait3D:ゼロショットポートレートビュー合成のための制御可能な拡散 DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis ( http://arxiv.org/abs/2312.13016v3 ) ライセンス: Link先を確認	Yuming Gu, You Xie, Hongyi Xu, Guoxian Song, Yichun Shi, Di Chang, Jing Yang, Linjie Luo	(参考訳) 本稿では,DiffPortrait3Dという条件付き拡散モデルについて述べる。具体的には、単一のRGB入力を前提として、アイデンティティと表情の両方を保持する新しいカメラビューから、可塑性だが一貫した顔の詳細を合成することを目的としている。時間を要する最適化と微調整に代えて,ゼロショット方式は,不適切なカメラビュー,極端な表情,多彩な芸術的描写を備えた任意の顔のポートレートにうまく一般化する。その中心となるのが,大規模画像データセットで事前学習した2次元拡散モデルの生成前処理をレンダリングバックボーンとして活用すると同時に,外観とカメラの姿勢の無角な注意制御によって雑音を誘導する手法である。そこで我々はまず,凍結したユニセットの自己注意層に参照画像から外観コンテキストを注入する。そして、レンダリングビューを、同じビューから横断被写体の条件画像を見て、カメラポーズを解釈する新しい条件制御モジュールで操作する。さらに,学習可能なクロスビューアテンションモジュールを挿入することで,新たな3dアウェアノイズ生成プロセスによってさらに強化され,ビュー一貫性が向上する。我々は,本研究の課題であるマルチビュー・イン・ザ・ワイルドベンチマークを質的かつ定量的に評価し,最新結果を実証する。 We present DiffPortrait3D, a conditional diffusion model that is capable of synthesizing 3D-consistent photo-realistic novel views from as few as a single in-the-wild portrait. Specifically, given a single RGB input, we aim to synthesize plausible but consistent facial details rendered from novel camera views with retained both identity and facial expression. In lieu of time-consuming optimization and fine-tuning, our zero-shot method generalizes well to arbitrary face portraits with unposed camera views, extreme facial expressions, and diverse artistic depictions. At its core, we leverage the generative prior of 2D diffusion models pre-trained on large-scale image datasets as our rendering backbone, while the denoising is guided with disentangled attentive control of appearance and camera pose. To achieve this, we first inject the appearance context from the reference image into the self-attention layers of the frozen UNets. The rendering view is then manipulated with a novel conditional control module that interprets the camera pose by watching a condition image of a crossed subject from the same view. Furthermore, we insert a trainable cross-view attention module to enhance view consistency, which is further strengthened with a novel 3D-aware noise generation process during inference. We demonstrate state-of-the-art results both qualitatively and quantitatively on our challenging in-the-wild and multi-view benchmarks.	翻訳日:2023-12-25 17:32:56 公開日:2023-12-22
# 内部状態、制約のない接続、離散的アクティベーションを用いたニューラルネットワークのトレーニング Training Neural Networks with Internal State, Unconstrained Connectivity, and Discrete Activations ( http://arxiv.org/abs/2312.14359v1 ) ライセンス: Link先を確認	Alexander Grushin	(参考訳) 今日の最も強力な機械学習アプローチは、通常、事前に定義されたレイヤと異なるアクティベーション機能を持つステートレスアーキテクチャをトレーニングするために設計されている。これらのアプローチは、自然言語処理や画像認識といった分野で前例のない成功を収める一方で、トレーニングされたモデルは、人間がしないような間違いを犯しやすい。本稿では、真の知性は内部状態を管理するために機械学習モデルの能力を必要とするかもしれないが、そのようなモデルを訓練するための最も効果的なアルゴリズムはまだ発見されていない。我々はさらに、そのようなアルゴリズムは必ずしも深いアーキテクチャ上の勾配降下に基づくものではなく、むしろ、離散的なアクティベーションと、(複数の事前定義された層のような)初期トポロジー的制約の少ないアーキテクチャが最もうまく機能するかもしれないと仮定する。我々は,このような学習アルゴリズムの設計を継続する試みの1つとして,バイナリアクティベーションと重みの行列のみを持つアーキテクチャに適用し,自然言語テキストの有用な表現を生成できるが,大量のトレーニングデータを活用する能力に制限があることを示す。次に、アルゴリズムの改善と、類似したアーキテクチャのための他のトレーニングアルゴリズムを設計するためのアイデアを提供する。最後に,効果的な学習アルゴリズムが見つかると得られる潜在的な利点について議論し,その効果が実際に存在するかどうかを評価する実験を提案する。 Today's most powerful machine learning approaches are typically designed to train stateless architectures with predefined layers and differentiable activation functions. While these approaches have led to unprecedented successes in areas such as natural language processing and image recognition, the trained models are also susceptible to making mistakes that a human would not. In this paper, we take the view that true intelligence may require the ability of a machine learning model to manage internal state, but that we have not yet discovered the most effective algorithms for training such models. We further postulate that such algorithms might not necessarily be based on gradient descent over a deep architecture, but rather, might work best with an architecture that has discrete activations and few initial topological constraints (such as multiple predefined layers). We present one attempt in our ongoing efforts to design such a training algorithm, applied to an architecture with binary activations and only a single matrix of weights, and show that it is able to form useful representations of natural language text, but is also limited in its ability to leverage large quantities of training data. We then provide ideas for improving the algorithm and for designing other training algorithms for similar architectures. Finally, we discuss potential benefits that could be gained if an effective training algorithm is found, and suggest experiments for evaluating whether these benefits exist in practice.	翻訳日:2023-12-25 16:39:37 公開日:2023-12-22
# Kac-Luttingerモデルにおける相互作用するボース気体中のボース・アインシュタイン凝縮について On Bose-Einstein condensation in interacting Bose gases in the Kac-Luttinger model ( http://arxiv.org/abs/2312.14357v1 ) ライセンス: Link先を確認	Chiara Boccato, Joachim Kerner, and Maximilian Pechmann	(参考訳) 2\le d \in \mathbb n$ をゼロ温度で相互作用するボース気体をkac-luttingerモデルとして知られるランダムモデルで研究した。平均場型であるボソン間の対相互作用を選択することで、確率的あるいはほぼ1つの確率的(完全)ボース・アインシュタイン凝縮をハーツリー型汎函数の最小化に証明する。我々は,非相互作用ボース気体のスペクトルギャップに関するalain-sol sznitmanの最近の結果に基づいて,これを達成する。 We study interacting Bose gases of dimensions $2\le d \in \mathbb N$ at zero temperature in a random model known as the Kac-Luttinger model. Choosing the pair-interaction between the bosons to be of a mean-field type, we prove (complete) Bose-Einstein condensation in probability or with probability almost one into the minimizer of a Hartree-type functional. We accomplish this by building upon very recent results by Alain-Sol Sznitman on the spectral gap of the noninteracting Bose gas.	翻訳日:2023-12-25 16:39:15 公開日:2023-12-22
# すべてを信じるな - 大言語モデルにおける幻覚の自動識別による要約解釈可能性の向上 Don't Believe Everything You Read: Enhancing Summarization Interpretability through Automatic Identification of Hallucinations in Large Language Models ( http://arxiv.org/abs/2312.14346v1 ) ライセンス: Link先を確認	Priyesh Vakharia, Devavrat Joshi, Meenal Chavan, Dhananjay Sonawane, Bhrigu Garg, Parsa Mazaheri, Ian Lane	(参考訳) 大規模言語モデル(LLM)は、機械翻訳やテキスト要約といったタスクのテキスト操作に適しています。しかし、これらのモデルは幻覚を引き起こす傾向があり、それはモデルが提供する答えの忠実さを損なう可能性がある。 llmにおける幻覚と闘う最近の研究は、幻覚文の同定と、モデルが幻覚を起こす異なる方法の分類を扱う。本稿では,幻覚に対する LLM の振る舞いを深く掘り下げ,異なる種類の幻覚を識別するためのトークンレベルのアプローチを定義し,さらに,このトークンレベルのタグ付けを用いて対話要約タスクにおける LLM の解釈性と忠実性を改善する。そこで本稿では,新たな拡張データセットと新たなトレーニングパラダイムを提案する。 Large Language Models (LLMs) are adept at text manipulation -- tasks such as machine translation and text summarization. However, these models can also be prone to hallucination, which can be detrimental to the faithfulness of any answers that the model provides. Recent works in combating hallucinations in LLMs deal with identifying hallucinated sentences and categorizing the different ways in which models hallucinate. This paper takes a deep dive into LLM behavior with respect to hallucinations, defines a token-level approach to identifying different kinds of hallucinations, and further utilizes this token-level tagging to improve the interpretability and faithfulness of LLMs in dialogue summarization tasks. Through this, the paper presents a new, enhanced dataset and a new training paradigm.	翻訳日:2023-12-25 16:39:04 公開日:2023-12-22
# logic-scaffolding: llmsを用いたパーソナライズされたアスペクト誘導型推奨説明生成 Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs ( http://arxiv.org/abs/2312.14345v1 ) ライセンス: Link先を確認	Behnam Rahdari, Hao Ding, Ziwei Fan, Yifei Ma, Zhuotong Chen, Anoop Deoras and Branislav Kveton	(参考訳) 自然言語テキスト生成機能のようなLarge Language Models(LLMs)のユニークな能力は、レコメンデーションの説明を提供する強力な候補としてそれらを位置づけている。しかし、LLMのサイズにもかかわらず、既存のモデルのほとんどはゼロショットの説明を確実に作成するのに苦労している。この問題に対処するために、アスペクトベースの説明とチェーン・オブ・思想のアイデアを組み合わせたLogic-Scaffolding(Logic-Scaffolding)というフレームワークを提案する。本稿では,フレームワーク構築の経験を共有し,その結果を探索するためのインタラクティブなデモンストレーションを行う。 The unique capabilities of Large Language Models (LLMs), such as the natural language text generation ability, position them as strong candidates for providing explanation for recommendations. However, despite the size of the LLM, most existing models struggle to produce zero-shot explanations reliably. To address this issue, we propose a framework called Logic-Scaffolding, that combines the ideas of aspect-based explanation and chain-of-thought prompting to generate explanations through intermediate reasoning steps. In this paper, we share our experience in building the framework and present an interactive demonstration for exploring our results.	翻訳日:2023-12-25 16:38:50 公開日:2023-12-22
# 量子多重グラフ状態と多重ハイパーグラフ状態 Quantum multigraph states and multihypergraph states ( http://arxiv.org/abs/2312.14399v1 ) ライセンス: Link先を確認	Xiao-Dong Zhang, Bin-Bin Cai, and Song Lin	(参考訳) 我々は、エッジとハイパーエッジのユニークな操作によって定義される2種類の多粒子交絡状態、マルチグラフ状態とマルチハイパーグラフ状態を提案した。重要な発見は、提案された多重ハイパーグラフ状態と、dが素数であるときの一般化実重み付け状態との1対1対応である。合成 d に対して、多重超グラフ状態は一般化された実重み付け状態の部分集合を形成する。一方,超グラフ状態から実等重み付け状態を構築する方法を詳述し,d-次元超グラフ状態から生成できない一般化実重み付け状態を明らかにする。 We proposed two classes of multiparticle entangled states, the multigraph states and multihypergraph states, defined by unique operations on the edges and hyperedges. A key discovery is the one-to-one correspondence between the proposed multihypergraph states and the generalized real equally weighted states when d is prime. While for composite d, multihypergraph states form a subset of the generalized real equally weighted states. Meanwhile, we detailed a method for constructing real equally weighted states from hypergraph states and revealed the generalized real equally weighted states which cannot be generated from d-dimensional hypergraph states.	翻訳日:2023-12-25 16:28:33 公開日:2023-12-22
# 教師なし深層学習画像検証法 Unsupervised Deep Learning Image Verification Method ( http://arxiv.org/abs/2312.14395v1 ) ライセンス: Link先を確認	Enoch Solomon, Abraham Woubie and Eyael Solomon Emiru	(参考訳) ディープラーニングは一般的に画像認識に使用されるが、通常は大量のラベル付きトレーニングデータが必要である。これにより、最先端の教師なし顔認証技術と比較すると、顕著な性能格差が生じる。本研究では,顔画像ベクトルを新しい表現に変換するオートエンコーダを利用して,このギャップを狭める手法を提案する。特に、オートエンコーダは、元の入力画像ベクトルではなく、隣接する顔画像ベクトルを再構成するように訓練される。これらの隣接顔画像ベクトルは、訓練顔画像ベクトルとの最高コサインスコアに基づいて教師なしプロセスにより選択される。提案手法は,野生(lfw)データセットのラベル付き顔のベースラインシステム上でのeerの相対的改善を56\%達成する。これにより、コサインとPLDAスコアリングシステムのパフォーマンスギャップを狭めることに成功した。 Although deep learning are commonly employed for image recognition, usually huge amount of labeled training data is required, which may not always be readily available. This leads to a noticeable performance disparity when compared to state-of-the-art unsupervised face verification techniques. In this work, we propose a method to narrow this gap by leveraging an autoencoder to convert the face image vector into a novel representation. Notably, the autoencoder is trained to reconstruct neighboring face image vectors rather than the original input image vectors. These neighbor face image vectors are chosen through an unsupervised process based on the highest cosine scores with the training face image vectors. The proposed method achieves a relative improvement of 56\% in terms of EER over the baseline system on Labeled Faces in the Wild (LFW) dataset. This has successfully narrowed down the performance gap between cosine and PLDA scoring systems.	翻訳日:2023-12-25 16:28:22 公開日:2023-12-22
# AdapTraj:マルチエージェント軌道予測のためのマルチソースドメイン一般化フレームワーク AdapTraj: A Multi-Source Domain Generalization Framework for Multi-Agent Trajectory Prediction ( http://arxiv.org/abs/2312.14394v1 ) ライセンス: Link先を確認	Tangwen Qian, Yile Chen, Gao Cong, Yongjun Xu, Fei Wang	(参考訳) 近年,動的システムにおけるオブジェクトの複雑な相互作用をモデル化するための重要な課題として,マルチエージェント軌道予測が注目されている。有望な進歩にもかかわらず、既存の研究はすべて、実際のデプロイメントで遭遇したモデル学習中に観測されたデータ分布が一致しているという仮定に従っている。しかし、本質的な分散シフトが配置環境のモビリティパターンに存在する可能性があり、ドメインの一般化とパフォーマンスの低下に繋がるので、この仮定はしばしば現実には成り立たない。したがって、マルチエージェント軌道予測タスクにおけるそのような不一致を緩和するために、複数のソースドメインの軌跡を利用するのが望ましい。しかし,本課題におけるマルチソース領域一般化の開発は,(1)負の伝達,(2)外部要因のモデリングが不十分な2つの課題を提起している。これらの課題に対処するために、焦点エージェントと隣接エージェントの両方に対して、ドメイン不変およびドメイン固有の4種類の特徴を明示的にモデル化する新しい因果式を提案する。新たな定式化に基づいて,マルチエージェント軌道予測に特化したマルチソースドメイン一般化フレームワークadaptrajを提案する。 adaptrajは様々なモデルに適応可能なプラグアンドプレイモジュールとして機能する。異なるドメインを持つ4つのデータセットに対する大規模な実験は、AdapTrajが他のベースラインをかなり上回っていることを示している。 Multi-agent trajectory prediction, as a critical task in modeling complex interactions of objects in dynamic systems, has attracted significant research attention in recent years. Despite the promising advances, existing studies all follow the assumption that data distribution observed during model learning matches that encountered in real-world deployments. However, this assumption often does not hold in practice, as inherent distribution shifts might exist in the mobility patterns for deployment environments, thus leading to poor domain generalization and performance degradation. Consequently, it is appealing to leverage trajectories from multiple source domains to mitigate such discrepancies for multi-agent trajectory prediction task. However, the development of multi-source domain generalization in this task presents two notable issues: (1) negative transfer; (2) inadequate modeling for external factors. To address these issues, we propose a new causal formulation to explicitly model four types of features: domain-invariant and domain-specific features for both the focal agent and neighboring agents. Building upon the new formulation, we propose AdapTraj, a multi-source domain generalization framework specifically tailored for multi-agent trajectory prediction. AdapTraj serves as a plug-and-play module that is adaptable to a variety of models. Extensive experiments on four datasets with different domains demonstrate that AdapTraj consistently outperforms other baselines by a substantial margin.	翻訳日:2023-12-25 16:28:10 公開日:2023-12-22
# 平面符号による二項符号の結合 Concatenating Binomial Codes with the Planar Code ( http://arxiv.org/abs/2312.14390v1 ) ライセンス: Link先を確認	Juliette Soule, Andrew C. Doherty, Arne L. Grimsmo	(参考訳) 回転対称ボソニック符号は、量子ビットを振動子の自由度に、特に超伝導量子ビット実験において魅力的な符号化である。これらのコードはかなりの損失と強調を許容するが、大規模なデバイスを達成するためには、より高いレベルのコードと組み合わせる必要がある。耐故障性量子計算のための計測に基づくスキームにおいて,これらの符号と平面符号の整合性を検討する。我々は, 基本レベルの符号化として二項符号に着目し, 各種計測プロトコルにおいて損失を受ける符号化の破断点を推定する。これらの符号は光子損失に耐性があるが、ゲート操作や測定には平均光子数と高位相分解能が必要である。二項符号量子ビットを用いた平面符号の性能を得るには,適応位相計測,最大ラピッド量子状態推定,重み付き最小重み復号を実装する必要がある。 Rotation symmetric bosonic codes are are an attractive encoding for qubits into oscillator degrees of freedom, particularly in superconducting qubit experiments. While these codes can tolerate considerable loss and dephasing, they will need to be combined with higher level codes to achieve large-scale devices. We investigate concatenating these codes with the planar code in a measurement-based scheme for fault-tolerant quantum computation. We focus on binomial codes as the base level encoding, and estimate break-even points for such encodings under loss for various types of measurement protocol. These codes are more resistant to photon loss errors, but require both higher mean photon numbers and higher phase resolution for gate operations and measurements. We find that it is necessary to implement adaptive phase measurements, maximum likelihood quantum state inference, and weighted minimum weight decoding to obtain good performance for a planar code using binomial code qubits.	翻訳日:2023-12-25 16:27:48 公開日:2023-12-22
# StyleRetoucher:GANプリミティブによる一般的なポートレートイメージのリタッチ StyleRetoucher: Generalized Portrait Image Retouching with GAN Priors ( http://arxiv.org/abs/2312.14389v1 ) ライセンス: Link先を確認	Wanchao Su, Can Wang, Chen Liu, Hangzhou Han, Hongbo Fu, Jing Liao	(参考訳) ポートレート画像の微調整は、プロのアーティストにとっても退屈で時間がかかります。自動リタッチは存在するが、過度にスムースなアーティファクトに悩まされるか、一般化能力に欠ける。そこで本研究では,styleganの生成と一般化を活かし,顔の細部を保ちつつ入力画像の肌状態を改善するための,新しい自動ポートレート画像リタッチフレームワークであるstyleretoucherを提案する。事前訓練したStyleGANの先行性から,本手法はより優れた堅牢性を示す。 a)。少ないトレーニングサンプルで安定して実行し b)。ドメイン外のデータでうまく一般化する。さらに,入力画像の空間的特徴とStyleGAN層の中間特徴を混合することにより,入力特性を最大に保持する。さらに,スキンブレミッシュを効果的に識別し除去し,画像皮膚状態を改善する新しいブレミッシュ認識特徴選択機構を提案する。定性的かつ定量的な評価は,本手法の大きな一般化能力を検証する。さらなる実験により、styleretoucherはイメージリタッチタスクの代替ソリューションよりも優れたパフォーマンスを示している。また,既存手法よりも優れた修正性能を確認するために,利用者の意識調査を実施している。 Creating fine-retouched portrait images is tedious and time-consuming even for professional artists. There exist automatic retouching methods, but they either suffer from over-smoothing artifacts or lack generalization ability. To address such issues, we present StyleRetoucher, a novel automatic portrait image retouching framework, leveraging StyleGAN's generation and generalization ability to improve an input portrait image's skin condition while preserving its facial details. Harnessing the priors of pretrained StyleGAN, our method shows superior robustness: a). performing stably with fewer training samples and b). generalizing well on the out-domain data. Moreover, by blending the spatial features of the input image and intermediate features of the StyleGAN layers, our method preserves the input characteristics to the largest extent. We further propose a novel blemish-aware feature selection mechanism to effectively identify and remove the skin blemishes, improving the image skin condition. Qualitative and quantitative evaluations validate the great generalization capability of our method. Further experiments show StyleRetoucher's superior performance to the alternative solutions in the image retouching task. We also conduct a user perceptive study to confirm the superior retouching performance of our method over the existing state-of-the-art alternatives.	翻訳日:2023-12-25 16:27:32 公開日:2023-12-22
# 対話型画像セグメンテーションのための可変非感性および目標保存マスク微細化 Variance-insensitive and Target-preserving Mask Refinement for Interactive Image Segmentation ( http://arxiv.org/abs/2312.14387v1 ) ライセンス: Link先を確認	Chaowei Fang, Ziyin Zhou, Junye Chen, Hanjing Su, Qingyao Wu, Guanbin Li	(参考訳) ポイントベースのインタラクティブな画像セグメンテーションは、セマンティックセグメンテーションや画像編集といったアプリケーションにおけるマスクアノテーションの負担を軽減できる。しかし,ユーザ入力を限定したターゲットマスクの完全抽出は依然として困難である。本稿では,ユーザ入力の少ないセグメンテーション品質を向上する新しい手法である可変無感・ターゲット保存マスクリファインメントを提案する。初期マスクとしての最後のセグメンテーション結果については、初期マスクを継続的に強化する反復精錬工程が一般的である。それにもかかわらず、従来の手法は初期マスクのばらつきに敏感である。この問題を回避するため,提案手法では,異なる種類の初期マスクからの一貫した推論を保証するマスクマッチングアルゴリズムを組み込んだ。また,ターゲット認識型ズームアルゴリズムを導入し,ダウンサンプリング時のオブジェクト情報保存,効率のバランス,正確性について述べる。 GrabCut、バークレー、SBD、DAVISデータセットの実験は、インタラクティブな画像セグメンテーションにおける我々の手法の最先端性能を実証している。 Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing. However, fully extracting the target mask with limited user inputs remains challenging. We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs. Regarding the last segmentation result as the initial mask, an iterative refinement process is commonly employed to continually enhance the initial mask. Nevertheless, conventional techniques suffer from sensitivity to the variance in the initial mask. To circumvent this problem, our proposed method incorporates a mask matching algorithm for ensuring consistent inferences from different types of initial masks. We also introduce a target-aware zooming algorithm to preserve object information during downsampling, balancing efficiency and accuracy. Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.	翻訳日:2023-12-25 16:27:12 公開日:2023-12-22
# LLMを超えた生成AI:マルチモーダル生成のシステム意味 Generative AI Beyond LLMs: System Implications of Multi-Modal Generation ( http://arxiv.org/abs/2312.14385v1 ) ライセンス: Link先を確認	Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer, Yejin Lee, Zachary DeVito, Jeff Johnson, Gu-Yeon Wei, David Brooks, Carole-Jean Wu	(参考訳) 大規模な生成AIモデルの開発がテキスト(1D)生成を超えて進化し、画像(2D)とビデオ(3D)生成を含むようになると、空間的および時間的情報の処理は品質、パフォーマンス、効率に固有の課題をもたらす。本稿では,マルチモーダルテキスト・ツー・イメージ(TTI)とテキスト・ツー・ビデオ(TTV)生成モデルに対する新しいシステム設計空間の理解に向けた最初の取り組みを示す。現在のモデルアーキテクチャ設計は、拡散モデルとトランスフォーマーモデルという2つのカテゴリに分けられる。 8種類のTTI/TTVモデルの系統的性能評価では,Flash Attentionのような最先端の最適化手法を適用した後,ConvolutionはDiffusionベースのTTIモデルの実行時間の最大44%を占め,Linear層はTransformerベースのモデルの実行時間の最大49%を消費している。また,Diffusion ベースの TTI モデルは LLM 推論の Prefill 段階に似ており,Decode フェーズに類似した Transformer ベースの TTI モデルよりも Flash Attention の 1.1-2.5 倍の高速化が期待できる。 LLM向けに設計された最適化は、直接TTI/TTVモデルにマッピングされないため、新たな最適化機会を得るために、これらのワークロードを徹底的に評価する必要がある。このようにして、TTI/TTVモデルの文脈でシーケンス長を定義し、拡散モデル推論において、シーケンス長は最大4倍まで変化する。さらに,ttvワークロードの時間的側面がユニークなシステムボトルネックをもたらし,時間的注意が全体の注意時間の60%以上を占めることを観察した。全体として、当社のシステムパフォーマンス評価は、新たなTTI/TTVワークロードのために効率的でデプロイ可能なシステムを設計するための重要な第一歩です。 As the development of large-scale Generative AI models evolve beyond text (1D) generation to include image (2D) and video (3D) generation, processing spatial and temporal information presents unique challenges to quality, performance, and efficiency. We present the first work towards understanding this new system design space for multi-modal text-to-image (TTI) and text-to-video (TTV) generation models. Current model architecture designs are bifurcated into 2 categories: Diffusion- and Transformer-based models. Our systematic performance characterization on a suite of eight representative TTI/TTV models shows that after state-of-the-art optimization techniques such as Flash Attention are applied, Convolution accounts for up to 44% of execution time for Diffusion-based TTI models, while Linear layers consume up to 49% of execution time for Transformer-based models. We additionally observe that Diffusion-based TTI models resemble the Prefill stage of LLM inference, and benefit from 1.1-2.5x greater speedup from Flash Attention than Transformer-based TTI models that resemble the Decode phase. Since optimizations designed for LLMs do not map directly onto TTI/TTV models, we must conduct a thorough characterization of these workloads to gain insights for new optimization opportunities. In doing so, we define sequence length in the context of TTI/TTV models and observe sequence length can vary up to 4x in Diffusion model inference. We additionally observe temporal aspects of TTV workloads pose unique system bottlenecks, with Temporal Attention accounting for over 60% of total Attention time. Overall, our in-depth system performance characterization is a critical first step towards designing efficient and deployable systems for emerging TTI/TTV workloads.	翻訳日:2023-12-25 16:26:57 公開日:2023-12-22
# van der Waals位相強磁性金属テルル化鉄の破壊反転対称性 Broken inversion symmetry in van der Waals topological ferromagnetic metal iron germanium telluride ( http://arxiv.org/abs/2312.14384v1 ) ライセンス: Link先を確認	Kai-Xuan Zhang, Hwiin Ju, Hyuncheol Kim, Jingyuan Cui, Jihoon Keum, Je-Geun Park, and Jong Seok Lee	(参考訳) 反転対称性の破れは多くの量子効果にとって重要であり、次世代のスピントロニクスにとって重要なスピン軌道トルクの基礎である。近年, トポロジカルファンダーワールス (vdW) マグネット鉄テルル化物に, 新規な内在性スピン軌道トルクが確立されている。しかし、層間反転対称性の破れに関する明確な証拠がないため、謎のままである。本稿では,第2高調波発生法(SHG)により直接測定されたテルル化鉄鉄の破壊反転対称性の証拠を報告する。結晶対称性は、中心対称P63/mmcから非中心対称極性P3m1空間群へと減少し、3次元SHGパターンが支配的な外面偏光を与えることを示す。さらに、SHG反応は、主にランダムな欠陥から順序付けられたFe空孔への移行によって、Fe欠乏の増加に伴って、等方パターンから鋭い3倍対称性へと進化する。このようなSHG応答は温度に対して堅牢であり、強磁性遷移温度より上および下方の未変化結晶対称性を保証する。これらの発見は、この興味深いvdw金属、鉄ゲルマニウムテルライド:バンドトポロジー、固有スピン軌道トルク、位相的vdw極性状態の理解に重要な新しい情報を与える。 Inversion symmetry breaking is critical for many quantum effects and fundamental for spin-orbit torque, which is crucial for next-generation spintronics. Recently, a novel type of gigantic intrinsic spin-orbit torque has been established in the topological van-der-Waals (vdW) magnet iron germanium telluride. However, it remains a puzzle because no clear evidence exists for interlayer inversion symmetry breaking. Here, we report the definitive evidence of broken inversion symmetry in iron germanium telluride directly measured by the second harmonic generation (SHG) technique. Our data show that the crystal symmetry reduces from centrosymmetric P63/mmc to noncentrosymmetric polar P3m1 space group, giving the three-fold SHG pattern with dominant out-of-plane polarization. Additionally, the SHG response evolves from an isotropic pattern to a sharp three-fold symmetry upon increasing Fe deficiency, mainly due to the transition from random defects to ordered Fe vacancies. Such SHG response is robust against temperature, ensuring unaltered crystalline symmetries above and below the ferromagnetic transition temperature. These findings add crucial new information to our understanding of this interesting vdW metal, iron germanium telluride: band topology, intrinsic spin-orbit torque and topological vdW polar metal states.	翻訳日:2023-12-25 16:26:25 公開日:2023-12-22
# 可視性透かし除去のための干渉除去とコンテンツ回収 Removing Interference and Recovering Content Imaginatively for Visible Watermark Removal ( http://arxiv.org/abs/2312.14383v1 ) ライセンス: Link先を確認	Yicheng Leng, Chaowei Fang, Gen Li, Yixiang Fang, Guanbin Li	(参考訳) 可視的な透かしは、画像の著作権を保護するのに役立ちますが、基礎となるコンテンツを歪め、シーンの解釈や画像編集といった作業を複雑にします。目に見える透かし除去は、透かしの干渉をなくし、背景コンテンツを復元することを目的としている。しかし, 従来の手法では, 同一枝内で透かし成分の除去や背景復元を行う場合が多く, 予測や背景が曖昧な場合の無視において, 透かしの残差が生じている。これらの制約に対処するために、Removing Interference and Recovering Content Imaginatively (RIRCI)フレームワークを紹介した。 rirciは2段階のアプローチを具現化している: 最初のフェーズはウォーターマークのコンポーネントの識別と分離に集中し、次のフェーズは背景コンテンツの復元に焦点を当てている。本モデルでは, 半透明な透かしの下の固有背景情報と, 影響のない地域からの周辺環境情報を完全に探索できる2経路ネットワークを用いた。さらに、グローバルおよびローカルなコンテキストインタラクションモジュールは、背景復元フェーズにおける包括的な表現モデリングのための多層パーセプトロンと双方向特徴変換の上に構築される。提案手法の有効性は2つの大規模データセットで実証的に検証され,既存の透かし除去技術よりも顕著に向上した。 Visible watermarks, while instrumental in protecting image copyrights, frequently distort the underlying content, complicating tasks like scene interpretation and image editing. Visible watermark removal aims to eliminate the interference of watermarks and restore the background content. However, existing methods often implement watermark component removal and background restoration tasks within a singular branch, leading to residual watermarks in the predictions and ignoring cases where watermarks heavily obscure the background. To address these limitations, this study introduces the Removing Interference and Recovering Content Imaginatively (RIRCI) framework. RIRCI embodies a two-stage approach: the initial phase centers on discerning and segregating the watermark component, while the subsequent phase focuses on background content restoration. To achieve meticulous background restoration, our proposed model employs a dual-path network capable of fully exploring the intrinsic background information beneath semi-transparent watermarks and peripheral contextual information from unaffected regions. Moreover, a Global and Local Context Interaction module is built upon multi-layer perceptrons and bidirectional feature transformation for comprehensive representation modeling in the background restoration phase. The efficacy of our approach is empirically validated across two large-scale datasets, and our findings reveal a marked enhancement over existing watermark removal techniques.	翻訳日:2023-12-25 16:26:02 公開日:2023-12-22
# インドシアニングリーンの共振強化2光子吸収断面積に関する実験的検討 Experimental Upper Bounds for Resonance-Enhanced Entangled Two-Photon Absorption Cross Section of Indocyanine Green ( http://arxiv.org/abs/2312.14382v1 ) ライセンス: Link先を確認	Manni He, Bryce P. Hickam, Nathan Harper, Scott K. Cushing	(参考訳) 共振中間状態は、絡み合った2光子吸収(ETPA)の効率を高めるために提案されている。共鳴励起etpa(r-etpa)は、明るい真空を用いて原子系で証明されているが、有機分子では研究されていない。有機分子色素であるインドシアニングリーン (ICG) において, 近赤外光子に励起された光子によって初めてr-ETPAが検出された。報告されている多くの仮想状態媒介ETPA(v-ETPA)測定と同様に、r-ETPA信号は測定されず、クロスセクションの試験上界は6 \times 10^{-23}$ cm$^2$/moleculeである。さらに、ICGの800nmにおける古典的共鳴励起2光子吸収(r-TPA)断面積を初めて20(\pm13)=GMと測定し、共鳴中間状態を持つことはICGの2光子過程を著しく向上させるものではないことを示唆した。絡み合った光子によって励起されたICGの分光分解発光シグネチャもこの結論を支持する。 Resonant intermediate states have been proposed to increase the efficiency of entangled two-photon absorption (ETPA). Although resonance-enhanced ETPA (r-ETPA) has been demonstrated in atomic systems using bright squeezed vacuum, it has not been studied in organic molecules. We investigate for the first time r-ETPA in an organic molecular dye, indocyanine green (ICG), when excited by broadband entangled photons in near-IR. Similar to many reported virtual state mediated ETPA (v-ETPA) measurements, no r-ETPA signals are measured, with an experimental upper bound for the cross section placed at $6 \times 10^{-23}$ cm$^2$/molecule. In addition, the classical resonance-enhanced two-photon absorption (r-TPA) cross section of ICG at 800 nm is measured for the first time to be $20(\pm13)$ GM, suggesting that having a resonant intermediate state does not significantly enhance two-photon processes in ICG. The spectrotemporally resolved emission signatures of ICG excited by entangled photons are also presented to support this conclusion.	翻訳日:2023-12-25 16:25:37 公開日:2023-12-22
# 投影軌道正規化による連合学習 Federated Learning with Projected Trajectory Regularization ( http://arxiv.org/abs/2312.14380v1 ) ライセンス: Link先を確認	Tiejin Chen, Yuanpu Cao, Yujia Wang, Cho-Jui Hsieh, Jinghui Chen	(参考訳) フェデレーション学習は、ローカルデータを共有せずに、分散クライアントから機械学習モデルの共同トレーニングを可能にする。フェデレーション学習における1つの重要な課題は、非識別的に分散したデータをクライアント間で処理することで、モデルトレーニングのパフォーマンスが低下する。この一連の研究は、主に最終段階のグローバルモデルパラメータ/勾配や過去のモデルパラメータ/勾配の線形結合の利用に焦点を当てており、モデル訓練軌道からのグローバル情報の可能性を完全に活用していない。本稿では、データ不均一性問題に対処するための予測軌道正規化(FedPTR)を備えた新しいフェデレーション学習フレームワークを提案する。具体的には、ローカルクライアントやサーバが、最近のモデル更新の学習ダイナミクスを模倣した補助(合成)データセットを最適化し、それを、ローカルトレーニングの正規化のために次のステップモデル軌道を投影する。非凸確率的設定下で提案手法の厳密な理論解析を行い,不均質なデータ分布下での収束性を検証する。各種ベンチマークデータセットと非i.d.設定の実験により,提案フレームワークの有効性が検証された。 Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients, which leads to deteriorated model training performances. Prior works in this line of research mainly focus on utilizing last-step global model parameters/gradients or the linear combinations of the past model parameters/gradients, which do not fully exploit the potential of global information from the model training trajectory. In this paper, we propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data heterogeneity issue, which proposes a unique way to better extract the essential global information from the model training trajectory. Specifically, FedPTR allows local clients or the server to optimize an auxiliary (synthetic) dataset that mimics the learning dynamics of the recent model update and utilizes it to project the next-step model trajectory for local training regularization. We conduct rigorous theoretical analysis for our proposed framework under nonconvex stochastic settings to verify its fast convergence under heterogeneous data distributions. Experiments on various benchmark datasets and non-i.i.d. settings validate the effectiveness of our proposed framework.	翻訳日:2023-12-25 16:25:14 公開日:2023-12-22
# 音声認識と音声イベント分類の改善を目的としたマルチモーダルアテンションマージ Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification ( http://arxiv.org/abs/2312.14378v1 ) ライセンス: Link先を確認	Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu	(参考訳) ラベルなしデータに対する自己教師付き目標を用いた大規模基礎モデルのトレーニングと下流タスクの微調整が標準手順として登場している。残念ながら、このアプローチの有効性は、制限された微調整計算とラベル付き下流データの不足によって制約されることが多い。マルチモーダル・アテンション・マージング(MAM)は、高リソース・モダリティ・テキスト・画像に根ざしたモデルの注意行列から、ゼロショット・パラダイムを用いたリソース制約領域・音声・音声への直接的な知識伝達を容易にする試みである。 MAMは、自動音声認識(ASR)モデルの相対的な単語誤り率(WER)を最大6.70%削減し、オーディオイベント分類(AEC)モデルの相対的な分類誤差を10.63%削減する。データ/計算が利用可能である場合、注意行列をマージするためのデータ駆動アプローチであるLearnerable-MAMを提示し、その結果、ASRのWERがさらに2.90%減少し、AECの18.42%が微調整に比べて減少する結果となった。 Training large foundation models using self-supervised objectives on unlabeled data, followed by fine-tuning on downstream tasks, has emerged as a standard procedure. Unfortunately, the efficacy of this approach is often constrained by both limited fine-tuning compute and scarcity in labeled downstream data. We introduce Multimodal Attention Merging (MAM), an attempt that facilitates direct knowledge transfer from attention matrices of models rooted in high resource modalities, text and images, to those in resource-constrained domains, speech and audio, employing a zero-shot paradigm. MAM reduces the relative Word Error Rate (WER) of an Automatic Speech Recognition (ASR) model by up to 6.70%, and relative classification error of an Audio Event Classification (AEC) model by 10.63%. In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2.90% relative reduction in WER for ASR and 18.42% relative reduction in AEC compared to fine-tuning.	翻訳日:2023-12-25 16:24:52 公開日:2023-12-22
# マルチエージェント軌道予測のための社会時間グラフの学習 Learning Socio-Temporal Graphs for Multi-Agent Trajectory Prediction ( http://arxiv.org/abs/2312.14373v1 ) ライセンス: Link先を確認	Yuke Li, Lixiong Chen, Guangyi Chen, Ching-Yao Chan, Kun Zhang, Stefano Anzellotti, Donglai Wei	(参考訳) 歩行者の軌道を正確に予測するためには、他の歩行者との社会的・時間的相互作用を常に考慮しなければならない。関連する情報を分離、部分的、または暗黙的に表現する既存の作業とは異なり、それを完全かつ明示的に捉えて分析するための完全な表現を提案する。特に, 社会時間グラフ (stg) と呼ぶ有向非循環グラフ構造を導入し, 空間と時間にまたがる集団間の対方向の社会時間相互作用を明示的に捉えた。我々のモデルは、STGの構造を決定する潜在変数を持つ時間変化生成プロセスに基づいて構築される。軌道予測のためのSTGの構造を学習するためのエンドツーエンドパイプラインを提供するSTGformerというアテンションベースモデルを設計する。提案手法は,2つの大規模ベンチマークデータセットにおいて,最先端の予測精度を実現する。本分析は, 過去の軌跡が, 他人の進路を予測する上で重要であることを示す。我々のモデルは、社会時間的局所性の強い概念でこの関係を学習する。統計学は、この情報を明示的に予測するために利用すると、軌道のみのアプローチに対して顕著な性能向上が得られることを示した。 In order to predict a pedestrian's trajectory in a crowd accurately, one has to take into account her/his underlying socio-temporal interactions with other pedestrians consistently. Unlike existing work that represents the relevant information separately, partially, or implicitly, we propose a complete representation for it to be fully and explicitly captured and analyzed. In particular, we introduce a Directed Acyclic Graph-based structure, which we term Socio-Temporal Graph (STG), to explicitly capture pair-wise socio-temporal interactions among a group of people across both space and time. Our model is built on a time-varying generative process, whose latent variables determine the structure of the STGs. We design an attention-based model named STGformer that affords an end-to-end pipeline to learn the structure of the STGs for trajectory prediction. Our solution achieves overall state-of-the-art prediction accuracy in two large-scale benchmark datasets. Our analysis shows that a person's past trajectory is critical for predicting another person's future path. Our model learns this relationship with a strong notion of socio-temporal localities. Statistics show that utilizing this information explicitly for prediction yields a noticeable performance gain with respect to the trajectory-only approaches.	翻訳日:2023-12-25 16:24:24 公開日:2023-12-22
# KamLAND-Zenシミュレーションのための生成モデル Generative Models for Simulation of KamLAND-Zen ( http://arxiv.org/abs/2312.14372v1 ) ライセンス: Link先を確認	Z. Fu, C. Grant, D. M. Krawiec, A. Li, L. Winslow	(参考訳) ニュートリノのない二重ベータ崩壊(0{\nu}\b{eta}\b{eta})の次の世代の探索は、ニュートリノの性質と宇宙の物質-反物質非対称性の源についての深い疑問に答えるものである。年1トンの同位体が1トン当たり1件未満の事象率を観測する。発見を主張するには、0{\nu}\b{eta}\b{eta}を模倣する検出器事象の正確かつ効率的なシミュレーションが重要である。伝統的なモンテカルロシミュレーションは機械学習に基づく生成モデルによって補うことができる。本研究では,KamLANDのようなモノリシック液体シンチレータ検出器向けに設計された生成モデルの性能について述べる。低レベルの機能を復元し、補間を行う能力を示す。将来、これらの生成モデルの結果は、高品質な豊富な生成データを提供することで、イベントの分類と背景拒絶を改善するのに使うことができる。 The next generation of searches for neutrinoless double beta decay (0{\nu}\b{eta}\b{eta}) are poised to answer deep questions on the nature of neutrinos and the source of the Universe's matter-antimatter asymmetry. They will be looking for event rates of less than one event per ton of instrumented isotope per year. To claim discovery, accurate and efficient simulations of detector events that mimic 0{\nu}\b{eta}\b{eta} is critical. Traditional Monte Carlo (MC) simulations can be supplemented by machine-learning-based generative models. In this work, we describe the performance of generative models designed for monolithic liquid scintillator detectors like KamLAND to produce highly accurate simulation data without a predefined physics model. We demonstrate its ability to recover low-level features and perform interpolation. In the future, the results of these generative models can be used to improve event classification and background rejection by providing high-quality abundant generated data.	翻訳日:2023-12-25 16:24:05 公開日:2023-12-22
# 合成データを用いた学習のための品質多様性生成サンプリング Quality-Diversity Generative Sampling for Learning with Synthetic Data ( http://arxiv.org/abs/2312.14369v1 ) ライセンス: Link先を確認	Allen Chang, Matthew C. Fontaine, Serena Booth, Maja J. Matari\'c, Stefanos Nikolaidis	(参考訳) 生成モデルは、合成トレーニングデータセットを作成することによって、実際のデータソースのサロゲートとして機能することができる。合成トレーニングデータセットを生成する際の品質と多様性の保護に注力する。バイアス発生器から得られるデータにもかかわらず、ユーザ定義測度空間を均一にサンプリングするフレームワークである品質多様性生成サンプリング(QDGS)を提案する。 qdgsはモデルに依存しないフレームワークで、生成モデルを微調整することなく、合成によって生成されたデータの多様性の尺度で品質目標を最適化する。 qdgsが生成するバランスのとれた合成データセットを用いて,まず,カラーバイアス形状データセットで学習した識別器を概念実証としてデバイアスする。顔データ合成にqdgを適用することで、肌の色調や年齢といった所望の意味概念を駆使して、視覚特徴のブレンドを組み合わせた交叉データセットを作成する。このバランスの取れたデータを分類器のトレーニングに利用することで、顔認識ベンチマークの精度を維持しながら公平性が向上する。 https://github.com/Cylumn/qd-generative-sampling Generative models can serve as surrogates for some real data sources by creating synthetic training datasets, but in doing so they may transfer biases to downstream tasks. We focus on protecting quality and diversity when generating synthetic training datasets. We propose quality-diversity generative sampling (QDGS), a framework for sampling data uniformly across a user-defined measure space, despite the data coming from a biased generator. QDGS is a model-agnostic framework that uses prompt guidance to optimize a quality objective across measures of diversity for synthetically generated data, without fine-tuning the generative model. Using balanced synthetic datasets generated by QDGS, we first debias classifiers trained on color-biased shape datasets as a proof-of-concept. By applying QDGS to facial data synthesis, we prompt for desired semantic concepts, such as skin tone and age, to create an intersectional dataset with a combined blend of visual features. Leveraging this balanced data for training classifiers improves fairness while maintaining accuracy on facial recognition benchmarks. Code available at: https://github.com/Cylumn/qd-generative-sampling	翻訳日:2023-12-25 16:23:48 公開日:2023-12-22
# GROOD:補間多様体における勾配認識外分布検出 GROOD: GRadient-aware Out-Of-Distribution detection in interpolated manifolds ( http://arxiv.org/abs/2312.14427v1 ) ライセンス: Link先を確認	Mostafa ElAraby, Sabyasachi Sahoo, Yann Pequignot, Paul Novello, Liam Paull	(参考訳) ディープニューラルネットワーク(DNN)は、オフ・オブ・ディストリビューション(OOD)サンプルの過信予測でサイレントに失敗することが多く、現実のデプロイメントにおいてリスクを生じさせる。既存の手法は主にDNNパラメータに関して計算される特徴表現空間や勾配ノルムを強調するが、それらは複雑な勾配分布と分類領域のトポロジを見落としている。このギャップに対処するために, 勾配空間の識別力に依存する新しい枠組みである補間多様体 (grood) における勾配認識外分布検出を導入する。このスペースを構築するために、GROODはOODの特徴を特に捉えるプロトタイプとともに、クラスプロトタイプに依存している。この手法はDNNの初期中間層において,IDとOODサンプル間の勾配空間の分離を改良するために,目的とする混合演算を取り入れている。トレーニングセットから最も近い隣接勾配までの距離を用いてOOD検出の有効性を定量化し,より堅牢なOODスコアを得た。実験的評価は、ターゲット入力混合の導入が勾配空間におけるIDとOODの分離を増幅し、多様なデータセット間で印象的な結果をもたらすことを裏付けるものである。特に、ImageNet-1kに対してベンチマークすると、GROODは最先端のベースラインの確立した堅牢性を上回る。本研究により,画像分類におけるDNNのOOD検出を向上するために,勾配空間とクラスプロトタイプを利用する方法を確立した。 Deep neural networks (DNNs) often fail silently with over-confident predictions on out-of-distribution (OOD) samples, posing risks in real-world deployments. Existing techniques predominantly emphasize either the feature representation space or the gradient norms computed with respect to DNN parameters, yet they overlook the intricate gradient distribution and the topology of classification regions. To address this gap, we introduce GRadient-aware Out-Of-Distribution detection in interpolated manifolds (GROOD), a novel framework that relies on the discriminative power of gradient space to distinguish between in-distribution (ID) and OOD samples. To build this space, GROOD relies on class prototypes together with a prototype that specifically captures OOD characteristics. Uniquely, our approach incorporates a targeted mix-up operation at an early intermediate layer of the DNN to refine the separation of gradient spaces between ID and OOD samples. We quantify OOD detection efficacy using the distance to the nearest neighbor gradients derived from the training set, yielding a robust OOD score. Experimental evaluations substantiate that the introduction of targeted input mix-upamplifies the separation between ID and OOD in the gradient space, yielding impressive results across diverse datasets. Notably, when benchmarked against ImageNet-1k, GROOD surpasses the established robustness of state-of-the-art baselines. Through this work, we establish the utility of leveraging gradient spaces and class prototypes for enhanced OOD detection for DNN in image classification.	翻訳日:2023-12-25 16:17:05 公開日:2023-12-22
# Room Occupency Prediction: 機械学習のパワーと時間的洞察を探る Room Occupancy Prediction: Exploring the Power of Machine Learning and Temporal Insights ( http://arxiv.org/abs/2312.14426v1 ) ライセンス: Link先を確認	Siqi Mao, Yaping Yuan, Yinpu Li, Ziren Wang, Yuanxin Yao, Yixin Kang	(参考訳) 建物の省エネルギーは温室効果ガス排出対策や気候変動対策において最重要課題である。照明制御や気候調整といった行動を伴う部屋の効率の良い管理は、エネルギー消費を削減するための重要な戦略である。監視技術が実現できない状況では、部屋の占有率を推定するために非侵入センサーが使用される。本研究では,ランダムフォレストが連続的に最も高い予測精度を達成し,多様な機械学習モデルを用いた部屋占有率予測フレームワークを提案する。特にこのデータセットは、時間次元と空間次元の両方を包含し、豊富な情報を明らかにする。興味深いことに、我々のフレームワークは明示的な時間的モデリングがなくても堅牢な性能を示す。これらの発見は、従来の機械学習モデルの顕著な予測力を強調している。この成功は、特徴冗長性の存在、線形空間パターンと時間パターンの単純さ、高周波データサンプリングの利点に起因する。これらの結果は説得力があるが、時間次元を明示的にモデル化することで深い洞察を解き放ち、特定のシナリオにおける予測能力をさらに高める可能性があることには、オープンにしておくことが不可欠である。まとめると,本研究は,連続的および分類的タスクに対する予測フレームワークの有効性を検証するだけでなく,時間的側面の包含による改善の可能性も強調する。この研究は、エネルギー効率のよいプラクティスと部屋の占有管理を形作る機械学習の約束を強調している。 Energy conservation in buildings is a paramount concern to combat greenhouse gas emissions and combat climate change. The efficient management of room occupancy, involving actions like lighting control and climate adjustment, is a pivotal strategy to curtail energy consumption. In contexts where surveillance technology isn't viable, non-intrusive sensors are employed to estimate room occupancy. In this study, we present a predictive framework for room occupancy that leverages a diverse set of machine learning models, with Random Forest consistently achieving the highest predictive accuracy. Notably, this dataset encompasses both temporal and spatial dimensions, revealing a wealth of information. Intriguingly, our framework demonstrates robust performance even in the absence of explicit temporal modeling. These findings underscore the remarkable predictive power of traditional machine learning models. The success can be attributed to the presence of feature redundancy, the simplicity of linear spatial and temporal patterns, and the advantages of high-frequency data sampling. While these results are compelling, it's essential to remain open to the possibility that explicitly modeling the temporal dimension could unlock deeper insights or further enhance predictive capabilities in specific scenarios. In summary, our research not only validates the effectiveness of our prediction framework for continuous and classification tasks but also underscores the potential for improvements through the inclusion of temporal aspects. The study highlights the promise of machine learning in shaping energy-efficient practices and room occupancy management.	翻訳日:2023-12-25 16:16:38 公開日:2023-12-22
# ロジスティカル・ファンハウスにおける損失--合成メディア企業としての投機的デザイン Lost in the Logistical Funhouse: Speculative Design as Synthetic Media Enterprise ( http://arxiv.org/abs/2312.14424v1 ) ライセンス: Link先を確認	Zoe Horn, Liam Magee, Anna Munster	(参考訳) ウォルマートなどの企業による調達交渉機関としてのチャットボットの展開から、オーバーブックされたフライトを管理するための「差別化されたチャット」を提供する自律エージェントに至るまで、合成メディアはロジスティクスの世界を「自然」な環境にしている。ここでは、商品、部品、労働の協調が問題を設計し、「ソリューション」を合成できるトレーニングセットを作成する。しかし、MidJourneyやOpenAIといったプロトプラットフォームや、Eleven LabsやD:IDといったアプリを通じて、合成メディアはどの程度まで、ロジスティックメディアとして理解されるのか? 本稿では,GPTをベースとしたロジスティクスデザインビジネス開発を支援するボットChatFOSを用いた合成メディア実験について述べる。素早い生成メディア出力を用いて、ロジカルワールド内のAIの出現する機能のシミュレーションとパロディを組み立てる。この過程では,大規模言語モデルがメディアルータやスイッチとなり,画像プロンプト,Webサイトコード,プロモーションコピー,投資家ピッチシナリオの生成を管理する過程が説明される。これらの要素は、企業ウェブサイトやプロモーションビデオなどのメディアアンサンブルにチェーン化され、当社が「設立」した架空の物流視覚化会社を刺激します。 ChatFOSを介して投機的シナリオを創出するプロセスと方法により,合成メディアをロジスティックメディアとして再配置する方法について考察する。我々の実験は、メディアのロジスティクスとメディアのロジスティクスの展開の仕方を探るものである。現代計算と資本の政治と美学について、ロジスティクスと合成メディリティの両面から(実践ベースで)具体的に説明できることは何だろうか? From the deployment of chatbots as procurement negotiators by corporations such as Walmart to autonomous agents providing 'differentiated chat' for managing overbooked flights, synthetic media are making the world of logistics their 'natural' habitat. Here the coordination of commodities, parts and labour design the problems and produce the training sets from which 'solutions' can be synthesised. But to what extent might synthetic media, surfacing via proto-platforms such as MidJourney and OpenAI and apps such as Eleven Labs and D:ID, be understood as logistical media? This paper details synthetic media experiments with 'ChatFOS', a GPT-based bot tasked with developing a logistics design business. Using its prompt-generated media outputs, we assemble a simulation and parody of AI's emerging functionalities within logistical worlds. In the process, and with clunky 'human-in-the-loop' stitching, we illustrate how large language models become media routers or switches, governing production of image prompts, website code, promotional copy, and investor pitch scenarios. Together these elements become links chained together in media ensembles such as the corporate website or the promotional video, fuelling the fictive logistics visualisation company we have 'founded'. The processes and methods of producing speculative scenarios via ChatFOS lead us to consider how synthetic media might be re-positioned as logistical media. Our experiments probe the ways in which the media of logistics and the logistics of media are increasingly enfolded. We ask: what can a (practice-based) articulation of this double-becoming of logistics and synthetic mediality tell us about the politics and aesthetics of contemporary computation and capital?	翻訳日:2023-12-25 16:16:18 公開日:2023-12-22
# 機械生成指示の有効性 Efficacy of Machine-Generated Instructions ( http://arxiv.org/abs/2312.14423v1 ) ライセンス: Link先を確認	Samaksh Gulati and Anshit Verma and Manoj Parmar and Palash Chaudhary	(参考訳) 大きな"インストラクションチューニング"言語モデル(命令に応答するために微調整された)は、ゼロショットを新しいタスクに一般化する驚くべき能力を示している。それでも、それらはしばしば量、多様性、創造性に制限される人間による命令データに大きく依存しているため、チューニングされたモデルの一般化を妨げる。我々は,機械生成アノテーションの有効性を定量的に検討し,細調整されたBERTモデルと人間のv/s機械生成アノテーションとの比較を行った。我々の手法をバニラGPT-3モデルに適用すると、機械が生成したアノテーションは78.54%正確であり、微調整されたモデルは、人間のラベル付きアノテーションと比較して96.01%の性能を達成した。この結果は、マシン生成アノテーションがリソースであり、ダウンストリームモデルを微調整するコスト効率のよい方法であることを示している。 Large "instruction-tuned" language models (i.e., finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is often limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We conducted a quantitative study to figure out the efficacy of machine-generated annotations, where we compare the results of a fine-tuned BERT model with human v/s machine-generated annotations. Applying our methods to the vanilla GPT-3 model, we saw that machine generated annotations were 78.54% correct and the fine-tuned model achieved a 96.01% model performance compared to the performance with human-labelled annotations. This result shows that machine-generated annotations are a resource and cost effective way to fine-tune down-stream models.	翻訳日:2023-12-25 16:15:44 公開日:2023-12-22
# base-equivalent concept-relevance を用いた動作可能な形式的概念識別の強化 Enhancing Actionable Formal Concept Identification with Base-Equivalent Conceptual-Relevance ( http://arxiv.org/abs/2312.14421v1 ) ライセンス: Link先を確認	Ayao Bobi, Rokia Missaoui and Mohamed Hamza Ibrahim	(参考訳) 知識発見アプリケーションでは、データから生成されたパターンは極めて大きく、アナリストによる探索は困難である。形式的概念分析(FCA)フレームワークでは、安定性指標やその他の品質指標を通じて重要な形式的概念を特定する研究が行われている。本稿では,行動可能な概念の識別を改善するための新しい概念関連性指標であるBase-Equivalent Conceptual Relevance(BECR)スコアを紹介する。概念的観点からは、基本属性と等価属性は意味のある情報と見なされ、概念の概念的構造を維持するために非常に不可欠である。したがって、BECRの基本的な考え方は、より基本的で等価な属性と概念意図が持つ最小のジェネレータがより関連性が高いことである。そのため、BECRはこれらの属性と最小限のジェネレータを概念意図ごとに定量化する。合成および実世界のデータセットに関する予備実験は、よく知られた安定性指標と比較してBECRの効率を示す。 In knowledge discovery applications, the pattern set generated from data can be tremendously large and hard to explore by analysts. In the Formal Concept Analysis (FCA) framework, there have been studies to identify important formal concepts through the stability index and other quality measures. In this paper, we introduce the Base-Equivalent Conceptual Relevance (BECR) score, a novel conceptual relevance interestingness measure for improving the identification of actionable concepts. From a conceptual perspective, the base and equivalent attributes are considered meaningful information and are highly essential to maintain the conceptual structure of concepts. Thus, the basic idea of BECR is that the more base and equivalent attributes and minimal generators a concept intent has, the more relevant it is. As such, BECR quantifies these attributes and minimal generators per concept intent. Our preliminary experiments on synthetic and real-world datasets show the efficiency of BECR compared to the well-known stability index.	翻訳日:2023-12-25 16:15:28 公開日:2023-12-22
# 目標測度拡散写像のシャープな誤差推定とコミッタ問題への応用 Sharp error estimates for target measure diffusion maps with applications to the committor problem ( http://arxiv.org/abs/2312.14418v1 ) ライセンス: Link先を確認	Shashank Sule, Luke Evans and Maria Cameron	(参考訳) 重要サンプリングを特徴とする拡散マップの変種である目標測度拡散マップ(tmdmap,banisch et al. 2020)の一貫性エラーに対する漸近的に鋭い誤差推定を行い,任意の密度から入力データを抽出できるようにした。導出誤差推定にはバイアス誤差と分散誤差が含まれる。結果として得られる収束率はグラフラプラシアンの近似理論と一致する。結果の重要新しさは、先行項上のすべての前因子の明示的な定量化にある。また, TMDmapを用いて得られたディリクレBVPの解に対して, 解誤差が整合誤差によって制御されることを示す。これらの結果を用いて,遷移経路理論(tpt)の枠組みを用いた過減衰ランジュバン力学が制御する系における希少事象の解析におけるtmdmapの応用について検討した。 TPTの礎石成分はコミッタ問題の解であり、コルモゴロフ PDE に対する境界値問題である。注目すべきことに、TMDmapアルゴリズムは、プレファクタ式におけるいくつかのエラー項のキャンセルによるコミッタ問題に対するメッシュレス解法として特に適している。さらに, 準均一サンプリング密度を用いた場合, バイアスおよび分散誤差の顕著な改善が生じる。 TMDmapアルゴリズムの空間的均一な入力として$\delta$-netsを使用することで,これらの精度の向上が実現可能であることを示す。 We obtain asymptotically sharp error estimates for the consistency error of the Target Measure Diffusion map (TMDmap) (Banisch et al. 2020), a variant of diffusion maps featuring importance sampling and hence allowing input data drawn from an arbitrary density. The derived error estimates include the bias error and the variance error. The resulting convergence rates are consistent with the approximation theory of graph Laplacians. The key novelty of our results lies in the explicit quantification of all the prefactors on leading-order terms. We also prove an error estimate for solutions of Dirichlet BVPs obtained using TMDmap, showing that the solution error is controlled by consistency error. We use these results to study an important application of TMDmap in the analysis of rare events in systems governed by overdamped Langevin dynamics using the framework of transition path theory (TPT). The cornerstone ingredient of TPT is the solution of the committor problem, a boundary value problem for the backward Kolmogorov PDE. Remarkably, we find that the TMDmap algorithm is particularly suited as a meshless solver to the committor problem due to the cancellation of several error terms in the prefactor formula. Furthermore, significant improvements in bias and variance errors occur when using a quasi-uniform sampling density. Our numerical experiments show that these improvements in accuracy are realizable in practice when using $\delta$-nets as spatially uniform inputs to the TMDmap algorithm.	翻訳日:2023-12-25 16:15:13 公開日:2023-12-22
# パラメトリック駆動非線形共振器の臨界量子幾何テンソル Critical quantum geometric tensors of parametrically-driven nonlinear resonators ( http://arxiv.org/abs/2312.14414v1 ) ライセンス: Link先を確認	Hao-Long Zhang, Jia-Hao Lv, Ken Chen, Xue-Jia Yu, Fan Wu, Zhen-Biao Yang, and Shi-Biao Zheng	(参考訳) パラメトリック駆動非線形共振器は、フォールトトレラント量子計算を実現するためのビルディングブロックであり、臨界量子センシングに有用である。基本的な観点からすると、そのような系の最も興味深い特徴はおそらく、他の量子系との相互作用なしに生じる臨界現象である。固有スペクトルの非解析的挙動は実質的に研究されているが、基底状態波動関数に関連するものはほとんど未調査のままである。量子基底状態幾何学的テンソルを指標として、駆動パラメータ $\varepsilon$ と phase $\phi$ を含む位相図を包括的に確立する。その結果、$\varepsilon$の増加に伴い、システムは通常から超ラジアント相への量子相転移を行い、臨界点は$\phi$の影響を受けないことが明らかとなった。さらに, 臨界指数とスケーリング次元は, 従来の作業と一致した, 厳密な数値的手法によって求めた。その結果、位相遷移は量子ラビモデルの普遍性クラスに含まれることがわかった。この研究は、量子計量とベリー曲率が量子相転移の様々な挙動を示すことを示した。 Parametrically driven nonlinear resonators represent a building block for realizing fault-tolerant quantum computation and are useful for critical quantum sensing. From a fundamental viewpoint, the most intriguing feature of such a system is perhaps the critical phenomena, which can occur without interaction with any other quantum system. The non-analytic behaviors of its eigenspectrum have been substantially investigated, but those associated with the ground state wavefunction have largely remained unexplored. Using the quantum ground state geometric tensor as an indicator, we comprehensively establish a phase diagram involving the driving parameter $\varepsilon$ and phase $\phi$. The results reveal that with the increase in $\varepsilon$, the system undergoes a quantum phase transition from the normal to the superradiant phase, with the critical point unaffected by $\phi$. Furthermore, the critical exponent and scaling dimension are obtained by an exact numerical method, which is consistent with previous works. Our numerical results show that the phase transition falls within the universality class of the quantum Rabi model. This work reveals that the quantum metric and Berry curvature display diverging behaviors across the quantum phase transition.	翻訳日:2023-12-25 16:14:48 公開日:2023-12-22
# マルチモーダル歩行認識のための多段適応型特徴融合ニューラルネットワーク A Multi-Stage Adaptive Feature Fusion Neural Network for Multimodal Gait Recognition ( http://arxiv.org/abs/2312.14410v1 ) ライセンス: Link先を確認	Shinan Zou and Jianbo Xiong and Chao Fan and Shiqi Yu and Jin Tang	(参考訳) 歩行認識は生体計測技術であり、広く注目を集めている。多くの既存の歩行認識アルゴリズムは単調であり、少数のマルチモーダル歩行認識アルゴリズムは一度だけマルチモーダル融合を行う。これらのアルゴリズムは、複数のモダリティの相補的な利点を完全に活用することができない。本稿では,歩行データの時間的・空間的特性を考慮して,特徴抽出過程において異なる段階のマルチモーダル融合を行う多段特徴融合戦略(msffs)を提案する。また,シルエットと骨格のセマンティックな関連性を考慮したAFFM(Adaptive Feature fusion Module)を提案する。融合プロセスは、より関連する骨格関節と異なるシルエット領域を融合する。歩行時間における視覚的外見の変化と時間経過が共起しているため,空間時空間特徴抽出器(MSSTFE)を提案する。特に、MSSTFEは異なる空間スケールで時空間リンク情報を抽出し集約する。上記の戦略とモジュールを組み合わせることで,多段階適応機能融合(MSAFF)ニューラルネットワークを提案する。さらに、MSAFFは特徴次元プーリング(FDプール)を備えており、精度を損なうことなく歩行表現の寸法を大幅に削減することができる。 https://github.com/ShinanZou/MSAFF Gait recognition is a biometric technology that has received extensive attention. Most existing gait recognition algorithms are unimodal, and a few multimodal gait recognition algorithms perform multimodal fusion only once. None of these algorithms may fully exploit the complementary advantages of the multiple modalities. In this paper, by considering the temporal and spatial characteristics of gait data, we propose a multi-stage feature fusion strategy (MSFFS), which performs multimodal fusions at different stages in the feature extraction process. Also, we propose an adaptive feature fusion module (AFFM) that considers the semantic association between silhouettes and skeletons. The fusion process fuses different silhouette areas with their more related skeleton joints. Since visual appearance changes and time passage co-occur in a gait period, we propose a multiscale spatial-temporal feature extractor (MSSTFE) to learn the spatial-temporal linkage features thoroughly. Specifically, MSSTFE extracts and aggregates spatial-temporal linkages information at different spatial scales. Combining the strategy and modules mentioned above, we propose a multi-stage adaptive feature fusion (MSAFF) neural network, which shows state-of-the-art performance in many experiments on three datasets. Besides, MSAFF is equipped with feature dimensional pooling (FD Pooling), which can significantly reduce the dimension of the gait representations without hindering the accuracy. https://github.com/ShinanZou/MSAFF	翻訳日:2023-12-25 16:14:26 公開日:2023-12-22
# サービス効率と平等のバランスをとるための拡張p中間問題 Extended p-median problems for balancing service efficiency and equality ( http://arxiv.org/abs/2312.14408v1 ) ライセンス: Link先を確認	Yunfeng Kong, Chenchen Lian, Guangli Zhang, Shiyan Zhai	(参考訳) この記事では、サービス効率と平等のバランスをとるためのロケーション問題を取り上げます。公共サービスシステムでは、他のサービスにアクセスするのに長い旅行距離が必要な場合、うらやましい思いをする人もいます。走行距離をサービス施設としきい値距離と比較することにより、エンビーの強度を測定することができる。サービス効率と等価性の間のトレードオフに関して,全エンビー関数を用いて4つの拡張p中間問題を提案する。新しい問題の5つの分析的性質は数学的に証明されている。新しい問題は、よく設計された3つのインスタンスでテストされた。実験により,旅行コストと空間的エンビーを最小化することにより,標準偏差,平均絶対偏差,旅行距離間のジーニ係数などの等式を著しく改善できることを示した。また, 施設数の観点からサービス供給が提供される場合, 走行距離をわずかに増加させることでサービス平等性を大幅に向上させることができることを示した。施設数でサービス供給量が増えると、サービス効率と空間平等の両方を著しく向上させることができる。 This article deals with the location problem for balancing the service efficiency and equality. In public service systems, some people may feel envy in case that they need longer travel distance to access services than others. The strength of the envy can be measured by comparing one's travel distance to service facility with a threshold distance. Using the total envy function, four extended p-median problems are proposed for trade-off between service efficiency and equality. Five analytical properties of the new problems are mathematically proven. The new problems were tested on three sets of well-designed instances. The experimentation shows that the equality measures, such as the standard deviation, the mean absolute deviation, and the Gini coefficient between travel distances, can be substantially improved by minimizing the travel cost and the spatial envy. The experimentation also shows that, when the service supply is given in terms of the number of facilities, the service equality can be considerably improved by slightly increasing the travel distance. When the service supply is increased in terms of the number of facilities, both the service efficiency and spatial equality can be significantly improved.	翻訳日:2023-12-25 16:14:04 公開日:2023-12-22
# advcloak:プライバシー保護のためにカスタマイズされたadversarial cloak AdvCloak: Customized Adversarial Cloak for Privacy Protection ( http://arxiv.org/abs/2312.14407v1 ) ライセンス: Link先を確認	Xuannan Liu and Yaoyao Zhong and Xing Cui and Yuhang Zhang and Peipei Li and Weihong Deng	(参考訳) ソーシャルメディアで広範な顔画像が共有されているため、プライバシーに関する懸念が顕著に高まっている。本稿では,生成モデルを用いたプライバシー保護のための革新的なフレームワークであるAdvCloakを提案する。 AdvCloakは、機能レベルの一般化機能を提供しながら、優れた画像レベルの自然性を維持することができる、クラスワイドの対向マスクを自動でカスタマイズするように設計されている。具体的には、AdvCloakは、2段階のトレーニング戦略を用いて、生成する敵ネットワークを逐次最適化する。この戦略は、最初は、イメージ固有のトレーニングを通じて、個々の顔にマスクを適応させることに焦点を当て、続いて、特徴レベルの一般化能力を、個人固有のトレーニングを通じて、個人の顔の多様なバリエーションに拡張する。限られたトレーニングデータを完全に活用するために,AdvCloakと幾何的モデリング手法を組み合わせることで,情報源の特徴部分空間をより正確に記述する。 AdvCloakが既存の最先端の手法よりも効率と有効性で優れていることを示す。 With extensive face images being shared on social media, there has been a notable escalation in privacy concerns. In this paper, we propose AdvCloak, an innovative framework for privacy protection using generative models. AdvCloak is designed to automatically customize class-wise adversarial masks that can maintain superior image-level naturalness while providing enhanced feature-level generalization ability. Specifically, AdvCloak sequentially optimizes the generative adversarial networks by employing a two-stage training strategy. This strategy initially focuses on adapting the masks to the unique individual faces via image-specific training and then enhances their feature-level generalization ability to diverse facial variations of individuals via person-specific training. To fully utilize the limited training data, we combine AdvCloak with several general geometric modeling methods, to better describe the feature subspace of source identities. Extensive quantitative and qualitative evaluations on both common and celebrity datasets demonstrate that AdvCloak outperforms existing state-of-the-art methods in terms of efficiency and effectiveness.	翻訳日:2023-12-25 16:13:47 公開日:2023-12-22
# スケールにおける生成的事前学習: フラッド検出のためのトランザクショナル・ビヘイビアの符号化 Generative Pretraining at Scale: Transformer-Based Encoding of Transactional Behavior for Fraud Detection ( http://arxiv.org/abs/2312.14406v1 ) ライセンス: Link先を確認	Ze Yu Zhao (1), Zheng Zhu (1), Guilin Li (1), Wenhan Wang (1), Bo Wang (1) ((1) Tencent, WeChat Pay)	(参考訳) 本稿では,支払いシステムにおける不正検出に適したgpt(generative pretrained transformer)アーキテクチャを活用した,革新的な自己回帰モデルを提案する。本手法は,トークン爆発に対して革新的に対処し,行動シーケンスを再構築し,時間的および文脈的分析によるトランザクション動作の微妙な理解を提供する。教師なし事前トレーニングを利用することで,ラベル付きデータを必要とせず,特徴表現に優れる。さらに,中国最大のオンライン決済業者のセキュリティと有効性を促進し,異常検出を強化するための差分畳み込みアプローチを統合する。我々のモデルのスケーラビリティと適応性は、様々なトランザクションコンテキストにおける幅広い適用性を約束します。 In this work, we introduce an innovative autoregressive model leveraging Generative Pretrained Transformer (GPT) architectures, tailored for fraud detection in payment systems. Our approach innovatively confronts token explosion and reconstructs behavioral sequences, providing a nuanced understanding of transactional behavior through temporal and contextual analysis. Utilizing unsupervised pretraining, our model excels in feature representation without the need for labeled data. Additionally, we integrate a differential convolutional approach to enhance anomaly detection, bolstering the security and efficacy of one of the largest online payment merchants in China. The scalability and adaptability of our model promise broad applicability in various transactional contexts.	翻訳日:2023-12-25 16:13:28 公開日:2023-12-22
# グラフ注意に基づくアナログ回路の対称性制約抽出 Graph Attention-Based Symmetry Constraint Extraction for Analog Circuits ( http://arxiv.org/abs/2312.14405v1 ) ライセンス: Link先を確認	Qi Xu, Lijie Wang, Jing Wang, Song Chen, Lin Cheng, Yi Kang	(参考訳) 近年、アナログ回路は広く注目され、多くの新興アプリケーションで広く利用されている。アナログ回路の高需要は、より短い回路設計サイクルを必要とする。所望のパフォーマンスと仕様を達成するためには、アナログレイアウトプロセス中に様々な幾何学的対称性の制約を慎重に考慮する必要がある。しかし、経験豊富なアナログエンジニアによるこれらの制約の手動ラベリングは、手間と時間がかかるプロセスである。本稿では,アナログ回路レイアウトにおける対称制約を自動的に抽出するグラフベースの学習フレームワークを提案する。提案フレームワークは,回路の接続特性とデバイス情報を利用して対称制約の一般的な規則を学習し,回路網上のデバイスレベルの制約を効果的に抽出する。実験結果は,最先端の対称制約検出手法と比較して,高い精度と低い偽陽性率を実現することを実証した。 In recent years, analog circuits have received extensive attention and are widely used in many emerging applications. The high demand for analog circuits necessitates shorter circuit design cycles. To achieve the desired performance and specifications, various geometrical symmetry constraints must be carefully considered during the analog layout process. However, the manual labeling of these constraints by experienced analog engineers is a laborious and time-consuming process. To handle the costly runtime issue, we propose a graph-based learning framework to automatically extract symmetric constraints in analog circuit layout. The proposed framework leverages the connection characteristics of circuits and the devices'information to learn the general rules of symmetric constraints, which effectively facilitates the extraction of device-level constraints on circuit netlists. The experimental results demonstrate that compared to state-of-the-art symmetric constraint detection approaches, our framework achieves higher accuracy and lower false positive rate.	翻訳日:2023-12-25 16:13:16 公開日:2023-12-22
# クロスコヴァリエートな歩行認識:ベンチマーク Cross-Covariate Gait Recognition: A Benchmark ( http://arxiv.org/abs/2312.14404v1 ) ライセンス: Link先を確認	Shinan Zou and Chao Fan and Jianbo Xiong and Chuanfu Shen and Shiqi Yu and Jin Tang	(参考訳) 歩行データセットは歩行研究に不可欠である。しかし,本研究では,従来の制約付きデータセットや新興実世界のデータセットが,共変量多様性に関して不足していることを示す。このギャップを埋めるため、私たちは、CCGRデータセットの収集に20ヶ月の懸命な努力を払っています。 CCGRデータセットには970人の被験者と約1.6万のシーケンスがあり、ほぼすべての被験者は33のビューと53の異なる共変体を持っている。既存のデータセットと比較すると、CCGRは個体数と個体レベルの多様性の両方を持っている。さらに、ビューとコ変数はよくラベル付けされ、異なる要因の影響を分析することができる。 CCGRは、RGB、パース、シルエット、ポーズなど、さまざまな種類の歩行データを提供し、研究者に探索のための包括的なリソースを提供する。本稿では,新たに提案する解析データを用いて,多変量歩行認識に深く取り組むために,解析に基づく歩行認識(parsinggait)を提案する。我々は広範な実験を行った。私たちの主な結果は以下のとおりです。 1) 歩行認識の実用的応用において, クロスコヴァリエートが重要な課題として出現する。 2)ParsingGaitは,さらなる進歩の可能性を示す。 3)既存のSOTA法はCCGRで43%未満の精度を達成し,クロスコバルト歩行認識の緊急性を強調した。リンク: https://github.com/shinanzou/ccgr。 Gait datasets are essential for gait research. However, this paper observes that present benchmarks, whether conventional constrained or emerging real-world datasets, fall short regarding covariate diversity. To bridge this gap, we undertake an arduous 20-month effort to collect a cross-covariate gait recognition (CCGR) dataset. The CCGR dataset has 970 subjects and about 1.6 million sequences; almost every subject has 33 views and 53 different covariates. Compared to existing datasets, CCGR has both population and individual-level diversity. In addition, the views and covariates are well labeled, enabling the analysis of the effects of different factors. CCGR provides multiple types of gait data, including RGB, parsing, silhouette, and pose, offering researchers a comprehensive resource for exploration. In order to delve deeper into addressing cross-covariate gait recognition, we propose parsing-based gait recognition (ParsingGait) by utilizing the newly proposed parsing data. We have conducted extensive experiments. Our main results show: 1) Cross-covariate emerges as a pivotal challenge for practical applications of gait recognition. 2) ParsingGait demonstrates remarkable potential for further advancement. 3) Alarmingly, existing SOTA methods achieve less than 43% accuracy on the CCGR, highlighting the urgency of exploring cross-covariate gait recognition. Link: https://github.com/ShinanZou/CCGR.	翻訳日:2023-12-25 16:13:03 公開日:2023-12-22
# フェアネスフェア:人間の認識を集団的意思決定に持ち込む The Fairness Fair: Bringing Human Perception into Collective Decision-Making ( http://arxiv.org/abs/2312.14402v1 ) ライセンス: Link先を確認	Hadi Hosseini	(参考訳) 公正は集団意思決定において最も望ましい社会的原則の1つである。過去数十年間、その公理的性質について広く研究され、アルゴリズム決定における理論的・計算的な側面から、近年、マルチエージェントシステムコミュニティからかなりの注目を集めている。しかし、これらの研究はしばしば、現実世界の問題の曖昧な性質における人間の公正性に対する認識の複雑さを捉えるのに十分ではない。我々は、公正な解決策は、社会的プランナー(設計者)によって望ましいものとみなすだけでなく、人間と社会的認知によって支配され、人間の判断に基づいて認識された結果が検討され、検証可能であるべきであると論じる。この目標を達成するには、コンピューティングやAIから行動経済学、人間とAIの相互作用まで幅広い学際的なアプローチが必要である。その際,現在のフェア・ディビジョン文学の欠点と長期的な課題を特定し,最近の取り組みを解説し,さらに重要なこととして,一連のオープン・リサーチの方向性を強調する。 Fairness is one of the most desirable societal principles in collective decision-making. It has been extensively studied in the past decades for its axiomatic properties and has received substantial attention from the multiagent systems community in recent years for its theoretical and computational aspects in algorithmic decision-making. However, these studies are often not sufficiently rich to capture the intricacies of human perception of fairness in the ambivalent nature of the real-world problems. We argue that not only fair solutions should be deemed desirable by social planners (designers), but they should be governed by human and societal cognition, consider perceived outcomes based on human judgement, and be verifiable. We discuss how achieving this goal requires a broad transdisciplinary approach ranging from computing and AI to behavioral economics and human-AI interaction. In doing so, we identify shortcomings and long-term challenges of the current literature of fair division, describe recent efforts in addressing them, and more importantly, highlight a series of open research directions.	翻訳日:2023-12-25 16:12:39 公開日:2023-12-22
# CLIPのバックボーン効果の解明 : 表現の相乗効果と変異 Unveiling Backbone Effects in CLIP: Exploring Representational Synergies and Variances ( http://arxiv.org/abs/2312.14400v1 ) ライセンス: Link先を確認	Cristian Rodriguez-Opazo and Edison Marrese-Taylor and Ehsan Abbasnejad and Hamed Damirchi and Ignacio M. Jara and Felipe Bravo-Marquez and Anton van den Hengel	(参考訳) コントラスト言語-画像事前学習(CLIP)は画像表現学習において顕著な手法である。ビジョントランスフォーマー(ViT)やResNetsのような畳み込みネットワーク(ConvNet)といったトランスフォーマーベースのモデルにまたがるさまざまなニューラルネットワークは、CLIPでトレーニングされ、さまざまなビジョンタスクにわたって普遍的なバックボーンとして機能する。同じデータとトレーニング目標を活用しているにも関わらず、これらのアーキテクチャによって学習される表現の有効性は重要な疑問を提起する。本研究は,これらのバックボーンアーキテクチャ間のクリップ性能の違いを調査し,その分類の相違を明らかにした。特に、これらの表現の正規化は、かなりの性能変化をもたらす。その結果,適切なバックボーンの選択により20%以上の改善が期待できるバックボーン予測の相乗効果が顕著に示された。さらに,複数のバックボーンからの予測を組み合わせれば,最大6.34倍の性能向上が期待できる,単純かつ効果的な手法を提案する。結果を再現するためのコードをリリースします。 Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning. Various neural architectures, spanning Transformer-based models like Vision Transformers (ViTs) to Convolutional Networks (ConvNets) like ResNets, are trained with CLIP and serve as universal backbones across diverse vision tasks. Despite utilizing the same data and training objectives, the effectiveness of representations learned by these architectures raises a critical question. Our investigation explores the differences in CLIP performance among these backbone architectures, revealing significant disparities in their classifications. Notably, normalizing these representations results in substantial performance variations. Our findings showcase a remarkable possible synergy between backbone predictions that could reach an improvement of over 20% through informed selection of the appropriate backbone. Moreover, we propose a simple, yet effective approach to combine predictions from multiple backbones, leading to a notable performance boost of up to 6.34\%. We will release the code for reproducing the results.	翻訳日:2023-12-25 16:12:21 公開日:2023-12-22
# 多様化による適応的微分進化:最適化への挑戦 Adaptive Differential Evolution with Diversification: Addressing Optimization Challenges ( http://arxiv.org/abs/2312.14464v1 ) ライセンス: Link先を確認	Sarit Maitra	(参考訳) 微分進化(DE)アルゴリズムの既存の変種は、局所探索の貧弱さや早期収束に対する感受性など、一定の制限がある。本研究では,周辺構造を動的に修飾する手法であるaded(adaptive differential evolution with diversification)を提案する。凸関数と非凸関数の両方を扱うために開発されたADEDは、Rosenbrock、Rastrigin、Ackley、DeVilliers-Glasser02を含む22のベンチマーク関数で検証されている。開発はGoogle CloudでJupyter NotebookとPython v3.10.12を使って行われ、マルチオブジェクトベンチマークのZDTテストスイートで追加のテストが行われた。 ADEDは適応的かつ多様なアプローチで、適応的突然変異とクロスオーバーレート、多様な突然変異戦術、多様化測定、局所探索機構、収束監視を含む。これらの特徴の組み合わせは、複雑で多様な風景をナビゲートするADEDの有効性を強化し、単目的と多目的の両方の最適化シナリオにおける課題に対処するための有望なツールとして位置づけている。 The existing variants of the Differential Evolution (DE) algorithm come with certain limitations, such as poor local search and susceptibility to premature convergence. This study introduces Adaptive Differential Evolution with Diversification (ADED), a method that dynamically modifies the neighborhood structure by evaluating the trial solutions' fitness. Developed to work with both convex and nonconvex objective functions, ADED is validated with 22 benchmark functions, including Rosenbrock, Rastrigin, Ackley, and DeVilliers-Glasser02. The development is carried out in Google Cloud using Jupyter Notebook and Python v3.10.12, with additional testing conducted on the multi-objective benchmark ZDT test suite. ADED distinguishes itself with its adaptive and diverse approach, which includes adaptive mutation and crossover-rates, diverse mutation tactics, diversification measurements, local search mechanisms, and convergence monitoring. The unique combination of these features collectively enhances ADED's effectiveness in navigating complex and diverse landscapes, positioning it as a promising tool for addressing challenges in both single- and multi-objective optimization scenarios.	翻訳日:2023-12-25 16:05:30 公開日:2023-12-22
# ビザンチン系ロバスト集団の高次元攻撃 Attacking Byzantine Robust Aggregation in High Dimensions ( http://arxiv.org/abs/2312.14461v1 ) ライセンス: Link先を確認	Sarthak Choudhary, Aashish Kolluri and Prateek Saxena	(参考訳) 現代のニューラルネットワークやモデルのトレーニングには、一般的に高次元ベクトルのサンプルを平均化する必要がある。毒殺攻撃は、モデルトレーニングに使用される平均ベクターを歪めたり偏ったりし、モデルに特定のパターンを学習させたり、有用なものを学ぶのを避けたりする。ビザンチンのロバストアグリゲーションは、そのようなバイアスに対するアルゴリズムによる防御である。ロバストアグリゲータは、たとえ一部の入力が任意に破損したとしても、平均のような中央値統計計算における最大バイアスを制限できる。このようなアグリゲータの設計は、高次元を扱う場合に難しい。しかし、バイアスの強い理論的境界を持つ最初の多項式時間アルゴリズムが最近提案されている。彼らの境界線は数次元とは無関係であり、防衛戦における毒殺の威力に対する概念的な制限を約束している。本稿では,次元非依存バイアスの主張を覆す強力な防御の実現に向けたHIDRAと呼ばれる新たな攻撃を示す。 HIDRAは、それまでの情報理論分析には関心がなかった、新しい計算ボトルネックを強調している。実験結果から,本攻撃はモデル性能をほぼ完全に破壊するが,同じ目標を持つ既存攻撃は大きな効果をもたらさないことが示された。我々の発見は、毒殺と証明可能な防御の間の武器競争を広範囲に開放している。 Training modern neural networks or models typically requires averaging over a sample of high-dimensional vectors. Poisoning attacks can skew or bias the average vectors used to train the model, forcing the model to learn specific patterns or avoid learning anything useful. Byzantine robust aggregation is a principled algorithmic defense against such biasing. Robust aggregators can bound the maximum bias in computing centrality statistics, such as mean, even when some fraction of inputs are arbitrarily corrupted. Designing such aggregators is challenging when dealing with high dimensions. However, the first polynomial-time algorithms with strong theoretical bounds on the bias have recently been proposed. Their bounds are independent of the number of dimensions, promising a conceptual limit on the power of poisoning attacks in their ongoing arms race against defenses. In this paper, we show a new attack called HIDRA on practical realization of strong defenses which subverts their claim of dimension-independent bias. HIDRA highlights a novel computational bottleneck that has not been a concern of prior information-theoretic analysis. Our experimental evaluation shows that our attacks almost completely destroy the model performance, whereas existing attacks with the same goal fail to have much effect. Our findings leave the arms race between poisoning attacks and provable defenses wide open.	翻訳日:2023-12-25 16:05:08 公開日:2023-12-22
# ヒト脳波とTD3深部強化学習における共有自律性のためのマルチエージェントコパイロットアプローチ Multiagent Copilot Approach for Shared Autonomy between Human EEG and TD3 Deep Reinforcement Learning ( http://arxiv.org/abs/2312.14458v1 ) ライセンス: Link先を確認	Chun-Ren Phang and Akimasa Hirata	(参考訳) 深層強化学習(RL)アルゴリズムは、環境と対話できる完全自律エージェントの開発を可能にする。脳コンピュータインタフェース(BCI)システムは、明示的な環境に関係なく人間の暗黙の脳信号を解読する。本研究では,deep rlとbciを統合し,環境要因を考慮し,自律系における有益なヒューマン介入と脳活動のデコード性能を向上させる。人体の脳波(EEG)からデコードされた作用指令と、与えられた環境に対する双発遅延DDPG(TD3)エージェントから生成された作用との間には、共有自律性が認められた。提案手法は,EEG(EEG-NB)やTD3(TD3制御)よりも有意に優れていた。 co-fbモデルは、eeg-nbモデルよりも高い目標接近スコア、低い故障率、低いヒューマンワークロードを達成した。 Co-FB制御方式はTD3モデルよりも目に見える目標スコアと人間の介入のレベルが高い。また,エージェント決定の矛盾が副操縦士モデルの制御精度と権限に与える影響を評価するために,差分d-インデックスを提案した。我々は,TD3エージェントの制御権限と,d-インデックスに対するヒト脳波分類の性能改善との間に有意な相関が認められた。また,制御権限をtd3エージェントに移行することで,bci復号が最適でない場合の性能が向上した。これらの結果から, コンピロシステムは複雑な環境を効果的に扱えること, 環境要因を考慮したBCI性能の向上が期待できることがわかった。今後の作業は、協調動作の性能を評価するために、連続的な行動空間と異なるマルチエージェントアプローチを採用するべきである。 Deep reinforcement learning (RL) algorithms enable the development of fully autonomous agents that can interact with the environment. Brain-computer interface (BCI) systems decipher human implicit brain signals regardless of the explicit environment. In this study, we integrated deep RL and BCI to improve beneficial human interventions in autonomous systems and the performance in decoding brain activities by considering environmental factors. Shared autonomy was allowed between the action command decoded from the electroencephalography (EEG) of the human agent and the action generated from the twin delayed DDPG (TD3) agent for a given environment. Our proposed copilot control scheme with a full blocker (Co-FB) significantly outperformed the individual EEG (EEG-NB) or TD3 control. The Co-FB model achieved a higher target approaching score, lower failure rate, and lower human workload than the EEG-NB model. The Co-FB control scheme had a higher invisible target score and level of allowed human intervention than the TD3 model. We also proposed a disparity d-index to evaluate the effect of contradicting agent decisions on the control accuracy and authority of the copilot model. We found a significant correlation between the control authority of the TD3 agent and the performance improvement of human EEG classification with respect to the d-index. We also observed that shifting control authority to the TD3 agent improved performance when BCI decoding was not optimal. These findings indicate that the copilot system can effectively handle complex environments and that BCI performance can be improved by considering environmental factors. Future work should employ continuous action space and different multi-agent approaches to evaluate copilot performance.	翻訳日:2023-12-25 16:04:47 公開日:2023-12-22
# QuaR-VLA:四足歩行ロボットの視覚言語行動モデル QUAR-VLA: Vision-Language-Action Model for Quadruped Robots ( http://arxiv.org/abs/2312.14457v1 ) ライセンス: Link先を確認	Pengxiang Ding, Han Zhao, Zhitao Wang, Zhenyu Wei, Shangke Lyu, Donglin Wang	(参考訳) ロボット知性の重要な発現は、自然と対話し、自律的に意思決定する能力である。従来のロボット制御のアプローチは、知覚、計画、意思決定を分割し、システム設計を単純化するが、異なる情報ストリーム間のシナジーを制限する。この区画化は、シームレスな自律的推論、意思決定、行動実行を達成する上での課題を提起する。これらの制約に対処するために、Quadruped Robots (QUAR-VLA) のためのビジョン・ランゲージ・アクションタスクと呼ばれる新しいパラダイムが導入された。このアプローチでは、視覚情報と指示を密に統合して実行可能なアクションを生成し、知覚、計画、意思決定を効果的に融合する。中心となるアイデアは、ロボット全体の知性を高めることだ。この枠組みの中で注目すべき課題は、きめ細かい指示を視覚的知覚情報と整合させることである。これは、ロボットが視覚観察と調和して詳細な指示を正しく解釈し行動することを保証するのに必要な複雑さを強調している。そこで本研究では,VLAモデルのファミリーである Quadruped Robotic Transformer (QUART) を提案し,実世界のロボットの入力として様々なモードから視覚情報と指示を統合し,実世界のロボットに対して実行可能なアクションを生成するとともに, quadruped Robot Dataset (QUARD) を提示する。評価試験(4000回)により,本手法がロボットの能力向上に寄与し,QUIRTが創発的能力の獲得を可能にした。 The important manifestation of robot intelligence is the ability to naturally interact and autonomously make decisions. Traditional approaches to robot control often compartmentalize perception, planning, and decision-making, simplifying system design but limiting the synergy between different information streams. This compartmentalization poses challenges in achieving seamless autonomous reasoning, decision-making, and action execution. To address these limitations, a novel paradigm, named Vision-Language-Action tasks for QUAdruped Robots (QUAR-VLA), has been introduced in this paper. This approach tightly integrates visual information and instructions to generate executable actions, effectively merging perception, planning, and decision-making. The central idea is to elevate the overall intelligence of the robot. Within this framework, a notable challenge lies in aligning fine-grained instructions with visual perception information. This emphasizes the complexity involved in ensuring that the robot accurately interprets and acts upon detailed instructions in harmony with its visual observations. Consequently, we propose QUAdruped Robotic Transformer (QUART), a family of VLA models to integrate visual information and instructions from diverse modalities as input and generates executable actions for real-world robots and present QUAdruped Robot Dataset (QUARD), a large-scale multi-task dataset including navigation, complex terrain locomotion, and whole-body manipulation tasks for training QUART models. Our extensive evaluation (4000 evaluation trials) shows that our approach leads to performant robotic policies and enables QUART to obtain a range of emergent capabilities.	翻訳日:2023-12-25 16:04:18 公開日:2023-12-22
# 分散検出のための次元の呪いを克服する方法 How to Overcome Curse-of-Dimensionality for Out-of-Distribution Detection? ( http://arxiv.org/abs/2312.14452v1 ) ライセンス: Link先を確認	Soumya Suvra Ghosal, Yiyou Sun, and Yixuan Li	(参考訳) ワイルドにデプロイされた機械学習モデルは、未知のクラスからのout-of-distribution(ood)データに挑戦できる。 OOD検出の最近の進歩は、分布内(ID)データから比較的離れたサンプルを識別するための距離測定に依存している。約束にもかかわらず、距離ベースの手法は、高次元の特徴空間における有効性を制限している次元の呪いに悩まされる。この問題に対処するために,OOD検出のための新しいフレームワーク,Subspace Nearest Neighbor (SNN)を提案する。トレーニングにおいて,本手法は,次元の最も関連性の高い部分集合(部分空間)を活用することにより,モデルとその特徴表現を正規化する。サブスペース学習は、IDとOODデータの間の高度に区別可能な距離測定を行う。我々はSNNの有効性を検証するための総合的な実験と改善を行った。現在の最良の距離ベースの手法と比較して、SNNはCIFAR-100ベンチマークで平均FPR95を15.96%削減している。 Machine learning models deployed in the wild can be challenged by out-of-distribution (OOD) data from unknown classes. Recent advances in OOD detection rely on distance measures to distinguish samples that are relatively far away from the in-distribution (ID) data. Despite the promise, distance-based methods can suffer from the curse-of-dimensionality problem, which limits the efficacy in high-dimensional feature space. To combat this problem, we propose a novel framework, Subspace Nearest Neighbor (SNN), for OOD detection. In training, our method regularizes the model and its feature representation by leveraging the most relevant subset of dimensions (i.e. subspace). Subspace learning yields highly distinguishable distance measures between ID and OOD data. We provide comprehensive experiments and ablations to validate the efficacy of SNN. Compared to the current best distance-based method, SNN reduces the average FPR95 by 15.96% on the CIFAR-100 benchmark.	翻訳日:2023-12-25 16:03:48 公開日:2023-12-22
# セッションベースレコメンデーションにおけるアンラーニングの効果について On the Effectiveness of Unlearning in Session-Based Recommendation ( http://arxiv.org/abs/2312.14447v1 ) ライセンス: Link先を確認	Xin Xin, Liu Yang, Ziqi Zhao, Pengjie Ren, Zhumin Chen, Jun Ma, Zhaochun Ren	(参考訳) セッションベースのレコメンデーションは、セッション内の前のインタラクションからユーザの将来の関心を予測する。歴史的なサンプルを記憶しているにも関わらず、特定のトレーニングサンプルの影響を取り除こうとする未学習の要求も、ユーザのプライバシやモデルの忠実性といった理由から発生する。しかし、未学習に関する既存の研究はセッションベースの推薦には適していない。一方、これらの手法は、セッション中の未学習項目と残りの項目との協調的相関や逐次的接続により、未学習効果を満足することができない。一方,セッションベースのレコメンデーションシナリオにおいて,未学習の有効性を検証する研究はほとんど行われていない。本稿では,セッションベースレコメンデーションにおける高い学習効率,正確なレコメンデーション性能,学習効率の向上を実現する,セッションベースのレコメンデーションアンラーニングフレームワークsruを提案する。具体的には、まず、セッション間の類似性に応じてトレーニングセッションを個別のサブモデルに分割し、次に、セッションとサブモデル内のデータのセントロイドの相関関係に応じて隠れた状態を融合させる注意ベースの集約層を利用する。さらに,未学習の有効性を向上させるために,協調追加削除(ced),隣接追加削除(ned),ランダム追加削除(red)という3つの追加データ削除戦略を提案する。さらに,データ削除後に未学習サンプルを推測できるかどうかを測定し,未学習の有効性を検証する評価指標を提案する。 3つの代表的なセッションベースレコメンデーションモデルでSRUを実装し、3つのベンチマークデータセットで実験を行う。実験の結果,本手法の有効性が示された。 Session-based recommendation predicts users' future interests from previous interactions in a session. Despite the memorizing of historical samples, the request of unlearning, i.e., to remove the effect of certain training samples, also occurs for reasons such as user privacy or model fidelity. However, existing studies on unlearning are not tailored for the session-based recommendation. On the one hand, these approaches cannot achieve satisfying unlearning effects due to the collaborative correlations and sequential connections between the unlearning item and the remaining items in the session. On the other hand, seldom work has conducted the research to verify the unlearning effectiveness in the session-based recommendation scenario. In this paper, we propose SRU, a session-based recommendation unlearning framework, which enables high unlearning efficiency, accurate recommendation performance, and improved unlearning effectiveness in session-based recommendation. Specifically, we first partition the training sessions into separate sub-models according to the similarity across the sessions, then we utilize an attention-based aggregation layer to fuse the hidden states according to the correlations between the session and the centroid of the data in the sub-model. To improve the unlearning effectiveness, we further propose three extra data deletion strategies, including collaborative extra deletion (CED), neighbor extra deletion (NED), and random extra deletion (RED). Besides, we propose an evaluation metric that measures whether the unlearning sample can be inferred after the data deletion to verify the unlearning effectiveness. We implement SRU with three representative session-based recommendation models and conduct experiments on three benchmark datasets. Experimental results demonstrate the effectiveness of our methods.	翻訳日:2023-12-25 16:03:33 公開日:2023-12-22
# modality-aware fusion networkと大規模データセットによるクロスモーダルオブジェクト追跡 Cross-Modal Object Tracking via Modality-Aware Fusion Network and A Large-Scale Dataset ( http://arxiv.org/abs/2312.14446v1 ) ライセンス: Link先を確認	Lei Liu, Mengya Zhang, Cheng Li, Chenglong Li, and Jin Tang	(参考訳) ビジュアルトラッキングは、RGB画像シーケンスのみに依存する場合、無効なターゲットや低照度環境でのパフォーマンス低下といった課題に直面することが多い。深度データや赤外線データといった追加のモダリティは有効であることが証明されているが、既存のマルチモーダルイメージングプラットフォームは複雑で、現実の応用性に欠ける。対照的に、監視カメラで一般的に使用される近赤外線(NIR)イメージングは、光強度に基づいてRGBとNIRを切り替えることができる。しかしながら、これらの不均質なモダリティを横断するオブジェクトの追跡は、特に追跡中にモダリティスイッチ信号がないため、大きな課題となる。これらの課題に対処するため,我々はmodality-aware fusion network (mafnet) と呼ばれる適応型クロスモーダルオブジェクトトラッキングアルゴリズムを提案する。 MAFNetは、適応重み付け機構を用いてRGBとNIRの両方のモダリティからの情報を効率的に統合し、外観ギャップを効果的にブリッジし、モダリティ対応ターゲット表現を可能にする。適応重み付けモジュールとモダリティ固有の表現モジュール...の2つのキーコンポーネントで構成されている。 Visual tracking often faces challenges such as invalid targets and decreased performance in low-light conditions when relying solely on RGB image sequences. While incorporating additional modalities like depth and infrared data has proven effective, existing multi-modal imaging platforms are complex and lack real-world applicability. In contrast, near-infrared (NIR) imaging, commonly used in surveillance cameras, can switch between RGB and NIR based on light intensity. However, tracking objects across these heterogeneous modalities poses significant challenges, particularly due to the absence of modality switch signals during tracking. To address these challenges, we propose an adaptive cross-modal object tracking algorithm called Modality-Aware Fusion Network (MAFNet). MAFNet efficiently integrates information from both RGB and NIR modalities using an adaptive weighting mechanism, effectively bridging the appearance gap and enabling a modality-aware target representation. It consists of two key components: an adaptive weighting module and a modality-specific representation module......	翻訳日:2023-12-25 16:03:04 公開日:2023-12-22
# 最適制御による時間反転支援量子メトロロジー Time-reversal assisted quantum metrology with an optimal control ( http://arxiv.org/abs/2312.14443v1 ) ライセンス: Link先を確認	Da-Wei Luo, Ting Yu	(参考訳) 本稿では, 量子最適制御と時間反転戦略を用いて, ショットノイズ限界を克服し, パラメータ推定のためのハイゼンベルクスケーリング限界に達するプロトコルを提案する。量子ナビゲーションと測定において重要な役割を果たす位相推定を例に、系の光子数測定から生じる不確実性は、推定される位相とは無関係に、補助的なCram\'er-Rao境界を飽和させることができることを示す。光子損失の現実的な場合、最適な推定は最適な制御とフォトニックモードに結合したアンシラ2レベル系の射影測定によって達成可能であることを示す。 We propose a protocol to overcome the shot noise limit and reach the Heisenberg scaling limit for parameter estimation by using quantum optimal control and a time-reversal strategy. Exemplified through the phase estimation, which can play an important role in quantum navigation and measurement, we show that the uncertainty arising from a photon number measurement of the system can saturate the assisted Cream\'er-Rao bound, independent of the phase being estimated. In a realistic case with photon loss, we show that the optimal estimation may still be attainable by optimal control and a projective measurement on an ancilla two-level system coupled to photonic modes.	翻訳日:2023-12-25 16:02:43 公開日:2023-12-22
# DMC4ML: 機械学習のためのデータ移動複雑性 DMC4ML: Data Movement Complexity for Machine Learning ( http://arxiv.org/abs/2312.14441v1 ) ライセンス: Link先を確認	Chen Ding, Christopher Kanan, Dylan McKellips, Toranosuke Ozawa, Arian Shahmirza, Wesley Smith	(参考訳) 今日のコンピューティングの最大の需要は機械学習です。本稿では,変圧器,空間畳み込み,FFTという3つの機械学習アルゴリズムを解析する。その分析は3つの点で新しい。まず、従来の時間や空間の複雑さではなく、抽象的なメモリ階層におけるメモリアクセスのコストを測定する。第2に、解析は漸近的であり、メモリコストの主な源を特定する。最後に、結果はシンボリックであり、任意の次元サイズとヘッド数に対してグループ化されたクエリアテンションにおけるグループサイズや、任意の画像サイズとカーネルサイズに対してバッチ化された畳み込みのためのバッチサイズなどのアルゴリズムパラメータを選択するために使用できる。 The greatest demand for today's computing is machine learning. This paper analyzes three machine learning algorithms: transformers, spatial convolution, and FFT. The analysis is novel in three aspects. First, it measures the cost of memory access on an abstract memory hierarchy, instead of traditional time or space complexity. Second, the analysis is asymptotic and identifies the primary sources of the memory cost. Finally, the result is symbolic, which can be used to select algorithmic parameters such as the group size in grouped query attention for any dimension size and number of heads and the batch size for batched convolution for any image size and kernel size.	翻訳日:2023-12-25 16:02:30 公開日:2023-12-22
# 逆攻撃によるテキスト・画像生成における非対称バイアス Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks ( http://arxiv.org/abs/2312.14440v1 ) ライセンス: Link先を確認	Haz Sameen Shahgir, Xianghao Kong, Greg Ver Steeg, Yue Dong	(参考訳) コンテンツ生成におけるテキスト・ツー・イメージ(T2I)モデルの普及は、敵対的攻撃に対する堅牢性を含む安全性を慎重に検査する必要がある。これに関する広範な研究にもかかわらず、その効果の理由は未解明である。本稿では,攻撃成功率(ASR)に関連する要因の分析に焦点をあて,T2Iモデルに対する敵攻撃に関する実証的研究を行った。敵接尾辞と2つの勾配に基づく攻撃アルゴリズムを用いた新たな攻撃目標であるエンティティスワップを導入する。人間と自動評価は、エンティティスワップ上でのASRの非対称性を明らかにし、例えば、「雨の中で踊る人間」というプロンプトで「人間」を「ロボット」に置き換えるのは容易であるが、逆の逆の接尾辞は極めて困難である。さらに、モデルの信念から敵対的ASRへの示唆的信号を確立するための測度を提案する。我々は、敵攻撃の60%の成功確率と、この確率が5%以下に低下する状況を特定する。 The widespread use of Text-to-Image (T2I) models in content generation requires careful examination of their safety, including their robustness to adversarial attacks. Despite extensive research into this, the reasons for their effectiveness are underexplored. This paper presents an empirical study on adversarial attacks against T2I models, focusing on analyzing factors associated with attack success rates (ASRs). We introduce a new attack objective - entity swapping using adversarial suffixes and two gradient-based attack algorithms. Human and automatic evaluations reveal the asymmetric nature of ASRs on entity swap: for example, it is easier to replace "human" with "robot" in the prompt "a human dancing in the rain." with an adversarial suffix but is significantly harder in reverse. We further propose probing metrics to establish indicative signals from the model's beliefs to the adversarial ASR. We identify conditions resulting in a 60% success probability for adversarial attacks and others where this likelihood drops below 5%.	翻訳日:2023-12-25 16:02:19 公開日:2023-12-22
# PUMA: グラフ凝縮を用いた効率的な連続グラフ学習 PUMA: Efficient Continual Graph Learning with Graph Condensation ( http://arxiv.org/abs/2312.14439v1 ) ライセンス: Link先を確認	Yilun Liu, Ruihong Qiu, Yanran Tang, Hongzhi Yin, Zi Huang	(参考訳) ストリーミンググラフを扱う場合、既存のグラフ表現学習モデルは破滅的な忘れがちな問題に遭遇する。これに対し、連続グラフ学習は、静的グラフからストリーミンググラフへのグラフ表現学習を可能にする新しいパラダイムとして出現する。これまでの作業であるCaTは、連続的な学習手順をバランスよく行うリプレイベースのフレームワークで、入ってくるグラフを凝縮してデータを再生するための、小さいが効果的なメモリバンクを設計する。 CaTは破滅的な記憶問題を緩和するが,(1)CaTから派生したグラフ凝縮アルゴリズムはラベル付きノードにのみ焦点をあてるが,(2)CaTの継続トレーニングスキームは,これまでに学習した知識に重きを置いて,新たに追加された記憶から学習するモデル能力を制限する;(3)CaTの凝縮過程と再生過程はいずれも時間を要する。本稿では,CaT から拡張した Psudo-label guided memory bank (PUMA) CGL フレームワークを提案する。グラフ内の情報をフル活用するために、PUMAはラベル付きノードと非ラベル付きノードの両方でグラフ凝縮時のノードのカバレッジを拡大する。さらに,過去の連続学習スキームを改良し,歴史と新しいグラフのバランスのとれたトレーニングを行うための,scratchからトレーニング戦略を提案する。さらにpumaは、ワンタイムプロジェクションとワイドグラフエンコーダを使用して、トレーニングステージにおけるグラフ凝縮とグラフエンコーディングプロセスを加速し、フレームワーク全体の効率を向上させる。 4つのデータセットに関する広範な実験は、既存のメソッドに対する最先端のパフォーマンスと効率を示している。 When handling streaming graphs, existing graph representation learning models encounter a catastrophic forgetting problem, where previously learned knowledge of these models is easily overwritten when learning with newly incoming graphs. In response, Continual Graph Learning emerges as a novel paradigm enabling graph representation learning from static to streaming graphs. Our prior work, CaT is a replay-based framework with a balanced continual learning procedure, which designs a small yet effective memory bank for replaying data by condensing incoming graphs. Although the CaT alleviates the catastrophic forgetting problem, there exist three issues: (1) The graph condensation algorithm derived in CaT only focuses on labelled nodes while neglecting abundant information carried by unlabelled nodes; (2) The continual training scheme of the CaT overemphasises on the previously learned knowledge, limiting the model capacity to learn from newly added memories; (3) Both the condensation process and replaying process of the CaT are time-consuming. In this paper, we propose a psudo-label guided memory bank (PUMA) CGL framework, extending from the CaT to enhance its efficiency and effectiveness by overcoming the above-mentioned weaknesses and limits. To fully exploit the information in a graph, PUMA expands the coverage of nodes during graph condensation with both labelled and unlabelled nodes. Furthermore, a training-from-scratch strategy is proposed to upgrade the previous continual learning scheme for a balanced training between the historical and the new graphs. Besides, PUMA uses a one-time prorogation and wide graph encoders to accelerate the graph condensation and the graph encoding process in the training stage to improve the efficiency of the whole framework. Extensive experiments on four datasets demonstrate the state-of-the-art performance and efficiency over existing methods.	翻訳日:2023-12-25 16:02:01 公開日:2023-12-22
# PC-Conv:2次元フィルタリングによるホモフィリーとヘテロフィリーの統合 PC-Conv: Unifying Homophily and Heterophily with Two-fold Filtering ( http://arxiv.org/abs/2312.14438v1 ) ライセンス: Link先を確認	Bingheng Li, Erlin Pan, Zhao Kang	(参考訳) 近年,厳密なグラフ表現学習法が,強いヘテロ親和性グラフとホモ親和性グラフの両方において優れた性能を達成している。したがって、それらは異なる相同性のレベルを持つ実世界のグラフをまたいでうまく一般化できない。これは、ヘテロ親和グラフにおけるホモフィリーの無視と、その逆によるものである。本稿では,親水性グラフのホモフィアを抽出するための2次元フィルタリング機構を提案する。特に、グラフ熱方程式を拡張して、長距離からの大域情報のヘテロ親和的な集約を行う。結果のフィルタは Possion-Charlier (PC) 多項式によって正確に近似することができる。複数の順序で情報を活用するために,ノード分類タスクのための強力なグラフ畳み込みPC-ConvとそのインスタンスPCNetを導入する。最先端のGNNと比較すると、PCNetはよく知られたホモフィルグラフとヘテロフィルグラフの競合性能を示す。私たちの実装はhttps://github.com/uestclbh/pc-convで利用可能です。 Recently, many carefully crafted graph representation learning methods have achieved impressive performance on either strong heterophilic or homophilic graphs, but not both. Therefore, they are incapable of generalizing well across real-world graphs with different levels of homophily. This is attributed to their neglect of homophily in heterophilic graphs, and vice versa. In this paper, we propose a two-fold filtering mechanism to extract homophily in heterophilic graphs and vice versa. In particular, we extend the graph heat equation to perform heterophilic aggregation of global information from a long distance. The resultant filter can be exactly approximated by the Possion-Charlier (PC) polynomials. To further exploit information at multiple orders, we introduce a powerful graph convolution PC-Conv and its instantiation PCNet for the node classification task. Compared with state-of-the-art GNNs, PCNet shows competitive performance on well-known homophilic and heterophilic graphs. Our implementation is available at https://github.com/uestclbh/PC-Conv.	翻訳日:2023-12-25 16:01:27 公開日:2023-12-22
# REBEL:人間のフィードバックによる強化学習におけるリワード過最適化のための正規化に基づく解法 REBEL: A Regularization-Based Solution for Reward Overoptimization in Reinforcement Learning from Human Feedback ( http://arxiv.org/abs/2312.14436v1 ) ライセンス: Link先を確認	Souradip Chakraborty, Amisha Bhaskar, Anukriti Singh, Pratap Tokekar, Dinesh Manocha, and Amrit Singh Bedi	(参考訳) 本研究では,人間のフィードバック(RRLHF)からのロボット強化学習を応用した,効率的な報酬正規化アルゴリズムREBELを提案する。連続制御ロボットタスクの強化学習(RL)性能は、基礎となる報酬関数に敏感である。実際には、報酬機能は人間の意図や価値観、社会的規範などと不一致に陥り、現実世界で壊滅的な失敗に繋がることが多い。人間の好みを利用して、正規化された報酬機能を学び、最終的にエージェントを真の意図した行動に合わせる。エージェント選好と呼ばれる既存のRRLHFフレームワークに報酬正規化という新たな概念を導入する。そこで我々は,人間のフィードバックを嗜好の観点から考えるだけでなく,報酬関数を学習しながら,基礎となるRLエージェントの嗜好を考慮することを提案する。このことは,RLにおける報酬関数の設計に伴う過度な最適化の改善に役立つことを示す。 PEBBLEやPEBBLE+SURFのような最先端の手法と比較して,REBELは試料効率を最大70%向上させ,同程度の報酬を得られることを示した。 In this work, we propose REBEL, an algorithm for sample efficient reward regularization based robotic reinforcement learning from human feedback (RRLHF). Reinforcement learning (RL) performance for continuous control robotics tasks is sensitive to the underlying reward function. In practice, the reward function often ends up misaligned with human intent, values, social norms, etc., leading to catastrophic failures in the real world. We leverage human preferences to learn regularized reward functions and eventually align the agents with the true intended behavior. We introduce a novel notion of reward regularization to the existing RRLHF framework, which is termed as agent preferences. So, we not only consider human feedback in terms of preferences, we also propose to take into account the preference of the underlying RL agent while learning the reward function. We show that this helps to improve the over-optimization associated with the design of reward functions in RL. We experimentally show that REBEL exhibits up to 70% improvement in sample efficiency to achieve a similar level of episodic reward returns as compared to the state-of-the-art methods such as PEBBLE and PEBBLE+SURF.	翻訳日:2023-12-25 16:01:09 公開日:2023-12-22
# オンライン機械学習に基づく単一粒子x線回折画像からのスケーラブルな3次元再構成 Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning ( http://arxiv.org/abs/2312.14432v1 ) ライセンス: Link先を確認	Jay Shenoy, Axel Levy, Fr\'ed\'eric Poitevin, Gordon Wetzstein	(参考訳) X線自由電子レーザー(XFEL)は、生体分子の構造と力学を計測し、生命の基本的な構成要素を理解するのに役立つ。特に、高い繰り返し速度のXFELは、低温または結晶化状態では捕獲できないフリーティング状態にアクセスする機会として、個々の弱い散乱生体分子をほぼ生理的条件下で撮像する単一粒子イメージング(X線SPI)を可能にする。既存のX線SPI再構成アルゴリズムは、各撮像画像中の粒子の未知の向きと共有3次元構造を推定するが、これらの新興XFELによって生成された大量のデータセットを扱うには不十分である。本稿では,大規模なX線SPIデータセットから3次元マクロ分子の構造を推定するオンライン再構成フレームワークであるX-RAIを紹介する。 X-RAIは畳み込みエンコーダ(convolutional encoder)で構成されており、大きなデータセットに対するポーズ推定をアモーティズするとともに、暗黙の神経表現を用いてエンドツーエンドで自己管理的な高品質な3D再構成を可能にする物理ベースのデコーダ(decoder)も備えている。我々は、X-RAIがシミュレーションと挑戦的な実験環境において、数百万の回折画像を含む大規模なデータセットをオンライン形式で処理する前例のない能力を示した。これらの能力は、リアルタイムのキャプチャと再構築に向けたX線SPIのパラダイムシフトを表している。 X-ray free-electron lasers (XFELs) offer unique capabilities for measuring the structure and dynamics of biomolecules, helping us understand the basic building blocks of life. Notably, high-repetition-rate XFELs enable single particle imaging (X-ray SPI) where individual, weakly scattering biomolecules are imaged under near-physiological conditions with the opportunity to access fleeting states that cannot be captured in cryogenic or crystallized conditions. Existing X-ray SPI reconstruction algorithms, which estimate the unknown orientation of a particle in each captured image as well as its shared 3D structure, are inadequate in handling the massive datasets generated by these emerging XFELs. Here, we introduce X-RAI, an online reconstruction framework that estimates the structure of a 3D macromolecule from large X-ray SPI datasets. X-RAI consists of a convolutional encoder, which amortizes pose estimation over large datasets, as well as a physics-based decoder, which employs an implicit neural representation to enable high-quality 3D reconstruction in an end-to-end, self-supervised manner. We demonstrate that X-RAI achieves state-of-the-art performance for small-scale datasets in simulation and challenging experimental settings and demonstrate its unprecedented ability to process large datasets containing millions of diffraction images in an online fashion. These abilities signify a paradigm shift in X-ray SPI towards real-time capture and reconstruction.	翻訳日:2023-12-25 16:00:52 公開日:2023-12-22
# スマートマニュファクチャリングにおける一元的産業大知識モデルフレームワーク A Unified Industrial Large Knowledge Model Framework in Smart Manufacturing ( http://arxiv.org/abs/2312.14428v1 ) ライセンス: Link先を確認	Jay Lee, Hanqi Su	(参考訳) 近年の大規模言語モデル(LLM)の出現は、人工知能の可能性を示し、業界 4.0 とスマート製造の新しい機会を明らかにしている。しかし、これらのLSMを産業に適用する際、主にドメイン固有の知識ではなく、一般的な知識に関するトレーニングのために顕著なギャップが存在する。このような専門的なドメイン知識は、産業アプリケーションの複雑なニーズに効果的に対処するために不可欠である。このギャップを埋めるために,スマートマニュファクチャリングにおける産業に革命をもたらす可能性を強調する産業大知識モデル(ILKM)フレームワークを提案する。さらに、ILKMとLLMは8つの視点から比較される。最後に、スマート製造におけるilkms開発指針として「6s原則」を提案する。 The recent emergence of large language models (LLMs) shows the potential for artificial general intelligence, revealing new opportunities in industry 4.0 and smart manufacturing. However, a notable gap exists in applying these LLMs in industry, primarily due to their training on general knowledge rather than domain-specific knowledge. Such specialized domain knowledge is vital for effectively addressing the complex needs of industrial applications. To bridge this gap, this paper proposes an Industrial Large Knowledge Model (ILKM) framework emphasizing their potential to revolutionize the industry in smart manufacturing. In addition, ILKMs and LLMs are compared from eight perspectives. Finally, "6S Principle" is proposed as the guideline for the development of ILKMs in smart manufacturing.	翻訳日:2023-12-25 16:00:24 公開日:2023-12-22
# 等分散に基づく幻覚の理論 Theory of Hallucinations based on Equivariance ( http://arxiv.org/abs/2312.14504v1 ) ライセンス: Link先を確認	Hisaichi Shibata	(参考訳) 等分散は、言語モデルを含む機械学習において重要な特徴である。同じ意味の句列が一貫して解釈されることを保証する。例えば、"There is a cat on the table"という文は、トークンレベルの表現のバリエーションに関係なく、言語モデルによって解釈されるべきである。この知見に基づいて,言語モデルの等分散性の不足が幻覚に繋がる可能性を示唆する新しい理論を提案する。この理論によれば、比較的小さなデータセットで訓練された言語モデルは、入力テキストを誤解釈したり、誤ったテキスト(すなわち幻覚)を生成する傾向がある。この理論をテストするために、私はキャラクターレベルの置換暗号である「dancing men」として知られる玩具モデルを開発した。さらに,T5(Text To Text Transfer Transformer)モデルに基づく新しい手法を提案する。私は、このT5モデルは暗号をほぼ完全に解き、このフレームで同値を得る能力を示した。この方法は、トークンや辞書を使わずに、大きな言語モデルに類似した、単語レベルおよび文レベルの置換暗号にスケールできる。このスケーラビリティは、不適切な同値獲得と幻覚の出現の間の関係を調査するのに適している。 Equivariance is an important feature in machine learning, including language models. It ensures that any sequences of phrases with the same meanings are interpreted consistently. For example, the sentence 'There is a cat on the table' should be interpreted by language models as it is, regardless of variations in its token-level expression. Building on this insight, I propose a new theory suggesting that insufficient equivariance in language models can lead to hallucinations. According to this theory, which is both intuitive and novel, language models trained on relatively small datasets tend to misinterpret input texts and/or generate incorrect texts (i.e., hallucinations). To test this theory, I developed a toy model known as 'dancing men', which is a character-level substitution cipher. Additionally, I propose a novel technique based on the T5 (Text To Text Transfer Transformer) model to efficiently decipher these codes without relying on frequency analysis. I have found that this T5 model can almost completely solve the cipher, demonstrating its ability to acquire equivariance in this frame. This method could be scaled up to word-level and sentence-level substitution ciphers, analogous to large language models without tokenizers or dictionaries. This scalability makes it suitable for investigating the proposed link between inadequate equivariance acquisition and the emergence of hallucinations.	翻訳日:2023-12-25 15:55:48 公開日:2023-12-22
# vistripformer:汎用ビデオ復元のためのトークン効率の高いトランスフォーマー ViStripformer: A Token-Efficient Transformer for Versatile Video Restoration ( http://arxiv.org/abs/2312.14502v1 ) ライセンス: Link先を確認	Fu-Jen Tsai, Yan-Tsung Peng, Chen-Yu Chang, Chan-Yu Li, Yen-Yu Lin, Chung-Chi Tsai, and Chia-Wen Lin	(参考訳) ビデオ復元は、画質の劣化したフレームからクリーンでシャープなビデオを復元する、低レベルの視覚タスクである。隣接するフレームからの時間情報を使ってビデオの復元を成功させる。近年,トランスフォーマーの成功はコンピュータビジョンコミュニティにおいて認知度を高めている。しかし、その自己保持機構は大量のメモリを必要とするため、ビデオ復元のような高解像度の視覚タスクには適さない。本稿では,空間的および時間的情報を抽出するために,フレーム内ストリップ注意 (intra-sa) とフレーム間ストリップ注意 (inter-sa) からなる長距離データ相関を捉えるために時空間的ストリップ注意を利用するvistripformer (video stripformer) を提案する。ビデオフレームを水平方向と垂直方向のストリップ状の特徴に分解し,様々な方向や大きさの劣化パターンに対処する。さらに、ViStripformerはバニラ変圧器よりもメモリ使用量の少ない効率的かつ効率的なトランスアーキテクチャである。広範に実験した結果,提案手法は,ビデオデブラリング,デモレーリング,デレイニングなどの映像復元作業において,高速な推定時間で優れた結果が得られることがわかった。 Video restoration is a low-level vision task that seeks to restore clean, sharp videos from quality-degraded frames. One would use the temporal information from adjacent frames to make video restoration successful. Recently, the success of the Transformer has raised awareness in the computer-vision community. However, its self-attention mechanism requires much memory, which is unsuitable for high-resolution vision tasks like video restoration. In this paper, we propose ViStripformer (Video Stripformer), which utilizes spatio-temporal strip attention to catch long-range data correlations, consisting of intra-frame strip attention (Intra-SA) and inter-frame strip attention (Inter-SA) for extracting spatial and temporal information. It decomposes video frames into strip-shaped features in horizontal and vertical directions for Intra-SA and Inter-SA to address degradation patterns with various orientations and magnitudes. Besides, ViStripformer is an effective and efficient transformer architecture with much lower memory usage than the vanilla transformer. Extensive experiments show that the proposed model achieves superior results with fast inference time on video restoration tasks, including video deblurring, demoireing, and deraining.	翻訳日:2023-12-25 15:55:26 公開日:2023-12-22
# 高速・高次物理インフォームドニューラルネットワークのハッチンソントレース推定 Hutchinson Trace Estimation for High-Dimensional and High-Order Physics-Informed Neural Networks ( http://arxiv.org/abs/2312.14499v1 ) ライセンス: Link先を確認	Zheyuan Hu, Zekun Shi, George Em Karniadakis, Kenji Kawaguchi	(参考訳) 物理学に変形したニューラルネットワーク(pinns)は偏微分方程式(pdes)の解法として有効であることが証明されている。しかし, PINNを高次元かつ高次元のPDEに拡張することは, 残留損失の自動微分に伴う計算コストが大きな課題となる。本稿では,Hutchinson Trace Estimation (HTE)を導入し,高次元・高次PDE処理におけるPINNの限界に対処する。科学計算においてユビキタスな2階高次元PDEから始め、HTEはヘッセン行列全体の計算をヘッセンベクトル積(HVP)に変換する。このアプローチはテイラーモードの自動微分による計算ボトルネックを緩和し、ヘッセン行列からHVPへのメモリ消費を大幅に削減する。我々はさらに,hteのオリジナルのピン損失への収束と,その偏りのない挙動を特定の条件下で示す。 Stochastic Dimension Gradient Descent (SDGD)との比較は、特に次元間で大きな差異があるシナリオにおいて、HTEの明確な利点を強調している。さらにHTEを高次および高次元PDEに拡張し、特にバイハーモニック方程式に対処する。テンソルベクトル積(TVP)を用いることで、HTEは、4階高次元バイハーモニック方程式に関連する余剰テンソルを効率的に計算し、メモリを節約し、高速な計算を可能にする。 HTEの有効性は実験的な設定を通じて説明され、メモリと速度制約の下でSDGDと同等の収束率を示す。さらに、HTEは、グラディエント強化PINN(gPINN)バージョンとバイハーモニック方程式の加速に有用である。全体として、HTEは高次および高次元PDEに対処する科学的機械学習の新たな能力を開く。 Physics-Informed Neural Networks (PINNs) have proven effective in solving partial differential equations (PDEs), especially when some data are available by blending seamlessly data and physics. However, extending PINNs to high-dimensional and even high-order PDEs encounters significant challenges due to the computational cost associated with automatic differentiation in the residual loss. Herein, we address the limitations of PINNs in handling high-dimensional and high-order PDEs by introducing Hutchinson Trace Estimation (HTE). Starting with the second-order high-dimensional PDEs ubiquitous in scientific computing, HTE transforms the calculation of the entire Hessian matrix into a Hessian vector product (HVP). This approach alleviates the computational bottleneck via Taylor-mode automatic differentiation and significantly reduces memory consumption from the Hessian matrix to HVP. We further showcase HTE's convergence to the original PINN loss and its unbiased behavior under specific conditions. Comparisons with Stochastic Dimension Gradient Descent (SDGD) highlight the distinct advantages of HTE, particularly in scenarios with significant variance among dimensions. We further extend HTE to higher-order and higher-dimensional PDEs, specifically addressing the biharmonic equation. By employing tensor-vector products (TVP), HTE efficiently computes the colossal tensor associated with the fourth-order high-dimensional biharmonic equation, saving memory and enabling rapid computation. The effectiveness of HTE is illustrated through experimental setups, demonstrating comparable convergence rates with SDGD under memory and speed constraints. Additionally, HTE proves valuable in accelerating the Gradient-Enhanced PINN (gPINN) version as well as the Biharmonic equation. Overall, HTE opens up a new capability in scientific machine learning for tackling high-order and high-dimensional PDEs.	翻訳日:2023-12-25 15:54:52 公開日:2023-12-22
# ビジョンランゲージモデルによるFew-Shot物体検出の再検討 Revisiting Few-Shot Object Detection with Vision-Language Models ( http://arxiv.org/abs/2312.14494v1 ) ライセンス: Link先を確認	Anish Madan, Neehar Peri, Shu Kong, Deva Ramanan	(参考訳) few-shot object detection (fsod)ベンチマークには、制限されたアノテーションで新しいカテゴリを検出するための高度な技術がある。既存のベンチマークでは、COCOのような確立されたデータセットを、それぞれ、事前トレーニングと微調整のためのベースクラスと新しいクラスに分割することで再利用している。しかし、これらのベンチマークは、実際にfsodをデプロイする方法を反映していない。少数のベースカテゴリを事前学習するよりも、ターゲットドメインに対して基礎モデル(例えば、webスケールデータで事前学習された視覚言語モデル(vlm))を微調整することがより実用的であると主張する。驚いたことに、GroundingDINOのようなVLMからのゼロショット推論はCOCO上の最先端(48.3対33.1 AP)よりも著しく優れている。しかし、そのようなゼロショットモデルは、それでも対象とする興味ある概念と一致しない。例えば、web上のトレーラーは、自動運転車の文脈でトレーラーとは異なるかもしれない。本研究では,任意の外部データセット上で事前学習し,ターゲットクラス毎のKショットを微調整した検出器を評価するための新しいベンチマークプロトコルであるFoundational FSODを提案する。さらに、現在のfsodベンチマークは、実際にはデータサブセット上の各カテゴリに対する徹底したアノテーションを含むフェデレーションデータセットである点にも注目する。我々はこの知見を利用して、連合的損失を伴う微調整VLMの簡単な戦略を提案する。我々は LVIS と nu Images に対するアプローチの有効性を実証し,5.9 AP による先行作業よりも改善した。 Few-shot object detection (FSOD) benchmarks have advanced techniques for detecting new categories with limited annotations. Existing benchmarks repurpose well-established datasets like COCO by partitioning categories into base and novel classes for pre-training and fine-tuning respectively. However, these benchmarks do not reflect how FSOD is deployed in practice. Rather than only pre-training on a small number of base categories, we argue that it is more practical to fine-tune a foundation model (e.g., a vision-language model (VLM) pre-trained on web-scale data) for a target domain. Surprisingly, we find that zero-shot inference from VLMs like GroundingDINO significantly outperforms the state-of-the-art (48.3 vs. 33.1 AP) on COCO. However, such zero-shot models can still be misaligned to target concepts of interest. For example, trailers on the web may be different from trailers in the context of autonomous vehicles. In this work, we propose Foundational FSOD, a new benchmark protocol that evaluates detectors pre-trained on any external datasets and fine-tuned on K-shots per target class. Further, we note that current FSOD benchmarks are actually federated datasets containing exhaustive annotations for each category on a subset of the data. We leverage this insight to propose simple strategies for fine-tuning VLMs with federated losses. We demonstrate the effectiveness of our approach on LVIS and nuImages, improving over prior work by 5.9 AP.	翻訳日:2023-12-25 15:53:57 公開日:2023-12-22
# 単一画像物体検出のためのコンテキスト拡張トランス Context Enhanced Transformer for Single Image Object Detection ( http://arxiv.org/abs/2312.14492v1 ) ライセンス: Link先を確認	Seungjun An, Seonghoon Park, Gyeongnyeon Kim, Jeongyeol Baek, Byeongwon Lee, Seungryong Kim	(参考訳) 実世界のアプリケーションにおけるビデオデータの重要性が高まっているため、時間情報を利用する効率的なオブジェクト検出手法の必要性が高まっている。既存のビデオオブジェクト検出(VOD)技術では、この課題に対処するための様々な戦略が採用されているが、通常は、近隣のフレームやクリップ内のランダムなサンプル画像に依存する。近年の Transformer ベースのVOD 法は有望な結果を示しているが,時間的情報を組み込むネットワークの複雑さにより,実用性は制限されている。本稿では,新たに設計されたメモリモジュールを用いて,detrに時間的コンテキストを組み込むことにより,コンテキストエンハンストランス(cetr)と呼ばれる単一画像オブジェクト検出手法を提案する。時間情報を効率的に保存するために,データ間で文脈情報を収集するクラスメモリを構築する。さらに,現在の画像の関連メモリを選択的に活用するための分類に基づくサンプリング手法を提案する。本テストでは,テスト分布を考慮し,個々のメモリ機能を更新するテスト時間メモリ適応手法を提案する。 citycamとimagenet vidデータセットを用いた実験は、様々なビデオシステムにおけるフレームワークの効率を示す。プロジェクトページとコードは、https://ku-cvlab.github.io/cetr.com/で利用可能になる。 With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://ku-cvlab.github.io/CETR.	翻訳日:2023-12-25 15:53:17 公開日:2023-12-22
# 言語モデルは同時機械翻訳のための分岐予測器である Language Model is a Branch Predictor for Simultaneous Machine Translation ( http://arxiv.org/abs/2312.14488v1 ) ライセンス: Link先を確認	Aoxiong Yin, Tianyun Zhong, Haoyuan Li, Siliang Tang, Zhou Zhao	(参考訳) 同時機械翻訳(SiMT)の主な目的は、最終翻訳の品質を維持しながらレイテンシを最小限にすることである。本稿では,CPU分岐予測技術からインスピレーションを得て,SiMTタスクに分岐予測技術を取り入れて翻訳遅延を低減することを提案する。具体的には,言語モデルを分岐予測器として活用し,潜在的な分岐方向,すなわち未来語を予測している。その後、予測されたソース語を用いて事前に出力を復号する。実際のソースワードが予測されたソースワードから逸脱すると、実際のソースワードを使用して再び出力をデコードし、予測された出力を置き換える。計算コストをさらに削減するため,エンコーダと分岐予測器のパラメータを共有し,事前学習した言語モデルを用いて初期化を行う。提案手法は任意のSiMTモデルとシームレスに統合できる。広範な実験結果から,本手法は翻訳品質とレイテンシを同時に向上できることが示された。私たちのコードはhttps://github.com/YinAoXiong/simt_branch_predictorで利用可能です。 The primary objective of simultaneous machine translation (SiMT) is to minimize latency while preserving the quality of the final translation. Drawing inspiration from CPU branch prediction techniques, we propose incorporating branch prediction techniques in SiMT tasks to reduce translation latency. Specifically, we utilize a language model as a branch predictor to predict potential branch directions, namely, future source words. Subsequently, we utilize the predicted source words to decode the output in advance. When the actual source word deviates from the predicted source word, we use the real source word to decode the output again, replacing the predicted output. To further reduce computational costs, we share the parameters of the encoder and the branch predictor, and utilize a pre-trained language model for initialization. Our proposed method can be seamlessly integrated with any SiMT model. Extensive experimental results demonstrate that our approach can improve translation quality and latency at the same time. Our code is available at https://github.com/YinAoXiong/simt_branch_predictor .	翻訳日:2023-12-25 15:52:24 公開日:2023-12-22
# 極小核散乱における絡み合い Entanglement in few-nucleon scattering events ( http://arxiv.org/abs/2312.14484v1 ) ライセンス: Link先を確認	Tanja Kirchner, Wael Elkamhawy, Hans-Werner Hammer	(参考訳) 核子と重陽子を含む少数核子散乱過程におけるスピンの絡み合いを調べる。この目的のために、 Beane らが導入した絡み合い力を考える。強相互作用の絡み合い力を定義するために異なる絡み合いエントロピーを分析し、陽子-ニュートロン、中性子重陽子、陽子重陽子、重陽子-重陽子散乱の絡み合い力を計算する。後者の2つのプロセスでは、クーロン相互作用の修正も考慮に入れます。陽子-ニュートロン散乱とは対照的に、中性子-重陽子、陽子-重陽子散乱、重陽子-重陽子散乱におけるスピンの絡み合いには普遍的な低エネルギーの特徴はない。 We investigate the spin entanglement in few-nucleon scattering processes involving nucleons and deuterons. For this purpose, we consider the entanglement power introduced by Beane et al. We analyze different entanglement entropies as a basis to define the entanglement power of the strong interaction and calculate the corresponding entanglement powers for proton-neutron, neutron-deuteron, proton-deuteron, and deuteron-deuteron scattering. For the latter two processes, we also take into account the modification from the Coulomb interaction. In contrast to proton-neutron scattering, no universal low-energy features are evident in the spin entanglement in neutron-deuteron, proton-deuteron, and deuteron-deuteron scattering.	翻訳日:2023-12-25 15:52:05 公開日:2023-12-22
# 手術器具分割のための協調的プロンプト Part to Whole: Collaborative Prompting for Surgical Instrument Segmentation ( http://arxiv.org/abs/2312.14481v1 ) ライセンス: Link先を確認	Wenxi Yue, Jing Zhang, Kun Hu, Qiuxia Wu, Zongyuan Ge, Yong Xia, Jiebo Luo, Zhiyong Wang	(参考訳) Segment Anything Model (SAM)のような基礎モデルでは、ジェネリックオブジェクトセグメンテーションが約束されている。しかし,手術器具のセグメンテーションにSAMを直接適用することは重要な課題である。まずSAMは、外科医とコンピュータの相互作用を複雑にするフレーム単位のポイント・オー・ボックスプロンプトに依存する。また、SAMは、手術前訓練に不十分な手術データ、複雑な構造、各種手術器具の細部の詳細などにより、外科器具の分節化に最適である。これらの課題に対処するため,本論文では,テキスト・プロンプト可能な手術器具のセグメンテーションについて検討し,手術器具の構造知識とSAMの汎用セグメンテーション知識を統合した,新しい効率的なチューニング手法であるSP-SAM(Surgical Part-SAM)を提案する。 Specifically, we achieve this by proposing (1) collaborative prompts in the text form "[part name] of [instrument category name]" that decompose instruments into fine-grained parts; (2) a Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) a Part-to-Whole Selective Fusion and a Hierarchical Decoding strategy that selectively assemble the part-level representations into a whole for accurate instrument segmentation. SP-SAMは、手術器具の構造を理解し、様々なカテゴリーを区別するより良い能力を得る。 EndoVis2018とEndoVis2017の両方のデータセットに対する大規模な実験は、最小限のチューニング可能なパラメータでSP-SAMの最先端のパフォーマンスを示している。コードはhttps://github.com/wenxi-yue/SurgicalPart-SAMにある。 Foundation models like the Segment Anything Model (SAM) have demonstrated promise in generic object segmentation. However, directly applying SAM to surgical instrument segmentation presents key challenges. First, SAM relies on per-frame point-or-box prompts which complicate surgeon-computer interaction. Also, SAM yields suboptimal performance on segmenting surgical instruments, owing to insufficient surgical data in its pre-training as well as the complex structure and fine-grained details of various surgical instruments. To address these challenges, in this paper, we investigate text promptable surgical instrument segmentation and propose SP-SAM (SurgicalPart-SAM), a novel efficient-tuning approach that integrates surgical instrument structure knowledge with the generic segmentation knowledge of SAM. Specifically, we achieve this by proposing (1) collaborative prompts in the text form "[part name] of [instrument category name]" that decompose instruments into fine-grained parts; (2) a Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) a Part-to-Whole Selective Fusion and a Hierarchical Decoding strategy that selectively assemble the part-level representations into a whole for accurate instrument segmentation. Built upon them, SP-SAM acquires a better capability to comprehend surgical instrument structures and distinguish between various categories. Extensive experiments on both the EndoVis2018 and EndoVis2017 datasets demonstrate SP-SAM's state-of-the-art performance with minimal tunable parameters. Code is at https://github.com/wenxi-yue/SurgicalPart-SAM.	翻訳日:2023-12-25 15:51:34 公開日:2023-12-22
# MetaAID 2.5: 大規模言語モデルによるメタバースアプリケーション開発のためのセキュアフレームワーク MetaAID 2.5: A Secure Framework for Developing Metaverse Applications via Large Language Models ( http://arxiv.org/abs/2312.14480v1 ) ライセンス: Link先を確認	Hongyin Zhu	(参考訳) 大規模言語モデル(LLM)は、動的で現実的なコンテンツを生成し、非プレイヤー文字(NPC)の振る舞いを制御するために、メタバース環境でますます使われている。しかし、LSMに関連するサイバーセキュリティの懸念はますます顕著になっている。これまでの研究は主に、セキュリティを強化するためにシステムの脆弱性にパッチを当てることに重点を置いてきたが、これらのアプローチは、仮想空間がより複雑であるMetaverseには適していない。さらに、メタバースにおけるサイバーセキュリティの範囲は大幅に拡大すると予想されている。本稿では,LLMとのユーザインタラクションシミュレーションによるサイバーセキュリティ向上手法を提案する。我々の目標は、ユーザーを教育し、総合的なシミュレーションシステムに触れることで防衛能力を強化することである。このシステムには広範なMetaverseサイバーセキュリティQ&Aと攻撃シミュレーションシナリオが含まれる。ユーザーはこれらのリスクに関わり、リスクを認識し、耐えられる能力を向上させる。さらに,ユーザ入力の倫理的意味に対処するため,5次元のユーザコンテンツを評価するための評価器としてLLMを提案する。さらに、語彙拡張トレーニングを通じてモデルに適応し、パーソナライズされた入力やエモティコンをよりよく理解する。複数のLLM実験を行い,本手法が有効であることを確認した。 Large language models (LLMs) are increasingly being used in Metaverse environments to generate dynamic and realistic content and to control the behavior of non-player characters (NPCs). However, the cybersecurity concerns associated with LLMs have become increasingly prominent. Previous research has primarily focused on patching system vulnerabilities to enhance cybersecurity, but these approaches are not well-suited to the Metaverse, where the virtual space is more complex, LLMs are vulnerable, and ethical user interaction is critical. Moreover, the scope of cybersecurity in the Metaverse is expected to expand significantly. This paper proposes a method for enhancing cybersecurity through the simulation of user interaction with LLMs. Our goal is to educate users and strengthen their defense capabilities through exposure to a comprehensive simulation system. This system includes extensive Metaverse cybersecurity Q&A and attack simulation scenarios. By engaging with these, users will improve their ability to recognize and withstand risks. Additionally, to address the ethical implications of user input, we propose using LLMs as evaluators to assess user content across five dimensions. We further adapt the models through vocabulary expansion training to better understand personalized inputs and emoticons. We conduct experiments on multiple LLMs and find that our approach is effective.	翻訳日:2023-12-25 15:51:08 公開日:2023-12-22
# 入出力協調蒸留による連合学習 Federated Learning via Input-Output Collaborative Distillation ( http://arxiv.org/abs/2312.14478v1 ) ライセンス: Link先を確認	Xuan Gong, Shanglin Li, Yuxiang Bao, Barry Yao, Yawen Huang, Ziyan Wu, Baochang Zhang, Yefeng Zheng, David Doermann	(参考訳) Federated Learning(FL)は、個別に保持されたプライベートデータを共有せずに、分散ローカルノードが協調的に中央モデルをトレーニングする機械学習パラダイムである。既存のFLメソッドは、ローカルモデルパラメータを反復的に共有するか、共蒸留をデプロイする。しかし、前者はプライベートデータ漏洩の影響を受けやすく、後者の設計はタスク関連実データの前提条件に依存している。代わりに,直接入力と出力空間利用を用いた局所-中央協調蒸留に基づくデータフリーflフレームワークを提案する。我々の設計では、知識を伝達するための再帰的ローカルパラメータ交換や補助タスク関連データの要求を排除し、ローカルユーザーに直接プライバシ制御を行う。特に,ローカルモデル間の固有のデータの不均一性に対処するために,各ローカルモデルが各専門知識を表現するためのコンセンサスかつユニークな結果を生成する入力を蒸留することを学ぶ。提案するFLフレームワークは,自然画像と医用画像の両方において,実世界の異質なフェデレーション学習環境下での画像分類とセグメンテーションタスクに関する広範な実験により,顕著なプライバシー利用トレードオフを実現する。 Federated learning (FL) is a machine learning paradigm in which distributed local nodes collaboratively train a central model without sharing individually held private data. Existing FL methods either iteratively share local model parameters or deploy co-distillation. However, the former is highly susceptible to private data leakage, and the latter design relies on the prerequisites of task-relevant real data. Instead, we propose a data-free FL framework based on local-to-central collaborative distillation with direct input and output space exploitation. Our design eliminates any requirement of recursive local parameter exchange or auxiliary task-relevant data to transfer knowledge, thereby giving direct privacy control to local users. In particular, to cope with the inherent data heterogeneity across locals, our technique learns to distill input on which each local model produces consensual yet unique results to represent each expertise. Our proposed FL framework achieves notable privacy-utility trade-offs with extensive experiments on image classification and segmentation tasks under various real-world heterogeneous federated learning settings on both natural and medical images.	翻訳日:2023-12-25 15:50:50 公開日:2023-12-22
# MonoLSS: モノクロ3D検出のための学習可能なサンプル選択 MonoLSS: Learnable Sample Selection For Monocular 3D Detection ( http://arxiv.org/abs/2312.14474v1 ) ライセンス: Link先を確認	Zhenjia Li and Jinrang Jia and Yifeng Shi	(参考訳) 自律運転の分野では、1つのRGB画像における物体の3次元特性(深さ、寸法、方向)を推定する1つの重要なタスクである。以前の作品では、不適切な特徴が悪影響を及ぼすことを考慮せずに、ヒューリスティックな方法で3d特性を学ぶために機能を使用してきた。本稿では, 3d 特性を回帰させるために適切なサンプルのみを訓練することを提案する。サンプルを適応的に選択するために,Gumbel-Softmaxと相対距離サンプル分割器をベースとしたLearningable Sample Selection (LSS)モジュールを提案する。 LSSモジュールはウォームアップ戦略の下で動作し、トレーニングの安定性が向上する。さらに、3Dプロパティのサンプル選択専用のLSSモジュールは、オブジェクトレベルの特徴に依存しているため、曖昧さを伴わずに画像の原理に適合した3Dプロパティのサンプルを濃縮するMixUp3Dというデータ拡張手法をさらに発展させる。 2つの直交法として、LSSモジュールとMixUp3Dは独立または共同で使用できる。十分な実験により、それらの組み合わせが相乗効果をもたらし、個々のアプリケーションの合計を超越する改善をもたらすことが示されている。 LSSモジュールとMixUp3Dを利用すると、余分なデータなしでMonoLSSというメソッドがKITTIの3Dオブジェクト検出ベンチマークで3つのカテゴリ(カー、サイクリスト、ペデストリアン)で1位にランクされ、WaymoデータセットとKITTI-nuScenesのクロスデータセット評価の両方で競合する結果が得られる。コードは補助資料に含まれており、関連する学術・工業研究を促進するためにリリースされる。 In the field of autonomous driving, monocular 3D detection is a critical task which estimates 3D properties (depth, dimension, and orientation) of objects in a single RGB image. Previous works have used features in a heuristic way to learn 3D properties, without considering that inappropriate features could have adverse effects. In this paper, sample selection is introduced that only suitable samples should be trained to regress the 3D properties. To select samples adaptively, we propose a Learnable Sample Selection (LSS) module, which is based on Gumbel-Softmax and a relative-distance sample divider. The LSS module works under a warm-up strategy leading to an improvement in training stability. Additionally, since the LSS module dedicated to 3D property sample selection relies on object-level features, we further develop a data augmentation method named MixUp3D to enrich 3D property samples which conforms to imaging principles without introducing ambiguity. As two orthogonal methods, the LSS module and MixUp3D can be utilized independently or in conjunction. Sufficient experiments have shown that their combined use can lead to synergistic effects, yielding improvements that transcend the mere sum of their individual applications. Leveraging the LSS module and the MixUp3D, without any extra data, our method named MonoLSS ranks 1st in all three categories (Car, Cyclist, and Pedestrian) on KITTI 3D object detection benchmark, and achieves competitive results on both the Waymo dataset and KITTI-nuScenes cross-dataset evaluation. The code is included in the supplementary material and will be released to facilitate related academic and industrial studies.	翻訳日:2023-12-25 15:50:31 公開日:2023-12-22
# すべてのタスクが同じくらい難しいわけではない:動的深さルーティングによるマルチタスク強化学習 Not All Tasks Are Equally Difficult: Multi-Task Reinforcement Learning with Dynamic Depth Routing ( http://arxiv.org/abs/2312.14472v1 ) ライセンス: Link先を確認	Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng	(参考訳) マルチタスク強化学習は、一つのポリシーで異なるタスクセットを達成する。複数のタスクにまたがるパラメータを共有することでデータ効率を向上させるため、一般的なプラクティスでは、ネットワークを異なるモジュールに分割し、これらのモジュールをタスク固有のポリシーに再結合するようにルーティングネットワークを訓練する。しかしながら、既存のルーティングアプローチでは、すべてのタスクに一定数のモジュールを使用するため、さまざまな困難を伴うタスクには通常、さまざまな知識が必要になることを無視する。この研究は動的深度ルーティング(D2R)フレームワークを示し、特定の中間モジュールの戦略的スキップを学習し、各タスクに対して異なる数のモジュールを柔軟に選択する。この枠組みでは,オフ・ポリシー・トレーニング中の行動と対象ポリシーの異なる経路の問題に対処するための再ルーティング手法についても紹介する。さらに,マスタードタスクのルーティングを乱すことなく,未マスタータスクの経路探索を継続させる自動経路バランス機構の設計を行った。メタワールドベンチマークでは,D2Rが最先端性能を実現し,学習効率が大幅に向上した。 Multi-task reinforcement learning endeavors to accomplish a set of different tasks with a single policy. To enhance data efficiency by sharing parameters across multiple tasks, a common practice segments the network into distinct modules and trains a routing network to recombine these modules into task-specific policies. However, existing routing approaches employ a fixed number of modules for all tasks, neglecting that tasks with varying difficulties commonly require varying amounts of knowledge. This work presents a Dynamic Depth Routing (D2R) framework, which learns strategic skipping of certain intermediate modules, thereby flexibly choosing different numbers of modules for each task. Under this framework, we further introduce a ResRouting method to address the issue of disparate routing paths between behavior and target policies during off-policy training. In addition, we design an automatic route-balancing mechanism to encourage continued routing exploration for unmastered tasks without disturbing the routing of mastered ones. We conduct extensive experiments on various robotics manipulation tasks in the Meta-World benchmark, where D2R achieves state-of-the-art performance with significantly improved learning efficiency.	翻訳日:2023-12-25 15:49:58 公開日:2023-12-22
# プロトタイプを用いたクロスモーダル物体追跡 Prototype-based Cross-Modal Object Tracking ( http://arxiv.org/abs/2312.14471v1 ) ライセンス: Link先を確認	Lei Liu, Chenglong Li, Futian Wang, Longfeng Shen, and Jin Tang	(参考訳) クロスモーダル物体追跡は情報融合分野における重要な研究課題であり、切替可能な可視光と近赤外モードを統合することで、困難なシナリオにおける画像制限に対処することを目的としている。しかし,既存の追跡手法では,モダリティスイッチの存在下での客観性の変化に適応することが困難である。例えば、モデル更新に基づくトラッキング手法は、モダリティ切り替え中に安定したトラッキング結果を維持するのに苦労し、エラーの蓄積とモデルドリフトにつながる。テンプレートベースのトラッキング手法は、最初のフレームおよび/または最後のフレームからのテンプレート情報のみに依存している。この問題に対処するために,prototrackと呼ばれるプロトタイプベースのクロスモーダルオブジェクトトラッカを提案する。特に,対象情報を表すマルチモーダルプロトタイプを,第1フレームからの固定サンプルと異なるモダリティの2つの代表サンプルを含む,多種多様なサンプルで設計する。さらに、2つの新しいモジュールに基づくプロトタイプ生成アルゴリズムを開発し、異なる課題におけるプロトタイプ代表性を保証する。 Cross-modal object tracking is an important research topic in the field of information fusion, and it aims to address imaging limitations in challenging scenarios by integrating switchable visible and near-infrared modalities. However, existing tracking methods face some difficulties in adapting to significant target appearance variations in the presence of modality switch. For instance, model update based tracking methods struggle to maintain stable tracking results during modality switching, leading to error accumulation and model drift. Template based tracking methods solely rely on the template information from first frame and/or last frame, which lacks sufficient representation ability and poses challenges in handling significant target appearance changes. To address this problem, we propose a prototype-based cross-modal object tracker called ProtoTrack, which introduces a novel prototype learning scheme to adapt to significant target appearance variations, for cross-modal object tracking. In particular, we design a multi-modal prototype to represent target information by multi-kind samples, including a fixed sample from the first frame and two representative samples from different modalities. Moreover, we develop a prototype generation algorithm based on two new modules to ensure the prototype representative in different challenges......	翻訳日:2023-12-25 15:49:37 公開日:2023-12-22
# 即時制約による安全強化学習:積極的な探索の役割 Safe Reinforcement Learning with Instantaneous Constraints: The Role of Aggressive Exploration ( http://arxiv.org/abs/2312.14470v1 ) ライセンス: Link先を確認	Honghao Wei, Xin Liu, Lei Ying	(参考訳) 本稿では,線形関数近似による安全強化学習(safe rl)と,各ステップで安全でない動作を回避すべき厳密な瞬時制約について検討する。既存の研究では、厳密な瞬間制約を持つ安全なRLが検討されているが、そのアプローチはいくつかの重要な仮定に依存している。 (i)$ the rl agent は {\it every} 状態の安全なアクションセットを知っているか、あるいはすべての状態アクション状態トリプルが安全であるような {\it safe graph} を知っている。 (ii)$ 制約/コスト関数は線型である。本稿では,仮定なしで短時間の制約付き安全なRLを考える。 (i)$ と generalize $ (ii)Kernel Hilbert Space (RKHS)を再生するために。提案アルゴリズムであるLSVI-AEは,コスト関数が線形な場合のハード制約違反を$\tilde{\cO}(\sqrt{d^3H^4K})$後悔と$\tilde{\cO}(H \sqrt{dK})$コスト関数がRKHSに属する場合のハード制約違反を$\cO(H\gamma_K \sqrt{K})$ハード制約違反を達成している。ここで$K$は学習の地平線、$H$は各エピソードの長さ、$\gamma_K$はコスト関数の近似に使用されるカーネルの情報ゲインである。本論文では,学習用地平線への最適依存性をK$で実現し,LSVI-AEの効率性を実証した。特に,本手法の設計は積極的政策探索を奨励し,一般費用関数による安全RLのユニークな視点と,独立性のある安全行動に関する事前の知識を提供する。 This paper studies safe Reinforcement Learning (safe RL) with linear function approximation and under hard instantaneous constraints where unsafe actions must be avoided at each step. Existing studies have considered safe RL with hard instantaneous constraints, but their approaches rely on several key assumptions: $(i)$ the RL agent knows a safe action set for {\it every} state or knows a {\it safe graph} in which all the state-action-state triples are safe, and $(ii)$ the constraint/cost functions are {\it linear}. In this paper, we consider safe RL with instantaneous hard constraints without assumption $(i)$ and generalize $(ii)$ to Reproducing Kernel Hilbert Space (RKHS). Our proposed algorithm, LSVI-AE, achieves $\tilde{\cO}(\sqrt{d^3H^4K})$ regret and $\tilde{\cO}(H \sqrt{dK})$ hard constraint violation when the cost function is linear and $\cO(H\gamma_K \sqrt{K})$ hard constraint violation when the cost function belongs to RKHS. Here $K$ is the learning horizon, $H$ is the length of each episode, and $\gamma_K$ is the information gain w.r.t the kernel used to approximate cost functions. Our results achieve the optimal dependency on the learning horizon $K$, matching the lower bound we provide in this paper and demonstrating the efficiency of LSVI-AE. Notably, the design of our approach encourages aggressive policy exploration, providing a unique perspective on safe RL with general cost functions and no prior knowledge of safe actions, which may be of independent interest.	翻訳日:2023-12-25 15:49:16 公開日:2023-12-22
# FM-OV3D:オープン語彙検出のための基礎モデルに基づくクロスモーダル知識ブレンディング FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection ( http://arxiv.org/abs/2312.14465v1 ) ライセンス: Link先を確認	Dongmei Zhang, Chang Li, Ray Zhang, Shenghao Xie, Wei Xue, Xiaodong Xie, Shanghang Zhang	(参考訳) 様々な視覚タスクにおける事前訓練された基礎モデルの優れた性能は、2Dモデルのオープン語彙能力を高める可能性を示している。既存の方法は3D空間における類似の応用を探索する。しかし、そのほとんどは特異基盤モデルからの知識抽出のみに集中しており、3次元モデルの開語彙能力を制限している。様々な基礎モデルから相補的な事前学習知識を活用することで、2次元事前学習された視覚言語モデルから3次元空間への知識伝達を改善することができると仮定する。本研究では,複数の事前学習基礎モデルの知識をブレンドすることで,3次元モデルのオープンな局所化と認識能力を向上し,本来の3次元データセットの制約に直面することなく真のオープンな語彙を実現する,基礎モデルに基づくクロスモーダル知識ブレンディング法FM-OV3Dを提案する。具体的には, 開語彙3次元定位能力を学ぶために, 接地セグメンツモデルにおける開語彙定位知識を採用する。オープン語彙の3D認識能力には,GPT-3や安定拡散モデルなどの生成基盤モデルの知識とCLIPのような相互識別モデルを活用する。オープンボカブラリ3dオブジェクト検出のための2つの人気のあるベンチマーク実験の結果から,複数のファンデーションモデルから知識を効率的に学習し,オープンボカブラリ3dオブジェクト検出タスクにおいて,オープンボカブラリモデルのオープンボカブラリ能力を高め,最先端のパフォーマンスを達成することができた。コードはhttps://github.com/dmzhang0425/fm-ov3d.gitでリリースされる。 The superior performances of pre-trained foundation models in various visual tasks underscore their potential to enhance the 2D models' open-vocabulary ability. Existing methods explore analogous applications in the 3D space. However, most of them only center around knowledge extraction from singular foundation models, which limits the open-vocabulary ability of 3D models. We hypothesize that leveraging complementary pre-trained knowledge from various foundation models can improve knowledge transfer from 2D pre-trained visual language models to the 3D space. In this work, we propose FM-OV3D, a method of Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection, which improves the open-vocabulary localization and recognition abilities of 3D model by blending knowledge from multiple pre-trained foundation models, achieving true open-vocabulary without facing constraints from original 3D datasets. Specifically, to learn the open-vocabulary 3D localization ability, we adopt the open-vocabulary localization knowledge of the Grounded-Segment-Anything model. For open-vocabulary 3D recognition ability, We leverage the knowledge of generative foundation models, including GPT-3 and Stable Diffusion models, and cross-modal discriminative models like CLIP. The experimental results on two popular benchmarks for open-vocabulary 3D object detection show that our model efficiently learns knowledge from multiple foundation models to enhance the open-vocabulary ability of the 3D model and successfully achieves state-of-the-art performance in open-vocabulary 3D object detection tasks. Code is released at https://github.com/dmzhang0425/FM-OV3D.git.	翻訳日:2023-12-25 15:48:39 公開日:2023-12-22
# CaptainCook4D: 手続き的アクティビティにおけるエラーを理解するデータセット CaptainCook4D: A dataset for understanding errors in procedural activities ( http://arxiv.org/abs/2312.14556v1 ) ライセンス: Link先を確認	Rohith Peddi, Shivvrat Arya, Bharath Challa, Likhitha Pallapothula, Akshay Vyas, Jikai Wang, Qifan Zhang, Vasundhara Komaragiri, Eric Ragan, Nicholas Ruozzi, Yu Xiang, Vibhav Gogate	(参考訳) ステップバイステップの手順は、日常生活において個人が行う様々な活動に不可欠な要素である。これらの手順は、家具の組み立てやレシピの作成など、目標を効率的に達成するための指針となる。しかし、手続き活動の複雑さと持続性は本質的にエラーを起こす可能性を高める。このような手続き的アクティビティを一連のフレームから理解することは、視覚情報の正確な解釈とアクティビティの構造を推論する能力を必要とする難しいタスクである。そこで,本研究では,キッチン環境でレシピを行う384人の記録(94.5時間)からなる,エゴセントリックな4dデータセットcaptaincook4dを収集した。このデータセットは、2つの異なるタイプのアクティビティで構成されている。1つは参加者が提供されたレシピの指示に従属し、もう1つはエラーを逸脱し誘発する。我々は5.3Kステップアノテーションと10Kきめ細かいアクションアノテーションを提供し、以下のタスクのデータセットをベンチマークする:教師付きエラー認識、マルチステップローカライゼーション、手続き学習。 Following step-by-step procedures is an essential component of various activities carried out by individuals in their daily lives. These procedures serve as a guiding framework that helps to achieve goals efficiently, whether it is assembling furniture or preparing a recipe. However, the complexity and duration of procedural activities inherently increase the likelihood of making errors. Understanding such procedural activities from a sequence of frames is a challenging task that demands an accurate interpretation of visual information and the ability to reason about the structure of the activity. To this end, we collect a new egocentric 4D dataset, CaptainCook4D, comprising 384 recordings (94.5 hours) of people performing recipes in real kitchen environments. This dataset consists of two distinct types of activity: one in which participants adhere to the provided recipe instructions and another in which they deviate and induce errors. We provide 5.3K step annotations and 10K fine-grained action annotations and benchmark the dataset for the following tasks: supervised error recognition, multistep localization, and procedure learning	翻訳日:2023-12-25 15:41:29 公開日:2023-12-22
# 構造誘導材料のための機械学習とプロセス設計 Machine learning for structure-guided materials and process design ( http://arxiv.org/abs/2312.14552v1 ) ライセンス: Link先を確認	Lukas Morand, Tarek Iraki, Johannes Dornheim, Stefan Sandfeld, Norbert Link, Dirk Helm	(参考訳) 近年は、研究と産業の両方において、材料革新の加速への関心が高まっている。しかし、新しい先端材料の開発に真に価値を加えるためには、製造工程を考慮し、下流のプロセス設計アプローチをサポートする材料設計アプローチを調整することが不可欠である。この方向への大きなステップとして、材料プロセス-構造-プロパティチェーン全体を網羅する全体最適化アプローチを提案する。本手法では,2つの重要な識別問題に対処するために,機械学習技術を用いる。 1つ目は、望まれるマクロな特性を示す準最適材料構造を識別する材料設計問題の解決である。 2つ目は、これらの材料構造を製造するための最適な処理経路を見つけるプロセス設計問題を解決することである。どちらの識別問題も典型的には不十分であり、ソリューションアプローチにおいて重要な課題となる。しかし、これらの問題の非特異性もまた、処理に重要な利点をもたらす: 同様に機能するターゲット構造を複数持つことにより、対応するプロセスは、最適な到達可能な構造を製造するために効率的にガイドすることができる。特に,材料設計のためのマルチタスク学習に基づく最適化手法と組み合わせて,プロセス設計に深層強化学習を適用する。このアプローチの機能は、金属成形プロセスにおいて所望の特性を有する結晶テクスチャを製造するために使用することで実証される。 In recent years, there has been a growing interest in accelerated materials innovation in both, research and industry. However, to truly add value to the development of new advanced materials, it is inevitable to take into account manufacturing processes and thereby tailor materials design approaches to support downstream process design approaches. As a major step into this direction, we present a holistic optimization approach that covers the entire materials process-structure-property chain. Our approach specifically employs machine learning techniques to address two critical identification problems. The first is to solve a materials design problem, which involves identifying near-optimal material structures that exhibit desired macroscopic properties. The second is to solve a process design problem that is to find an optimal processing path to manufacture these material structures. Both identification problems are typically ill-posed, which presents a significant challenge for solution approaches. However, the non-unique nature of these problems also offers an important advantage for processing: By having several target structures that perform similarly well, the corresponding processes can be efficiently guided towards manufacturing the best reachable structure. In particular, we apply deep reinforcement learning for process design in combination with a multi-task learning-based optimization approach for materials design. The functionality of the approach will be demonstrated by using it to manufacture crystallographic textures with desired properties in a metal forming process.	翻訳日:2023-12-25 15:41:13 公開日:2023-12-22
# 因果独立源を持つ絡み合いスワッピング量子ネットワークにおける実量子理論の排除の提案 Proposals for ruling out the real quantum theories in an entanglement-swapping quantum network with causally independent sources ( http://arxiv.org/abs/2312.14547v1 ) ライセンス: Link先を確認	Jian Yao, Hu Chen, Ya-Li Mao, Zheng-Da Li, Jingyun Fan	(参考訳) 量子論における複素数の役割に関する問題は、量子力学の開始以来議論されてきた。近年,ベル非局所性テスト手法に基づく実量子論と複素量子論の区別が実現可能な提案が現れた [nature 600, 625-629 (2021)]。この方法に基づいて、実量子論は光速量子系と超伝導量子系の両方(Phys. Lett. 128, 040402 (2022), Phys. Lett. 128, 040403 (2022)]で実験的にファルシファイドされている。因果関係のない複数の独立したソースを持つ量子ネットワークは、非局所性の研究に新たな視点を提供するため、大きな関心を集めている。これらのソースの独立性は、観測可能な共分散にさらなる制約を課し、古典的および量子的相関に対する新しい境界をもたらす。本研究では,2つの源が因果独立であるという強い仮定の下で,絡み合いのシナリオを持つ実数理論と複素数理論の区別について検討した。改良されたNavascu\'es-Pironio-Ac\in法とベイジアン最適化を用いて、相関関数の最適係数を用いて、既存の提案と比較した実数理論と量子理論をより大きく区別できる提案を求める。この研究は、因果独立なパーティを特徴とする複雑な量子ネットワーク内の実と複素量子理論の識別をさらに探求するための道を開く。 The question of whether complex numbers play a fundamental role in quantum theory has been debated since the inception of quantum mechanics. Recently, a feasible proposal to differentiate between real and complex quantum theories based on the technique of testing Bell nonlocalities has emerged [Nature 600, 625-629 (2021)]. Based on this method, the real quantum theory has been falsified experimentally in both photonic and superconducting quantum systems [Phys. Rev. Lett. 128, 040402 (2022), Phys. Rev. Lett. 128, 040403 (2022)]. The quantum networks with multiple independent sources which are not causally connected have gained significant interest as they offer new perspective on studying the nonlocalities. The independence of these sources imposes additional constraints on observable covariances and leads to new bounds for classical and quantum correlations. In this study, we examine the discrimination between the real and complex quantum theories with an entanglement swapping scenario under a stronger assumption that the two sources are causally independent, which wasn't made in previous works. Using a revised Navascu\'es-Pironio-Ac\'in method and Bayesian optimization, we find a proposal with optimal coefficients of the correlation function which could give a larger discrimination between the real and quantum theories comparing with the existing proposals. This work opens up avenues for further exploration of the discrimination between real and complex quantum theories within intricate quantum networks featuring causally independent parties.	翻訳日:2023-12-25 15:40:54 公開日:2023-12-22
# パスポートフォーマットへの顔画像の包括的正規化 Inclusive normalization of face images to passport format ( http://arxiv.org/abs/2312.14544v1 ) ライセンス: Link先を確認	Hongliu Cao, Minh Nhat Do, Alexis Ravanel, Eoin Thomas	(参考訳) 近年、顔認識は現実世界のアプリケーションでますます使われている。しかし、肌の色バイアスと過酷な照明などの個人内変異を組み合わせると、人間の検査中にも顔認識タスクが失敗する可能性が高くなる。顔の正規化手法は、同一性を保ちながら入力画像から個人内変動を取り除き、このような課題に対処しようとする。しかし、ほとんどの顔の正規化法は1つまたは2つのバリエーションだけを取り除き、肌の色バイアスのようなデータセットバイアスを無視することができる。多くの顔正規化法の出力も人間の観察者には現実的ではない。本研究では、ポーズ、悪い照明、低解像度、ぼやけ、表情、サングラスのようなアクセサリーなどの大きな変化を含む、ほとんどの個人内変異を取り除くために、スタイルベースの顔正規化モデル(StyleFNM)を提案する。この論文では、事前訓練されたGANを制御して、パスポートのような画像のバランスの取れたデータセットを生成することにより、データセットバイアスも扱う。実験により、StyleFNMはよりリアルな出力を生成でき、顔認識システムの精度と公平性を大幅に向上できることが示された。 Face recognition has been used more and more in real world applications in recent years. However, when the skin color bias is coupled with intra-personal variations like harsh illumination, the face recognition task is more likely to fail, even during human inspection. Face normalization methods try to deal with such challenges by removing intra-personal variations from an input image while keeping the identity the same. However, most face normalization methods can only remove one or two variations and ignore dataset biases such as skin color bias. The outputs of many face normalization methods are also not realistic to human observers. In this work, a style based face normalization model (StyleFNM) is proposed to remove most intra-personal variations including large changes in pose, bad or harsh illumination, low resolution, blur, facial expressions, and accessories like sunglasses among others. The dataset bias is also dealt with in this paper by controlling a pretrained GAN to generate a balanced dataset of passport-like images. The experimental results show that StyleFNM can generate more realistic outputs and can improve significantly the accuracy and fairness of face recognition systems.	翻訳日:2023-12-25 15:40:26 公開日:2023-12-22
# 言語横断要約のための自動データ検索 Automatic Data Retrieval for Cross Lingual Summarization ( http://arxiv.org/abs/2312.14542v1 ) ライセンス: Link先を確認	Nikhilesh Bhatnagar, Ashok Urlana, Vandan Mujadia, Pruthwik Mishra, Dipti Misra Sharma	(参考訳) 言語間の要約では、ある言語で書かれたテキストを別の言語に要約する。英語から他のヨーロッパ諸言語への言語間要約を扱う研究機関がある。本研究では,英語からヒンディー語への言語間要約を実現することを目的とする。テキストとビデオのフォーマットでニュースにふさわしいイベントのカバレッジをペアリングすることで、クロスランガル要約のためのデータ取得に役立つことを示す。本稿では,データを分析し,文書と要約のペアとして機能するビデオ記述とをマッチングする手法を提案する。また,要約の正しさを確保するため,合理的なしきい値に対するフィルタリング手法を概説する。さらに、28,583のmonoおよびcross-lingual article-summary pairs https://github.com/tingc9/cross-sum-news-alignedを利用可能にする。また、収集したデータの複数のベースラインを構築し分析し、エラーを報告します。 Cross-lingual summarization involves the summarization of text written in one language to a different one. There is a body of research addressing cross-lingual summarization from English to other European languages. In this work, we aim to perform cross-lingual summarization from English to Hindi. We propose pairing up the coverage of newsworthy events in textual and video format can prove to be helpful for data acquisition for cross lingual summarization. We analyze the data and propose methods to match articles to video descriptions that serve as document and summary pairs. We also outline filtering methods over reasonable thresholds to ensure the correctness of the summaries. Further, we make available 28,583 mono and cross-lingual article-summary pairs https://github.com/tingc9/Cross-Sum-News-Aligned. We also build and analyze multiple baselines on the collected data and report error analysis.	翻訳日:2023-12-25 15:40:05 公開日:2023-12-22
# 戦略学習による適応的再収束駆動AIG書き換え Adaptive Reconvergence-driven AIG Rewriting via Strategy Learning ( http://arxiv.org/abs/2312.14536v1 ) ライセンス: Link先を確認	Liwei Ni, Zonglin Yang, Jiaxi Zhang, Junfeng Liu, Huawei Li, Biwei Xie, Xinquan Li	(参考訳) リライトは、回路の性能、パワー、面積(PPA)を改善することを目的とした論理合成における一般的な手順である。従来のreconvergence-driven and-inverter graph (aig) rewriting法では、ブール代数の最小化によるreconvergence coneの最適化にのみ焦点が当てられている。しかし、特定のコーンに適した他のノード書き換えアルゴリズムを組み込む機会がある。本稿では,マルチストラテジーに基づくAIG書き換えと戦略学習に基づくアルゴリズム選択という,2つの重要な手法を組み合わせた適応型再収束型AIG書き換えアルゴリズムを提案する。マルチストラテジーベースの書き換え手法は、マルチノード書き換えアルゴリズムのサポートを取り入れ、最適化空間を拡大することで従来のアプローチに拡張する。さらに、戦略学習に基づくアルゴリズム選択法は、与えられたコーンに対して最も適切なノード書き換えアルゴリズムを決定する。実験の結果,本手法は5.567\%,深さ5.327\%の有意な改善が得られた。 Rewriting is a common procedure in logic synthesis aimed at improving the performance, power, and area (PPA) of circuits. The traditional reconvergence-driven And-Inverter Graph (AIG) rewriting method focuses solely on optimizing the reconvergence cone through Boolean algebra minimization. However, there exist opportunities to incorporate other node-rewriting algorithms that are better suited for specific cones. In this paper, we propose an adaptive reconvergence-driven AIG rewriting algorithm that combines two key techniques: multi-strategy-based AIG rewriting and strategy learning-based algorithm selection. The multi-strategy-based rewriting method expands upon the traditional approach by incorporating support for multi-node-rewriting algorithms, thus expanding the optimization space. Additionally, the strategy learning-based algorithm selection method determines the most suitable node-rewriting algorithm for a given cone. Experimental results demonstrate that our proposed method yields a significant average improvement of 5.567\% in size and 5.327\% in depth.	翻訳日:2023-12-25 15:39:51 公開日:2023-12-22
# ADA-GAD:グラフ異常検出用自動符号化器 ADA-GAD: Anomaly-Denoised Autoencoders for Graph Anomaly Detection ( http://arxiv.org/abs/2312.14535v1 ) ライセンス: Link先を確認	Junwei He, Qianqian Xu, Yangbangyan Jiang, Zitai Wang, Qingming Huang	(参考訳) グラフ異常検出(graph anomaly detection)は、グラフ内の通常の振る舞いから逸脱するノードを特定する上で極めて重要である。既存の再構成に基づく手法はかなり成功したが、グラフの異常パターンによって生じる \textit{Anomaly Overfitting} と \textit{Homophily Trap} の問題に直面し、通常のノードが異常なノードよりもよく再構成されるという仮定を破ることがある。その結果,異常の少ないグラフで学習したモデルの方が検出性能が高いことがわかった。この知見に基づき、我々はAnomaly-Denoized Autoencoders for Graph Anomaly Detection (ADA-GAD)と呼ばれる新しい2段階のフレームワークを導入する。第1段階では,異常レベルを低減したグラフを生成する学習自由な異常化拡張法を設計する。グラフオートエンコーダを複数のレベルで事前訓練することにより、グラフオートエンコーダは通常のパターンをキャプチャできる。次の段階では、デコーダは元のグラフで検出するために再訓練され、前段で学んだマルチレベル表現の恩恵を受ける。一方、ノード異常分布正規化を提案し、さらに \textit{anomaly overfitting} を緩和する。本手法の有効性を検証するために,合成データと実世界データの両方について広範な実験を行った。 Graph anomaly detection is crucial for identifying nodes that deviate from regular behavior within graphs, benefiting various domains such as fraud detection and social network. Although existing reconstruction-based methods have achieved considerable success, they may face the \textit{Anomaly Overfitting} and \textit{Homophily Trap} problems caused by the abnormal patterns in the graph, breaking the assumption that normal nodes are often better reconstructed than abnormal ones. Our observations indicate that models trained on graphs with fewer anomalies exhibit higher detection performance. Based on this insight, we introduce a novel two-stage framework called Anomaly-Denoised Autoencoders for Graph Anomaly Detection (ADA-GAD). In the first stage, we design a learning-free anomaly-denoised augmentation method to generate graphs with reduced anomaly levels. We pretrain graph autoencoders on these augmented graphs at multiple levels, which enables the graph autoencoders to capture normal patterns. In the next stage, the decoders are retrained for detection on the original graph, benefiting from the multi-level representations learned in the previous stage. Meanwhile, we propose the node anomaly distribution regularization to further alleviate \textit{Anomaly Overfitting}. We validate the effectiveness of our approach through extensive experiments on both synthetic and real-world datasets.	翻訳日:2023-12-25 15:39:30 公開日:2023-12-22
# 個人情報のないユーザマッチングのための多視点ユーザ表現学習 Multi-view user representation learning for user matching without personal information ( http://arxiv.org/abs/2312.14533v1 ) ライセンス: Link先を確認	Hongliu Cao, Ilias El Baamrani, Eoin Thomas	(参考訳) 旅行産業のデジタル化が加速するにつれ、旅行者の行動の分析と理解がますます重要になる。しかし,旅行者データには,旅行プロバイダとのユーザインタラクションの頻度が比較的低いため,高いデータ間隔が生じることが多い。この効果を複雑にすることで、デバイス、アカウント、プラットフォームをオンラインでブラウジングするときに、データ分散ももたらされる。これらの課題に対処するために、確率的トラベラーマッチングが使用できる。トラベラーのブラウジング履歴は一般的に短く、旅行業界のurlは多くのトークンと非常に異質であるので、既存のユーザマッチングソリューションのほとんどはトラベラーマッチングには適していない。これらの課題に対処するために、類似性に基づく多視点情報融合を提案し、URLを多視点データとして扱うことにより、URLからより良いユーザ表現を学習する。実験の結果,提案した多視点ユーザ表現学習は,異なるビューからの相補的な情報を活用でき,URLのキー情報を強調表示し,ユーザマッチングタスクの他の表現学習ソリューションよりもはるかに優れた性能を発揮することがわかった。 As the digitization of travel industry accelerates, analyzing and understanding travelers' behaviors becomes increasingly important. However, traveler data frequently exhibit high data sparsity due to the relatively low frequency of user interactions with travel providers. Compounding this effect the multiplication of devices, accounts and platforms while browsing travel products online also leads to data dispersion. To deal with these challenges, probabilistic traveler matching can be used. Most existing solutions for user matching are not suitable for traveler matching as a traveler's browsing history is typically short and URLs in the travel industry are very heterogeneous with many tokens. To deal with these challenges, we propose the similarity based multi-view information fusion to learn a better user representation from URLs by treating the URLs as multi-view data. The experimental results show that the proposed multi-view user representation learning can take advantage of the complementary information from different views, highlight the key information in URLs and perform significantly better than other representation learning solutions for the user matching task.	翻訳日:2023-12-25 15:39:07 公開日:2023-12-22
# DuaLight: シナリオ特有かつシナリオ共有知識を活用した交通信号制御の強化 DuaLight: Enhancing Traffic Signal Control by Leveraging Scenario-Specific and Scenario-Shared Knowledge ( http://arxiv.org/abs/2312.14532v1 ) ライセンス: Link先を確認	Jiaming Lu, Jingqing Ruan, Haoyuan Jiang, Ziyue Li, Hangyu Mao and Rui Zhao	(参考訳) 強化学習は従来の交通信号制御タスクに革命をもたらしており、混雑を緩和し効率を向上する有望な力を示している。しかし,既存の手法では,特定のシナリオに固有の動的情報を吸収し,様々なシナリオにまたがる動的情報を普遍的に適用できる効果的な学習機構が欠如している。さらに、それぞれのシナリオにおいて、隣り合う交差点とターゲットの交差点の調整方法に関する本質的な経験を完全に捉えることができず、システム全体の準最適結果をもたらす。これらの問題を考察し、単一のシナリオにおける経験情報と様々なシナリオにわたる一般化可能な情報の両方を活用することを目的としたDuaLightを提案する。具体的には、DuaLightは2つの学習可能な部分を持つシナリオ固有の経験的加重モジュールを紹介している。さらに,シナリオ共有型Co-Trainモジュールを実装し,シナリオ間の動的情報の一般化を容易にする。実世界のシナリオと合成のシナリオの実証結果から、dualightはさまざまなメトリクスで競争力のあるパフォーマンスを達成し、交通渋滞を緩和するための有望なソリューションを提供する。コードは、https://github.com/lujiaming-12138/DuaLight.comで入手できる。 Reinforcement learning has been revolutionizing the traditional traffic signal control task, showing promising power to relieve congestion and improve efficiency. However, the existing methods lack effective learning mechanisms capable of absorbing dynamic information inherent to a specific scenario and universally applicable dynamic information across various scenarios. Moreover, within each specific scenario, they fail to fully capture the essential empirical experiences about how to coordinate between neighboring and target intersections, leading to sub-optimal system-wide outcomes. Viewing these issues, we propose DuaLight, which aims to leverage both the experiential information within a single scenario and the generalizable information across various scenarios for enhanced decision-making. Specifically, DuaLight introduces a scenario-specific experiential weight module with two learnable parts: Intersection-wise and Feature-wise, guiding how to adaptively utilize neighbors and input features for each scenario, thus providing a more fine-grained understanding of different intersections. Furthermore, we implement a scenario-shared Co-Train module to facilitate the learning of generalizable dynamics information across different scenarios. Empirical results on both real-world and synthetic scenarios show DuaLight achieves competitive performance across various metrics, offering a promising solution to alleviate traffic congestion, with 3-7\% improvements. The code is available under: https://github.com/lujiaming-12138/DuaLight.	翻訳日:2023-12-25 15:38:47 公開日:2023-12-22
# ZodiacEdge: インクリメンタルルールセットメンテナンスを備えたデータログエンジン ZodiacEdge: a Datalog Engine With Incremental Rule Set Maintenance ( http://arxiv.org/abs/2312.14530v1 ) ライセンス: Link先を確認	Weiqin Xu and Olivier Cur\'e	(参考訳) 本稿では,ルールセットを更新可能なとき,データログの実体化の漸進的なメンテナンスに取り組む。これはモノのインターネットとエッジコンピューティングの文脈において特に重要であり、スマートデバイスはデータログルールに代表される新しい知識を推論する必要があるかもしれない。提案手法は,データログプログラムにおけるルールセットに対応するノードを依存ハイパーグラフに適用する階層化戦略の適応に基づいている。ネゲーションとアグリゲーションの両方を含む再帰的ルールをサポートしている。本システムの有効性を実データおよび合成データで実証する。 In this paper, we tackle the incremental maintenance of Datalog inference materialisation when the rule set can be updated. This is particularly relevant in the context of the Internet of Things and Edge computing where smart devices may need to reason over newly acquired knowledge represented as Datalog rules. Our solution is based on an adaptation of a stratification strategy applied to a dependency hypergraph whose nodes correspond to rule sets in a Datalog program. Our implementation supports recursive rules containing both negation and aggregation. We demonstrate the effectiveness of our system on real and synthetic data.	翻訳日:2023-12-25 15:38:22 公開日:2023-12-22
# 一層ニューラルネットワークのための効率的かつ効率的なグリーンフェデレーション学習法 An effective and efficient green federated learning method for one-layer neural networks ( http://arxiv.org/abs/2312.14528v1 ) ライセンス: Link先を確認	Oscar Fontenla-Romero, Bertha Guijarro-Berdi\~nas, Elena Hern\'andez-Pereira, Beatriz P\'erez-S\'anchez	(参考訳) 現在、機械学習アルゴリズムは複雑さを増し続けており、かなりの量の計算資源とエネルギーを必要としている。これらの理由から、新しいグリーンアルゴリズムの開発に対する認識が高まっており、分散AIがこれに寄与することができる。フェデレーション学習(federated learning, fl)は、分散的な方法で協調モデルのトレーニングを可能にするため、マシンラーニングで最も活発な研究分野の1つであり、iot(internet of things)など、多くの実環境において興味深い選択肢であり、エッジコンピューティングデバイスでこれらのモデルを使用することを可能にする。本研究では,隠れレイヤのないニューラルネットワークに基づくfl法を提案する。従来のfl法とは異なり,単一のトレーニングラウンドでグローバルな協調モデルを生成することができる。これにより、トレーニングプロセスの管理を簡単にする効率的で効率的なモデルを得ることができる。さらに、この手法は、現在のデータ保護規制において重要な側面である、設計によるデータのプライバシを保持する。大規模データセットと多数のフェデレーションクライアントを用いて実験を行った。隠れたレイヤを持たないネットワークモデルに基づいているにもかかわらず、より複雑な最先端の機械学習モデルと比較して、あらゆるケースで競合する精度が維持される。さらに,本手法は同一および非同一の分散シナリオでも等しく動作することを示す。最後に、これは環境に優しいアルゴリズムであり、中央集権的なアルゴリズムに比べてトレーニングプロセス中にかなりの省エネを可能にする。 Nowadays, machine learning algorithms continue to grow in complexity and require a substantial amount of computational resources and energy. For these reasons, there is a growing awareness of the development of new green algorithms and distributed AI can contribute to this. Federated learning (FL) is one of the most active research lines in machine learning, as it allows the training of collaborative models in a distributed way, an interesting option in many real-world environments, such as the Internet of Things, allowing the use of these models in edge computing devices. In this work, we present a FL method, based on a neural network without hidden layers, capable of generating a global collaborative model in a single training round, unlike traditional FL methods that require multiple rounds for convergence. This allows obtaining an effective and efficient model that simplifies the management of the training process. Moreover, this method preserve data privacy by design, a crucial aspect in current data protection regulations. We conducted experiments with large datasets and a large number of federated clients. Despite being based on a network model without hidden layers, it maintains in all cases competitive accuracy results compared to more complex state-of-the-art machine learning models. Furthermore, we show that the method performs equally well in both identically and non-identically distributed scenarios. Finally, it is an environmentally friendly algorithm as it allows significant energy savings during the training process compared to its centralized counterpart.	翻訳日:2023-12-25 15:38:13 公開日:2023-12-22
# 一般化幾何相:数学的側面 Generalised Geometric Phase: Mathematical Aspects ( http://arxiv.org/abs/2312.14522v1 ) ライセンス: Link先を確認	Vivek M. Vyas	(参考訳) 幾何位相の概念の作用素一般化は、最近、純粋に物理的根拠に基づいて提案されている。ここでは、量子系の新しい幾何学構造を発見しながら、その存在の数学的基礎を提供する。観測可能な観測可能な平均を探索する一方で、量子系は異なる光線空間と関連するファイバー束構造を示す。一般化された幾何学的位相は、これらのファイバー束上の接続のホロノミーとして理解される。一般に基底となるレイ空間は擬ケーラー多様体であり、そのシンプレクティック構造は一般化された幾何学的位相として現れる。 An operator generalisation of the notion of geometric phase has been recently proposed purely based on physical grounds. Here we provide a mathematical foundation for its existence, while uncovering new geometrical structures in quantum systems. While probing the average of any observable it is found that a quantum system exhibits different ray spaces and associated fibre bundle structures. The generalised geometric phase is understood as (an)holonomy of a connection over these fibre bundles. The underlying ray spaces in general are found to be pseudo-Kahler manifolds, and its symplectic structure gets manifests as the generalised geometric phase.	翻訳日:2023-12-25 15:37:52 公開日:2023-12-22
# 量子エラー補正による量子コンピューティングプライバシのチューニング Tuning Quantum Computing Privacy through Quantum Error Correction ( http://arxiv.org/abs/2312.14521v1 ) ライセンス: Link先を確認	Hui Zhong, Keyi Ju, Manojna Sistla, Xinyue Zhang, Xiaoqi Qin, Xin Fu, Miao Pan	(参考訳) 量子コンピューティングは、大規模で複雑な問題を解決する上で有望なパラダイムである。量子コンピューティングのプライバシを保護するため、量子コンピューティングにおける差分プライバシ(英語版)(qdp)を再定義し、量子コンピューティングによって生成された固有ノイズを収集してqdpを実装するための研究の先駆者となる。しかしながら、そのような実装アプローチは、qdp機構のプライバシー予算を固定し制御不能にする固有のノイズの量によって制限される。本稿では,量子誤り訂正(quantum error correction, qec)技術を用いて,qdpにおけるプライバシー保護レベルを調整しながら,量子コンピューティングの誤りを減らすことを提案する。要するに、複数の単一キュービットゲート回路において、ゲートにQEC演算を適用するかどうかを決定することにより、量子ノイズエラー率を徐々に減少させる。我々はQEC後の一般誤差率とそれに対応するプライバシー予算の新しい計算式を導出した。そして、マルチレベル連結QEC演算を用いて、さらにノイズ低減を実現する。広範な数値シミュレーションを通じて,量子コンピューティングにおけるプライバシ保護の程度を規定する手段としてQECが実現可能であることを示す。 Quantum computing is a promising paradigm for efficiently solving large and high-complexity problems. To protect quantum computing privacy, pioneering research efforts proposed to redefine differential privacy (DP) in quantum computing, i.e., quantum differential privacy (QDP), and harvest inherent noises generated by quantum computing to implement QDP. However, such an implementation approach is limited by the amount of inherent noises, which makes the privacy budget of the QDP mechanism fixed and uncontrollable. To address this issue, in this paper, we propose to leverage quantum error correction (QEC) techniques to reduce quantum computing errors, while tuning the privacy protection levels in QDP. In short, we gradually decrease the quantum noise error rate by deciding whether to apply QEC operations on the gate in a multiple single qubit gates circuit. We have derived a new calculation formula for the general error rate and corresponding privacy budgets after QEC operation. Then, we expand to achieve further noise reduction using multi-level concatenated QEC operation. Through extensive numerical simulations, we demonstrate that QEC is a feasible way to regulate the degree of privacy protection in quantum computing.	翻訳日:2023-12-25 15:37:43 公開日:2023-12-22
# ニューロン分類のための置換不変エンコーダを用いた統合学習神経骨格と脳回路トポロジー Joint Learning Neuronal Skeleton and Brain Circuit Topology with Permutation Invariant Encoders for Neuron Classification ( http://arxiv.org/abs/2312.14518v1 ) ライセンス: Link先を確認	Minghui Liao, Guojia Wan, Bo Du	(参考訳) 神経系内のニューロンの種類を決定することは、脳コネクトミクスの分析や神経疾患の研究において重要な役割を果たす。しかし、ニューロンの解剖学的、生理学的、分子的特性を利用する効率は比較的低く費用がかかる。脳組織の電子顕微鏡イメージングと解析技術の進歩により、我々は神経細胞の高分解能形態と接続情報からなる全脳コネクトームを得ることができる。しかし、そのようなデータに基づいて自動ニューロン分類を行うモデルはほとんどない。本稿では,スケルトンから得られるニューロンの形態情報と神経回路から得られるニューロン間のトポロジ情報を組み合わせたフレームワークであるNeuNetを提案する。具体的には、NeuNetはSkeleton Encoder、Connectome Encoder、Readout Layerという3つのコンポーネントで構成されている。スケルトンエンコーダは、神経骨格の点データに1次元の畳み込みを伴うボトムアップ方式でニューロンの局所情報を統合し、コネクトームエンコーダはグラフニューラルネットワークを使用して神経回路のトポロジ情報を取得し、読み出し層は上記の2つの情報を融合し、分類結果を出力する。ヒト大脳皮質とショウジョウバエ脳のボリューム電子顕微鏡(vem)画像から,ニューロン分類タスクのための2つの新しいデータセットを再処理し,公開する。これら2つのデータセットに対する実験により, 精度0.9169と0.9363のモデルの有効性が示された。コードとデータは、https://github.com/WHUminghui/NeuNet.comで入手できる。 Determining the types of neurons within a nervous system plays a significant role in the analysis of brain connectomics and the investigation of neurological diseases. However, the efficiency of utilizing anatomical, physiological, or molecular characteristics of neurons is relatively low and costly. With the advancements in electron microscopy imaging and analysis techniques for brain tissue, we are able to obtain whole-brain connectome consisting neuronal high-resolution morphology and connectivity information. However, few models are built based on such data for automated neuron classification. In this paper, we propose NeuNet, a framework that combines morphological information of neurons obtained from skeleton and topological information between neurons obtained from neural circuit. Specifically, NeuNet consists of three components, namely Skeleton Encoder, Connectome Encoder, and Readout Layer. Skeleton Encoder integrates the local information of neurons in a bottom-up manner, with a one-dimensional convolution in neural skeleton's point data; Connectome Encoder uses a graph neural network to capture the topological information of neural circuit; finally, Readout Layer fuses the above two information and outputs classification results. We reprocess and release two new datasets for neuron classification task from volume electron microscopy(VEM) images of human brain cortex and Drosophila brain. Experiments on these two datasets demonstrated the effectiveness of our model with accuracy of 0.9169 and 0.9363, respectively. Code and data are available at: https://github.com/WHUminghui/NeuNet.	翻訳日:2023-12-25 15:37:24 公開日:2023-12-22
# 微分可能DSPとスペクトル最適輸送を用いた教師なし高調波パラメータ推定 Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and Spectral Optimal Transport ( http://arxiv.org/abs/2312.14507v1 ) ライセンス: Link先を確認	Bernardo Torres (S2A, IDS, LTCI), Geoffroy Peeters (S2A, IDS, LTCI), Ga\"el Richard (S2A, IDS, LTCI)	(参考訳) ニューラルオーディオ信号処理では、ピッチコンディショニングがシンセサイザーの性能向上に使われている。しかし, 音高推定器と合成器の併用は, 標準音高再生損失を用いた場合の課題であり, 外部の音高トラッカーに依存している。そこで本稿では,スペクトルエネルギーの変位を最小化する最適輸送理論に着想を得たスペクトル損失関数を提案する。我々は、調和テンプレートを調和信号に適合させる教師なしの自動符号化タスクを通じて、このアプローチを検証する。軽量エンコーダを用いて高調波の基本周波数と振幅を共同で推定し,可微分高調波合成器を用いて信号を再構成する。提案手法は、ニューラルオーディオアプリケーションにおける教師なしパラメータ推定を改善するための有望な方向を提供する。 In neural audio signal processing, pitch conditioning has been used to enhance the performance of synthesizers. However, jointly training pitch estimators and synthesizers is a challenge when using standard audio-to-audio reconstruction loss, leading to reliance on external pitch trackers. To address this issue, we propose using a spectral loss function inspired by optimal transportation theory that minimizes the displacement of spectral energy. We validate this approach through an unsupervised autoencoding task that fits a harmonic template to harmonic signals. We jointly estimate the fundamental frequency and amplitudes of harmonics using a lightweight encoder and reconstruct the signals using a differentiable harmonic synthesizer. The proposed approach offers a promising direction for improving unsupervised parameter estimation in neural audio applications.	翻訳日:2023-12-25 15:36:59 公開日:2023-12-22
# SIG: Prompt-based generation を用いた文学における話者識別 SIG: Speaker Identification in Literature via Prompt-Based Generation ( http://arxiv.org/abs/2312.14590v1 ) ライセンス: Link先を確認	Zhenlin Su, Liyan Xu, Jin Xu, Jiangnan Li, Mingdu Huangfu	(参考訳) 物語における引用の話者を特定することは文学的分析において重要な課題であり、未知の話者に対するドメイン外推論や、周囲の文脈に話者の言及がない非議論的なケースなど、難しいシナリオがある。本研究では,設計したプロンプトテンプレートに基づいてタスクと引用入力を口頭で表現し,他の補助タスクと容易に統合し,話者識別性能をさらに高めるための簡易かつ効果的な手法であるsigを提案する。予測はモデルによる直接生成から生じるか、または各話者候補の最大生成確率によって決定される。我々のアプローチ設計に基づき、SIGはドメイン外評価をサポートし、任意の形式の候補入力を受け入れることができるオープンワールド分類パラダイムを実現する。我々は,このタスクの最大のデータセットであるPDNCにおいて,クロスドメイン評価とドメイン内評価の両方を行い,SIGがそれまでの複雑な設計のベースラインを上回り,特に難易度のないシナリオでは最大17%改善した。別のデータセットWPに関する追加実験は、SIGの有効性をさらに裏付ける。 Identifying speakers of quotations in narratives is an important task in literary analysis, with challenging scenarios including the out-of-domain inference for unseen speakers, and non-explicit cases where there are no speaker mentions in surrounding context. In this work, we propose a simple and effective approach SIG, a generation-based method that verbalizes the task and quotation input based on designed prompt templates, which also enables easy integration of other auxiliary tasks that further bolster the speaker identification performance. The prediction can either come from direct generation by the model, or be determined by the highest generation probability of each speaker candidate. Based on our approach design, SIG supports out-of-domain evaluation, and achieves open-world classification paradigm that is able to accept any forms of candidate input. We perform both cross-domain evaluation and in-domain evaluation on PDNC, the largest dataset of this task, where empirical results suggest that SIG outperforms previous baselines of complicated designs, as well as the zero-shot ChatGPT, especially excelling at those hard non-explicit scenarios by up to 17% improvement. Additional experiments on another dataset WP further corroborate the efficacy of SIG.	翻訳日:2023-12-25 15:32:09 公開日:2023-12-22
# 非線形前方拡散 Non-Denoising Forward-Time Diffusions ( http://arxiv.org/abs/2312.14589v1 ) ライセンス: Link先を確認	Stefano Peluchetti	(参考訳) 本論文のスコープは拡散過程による生成的モデリングである。このパラダイムの1つのアプローチはsong et al. (2021) の仕事であり、これは所望のデータ分散をターゲットとした拡散プロセスを構築するのに時間反転の議論に依存している。拡散確率モデルの提案に共通する時間反転の議論は不要であることを示す。拡散ブリッジを適切に混合することにより,所望のデータ分布をターゲットとした拡散過程を得る。結果として得られる輸送は、構成によって正確であり、基盤となる拡散のダイナミクスを選択する際の柔軟性が向上し、新しいトレーニング目的によってニューラルネットワークによって近似することができる。我々は,我々の時間反転アプローチに対応するドリフト調整の統一的視点を開発し,この表現を用いて拡散に基づく生成モデルの内部動作を検査する。最後に,空間統計学で一般的なスケーラブルなシミュレーションと推論手法を活用し,基礎となる拡散動力学の完全因子分布を超越する。本研究に含まれる方法論の進歩は,拡散過程に基づく生成モデリングの一般的な枠組みの確立に寄与する。 The scope of this paper is generative modeling through diffusion processes. An approach falling within this paradigm is the work of Song et al. (2021), which relies on a time-reversal argument to construct a diffusion process targeting the desired data distribution. We show that the time-reversal argument, common to all denoising diffusion probabilistic modeling proposals, is not necessary. We obtain diffusion processes targeting the desired data distribution by taking appropriate mixtures of diffusion bridges. The resulting transport is exact by construction, allows for greater flexibility in choosing the dynamics of the underlying diffusion, and can be approximated by means of a neural network via novel training objectives. We develop a unifying view of the drift adjustments corresponding to our and to time-reversal approaches and make use of this representation to inspect the inner workings of diffusion-based generative models. Finally, we leverage on scalable simulation and inference techniques common in spatial statistics to move beyond fully factorial distributions in the underlying diffusion dynamics. The methodological advances contained in this work contribute toward establishing a general framework for generative modeling based on diffusion processes.	翻訳日:2023-12-25 15:31:48 公開日:2023-12-22
# Sparse Pooled Data 問題に対する準最適 \& Efficient アルゴリズムについて On a Near-Optimal \& Efficient Algorithm for the Sparse Pooled Data Problem ( http://arxiv.org/abs/2312.14588v1 ) ライセンス: Link先を確認	Max Hahn-Klimroth, Remco van der Hofstad, Noela M\"uller, Connor Riddlesden	(参考訳) プールデータ問題は、凝縮された測定値から、一連の未知のラベルを識別することを要求する。より正確には、$n$ アイテムが与えられたとき、各アイテムが$\cbc{0,1,\ldots, d}$ というラベルを持つと仮定する。もし$\sigma$のゼロでないエントリ数が$\theta \in (0,1)$に対して$k \sim n^{\theta}$であるなら、プールデータ問題 sparse と呼ぶ。 $\SIGMA$に関する情報はプールされた測定値から得られ、各ラベルのアイテムがプールに含まれているかを示す。最も基本的な問題は、できるだけ少数のプールを使用するプーリングスキームを設計し、高い確率で$\SIGMA$を再構築することである。問題の変種とその組合せの分岐は少なくとも35年間研究されてきた。しかし、ラベルの 'emph{efficient} 推論に関する現代の問題の研究は、理論上可能か効率の良い推論に必要となるプールの最小数の$$\log n$の統計的-計算的ギャップを示唆している。本稿では,この$\log n$-gapが人工的か,あるいは,情報理論のしきい値に非常に近い複数のプール上の新しいプールプール方式に基づいて,効率的なアルゴリズムである \algoname を設計することで,基本的な問題を解決する。 The pooled data problem asks to identify the unknown labels of a set of items from condensed measurements. More precisely, given $n$ items, assume that each item has a label in $\cbc{0,1,\ldots, d}$, encoded via the ground-truth $\SIGMA$. We call the pooled data problem sparse if the number of non-zero entries of $\SIGMA$ scales as $k \sim n^{\theta}$ for $\theta \in (0,1)$. The information that is revealed about $\SIGMA$ comes from pooled measurements, each indicating how many items of each label are contained in the pool. The most basic question is to design a pooling scheme that uses as few pools as possible, while reconstructing $\SIGMA$ with high probability. Variants of the problem and its combinatorial ramifications have been studied for at least 35 years. However, the study of the modern question of \emph{efficient} inference of the labels has suggested a statistical-to-computational gap of order $\log n$ in the minimum number of pools needed for theoretically possible versus efficient inference. In this article, we resolve the question whether this $\log n$-gap is artificial or of a fundamental nature by the design of an efficient algorithm, called \algoname, based upon a novel pooling scheme on a number of pools very close to the information-theoretic threshold.	翻訳日:2023-12-25 15:31:31 公開日:2023-12-22
# 環境に特有な人々 Environment-Specific People ( http://arxiv.org/abs/2312.14579v1 ) ライセンス: Link先を確認	Mirela Ostrek, Soubhik Sanyal, Carol O'Sullivan, Michael J. Black, Justus Thies	(参考訳) 生成画像合成とフルボディ生成の進歩にもかかわらず、最先端の手法は文脈に依存しず、テキストプロンプトに過度に依存しているか、あるいは単調な背景を持つファッション画像のようなキュレートされたトレーニングデータセットに縛られている。ここでの目標は、特定のシーンに意味的に適切な服装の人々を作ることです。そこで本研究では,既存の「野生内」写真に人物を写実的に塗り替えることのできる,コンテクスト認識フルボディ生成のための新しい手法であるespを提案する。 ESPは、環境写真から抽出され、生成プロセスに統合された2Dポーズおよびコンテキストキューに条件付けされる。当社のモデルは、さまざまな環境をカバーする人々の野生の写真セットを含むデータセットでトレーニングされています。本手法は定量的かつ定性的に分析され,ESPがコンテキストフルボディ生成のタスクにおいて最先端の処理性能を発揮することを示す。 Despite significant progress in generative image synthesis and full-body generation in particular, state-of-the-art methods are either context-independent, overly reliant to text prompts, or bound to the curated training datasets, such as fashion images with monotonous backgrounds. Here, our goal is to generate people in clothing that is semantically appropriate for a given scene. To this end, we present ESP, a novel method for context-aware full-body generation, that enables photo-realistic inpainting of people into existing "in-the-wild" photographs. ESP is conditioned on a 2D pose and contextual cues that are extracted from the environment photograph and integrated into the generation process. Our models are trained on a dataset containing a set of in-the-wild photographs of people covering a wide range of different environments. The method is analyzed quantitatively and qualitatively, and we show that ESP outperforms state-of-the-art on the task of contextual full-body generation.	翻訳日:2023-12-25 15:31:08 公開日:2023-12-22
# PoseViNet:多視点ポス推定と視覚変換器を用いたドライバ動作認識フレームワーク PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer ( http://arxiv.org/abs/2312.14577v1 ) ライセンス: Link先を確認	Neha Sengar, Indra Kumari, Jihui Lee, Dongsoo Har	(参考訳) 交通事故の主な原因は運転注意障害である。高速道路交通安全局(NHTSA)が実施した調査では、車内メニューとの対話、食事や飲み物の消費、車両の運転中の電話による会話など、運転者の注意をそらす重要な要因となっている。そこで本研究では,マルチビュー・ドライバ・アクション画像を用いたドライバの注意散逸検出手法を提案する。提案手法は,ポーズ推定とアクション推論,すなわち PoseViNet を用いた視覚変換器ベースのフレームワークである。姿勢情報を追加する動機は、トランスフォーマーが重要な機能に集中できるようにすることである。その結果、フレームワークは重要なアクションを特定するのにより適しています。提案するフレームワークは,ドライバの挙動を表すsfd3データセットを用いて,さまざまな最先端モデルと比較する。比較の結果,PoseViNetはこれらのモデルより優れていることがわかった。提案フレームワークは,運転者の行動を表すSynDD1データセットを用いて評価する。その結果、PoseViNetは、難しいデータセットで97.55%の検証精度と90.92%のテスト精度を達成した。 Driver distraction is a principal cause of traffic accidents. In a study conducted by the National Highway Traffic Safety Administration, engaging in activities such as interacting with in-car menus, consuming food or beverages, or engaging in telephonic conversations while operating a vehicle can be significant sources of driver distraction. From this viewpoint, this paper introduces a novel method for detection of driver distraction using multi-view driver action images. The proposed method is a vision transformer-based framework with pose estimation and action inference, namely PoseViNet. The motivation for adding posture information is to enable the transformer to focus more on key features. As a result, the framework is more adept at identifying critical actions. The proposed framework is compared with various state-of-the-art models using SFD3 dataset representing 10 behaviors of drivers. It is found from the comparison that the PoseViNet outperforms these models. The proposed framework is also evaluated with the SynDD1 dataset representing 16 behaviors of driver. As a result, the PoseViNet achieves 97.55% validation accuracy and 90.92% testing accuracy with the challenging dataset.	翻訳日:2023-12-25 15:30:49 公開日:2023-12-22
# mmgpl:グラフプロンプト学習を用いたマルチモーダル医療データ分析 MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning ( http://arxiv.org/abs/2312.14574v1 ) ライセンス: Link先を確認	Liang Peng, Songyue Cai, Zongqian Wu, Huifang Shang, Xiaofeng Zhu, and Xiaoxiao Li	(参考訳) プロンプト学習は、広範囲の下流タスクに対するマルチモーダル大モデルの微調整において顕著な効果を示した。それにもかかわらず、神経障害の診断に既存の即興学習法を適用することには、2つの問題がある。 (i)既存の方法では、神経イメージングにおいて少数のパッチだけが疾患に関連するにもかかわらず、すべてのパッチを平等に扱うのが一般的である。 (ii)神経障害の理解と診断に不可欠である脳接続ネットワークに内在する構造情報を無視する。そこで本研究では,神経疾患の診断のためのマルチモーダル大規模モデルの微調整過程において,グラフプロンプトを学習することで,新しいプロンプト学習モデルを提案する。具体的には、まずgpt-4を用いて関連する疾患概念を取得し、これらの概念とすべてのパッチ間の意味的類似性を計算する。第2に,各パッチと疾患関連概念間の意味的類似性に応じて,無関係パッチの重みを低減させる。さらに,これらの概念に基づいてトークン間のグラフを構築し,グラフ畳み込みネットワーク層を用いてグラフの構造情報を抽出する。広範にわたる実験により,神経疾患の診断において,最先端の方法と比較して優れた成績が得られ,臨床医が検証した。 Prompt learning has demonstrated impressive efficacy in the fine-tuning of multimodal large models to a wide range of downstream tasks. Nonetheless, applying existing prompt learning methods for the diagnosis of neurological disorder still suffers from two issues: (i) existing methods typically treat all patches equally, despite the fact that only a small number of patches in neuroimaging are relevant to the disease, and (ii) they ignore the structural information inherent in the brain connection network which is crucial for understanding and diagnosing neurological disorders. To tackle these issues, we introduce a novel prompt learning model by learning graph prompts during the fine-tuning process of multimodal large models for diagnosing neurological disorders. Specifically, we first leverage GPT-4 to obtain relevant disease concepts and compute semantic similarity between these concepts and all patches. Secondly, we reduce the weight of irrelevant patches according to the semantic similarity between each patch and disease-related concepts. Moreover, we construct a graph among tokens based on these concepts and employ a graph convolutional network layer to extract the structural information of the graph, which is used to prompt the pre-trained multimodal large models for diagnosing neurological disorders. Extensive experiments demonstrate that our method achieves superior performance for neurological disorder diagnosis compared with state-of-the-art methods and validated by clinicians.	翻訳日:2023-12-25 15:30:33 公開日:2023-12-22
# gromov-wasserstein距離の半定値緩和 Semidefinite Relaxations of the Gromov-Wasserstein Distance ( http://arxiv.org/abs/2312.14572v1 ) ライセンス: Link先を確認	Junyu Chen, Binh T. Nguyen, Yong Sheng Soh	(参考訳) グロモフ=ワッセルシュタイン距離(gromov-wasserstein distance)は、可比較空間間の対象をマッチングできる最適な輸送問題の変種である。その中核では、GW距離は非凸二次プログラムの解として指定されており、解けないことは知られていない。特に、GW距離の既存の解法は局所最適解のみを見つけることができる。本稿では,GW距離の半定値プログラミング(SDP)緩和を提案する。緩和は、輸送写像の線型項と二次項を関連付ける制約によって拡張されたgw距離の双対と見なすことができる。我々の緩和は、任意の輸送写像の近似比を大域最適解に計算する原理的な方法を提供する。最後に,数値実験により,大域的最適解を頻繁に計算し,大域的最適性の証明を行うことで,提案する緩和が強いことが示唆された。 The Gromov-Wasserstein (GW) distance is a variant of the optimal transport problem that allows one to match objects between incomparable spaces. At its core, the GW distance is specified as the solution of a non-convex quadratic program and is not known to be tractable to solve. In particular, existing solvers for the GW distance are only able to find locally optimal solutions. In this work, we propose a semi-definite programming (SDP) relaxation of the GW distance. The relaxation can be viewed as the dual of the GW distance augmented with constraints that relate the linear and quadratic terms of transportation maps. Our relaxation provides a principled manner to compute the approximation ratio of any transport map to the global optimal solution. Finally, our numerical experiments suggest that the proposed relaxation is strong in that it frequently computes the global optimal solution, together with a proof of global optimality.	翻訳日:2023-12-25 15:30:06 公開日:2023-12-22
# Data is Moody: プロセスイベントログからデータ修正ルールを発見する Data is Moody: Discovering Data Modification Rules from Process Event Logs ( http://arxiv.org/abs/2312.14571v1 ) ライセンス: Link先を確認	Marco Bjarne Schuster, Boris Wiegand, Jilles Vreeken	(参考訳) イベントログは、基盤となるビジネスプロセスの振る舞いに関する洞察を得るための強力なソースであるが、既存の作業は主に、イベント属性データを無視しながら、イベントログのアクティビティシーケンス内のパターンを見つけることに焦点を当てている。イベント属性データは、主にイベントの発生とプロセス結果を予測するために使用されるが、その技術状態は、プロセス実行中にイベント属性データがどのように変化するかの簡潔さと解釈可能なルールを無視する。サブグループ発見とルールベースの分類アプローチは、イベントログに存在するシーケンシャルな依存関係をキャプチャする能力に欠けており、プロセスの振る舞いに関する限られた洞察で満足できない結果をもたらす。イベントログが与えられたら、正確で簡潔で解釈可能なif-thenルールを見つけることに興味があります。我々はこの問題をMDL(Minimum Description Length)の原理で定式化し、データについて最も損失のない記述でモデルを選択する。さらに,ルールを効率的に探索するgreedy Moodyアルゴリズムを提案する。合成データと実世界のデータの両方に関する広範な実験により、Moodyはコンパクトで解釈可能なルールを見つけ、正確な発見のためにはほとんどデータを必要としておらず、ノイズに対して堅牢であることを示した。 Although event logs are a powerful source to gain insight about the behavior of the underlying business process, existing work primarily focuses on finding patterns in the activity sequences of an event log, while ignoring event attribute data. Event attribute data has mostly been used to predict event occurrences and process outcome, but the state of the art neglects to mine succinct and interpretable rules how event attribute data changes during process execution. Subgroup discovery and rule-based classification approaches lack the ability to capture the sequential dependencies present in event logs, and thus lead to unsatisfactory results with limited insight into the process behavior. Given an event log, we are interested in finding accurate yet succinct and interpretable if-then rules how the process modifies data. We formalize the problem in terms of the Minimum Description Length (MDL) principle, by which we choose the model with the best lossless description of the data. Additionally, we propose the greedy Moody algorithm to efficiently search for rules. By extensive experiments on both synthetic and real-world data, we show Moody indeed finds compact and interpretable rules, needs little data for accurate discovery, and is robust to noise.	翻訳日:2023-12-25 15:29:51 公開日:2023-12-22
# BSS-Bench: 再現性と効果的なバンド選択検索を目指して BSS-Bench: Towards Reproducible and Effective Band Selection Search ( http://arxiv.org/abs/2312.14570v1 ) ライセンス: Link先を確認	Wenshuai Xu, Zhenbo Xu	(参考訳) ハイパースペクトルイメージングの欠点(拡張性、高キャプチャ遅延、低空間分解能)を克服し、数百のバンドから少数の代表バンドのみを選択することが、広く適用できるようにする鍵となる技術である。しかしながら、現在のバンド選択(BS)手法は、バンド数、データセット分割、再トレーニング設定など、一貫性のないトレイン/バリデーション設定のため、公正な比較において課題に直面している。本稿では,BS法を簡易かつ再現可能なものにするために,52kのトレーニングを含むBSS-Benchベンチマーク(BSS-Bench)を提案する。 BSS-Benchの開発には1.26kのGPU日を要した。 bss-benchをクエリすることで、bs実験を簡単かつ再現可能とし、検索結果と最良性能とのギャップを測定することができる。 BSS-Benchに基づいて、帯域数、教師なし統計、異なるバックボーンなど、BSに対する様々な要因の影響をさらに議論する。 bss-bench に加えて single combination one shot (scos) と呼ばれる有効な単発bs法を提案する。さらに、SCOSの探索プロセスは柔軟であり、訓練を必要とせず、効率的かつ効果的である。 SCOSは、帯域がはるかに少ない場合でも、複数のタスクにおいて現在のBS法よりも優れていることを示す。私たちのBSS-Benchとコードは補足資料で利用可能で、公開されます。 The key technology to overcome the drawbacks of hyperspectral imaging (expensive, high capture delay, and low spatial resolution) and make it widely applicable is to select only a few representative bands from hundreds of bands. However, current band selection (BS) methods face challenges in fair comparisons due to inconsistent train/validation settings, including the number of bands, dataset splits, and retraining settings. To make BS methods easy and reproducible, this paper presents the first band selection search benchmark (BSS-Bench) containing 52k training and evaluation records of numerous band combinations (BC) with different backbones for various hyperspectral analysis tasks. The creation of BSS-Bench required a significant computational effort of 1.26k GPU days. By querying BSS-Bench, BS experiments can be performed easily and reproducibly, and the gap between the searched result and the best achievable performance can be measured. Based on BSS-Bench, we further discuss the impact of various factors on BS, such as the number of bands, unsupervised statistics, and different backbones. In addition to BSS-Bench, we present an effective one-shot BS method called Single Combination One Shot (SCOS), which learns the priority of any BCs through one-time training, eliminating the need for repetitive retraining on different BCs. Furthermore, the search process of SCOS is flexible and does not require training, making it efficient and effective. Our extensive evaluations demonstrate that SCOS outperforms current BS methods on multiple tasks, even with much fewer bands. Our BSS-Bench and codes are available in the supplementary material and will be publicly available.	翻訳日:2023-12-25 15:29:31 公開日:2023-12-22
# 正規化フローを用いた新しい音声生成 Creating New Voices using Normalizing Flows ( http://arxiv.org/abs/2312.14569v1 ) ライセンス: Link先を確認	Piotr Bilinski, Thomas Merritt, Abdelhamid Ezzerg, Kamil Pokora, Sebastian Cygert, Kayoko Yanagisawa, Roberto Barra-Chicote, Daniel Korzekwa	(参考訳) 現実的で自然な合成音声を作ることは、訓練中に見つからない音声のアイデンティティにとって大きな課題だ。新たな話者の音声合成への関心が高まっているため,本研究では,学習中に観察された話者から外挿し,未知の話者識別を作成するために,テキスト音声(TTS)と音声変換(VC)モードのフローを正規化する能力について検討する。まず、TSとVCのアプローチを作成し、その上で、インテリジェンス、自然性、話者の類似性、新しい音声を生成する能力の観点から、私たちの方法とベースラインを包括的に評価します。目的と主観の両方を用いて、ゼロショットと新しい音声合成という2つの評価課題にテクニックをベンチマークする。前者のタスクの目標は、目に見えない声への変換の精度を測定することである。後者の目的は、新しい声を作り出す能力を測定することである。広範評価により,提案手法はゼロショット音声合成における最先端性能を体系的に獲得し,トレーニングセットにない様々な新しい音声を生成できることが示されている。本研究は,MTSおよびVCモードの総合的な分析と比較とともに,メルスペクトルと正規化フローに基づく新しい音声を合成する最初の試みであると考えている。 Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS and VC, and then we comprehensively evaluate our methods and baselines in terms of intelligibility, naturalness, speaker similarity, and ability to create new voices. We use both objective and subjective metrics to benchmark our techniques on 2 evaluation tasks: zero-shot and new voice speech synthesis. The goal of the former task is to measure the precision of the conversion to an unseen voice. The goal of the latter is to measure the ability to create new voices. Extensive evaluations demonstrate that the proposed approach systematically allows to obtain state-of-the-art performance in zero-shot speech synthesis and creates various new voices, unobserved in the training set. We consider this work to be the first attempt to synthesize new voices based on mel-spectrograms and normalizing flows, along with a comprehensive analysis and comparison of the TTS and VC modes.	翻訳日:2023-12-25 15:28:41 公開日:2023-12-22
# 異方性勾配雑音下での確率重ボール法の加速収束 Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise ( http://arxiv.org/abs/2312.14567v1 ) ライセンス: Link先を確認	Rui Pan, Yuxing Liu, Xiaoyu Wang, Tong Zhang	(参考訳) 学習速度が減衰する重い球運動量は、深層学習モデルの最適化にSGDで広く利用されている。経験的人気とは対照的に、理論的な性質の理解は、特に二次回帰問題に対する標準異方性勾配雑音条件下では、まだかなり限られている。重い球運動量法は加速収束を提供し、大きなバッチ設定でうまく機能すると広く推測されているが、厳密な理論的解析は存在しない。本稿では,2次目的のステップ減衰スケジューラを用いた確率重畳法における非漸近収束境界を異方性勾配雑音条件下で確立することにより,この理論的ギャップを埋める。直接の含意として、重球運動量は、確率的分散項に関してほぼ最適収束速度を保ちながら、sgdのバイアス項の収束を加速させながら、$\tilde{\mathcal{o}}(\sqrt{\kappa})が得られることを示した。この組み合わせ効果は、統計ミニマックスレートからログ係数内の全体的な収束率を意味する。つまり、重い球運動量を持つSGDは、分散機械学習やフェデレーション学習のような大規模なバッチ設定において有用である。 Heavy-ball momentum with decaying learning rates is widely used with SGD for optimizing deep learning models. In contrast to its empirical popularity, the understanding of its theoretical property is still quite limited, especially under the standard anisotropic gradient noise condition for quadratic regression problems. Although it is widely conjectured that heavy-ball momentum method can provide accelerated convergence and should work well in large batch settings, there is no rigorous theoretical analysis. In this paper, we fill this theoretical gap by establishing a non-asymptotic convergence bound for stochastic heavy-ball methods with step decay scheduler on quadratic objectives, under the anisotropic gradient noise condition. As a direct implication, we show that heavy-ball momentum can provide $\tilde{\mathcal{O}}(\sqrt{\kappa})$ accelerated convergence of the bias term of SGD while still achieving near-optimal convergence rate with respect to the stochastic variance term. The combined effect implies an overall convergence rate within log factors from the statistical minimax rate. This means SGD with heavy-ball momentum is useful in the large-batch settings such as distributed machine learning or federated learning, where a smaller number of iterations can significantly reduce the number of communication rounds, leading to acceleration in practice.	翻訳日:2023-12-25 15:28:17 公開日:2023-12-22
# 人間監視の経済学 : ノームとインセンティブがAI労働者のコストとパフォーマンスに与える影響 The Economics of Human Oversight: How Norms and Incentives Affect Costs and Performance of AI Workers ( http://arxiv.org/abs/2312.14565v1 ) ライセンス: Link先を確認	Johann Laux, Fabian Stephany, Alice Liefgreen	(参考訳) aiアプリケーションの世界的急増は、業界を変革させ、既存の雇用の移転と補完をもたらしつつ、新しい雇用機会を生み出している。 AIの人間監督は、人間の労働者がAIモデルと対話して、そのパフォーマンス、安全性、規範的原則の遵守を改善する、新たなタスクである。画像のラベル付けやテキストの注釈を含むデータアノテーションは、データセットの品質がトレーニングされたaiモデルの品質に直接影響を与えるため、人間の監視プロセスとして重要な役割を果たす。したがって、人間の監視作業の効率性は、AI開発者にとって重要な競争上の優位性である。本稿では,データ品質とコストに対する規範設計と金銭的インセンティブの影響に着目し,人間の監視の基礎経済学を考察する。 307データアノテータを含む実験では、様々なタスク指示(ノーム)と金銭インセンティブを持つ6つのグループを調べている。その結果,明確な規則を付した注釈は高い精度を示し,あいまいな基準を14%上回った。同様に、追加の金銭インセンティブを受けるアノテータは、明確な規則とインセンティブの両方で作業するグループで記録された最高精度(87.5%の精度)により、大幅に向上する。しかしながら、両グループはタスク完了により多くの時間を必要としており、標準で作業する人に比べて平均タスク完了時間が31%増加し、インセンティブがない。これらの実証的な発見は、データキュレーションにおけるデータ品質と効率のトレードオフを強調し、標準設計の微妙な影響とAI開発の経済性に対するインセンティブに光を当てている。この論文は、AI技術の経済的、倫理的、法的考察に関する議論に実験的知見を貢献する。 The global surge in AI applications is transforming industries, leading to displacement and complementation of existing jobs, while also giving rise to new employment opportunities. Human oversight of AI is an emerging task in which human workers interact with an AI model to improve its performance, safety, and compliance with normative principles. Data annotation, encompassing the labelling of images or annotating of texts, serves as a critical human oversight process, as the quality of a dataset directly influences the quality of AI models trained on it. Therefore, the efficiency of human oversight work stands as an important competitive advantage for AI developers. This paper delves into the foundational economics of human oversight, with a specific focus on the impact of norm design and monetary incentives on data quality and costs. An experimental study involving 307 data annotators examines six groups with varying task instructions (norms) and monetary incentives. Results reveal that annotators provided with clear rules exhibit higher accuracy rates, outperforming those with vague standards by 14%. Similarly, annotators receiving an additional monetary incentive perform significantly better, with the highest accuracy rate recorded in the group working with both clear rules and incentives (87.5% accuracy). However, both groups require more time to complete tasks, with a 31% increase in average task completion time compared to those working with standards and no incentives. These empirical findings underscore the trade-off between data quality and efficiency in data curation, shedding light on the nuanced impact of norm design and incentives on the economics of AI development. The paper contributes experimental insights to discussions on the economical, ethical, and legal considerations of AI technologies.	翻訳日:2023-12-25 15:27:46 公開日:2023-12-22
# 複数の専門家によるオンラインカバレッジ Online Covering with Multiple Experts ( http://arxiv.org/abs/2312.14564v1 ) ライセンス: Link先を確認	Enik\H{o} Kevi and Kim-Thang Nguyen	(参考訳) 機械学習の予測でオンラインアルゴリズムを設計することは、スケジューリング、キャッシュ、クラスタリング、スキーレンタルなど、様々な実践的なオンライン問題に対する最悪のパラダイムを越えている。従来の学習強化アルゴリズムのアプローチでは,単一オラクルの予測の統合に重点を置いていたが,オンラインアルゴリズムの設計をemph{multiple}の専門家と検討した。静的なベストエキスパートの人気のあるベンチマークを越えて、新しい \emph{dynamic}ベンチマーク(時間とともに変化する予測の線形組み合わせ)を提案します。我々は,新しい動的ベンチマークにおいて,専門家数を$k$とし,オンライン最適化問題に対して$0-1$の性能保証を行うアルゴリズムを提案する。さらに,マルチエキスパートアプローチは,オンラインアルゴリズム研究コミュニティにおける長年の中心的なテーマである,いくつかのオンラインアルゴリズムをオンラインに結合する方法に関する新たな視点を提供する。 Designing online algorithms with machine learning predictions is a recent technique beyond the worst-case paradigm for various practically relevant online problems (scheduling, caching, clustering, ski rental, etc.). While most previous learning-augmented algorithm approaches focus on integrating the predictions of a single oracle, we study the design of online algorithms with \emph{multiple} experts. To go beyond the popular benchmark of a static best expert in hindsight, we propose a new \emph{dynamic} benchmark (linear combinations of predictions that change over time). We present a competitive algorithm in the new dynamic benchmark with a performance guarantee of $O(\log K)$, where $K$ is the number of experts, for $0-1$ online optimization problems. Furthermore, our multiple-expert approach provides a new perspective on how to combine in an online manner several online algorithms - a long-standing central subject in the online algorithm research community.	翻訳日:2023-12-25 15:26:55 公開日:2023-12-22
# 不特定区間における自然運転行動の交通再建と解析 Traffic Reconstruction and Analysis of Natural Driving Behaviors at Unsignalized Intersections ( http://arxiv.org/abs/2312.14561v1 ) ライセンス: Link先を確認	Supriya Sarker, Bibek Poudel, Michael Villarreal, Weizi Li	(参考訳) 本稿では,SUMOにおける手動ビデオデータラベリングと高度な交通シミュレーションを組み合わせた,新しいデータセットのレンズを通して,信号のない交差点における交通行動の複雑さについて検討する。この研究は、TNのメンフィスにある様々な無署名の交差点で、その日の異なる時間に交通を記録していた。ビデオデータを手動でラベル付けして特定の変数をキャプチャした後,SUMOシミュレーション環境におけるトラフィックシナリオを再構築した。これらのシミュレーションからの出力データは、車両移動の時間空間図、走行時間周波数分布、ボトルネック点を特定するための速度配置プロットを含む包括的な分析を提供した。このアプローチは、トラフィックダイナミクスの理解を深め、効果的なトラフィック管理とインフラ改善のための重要な洞察を提供する。 This paper explores the intricacies of traffic behavior at unsignalized intersections through the lens of a novel dataset, combining manual video data labeling and advanced traffic simulation in SUMO. This research involved recording traffic at various unsignalized intersections in Memphis, TN, during different times of the day. After manually labeling video data to capture specific variables, we reconstructed traffic scenarios in the SUMO simulation environment. The output data from these simulations offered a comprehensive analysis, including time-space diagrams for vehicle movement, travel time frequency distributions, and speed-position plots to identify bottleneck points. This approach enhances our understanding of traffic dynamics, providing crucial insights for effective traffic management and infrastructure improvements.	翻訳日:2023-12-25 15:26:29 公開日:2023-12-22
# Aurora:インストラクション・チューニングによるMistral-8x7Bスパースミキサーのための中国語チャット機能の活性化 Aurora:Activating Chinese chat capability for Mistral-8x7B sparse Mixture-of-Experts through Instruction-Tuning ( http://arxiv.org/abs/2312.14557v1 ) ライセンス: Link先を確認	Rongsheng Wang, Haoming Chen, Ruizhe Zhou, Yaofei Duan, Kunyan Cai, Han Ma, Jiaxi Cui, Jian Li, Patrick Cheong-Iao Pang, Yapeng Wang, Tao Tan	(参考訳) 既存の研究では、機械が生成する命令追従データを利用して大言語モデル(LLM)を精細化することで、人間が許可する命令を必要とせず、新しいタスクに対して印象的なゼロショット能力を発揮することが実証されている。本稿では,Mixtral-8x7B sparse Mixture-of-Experts モデルの中国語会話能力向上を目的とした,中国語の命令追従データセットの体系化,事前処理,統合を行う。この慎重に処理されたデータセットを微調整することで、Mixtral-8x7Bのスパースミクチャー・オブ・エクスプローラモデル"Aurora"の構築に成功した。オーロラの性能を評価するために,C-Eval, MMLU, CMMLUの3つのベンチマークテストを利用する。 Mixtral-8x7B sparse Mixture-of-Experts モデルに適用した命令微調整の有効性を実証研究により検証した。この研究は、スパースなエキスパート混合モデルにおける命令の微調整の実行において先駆的であり、このモデルアーキテクチャの能力向上において重要なブレークスルーとなった。私たちのコード、データ、モデルは、https://github.com/WangRongsheng/Auroraで公開されています。 Existing research has demonstrated that refining large language models (LLMs) through the utilization of machine-generated instruction-following data empowers these models to exhibit impressive zero-shot capabilities for novel tasks, without requiring human-authored instructions. In this paper, we systematically investigate, preprocess, and integrate three Chinese instruction-following datasets with the aim of enhancing the Chinese conversational capabilities of Mixtral-8x7B sparse Mixture-of-Experts model. Through instruction fine-tuning on this carefully processed dataset, we successfully construct the Mixtral-8x7B sparse Mixture-of-Experts model named "Aurora." To assess the performance of Aurora, we utilize three widely recognized benchmark tests: C-Eval, MMLU, and CMMLU. Empirical studies validate the effectiveness of instruction fine-tuning applied to Mixtral-8x7B sparse Mixture-of-Experts model. This work is pioneering in the execution of instruction fine-tuning on a sparse expert-mixed model, marking a significant breakthrough in enhancing the capabilities of this model architecture. Our code, data and model are publicly available at: https://github.com/WangRongsheng/Aurora	翻訳日:2023-12-25 15:25:50 公開日:2023-12-22
# 仮想アシスタントのセキュリティとプライバシーリスク姿勢の評価 Evaluating the Security and Privacy Risk Postures of Virtual Assistants ( http://arxiv.org/abs/2312.14633v1 ) ライセンス: Link先を確認	Borna Kalhor, Sanchari Das	(参考訳) 仮想アシスタント(VA)は、日々の作業で使いやすくなっているため、近年利用が増えている。その普及にもかかわらず、セキュリティとプライバシーの影響はいまだによく分かっていない。このギャップに対処するために、Alexa、Braina、Cortana、Google Assistant、Kalliope、Mycroft、Hound、Extremeの8つの広く使われている音声アシスタントのセキュリティとプライバシーの姿勢を評価する調査を行った。私たちは3つの脆弱性テストツール、AndroBugs, RiskInDroid, MobSFを使って、これらのVAのセキュリティとプライバシを評価しました。分析は、コード、アクセス制御、トラッキング、バイナリ分析、機密データ機密性の5つの領域に焦点を当てた。その結果、これらのVAはSSL証明書の検証、生のSQLクエリの実行、AESアルゴリズムの弱いモードの使用など、さまざまなセキュリティ脅威に対して脆弱であることが判明した。これらの脆弱性は、悪意のあるアクターがユーザーの個人情報を不正にアクセスできるようにする可能性がある。この研究は、これらの技術に関連するリスクを理解するための第一歩であり、より安全でプライバシーを尊重するVAを開発するための将来の研究の基礎を提供する。 Virtual assistants (VAs) have seen increased use in recent years due to their ease of use for daily tasks. Despite their growing prevalence, their security and privacy implications are still not well understood. To address this gap, we conducted a study to evaluate the security and privacy postures of eight widely used voice assistants: Alexa, Braina, Cortana, Google Assistant, Kalliope, Mycroft, Hound, and Extreme. We used three vulnerability testing tools, AndroBugs, RiskInDroid, and MobSF, to assess the security and privacy of these VAs. Our analysis focused on five areas: code, access control, tracking, binary analysis, and sensitive data confidentiality. The results revealed that these VAs are vulnerable to a range of security threats, including not validating SSL certificates, executing raw SQL queries, and using a weak mode of the AES algorithm. These vulnerabilities could allow malicious actors to gain unauthorized access to users' personal information. This study is a first step toward understanding the risks associated with these technologies and provides a foundation for future research to develop more secure and privacy-respecting VAs.	翻訳日:2023-12-25 15:18:47 公開日:2023-12-22
# メタバース検索を可能にする言語ベースのソリューション A Language-based solution to enable Metaverse Retrieval ( http://arxiv.org/abs/2312.14630v1 ) ライセンス: Link先を確認	Ali Abdari, Alex Falcon, Giuseppe Serra	(参考訳) 最近、Metaverseはますます魅力的になり、数百万のユーザーが利用可能なバーチャルワールドにアクセスしている。しかし、ユーザが現在の関心に最も合うMetaverseを見つけるには、どうすればよいのか? これまでのところ、検索のプロセスは主に口コミか、あるいはテクノロジー指向のウェブサイトの広告によって行われている。しかし、他のマルチメディアフォーマット(例えばビデオ用youtube)で利用可能な検索エンジンの欠如は、その限界を示している。この制限に対処するため,我々はユーザが求めるメタバースの所望の内容を自然に記述する言語を提案する。第2に,従来の3Dシーンとは違って,Metaverseのシナリオは,シナリオ自体のユーザクエリとの関連性に影響を与える,複数のタイプのマルチメディアを含むことが多いため,より複雑なデータフォーマットを表現する。そこで本研究では,テキストデータとのクロスモーダル関係を考慮しつつ,これらの側面をモデル化することを目的とした,テキスト対メタバース検索と呼ばれる新しいタスクを作成する。我々は,この問題に最初に取り組む人物であるため,マルチメディアコンテンツに富んだ3Dシーンで構成された33000のメタバースのデータセットも収集する。最後に、コントラスト学習に基づくディープラーニングフレームワークの設計と実装を行い、徹底的な実験的なセットアップを実現する。 Recently, the Metaverse is becoming increasingly attractive, with millions of users accessing the many available virtual worlds. However, how do users find the one Metaverse which best fits their current interests? So far, the search process is mostly done by word of mouth, or by advertisement on technology-oriented websites. However, the lack of search engines similar to those available for other multimedia formats (e.g., YouTube for videos) is showing its limitations, since it is often cumbersome to find a Metaverse based on some specific interests using the available methods, while also making it difficult to discover user-created ones which lack strong advertisement. To address this limitation, we propose to use language to naturally describe the desired contents of the Metaverse a user wishes to find. Second, we highlight that, differently from more conventional 3D scenes, Metaverse scenarios represent a more complex data format since they often contain one or more types of multimedia which influence the relevance of the scenario itself to a user query. Therefore, in this work, we create a novel task, called Text-to-Metaverse retrieval, which aims at modeling these aspects while also taking the cross-modal relations with the textual data into account. Since we are the first ones to tackle this problem, we also collect a dataset of 33000 Metaverses, each of which consists of a 3D scene enriched with multimedia content. Finally, we design and implement a deep learning framework based on contrastive learning, resulting in a thorough experimental setup.	翻訳日:2023-12-25 15:18:27 公開日:2023-12-22
# 二層グラフェン量子ドットと高インピーダンスマイクロ波共振器の双極子結合 Dipole coupling of a bilayer graphene quantum dot to a high-impedance microwave resonator ( http://arxiv.org/abs/2312.14629v1 ) ライセンス: Link先を確認	Max J. Ruckriegel, Lisa M. G\"achter, David Kealhofer, Mohsen Bahrami Panah, Chuyao Tong, Christoph Adam, Michele Masseroni, Hadrien Duprez, Rebekka Garreis, Kenji Watanabe, Takashi Taniguchi, Andreas Wallraff, Thomas Ihn, Klaus Ensslin, and Wei Wister Huang	(参考訳) 二層グラフェンの量子ドットを用いた回路量子電磁力学 (cQED) を実装し, 長いスピン状態とバレー状態を持つ半導体量子ビットの成熟材料プラットフォームである。本装置は、高インピーダンス(z_\mathrm{r} \approx 1 \mathrm{k{\omega}}$)超伝導マイクロ波共振器と、グラフェン系ファンデルワールスヘテロ構造において静電的に定義される二重量子ドットとを結合する。サブシステム間の電気双極子結合により、共振器は電荷安定図を再構成する二重量子ドットの電気感受性を感知することができる。 1${\mu}\mathrm{s}$の積分時間で信号対雑音比3.5で感度の高い高速検出を実現する。電荷-光子相互作用は、入力出力理論に対する共振器応答のカップリングによる変化を比較し、最大結合強度は$g/2{\pi} = 49.7 \mathrm{MHz}$である。本研究は,ファンデルワールス材料の量子ドットのプローブとしてcqedを導入し,二層グラフェン量子ドットとのコヒーレント電荷-光子カップリングへの道を示す。 We implement circuit quantum electrodynamics (cQED) with quantum dots in bilayer graphene, a maturing material platform for semiconductor qubits that can host long-lived spin and valley states. The presented device combines a high-impedance ($Z_\mathrm{r} \approx 1 \mathrm{k{\Omega}}$) superconducting microwave resonator with a double quantum dot electrostatically defined in a graphene-based van der Waals heterostructure. Electric dipole coupling between the subsystems allows the resonator to sense the electric susceptibility of the double quantum dot from which we reconstruct its charge stability diagram. We achieve sensitive and fast detection with a signal-to-noise ratio of 3.5 within 1 ${\mu}\mathrm{s}$ integration time. The charge-photon interaction is quantified in the dispersive and resonant regimes by comparing the coupling-induced change in the resonator response to input-output theory, yielding a maximal coupling strength of $g/2{\pi} = 49.7 \mathrm{MHz}$. Our results introduce cQED as a probe for quantum dots in van der Waals materials and indicate a path toward coherent charge-photon coupling with bilayer graphene quantum dots.	翻訳日:2023-12-25 15:18:04 公開日:2023-12-22
# cross silofederated learningとanalyticsによる,より持続可能なエンタープライズデータとアプリケーション管理に向けて Towards more sustainable enterprise data and application management with cross silo Federated Learning and Analytics ( http://arxiv.org/abs/2312.14628v1 ) ライセンス: Link先を確認	Hongliu Cao	(参考訳) プライバシ保護にコミットする新たな法的要件とポリシーに従うために、複数のクライアント/サイロが中央サーバの調整の下でグローバルモデルを協調的にトレーニングするグローバルスケールで、クロスサイロフェデレーション学習を展開する企業がますます増えている。データ共有と送信の代わりに、クライアントはプライベートなローカルデータと交換モデルのアップデートを使ってモデルをトレーニングする。しかし,関連研究の欠如により,クロスサイロ連関学習の炭素排出への影響についてはほとんど理解されていない。本研究では,モデル学習のみに焦点をあてるのではなく,AI製品ライフサイクル全体にわたって,クロスサイロ・フェデレートラーニングの持続可能性の側面を,集中型手法と比較して分析する。実世界のクロスサイロ・フェデレート・ラーニング・セッティングのためのより包括的な量的コストとCO2排出量推定手法を提案する。第2に,クロスサイロ連関学習と分析によるit企業の持続性とコスト効率の向上を目的とした,新たなデータ・アプリケーション管理システムを提案する。 To comply with new legal requirements and policies committed to privacy protection, more and more companies start to deploy cross-silo Federated Learning at global scale, where several clients/silos collaboratively train a global model under the coordination of a central server. Instead of data sharing and transmission, clients train models using their private local data and exchange model updates. However, there is little understanding of the carbon emission impact of cross silo Federated Learning due to the lack of related works. In this study, we first analyze the sustainability aspect of cross-silo Federated Learning, across the AI product life cycle instead of focusing only on the model training, with the comparison to the centralized method. A more holistic quantitative cost and CO2 emission estimation method for real world cross-silo Federated Learning setting is proposed. Secondly, we propose a novel data and application management system using cross silo Federated Learning and analytics to make IT companies more sustainable and cost effective.	翻訳日:2023-12-25 15:17:41 公開日:2023-12-22
# DSAP:データセットのデモグラフィック比較によるバイアスの分析 DSAP: Analyzing Bias Through Demographic Comparison of Datasets ( http://arxiv.org/abs/2312.14626v1 ) ライセンス: Link先を確認	Iris Dominguez-Catena, Daniel Paternain, Mikel Galar	(参考訳) ここ数年、人工知能システムはますます普及している。残念ながら、これらのシステムは人口統計バイアスを含む人間の意思決定と多くのバイアスを共有することができる。多くの場合、これらのバイアスはトレーニングに使用されるデータまで遡ることができる。これらのバイアスに関する私たちの知識にもかかわらず、さまざまなデータセットのバイアスを比較するだけでなく、それらのバイアスを検出して定量化するための一般的なツールがありません。そこで本研究では,2つのデータセットの人口構成を比較する2段階の手法であるdsap(demographic similarity from auxiliary profile)を提案する。 dsapは、3つの主要なアプリケーションにデプロイできる: データセットをまたがる人口統計学的盲点とバイアス問題を検出し、特徴付けし、単一のデータセットにおけるデータセットの人口統計バイアスを計測し、デプロイシナリオにおけるデータセットの人口統計学的シフトを計測する。 DSAPの重要な特徴は、明示的な人口統計ラベルなしでデータセットを堅牢に分析し、広範囲の状況に対してシンプルで解釈可能な機能を提供することである。提案手法の有用性を示すために,これまで人口統計学的偏見がみつかっていた表情認識タスクについて検討する。 3つのアプリケーションは、異なる特性を持つ20のデータセットのセットで研究される。コードはhttps://github.com/irisdominguez/dsapで入手できる。 In the last few years, Artificial Intelligence systems have become increasingly widespread. Unfortunately, these systems can share many biases with human decision-making, including demographic biases. Often, these biases can be traced back to the data used for training, where large uncurated datasets have become the norm. Despite our knowledge of these biases, we still lack general tools to detect and quantify them, as well as to compare the biases in different datasets. Thus, in this work, we propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of two datasets. DSAP can be deployed in three key applications: to detect and characterize demographic blind spots and bias issues across datasets, to measure dataset demographic bias in single datasets, and to measure dataset demographic shift in deployment scenarios. An essential feature of DSAP is its ability to robustly analyze datasets without explicit demographic labels, offering simplicity and interpretability for a wide range of situations. To show the usefulness of the proposed methodology, we consider the Facial Expression Recognition task, where demographic bias has previously been found. The three applications are studied over a set of twenty datasets with varying properties. The code is available at https://github.com/irisdominguez/DSAP.	翻訳日:2023-12-25 15:17:22 公開日:2023-12-22
# 階層型マルチエージェント強化学習による交通ネットワークにおける偽データインジェクション攻撃の評価 Hierarchical Multi-Agent Reinforcement Learning for Assessing False-Data Injection Attacks on Transportation Networks ( http://arxiv.org/abs/2312.14625v1 ) ライセンス: Link先を確認	Taha Eghtesad, Sirui Li, Yevgeniy Vorobeychik, Aron Laszka	(参考訳) ナビゲーションアプリケーションへのドライバーの依存が高まり、交通ネットワークは悪意のある俳優によるデータ操作攻撃の影響を受けやすくなった。管理者はデータ収集やナビゲーションサービスの処理の脆弱性を利用して偽情報を注入し、ドライバーの経路選択を妨害することができる。このような攻撃は交通渋滞を著しく増加させ、時間と資源のかなりの浪費をもたらし、道路ネットワークに依存している本質的なサービスを妨害する恐れがある。このような攻撃による脅威を評価するために,輸送ネットワークに対する最悪のデータ注入攻撃を見つけるための計算枠組みを導入する。まず、特定の道路で認識される走行時間を増加させることでドライバーを操作できる脅威俳優と、敵対的なモデルを考案する。次に,階層型マルチエージェント強化学習を用いて,データ操作の最適逆戦略を提案する。 NDネットワークトポロジであるスーフォールズへの攻撃をシミュレーションすることで,本手法の適用性を実証する。 The increasing reliance of drivers on navigation applications has made transportation networks more susceptible to data-manipulation attacks by malicious actors. Adversaries may exploit vulnerabilities in the data collection or processing of navigation services to inject false information, and to thus interfere with the drivers' route selection. Such attacks can significantly increase traffic congestions, resulting in substantial waste of time and resources, and may even disrupt essential services that rely on road networks. To assess the threat posed by such attacks, we introduce a computational framework to find worst-case data-injection attacks against transportation networks. First, we devise an adversarial model with a threat actor who can manipulate drivers by increasing the travel times that they perceive on certain roads. Then, we employ hierarchical multi-agent reinforcement learning to find an approximate optimal adversarial strategy for data manipulation. We demonstrate the applicability of our approach through simulating attacks on the Sioux Falls, ND network topology.	翻訳日:2023-12-25 15:17:01 公開日:2023-12-22
# 変形分解生成モデルによるルースフィッティングガーメントアニメーションに向けて Towards Loose-Fitting Garment Animation via Generative Model of Deformation Decomposition ( http://arxiv.org/abs/2312.14619v1 ) ライセンス: Link先を確認	Yifu Liu, Xiaoxia Li, Zhiling Luo, Wei Zhou	(参考訳) 既存の衣服アニメーションのデータ駆動手法は、通常直線的なスキニングによって駆動されるが、タイトな衣服では有効だが、複雑な変形を伴うゆったりした衣服をうまく扱わない。これらの制約に対処するために, 線形スキンを直接使用せずに, ゆるい変形を効率的にシミュレートする, 変形分解に基づく衣服生成モデルを開発した。具体的には,提案した生成モデルを用いて衣服生成空間を学習し,遅延表現を変形しない衣服と復号段階における動的オフセットに分離する。明示的な衣服の変形を分解することにより,我々の生成モデルは,標準的な衣服形状に複雑なポーズ駆動の変形を生成することができる。さらに,身体運動と衣服の以前の状態を潜在空間に移し,動的結果を再生することを学ぶ。さらに,高頻度シワを学習するために,敵の訓練装置に詳細拡張モジュールを導入する。提案手法は,大規模実験により最先端のデータ駆動方式よりも優れており,定性的かつ定量的な結果解析が可能である。 Existing data-driven methods for garment animation, usually driven by linear skinning, although effective on tight garments, do not handle loose-fitting garments with complex deformations well. To address these limitations, we develop a garment generative model based on deformation decomposition to efficiently simulate loose garment deformation without directly using linear skinning. Specifically, we learn a garment generative space with the proposed generative model, where we decouple the latent representation into unposed deformed garments and dynamic offsets during the decoding stage. With explicit garment deformations decomposition, our generative model is able to generate complex pose-driven deformations on canonical garment shapes. Furthermore, we learn to transfer the body motions and previous state of the garment to the latent space to regenerate dynamic results. In addition, we introduce a detail enhancement module in an adversarial training setup to learn high-frequency wrinkles. We demonstrate our method outperforms state-of-the-art data-driven alternatives through extensive experiments and show qualitative and quantitative analysis of results.	翻訳日:2023-12-25 15:16:43 公開日:2023-12-22
# 非エルミート行列反復による任意緩和速度 Arbitrary relaxation rate under non-Hermitian matrix iterations ( http://arxiv.org/abs/2312.14617v1 ) ライセンス: Link先を確認	Ja\v{s} Bensa	(参考訳) ブロックウォール(BW)ランダム量子回路における時間外相関(OTOC)を例として,非エルミート移動行列で伝播する可観測物の指数緩和について検討した。系の大きさとしてスケールするまで、観測可能な天体の指数的崩壊は通常、遷移行列の第二の最大の固有値によって決定されないが、一般的には遅く、この緩やかな崩壊速度は「幻の固有値」と呼ばれた。一般に、この緩やかな崩壊は、伝達行列の擬似スペクトルの最大の値によって与えられるが、この減衰率は、擬似スペクトルの第二の最大の固有値と最大の値の間の任意の値であることを示す。この任意の減衰は、例えば周期境界条件BW回路におけるOTOCの伝播において観測できる。この現象を探索するため,両端の2つの貯水池に結合した1次元偏りランダムウォークについて検討し,この単純なシステムはファントム固有値も示していることを示す。 We study the exponential relaxation of observables, propagated with a non-Hermitian transfer matrix, an example being out-of-time-ordered correlations (OTOC) in brickwall (BW) random quantum circuits. Until a time that scales as the system size, the exponential decay of observables is not usually determined by the second largest eigenvalue of the transfer matrix, as one can naively expect, but it is in general slower -- this slower decay rate was dubbed "phantom eigenvalue". Generally, this slower decay is given by the largest value in the pseudospecturm of the transfer matrix, however we show that the decay rate can be an arbitrary value between the second largest eigenvalue and the largest value in the pseudospectrum. This arbitrary decay can be observed for example in the propagation of OTOC in periodic boundary conditions BW circuits. To explore this phenomenon, we study a 1D biased random walk coupled to two reservoirs at the edges, and prove that this simple system also exhibits phantom eigenvalues.	翻訳日:2023-12-25 15:16:24 公開日:2023-12-22
# 行列投影のための固定点アルゴリズムと量子情報への応用 A fixed-point algorithm for matrix projections with applications in quantum information ( http://arxiv.org/abs/2312.14615v1 ) ライセンス: Link先を確認	Shrigyan Brahmachari, Roberto Rubboli, and Marco Tomamichel	(参考訳) 我々は、ある対称性の下で不変な正定値行列の集合上のバーズ距離に関して行列射影を計算する単純な不動点反復アルゴリズムを開発した。固定点反復アルゴリズムは反復数において最適解に指数関数的に早く収束することを示す。さらに、既定半定プログラム解法と比較して高速収束を示す。我々のアルゴリズムは,行列バリセンタの特定の場合において,元来 (\'Alvarez-Esteban et al., 2016) に導入された固定点反復アルゴリズムを復元する。以前の研究と比較すると、我々の証明は単純な行列の不等式のみに基づいており、より一般的で直接的である。最後に,量子資源理論と量子シャノン理論におけるアルゴリズムの応用について述べる。 We develop a simple fixed-point iterative algorithm that computes the matrix projection with respect to the Bures distance on the set of positive definite matrices that are invariant under some symmetry. We prove that the fixed-point iteration algorithm converges exponentially fast to the optimal solution in the number of iterations. Moreover, it numerically shows fast convergence compared to the off-the-shelf semidefinite program solvers. Our algorithm, for the specific case of matrix barycenters, recovers the fixed-point iterative algorithm originally introduced in (\'Alvarez-Esteban et al., 2016). Compared to previous works, our proof is more general and direct as it is based only on simple matrix inequalities. Finally, we discuss several applications of our algorithm in quantum resource theories and quantum Shannon theory.	翻訳日:2023-12-25 15:16:05 公開日:2023-12-22
# 一貫した画像編集のためのチューニングフリーインバージョンエンハンスド制御 Tuning-Free Inversion-Enhanced Control for Consistent Image Editing ( http://arxiv.org/abs/2312.14611v1 ) ライセンス: Link先を確認	Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong Fei, Mingyuan Fan, Junshi Huang	(参考訳) 実際の画像の一貫性のある編集は、アイデンティティや属性を変更することなく、入力画像のメインオブジェクトへの非厳密な編集(例えば姿勢の変更)を行う必要があるため、難しい作業である。一貫性のある属性を保証するために、既存のメソッドは構造的な一貫性のためにモデル全体やテキストの埋め込みを微調整するが、時間がかかり、厳密でない編集を行わない。他にもチューニングフリーな作品もあるが、実世界のシナリオではしばしば失敗するDDIM(Denoising Diffusion Implicit Model)の再構築によってパフォーマンスが低下している。本稿では, インバージョンプロセスの特徴とサンプリングプロセスの特徴を直接相関させて, DDIM再構成の不整合を緩和する, Tuning-free Inversion-enhanced Control (TIC) という新しい手法を提案する。具体的には、本手法は、自己保持層におけるキーおよび値の特徴から反転特徴を効果的に取得し、これらの反転特徴によりサンプリングプロセスを強化し、正確な再構成とコンテンツ一貫性編集を実現する。また,本手法の適用性を一般的な編集シナリオに拡張するために,インバージョンと単純なDDIM編集プロセスの内容を組み合わせたマスク誘導型注意結合戦略を提案する。実験の結果,提案手法は従来の再構成や一貫した編集に優れており,様々な設定で印象的な結果が得られることがわかった。 Consistent editing of real images is a challenging task, as it requires performing non-rigid edits (e.g., changing postures) to the main objects in the input image without changing their identity or attributes. To guarantee consistent attributes, some existing methods fine-tune the entire model or the textual embedding for structural consistency, but they are time-consuming and fail to perform non-rigid edits. Other works are tuning-free, but their performances are weakened by the quality of Denoising Diffusion Implicit Model (DDIM) reconstruction, which often fails in real-world scenarios. In this paper, we present a novel approach called Tuning-free Inversion-enhanced Control (TIC), which directly correlates features from the inversion process with those from the sampling process to mitigate the inconsistency in DDIM reconstruction. Specifically, our method effectively obtains inversion features from the key and value features in the self-attention layers, and enhances the sampling process by these inversion features, thus achieving accurate reconstruction and content-consistent editing. To extend the applicability of our method to general editing scenarios, we also propose a mask-guided attention concatenation strategy that combines contents from both the inversion and the naive DDIM editing processes. Experiments show that the proposed method outperforms previous works in reconstruction and consistent editing, and produces impressive results in various settings.	翻訳日:2023-12-25 15:15:52 公開日:2023-12-22
# BLSTMを用いたエンドツーエンド音声認識のための信頼度推定 BLSTM-Based Confidence Estimation for End-to-End Speech Recognition ( http://arxiv.org/abs/2312.14609v1 ) ライセンス: Link先を確認	Atsunori Ogawa, Naohiro Tawara, Takatomo Kano, Marc Delcroix	(参考訳) 自動音声認識(ASR)における各認識トークン(単語,サブワード,文字など)の信頼度を推定し,誤認識トークンを検知する信頼度推定は,ASRアプリケーションを開発する上で重要な機能である。本研究では,エンド・ツー・エンド(E2E)ASR仮説に対する信頼度推定を行う。最近のE2E ASRシステムは、様々なASRタスクに対して高い性能(例えば、5%のトークンエラー率)を示す。このような状況では、ほとんど正しいトークンシーケンスから頻繁な不正トークンを検出する必要があるため、信頼度推定が困難になる。この不均衡データセット問題に対処するために、クラスバランスの目的を訓練した強力なバイナリクラス(誤り/誤)シーケンスラベスターとして、双方向長短期メモリ(BLSTM)ベースのモデルを用いる。実験により,複数の種類のASR復号化スコアを補助的特徴として利用することにより,高不均衡条件下での信頼性推定性能が着実に向上することが確認された。また,BLSTMに基づくモデルの方がTransformerベースの信頼度推定モデルより優れていることを確認した。 Confidence estimation, in which we estimate the reliability of each recognized token (e.g., word, sub-word, and character) in automatic speech recognition (ASR) hypotheses and detect incorrectly recognized tokens, is an important function for developing ASR applications. In this study, we perform confidence estimation for end-to-end (E2E) ASR hypotheses. Recent E2E ASR systems show high performance (e.g., around 5% token error rates) for various ASR tasks. In such situations, confidence estimation becomes difficult since we need to detect infrequent incorrect tokens from mostly correct token sequences. To tackle this imbalanced dataset problem, we employ a bidirectional long short-term memory (BLSTM)-based model as a strong binary-class (correct/incorrect) sequence labeler that is trained with a class balancing objective. We experimentally confirmed that, by utilizing several types of ASR decoding scores as its auxiliary features, the model steadily shows high confidence estimation performance under highly imbalanced settings. We also confirmed that the BLSTM-based model outperforms Transformer-based confidence estimation models, which greatly underestimate incorrect tokens.	翻訳日:2023-12-25 15:15:28 公開日:2023-12-22
# 進化的部分微分方程式に対処する離散物理学インフォームドニューラルネットワーク Efficient Discrete Physics-informed Neural Networks for Addressing Evolutionary Partial Differential Equations ( http://arxiv.org/abs/2312.14608v1 ) ライセンス: Link先を確認	Siqi Chen, Bin Shan, Ye Li	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、ディープラーニングを用いて偏微分方程式(PDE)を解く有望な可能性を示している。しかし、PINNは進化的PDE、特に時間とともに多スケールまたは乱流の挙動を示す動的システムに対する訓練困難に直面している。 PINNの損失の時間的特徴が同時に訓練されているため、PINNが時間的因果性に反する可能性がある。本稿では,時間的因果関係を強制するために暗黙の時間差のスキームを用い,空間内のピンを異なる時間枠のpde解のサロゲートとして逐次更新する転送学習法を提案する。進化するPINNは、隣接する時間フレーム間の小さな更新しか必要とせず、進化方程式の様々な複雑さを捉えることができる。本手法は, 時間ステップが小さく, 異なる時間フレームのピンが十分に訓練されている場合, 理論的に収束することが証明される。さらに、既存のPINNの定式化が失敗したり、非効率であったりする様々なベンチマークに対して、最先端(SOTA)数値結果を提供する。提案手法は,進化的PDEのPINN近似の精度を向上し,効率を4～40倍に向上することを示した。 Physics-informed neural networks (PINNs) have shown promising potential for solving partial differential equations (PDEs) using deep learning. However, PINNs face training difficulties for evolutionary PDEs, particularly for dynamical systems whose solutions exhibit multi-scale or turbulent behavior over time. The reason is that PINNs may violate the temporal causality property since all the temporal features in the PINNs loss are trained simultaneously. This paper proposes to use implicit time differencing schemes to enforce temporal causality, and use transfer learning to sequentially update the PINNs in space as surrogates for PDE solutions in different time frames. The evolving PINNs are better able to capture the varying complexities of the evolutionary equations, while only requiring minor updates between adjacent time frames. Our method is theoretically proven to be convergent if the time step is small and each PINN in different time frames is well-trained. In addition, we provide state-of-the-art (SOTA) numerical results for a variety of benchmarks for which existing PINNs formulations may fail or be inefficient. We demonstrate that the proposed method improves the accuracy of PINNs approximation for evolutionary PDEs and improves efficiency by a factor of 4-40x.	翻訳日:2023-12-25 15:15:03 公開日:2023-12-22
# ChatGPT, Llama, 私のレポートを書いてもらえますか? 現地)大規模言語モデルを用いたデジタル鑑識レポート支援の試み ChatGPT, Llama, can you write my report? An experiment on assisted digital forensics reports written using (Local) Large Language Models ( http://arxiv.org/abs/2312.14607v1 ) ライセンス: Link先を確認	Ga\"etan Michelet, Frank Breitinger	(参考訳) 生成AI、特にChatGPTやLlamaのような大規模言語モデル(LLM)は、デジタル法医学の貴重なツールとして大きく進歩している。初期の研究では、ChatGPTの可能性を調査しているが、LLMが法医学的な報告書作成プロセスにどの程度役立つかという問題は未解決のままである。この問題に答えるために、この記事はまず、一般化の目的(例えば、レポートの「平均構造」を見つける)で法医学的なレポートを調べる。次に,本報告の異なる部分を生成するためのllmの強みと限界を事例研究を用いて評価する。この研究は、デジタル法医学調査の重要な側面であるレポート作成の自動化に関する貴重な洞察を提供する。本稿では,徹底的な証明読解と修正を組み合わせることで,レポート作成プロセスの実践者を支援することができるが,現時点では置き換えることはできないと結論付けている。 Generative AIs, especially Large Language Models (LLMs) such as ChatGPT or Llama, have advanced significantly, positioning them as valuable tools for digital forensics. While initial studies have explored the potential of ChatGPT in the context of investigations, the question of to what extent LLMs can assist the forensic report writing process remains unresolved. To answer the question, this article first examines forensic reports with the goal of generalization (e.g., finding the `average structure' of a report). We then evaluate the strengths and limitations of LLMs for generating the different parts of the forensic report using a case study. This work thus provides valuable insights into the automation of report writing, a critical facet of digital forensics investigations. We conclude that combined with thorough proofreading and corrections, LLMs may assist practitioners during the report writing process but at this point cannot replace them.	翻訳日:2023-12-25 15:14:38 公開日:2023-12-22
# トランスフォーマチックサリエンシーマップを用いたマルチカメラ3次元物体検出 Explainable Multi-Camera 3D Object Detection with Transformer-Based Saliency Maps ( http://arxiv.org/abs/2312.14606v1 ) ライセンス: Link先を確認	Till Beemelmanns, Wassim Zahr, Lutz Eckstein	(参考訳) 視覚トランスフォーマー(vits)は、3dオブジェクト検出を含む様々なコンピュータビジョンタスクで最先端の結果を得た。しかし、そのエンドツーエンドの実装により、ViTの説明がより簡単になるため、自律運転のような安全クリティカルなアプリケーションにViTをデプロイする上では、その予測の背後にあるモデルの理由を理解することが、当局、開発者、ユーザにとって重要である。本稿では,3次元物体検出に使用される複数のカメラ入力を持つDutR-like ViTのサリエンシマップを生成する手法を提案する。本手法は生の注意に基づく手法であり,勾配法よりも効率的である。提案手法を広範な摂動テストを用いて評価し, 視覚的品質や定量的指標において, 他の説明可能性法よりも優れていることを示す。また,トランスの異なる層にまたがって注意を集結させることの重要性を示す。私たちの研究は、AIモデルの内部動作に関する透明性を確立することによって、AIアプリケーションの信頼性向上に役立つ、ViTのための説明可能なAIの開発に寄与します。 Vision Transformers (ViTs) have achieved state-of-the-art results on various computer vision tasks, including 3D object detection. However, their end-to-end implementation also makes ViTs less explainable, which can be a challenge for deploying them in safety-critical applications, such as autonomous driving, where it is important for authorities, developers, and users to understand the model's reasoning behind its predictions. In this paper, we propose a novel method for generating saliency maps for a DetR-like ViT with multiple camera inputs used for 3D object detection. Our method is based on the raw attention and is more efficient than gradient-based methods. We evaluate the proposed method on the nuScenes dataset using extensive perturbation tests and show that it outperforms other explainability methods in terms of visual quality and quantitative metrics. We also demonstrate the importance of aggregating attention across different layers of the transformer. Our work contributes to the development of explainable AI for ViTs, which can help increase trust in AI applications by establishing more transparency regarding the inner workings of AI models.	翻訳日:2023-12-25 15:14:20 公開日:2023-12-22
# 拒否する理由? 言語モデルと判断の整合 Reasons to Reject? Aligning Language Models with Judgments ( http://arxiv.org/abs/2312.14591v1 ) ライセンス: Link先を確認	Weiwen Xu, Deng Cai, Zhisong Zhang, Wai Lam, Shuming Shi	(参考訳) 人間として、私たちは常に仲間と対話し、自然言語の形でフィードバックを受けます。この言語フィードバックによって、行動の反映、適切な行動の維持、エラーの修正が可能になります。大きな言語モデル(llm)を調整するために、言語フィードバックを使用できますか? llmを報酬や嗜好データと整合させる以前の研究とは対照的に、言語フィードバック(すなわち判断)のレンズを通してアライメントを体系的に探索する最初の研究を示す。我々は,LSMと判断の整合性に適応できる潜在的な方法の詳細な調査を開始し,これらの方法が判断を十分に活用できないことを明らかにした。判断をより効果的に活用するために,判断に基づく不適切な内容の検出と修正を可能にする新しい枠組みであるContrastive Unlikelihood Training (CUT)を提案する。オフラインアライメントの結果は、市販の判断データ1317件だけで、カット(llama2-13b)が175bのdavinci003を上回り、alpacaevalの最高基準を52.34ポイント上回ったことを示している。オンラインアライメントの結果、cut はモデル固有の判断データを用いて反復的に llms (llama2-chat-13b) を調整でき、alpacaeval の 81.09 から 91.36 ポイントの安定した性能向上が得られた。分析の結果,LLMアライメントの報奨や今後の研究の保証よりも高い可能性が示唆された。 As humans, we consistently engage in interactions with our peers and receive feedback in the form of natural language. This language feedback allows us to reflect on our actions, maintain appropriate behavior, and rectify our errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with reward or preference data, we present the first systematic exploration of alignment through the lens of language feedback (i.e., judgment). We commence with an in-depth investigation of potential methods that can be adapted for aligning LLMs with judgments, revealing that these methods are unable to fully capitalize on the judgments. To facilitate more effective utilization of judgments, we propose a novel framework, Contrastive Unlikelihood Training (CUT), that allows for fine-grained inappropriate content detection and correction based on judgments. Our offline alignment results show that, with merely 1317 off-the-shelf judgment data, CUT (LLaMA2-13b) can beat the 175B DaVinci003 and surpass the best baseline by 52.34 points on AlpacaEval. The online alignment results demonstrate that CUT can align LLMs (LLaMA2-chat-13b) in an iterative fashion using model-specific judgment data, with a steady performance improvement from 81.09 to 91.36 points on AlpacaEval. Our analysis further suggests that judgments exhibit greater potential than rewards for LLM alignment and warrant future research.	翻訳日:2023-12-25 15:14:01 公開日:2023-12-22
# MEAOD:オブジェクト検出器に対するモデル抽出攻撃 MEAOD: Model Extraction Attack against Object Detectors ( http://arxiv.org/abs/2312.14677v1 ) ライセンス: Link先を確認	Zeyu Li, Chenghui Shi, Yuwen Pu, Xuhong Zhang, Yu Li, Jinbao Li, Shouling Ji	(参考訳) さまざまな業界でディープラーニング技術が広く使われているため、ディープニューラルネットワークモデルの価値が高く、その結果、潜在的な攻撃者にとって魅力的なターゲットとなっている。モデル抽出攻撃、特にクエリベースのモデル抽出攻撃は、攻撃者が犠牲者モデルに匹敵する機能を持つ代替モデルを複製し、MLaaSプラットフォームの機密性とセキュリティに重大な脅威を与えることを可能にする。近年、多くの研究が分類モデルに対するモデル抽出攻撃の脅威を探っているが、現実のシナリオで頻繁に使用されるオブジェクト検出モデルはあまり注目されていない。本稿では,オブジェクト検出モデルに対するクエリベースモデル抽出攻撃の課題と実現可能性を調査し,meaodと呼ばれる効果的な攻撃手法を提案する。攻撃者評価データセットからサンプルを選択して、アクティブラーニングを使用して効率的なクエリデータセットを構築し、不十分なオブジェクトでカテゴリを強化する。さらに,クエリデータセットのアノテーションを更新することで,抽出効率も向上する。グレーボックスとブラックボックスのシナリオ実験により、10kのクエリ予算の所定の条件下での抽出性能を70%以上達成した。 The widespread use of deep learning technology across various industries has made deep neural network models highly valuable and, as a result, attractive targets for potential attackers. Model extraction attacks, particularly query-based model extraction attacks, allow attackers to replicate a substitute model with comparable functionality to the victim model and present a significant threat to the confidentiality and security of MLaaS platforms. While many studies have explored threats of model extraction attacks against classification models in recent years, object detection models, which are more frequently used in real-world scenarios, have received less attention. In this paper, we investigate the challenges and feasibility of query-based model extraction attacks against object detection models and propose an effective attack method called MEAOD. It selects samples from the attacker-possessed dataset to construct an efficient query dataset using active learning and enhances the categories with insufficient objects. We additionally improve the extraction effectiveness by updating the annotations of the query dataset. According to our gray-box and black-box scenarios experiments, we achieve an extraction performance of over 70% under the given condition of a 10k query budget.	翻訳日:2023-12-25 15:07:43 公開日:2023-12-22
# LLMによるテキストからのゼロショット因果グラフ外挿 Zero-shot Causal Graph Extrapolation from Text via LLMs ( http://arxiv.org/abs/2312.14670v1 ) ライセンス: Link先を確認	Alessandro Antonucci, Gregorio Piqu\'e, Marco Zaffalon	(参考訳) 我々は,自然言語から因果関係を推定する大規模言語モデル (LLM) の能力を評価する。従来の自然言語処理やディープラーニング技術と比較して、LLMは(専門的な)トレーニングサンプルを必要とせずにペア関係のベンチマークで競合性能を示す。これにより、反復的なペアワイズクエリを通じて因果グラフを外挿するアプローチを拡張するモチベーションが生まれます。専門家が検証した真正の因果グラフを用いた生物医学的要約のベンチマークを予備分析する。この結果は、特に医学領域において、分析する科学的テキストの量が膨大であり、因果関係のステートメントが暗黙的である場合において、因果関係推論における重要なステップとしてLSMの採用を約束し、支持している。 We evaluate the ability of large language models (LLMs) to infer causal relations from natural language. Compared to traditional natural language processing and deep learning techniques, LLMs show competitive performance in a benchmark of pairwise relations without needing (explicit) training samples. This motivates us to extend our approach to extrapolating causal graphs through iterated pairwise queries. We perform a preliminary analysis on a benchmark of biomedical abstracts with ground-truth causal graphs validated by experts. The results are promising and support the adoption of LLMs for such a crucial step in causal inference, especially in medical domains, where the amount of scientific text to analyse might be huge, and the causal statements are often implicit.	翻訳日:2023-12-25 15:07:25 公開日:2023-12-22
# モダリティを考慮したマルチモーダルインテント認識のためのトーケンレベルコントラスト学習 Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition ( http://arxiv.org/abs/2312.14667v1 ) ライセンス: Link先を確認	Qianrui Zhou, Hua Xu, Hao Li, Hanlei Zhang, Xiaohan Zhang, Yifan Wang, Kai Gao	(参考訳) マルチモーダルな意図認識は,実世界のマルチモーダルなシナリオにおいて,人間の言語や行動を理解する上で重要なタスクを構成する,ユーザの意図を理解するために,表現,身体の動き,発話のトーンといった多様なモダリティを活用することを目的としている。しかしながら、既存の手法の大半は、異なるモダリティ間の潜在的な相関や、非言語的モダリティから意味的特徴を効果的に学習する際の独自の制限を無視している。本稿では,モダリティ・アウェア・プロンプト(tcl-map)を用いたトークンレベルのコントラスト学習手法を提案する。テキストモダリティのための最適なマルチモーダルセマンティクス環境を確立するために、類似性に基づくモダリティアライメントとクロスモダリティアライメントアライメント機構を備えたテキスト、ビデオ、オーディオモダリティの機能を効果的に調整・融合するモダリティ・アウェア・プロンプト・モジュール(map)を開発した。提案するトークンレベルコントラスト学習フレームワーク(TCL)は,モダリティ対応のプロンプトと基底真理ラベルに基づいて,拡張サンプルを構築し,NT-Xent損失をラベルトークンに適用する。特に、TCLは、目的ラベルから導かれる最適なテキスト意味的洞察を利用して、他のモダリティの学習プロセスを導出する。広範な実験により,本手法は最先端手法と比較して著しく改善が得られた。さらに, アブレーション解析により, マルチモーダルプロンプト学習において有意な重要性を持つ手作りプロンプトよりも, モダリティ認識プロンプトが優れていることが示された。コードはhttps://github.com/thuiar/TCL-MAPで公開されている。 Multimodal intent recognition aims to leverage diverse modalities such as expressions, body movements and tone of speech to comprehend user's intent, constituting a critical task for understanding human language and behavior in real-world multimodal scenarios. Nevertheless, the majority of existing methods ignore potential correlations among different modalities and own limitations in effectively learning semantic features from nonverbal modalities. In this paper, we introduce a token-level contrastive learning method with modality-aware prompting (TCL-MAP) to address the above challenges. To establish an optimal multimodal semantic environment for text modality, we develop a modality-aware prompting module (MAP), which effectively aligns and fuses features from text, video and audio modalities with similarity-based modality alignment and cross-modality attention mechanism. Based on the modality-aware prompt and ground truth labels, the proposed token-level contrastive learning framework (TCL) constructs augmented samples and employs NT-Xent loss on the label token. Specifically, TCL capitalizes on the optimal textual semantic insights derived from intent labels to guide the learning processes of other modalities in return. Extensive experiments show that our method achieves remarkable improvements compared to state-of-the-art methods. Additionally, ablation analyses demonstrate the superiority of the modality-aware prompt over the handcrafted prompt, which holds substantial significance for multimodal prompt learning. The codes are released at https://github.com/thuiar/TCL-MAP.	翻訳日:2023-12-25 15:07:13 公開日:2023-12-22
# ボソニックcQEDにおける光-物質相互作用系間のオンデマンドトランスポジション On-demand transposition across light-matter interaction regimes in bosonic cQED ( http://arxiv.org/abs/2312.14665v1 ) ライセンス: Link先を確認	Fernando Valadares, Ni-Ni Huang, Kyle Chu, Aleksandr Dorogov, Weipin Chua, Kong Lingda, Pengtao Song, Yvonne Y. Gao	(参考訳) 科学とテクノロジーにおける光・物質相互作用の多様な応用は、これらの相互作用が定性的に異なる形で現れることに由来する。ボソニックcQEDは高Q超伝導キャビティの光電場を非線形回路素子に結合させ、その相互作用のリッチなダイナミクスを量子情報処理に利用している。しかし,キャビティコヒーレンスを損なうことなくインタラクションレジームの高速スイッチングを実現することは大きな課題である。本研究は,トランスモンのナノ秒スケールの周波数調整性と,数百マイクロ秒の寿命の共振器を結合した最初の実験である。提案手法は,共振相互作用を用いたキャビティフォック状態の高速生成や,定性的に異なる相互作用系での相互交換トモグラフィ技術,アイドル進化における不必要なキャビティ・トランスモンダイナミクスの抑制など,量子情報処理の新たな機能を実現する。ボソニックcQEDツールキットにフラックスチューナビリティーを導入することで、我々の研究は単一のプラットフォーム内での光-物質相互作用のフル範囲を探索する新しいパラダイムを開拓し、堅牢で汎用的な量子情報処理への有用な新しい経路を提供する。 The diverse applications of light-matter interactions in science and technology stem from the qualitatively distinct ways these interactions manifest, prompting the development of physical platforms that can interchange between regimes on demand. Bosonic cQED employs the light field of high-Q superconducting cavities coupled to non-linear circuit elements, harnessing the rich dynamics of their interaction for quantum information processing. However, implementing fast switching of the interaction regime without deteriorating the cavity coherence is a significant challenge. We present the first experiment to achieve this feat, combining nanosecond-scale frequency tunability of a transmon coupled to a cavity with lifetime of hundreds of microseconds. Our implementation affords a range of new capabilities for quantum information processing; from fast creation of cavity Fock states using resonant interaction and interchanging tomography techniques at qualitatively distinct interaction regimes on the fly, to the suppression of unwanted cavity-transmon dynamics during idle evolution. By bringing flux tunability into the bosonic cQED toolkit, our work opens up a new paradigm to probe the full range of light-matter interaction dynamics within a single platform and provides valuable new pathways towards robust and versatile quantum information processing.	翻訳日:2023-12-25 15:06:39 公開日:2023-12-22
# NeRFアンサンブルを用いた密度不確かさの定量化:データとシーン制約の影響 Density Uncertainty Quantification with NeRF-Ensembles: Impact of Data and Scene Constraints ( http://arxiv.org/abs/2312.14664v1 ) ライセンス: Link先を確認	Miriam J\"ager, Steven Landgraf, Boris Jutzi	(参考訳) コンピュータグラフィックス、コンピュータビジョン、フォトグラムメトリーの分野では、Neural Radiance Fields(NeRF)が現在の研究と開発を駆動する主要なトピックである。しかし、NeRF生成した3Dシーンの再現とその後の表面再構成の品質は、ネットワーク出力、特に密度に大きく依存している。この重要な側面については,平均密度とともに密度不確かさ推定を提供するNeRF-Ensemblesの利用を提案する。我々は,低画質画像やポーズなどのデータ制約がトレーニングプロセスの劣化,密度の不確実性の増大,予測密度の低下につながることを示した。高品質な入力データであっても、密度の不確実性は、取得コンステレーション、オクルージョン、材料特性などのシーン制約によって異なる。 NeRF-Ensemblesは不確実性を定量化するツールを提供するだけでなく、2つの有望な利点を示す。単一 NeRF の代わりに NeRF-Ensembles を用いることで、小さな外周を除去し、構造全体の完全性を改善したスムーズな出力が得られる。さらに,密度の不確かさに対するパーセンタイルに基づくしきい値の適用は,後処理において大きな(フォギー)アーティファクトの除去に有効であることが証明された。私たちは3つの異なるデータセットで方法論を実行します。 (i)合成ベンチマークデータセット (ii)実際のベンチマークデータセット (iii)現実的な記録条件とセンサによる実データ。 In the fields of computer graphics, computer vision and photogrammetry, Neural Radiance Fields (NeRFs) are a major topic driving current research and development. However, the quality of NeRF-generated 3D scene reconstructions and subsequent surface reconstructions, heavily relies on the network output, particularly the density. Regarding this critical aspect, we propose to utilize NeRF-Ensembles that provide a density uncertainty estimate alongside the mean density. We demonstrate that data constraints such as low-quality images and poses lead to a degradation of the training process, increased density uncertainty and decreased predicted density. Even with high-quality input data, the density uncertainty varies based on scene constraints such as acquisition constellations, occlusions and material properties. NeRF-Ensembles not only provide a tool for quantifying the uncertainty but exhibit two promising advantages: Enhanced robustness and artifact removal. Through the utilization of NeRF-Ensembles instead of single NeRFs, small outliers are removed, yielding a smoother output with improved completeness of structures. Furthermore, applying percentile-based thresholds on density uncertainty outliers proves to be effective for the removal of large (foggy) artifacts in post-processing. We conduct our methodology on 3 different datasets: (i) synthetic benchmark dataset, (ii) real benchmark dataset, (iii) real data under realistic recording conditions and sensors.	翻訳日:2023-12-25 15:06:15 公開日:2023-12-22
# 深部非パラメトリック時系列予測器 Deep Non-Parametric Time Series Forecaster ( http://arxiv.org/abs/2312.14657v1 ) ライセンス: Link先を確認	Syama Sundar Rangapuram, Jan Gasthaus, Lorenzo Stella, Valentin Flunkert, David Salinas, Yuyang Wang, Tim Januschowski	(参考訳) 本稿では,時系列予測のための非パラメトリックベースラインモデルを提案する。従来の予測モデルとは異なり、提案手法は予測分布のパラメトリック形式を仮定せず、学習可能な戦略に従って経験的分布からサンプリングして予測を生成する。これにより、モデルは常に妥当な予測(すなわち観測されたデータ範囲内での予測)を生成することができ、いくつかのデータ分布の数値安定性に苦しむ古典的なモデルと異なり失敗することはない。さらに,提案手法のグローバルバージョンを開発し,複数の時系列にまたがる情報を活用することで,サンプリング戦略を自動的に学習する。実験的な評価は,提案手法がすべてのデータセットに対して合理的かつ一貫した性能を示し,予測ツールボックスで考慮すべき強いベースラインであることを証明している。 This paper presents non-parametric baseline models for time series forecasting. Unlike classical forecasting models, the proposed approach does not assume any parametric form for the predictive distribution and instead generates predictions by sampling from the empirical distribution according to a tunable strategy. By virtue of this, the model is always able to produce reasonable forecasts (i.e., predictions within the observed data range) without fail unlike classical models that suffer from numerical stability on some data distributions. Moreover, we develop a global version of the proposed method that automatically learns the sampling strategy by exploiting the information across multiple related time series. The empirical evaluation shows that the proposed methods have reasonable and consistent performance across all datasets, proving them to be strong baselines to be considered in one's forecasting toolbox.	翻訳日:2023-12-25 15:05:50 公開日:2023-12-22
# SAVAE: 変分ベイズオートエンコーダの生存分析への応用 SAVAE: Leveraging the variational Bayes autoencoder for survival analysis ( http://arxiv.org/abs/2312.14651v1 ) ライセンス: Link先を確認	Patricia A. Apell\'aniz and Juan Parras and Santiago Zazo	(参考訳) 多くの医学研究の分野と同様に、生存分析は、複雑な、高次元、異質、不完全、検閲された医療データをモデル化するためのディープラーニング技術の応用への関心が高まっている。現在の手法では、実際には有効でない可能性のあるデータ間の関係を仮定することが多い。そこで本研究では,変分オートエンコーダに基づく新しいアプローチであるsavae(survival analysis variational autoencoder)を提案する。 SAVAEは、生存分析のための調整されたELBO定式化を導入し、共変量と生存時間の様々なパラメトリック分布をサポートすることで、この分野に大きく貢献する。さまざまなメトリクスを一貫して実行し、さまざまな実験を通じて堅牢性と安定性を示す一般的な方法を提供する。提案手法は, 時間とイベント, 検閲, 共変性相互作用, 時間変化リスク関連を効果的に推定する。我々は、ゲノム、臨床、人口統計データを含む多様なデータセットでモデルを検証し、様々なレベルの検閲を行う。このアプローチは、Concordance IndexとIntegrated Brier Scoreで評価されるように、最先端技術と比較して競合性能を示す。 SAVAEはまた、共変量と時間をパラメトリックにモデル化する解釈可能なモデルも提供している。さらに、その生成アーキテクチャは、クラスタリング、データ計算、生存データからの潜時空間推論による合成患者データの生成など、さらなる応用を促進する。 As in many fields of medical research, survival analysis has witnessed a growing interest in the application of deep learning techniques to model complex, high-dimensional, heterogeneous, incomplete, and censored medical data. Current methods often make assumptions about the relations between data that may not be valid in practice. In response, we introduce SAVAE (Survival Analysis Variational Autoencoder), a novel approach based on Variational Autoencoders. SAVAE contributes significantly to the field by introducing a tailored ELBO formulation for survival analysis, supporting various parametric distributions for covariates and survival time (as long as the log-likelihood is differentiable). It offers a general method that consistently performs well on various metrics, demonstrating robustness and stability through different experiments. Our proposal effectively estimates time-to-event, accounting for censoring, covariate interactions, and time-varying risk associations. We validate our model in diverse datasets, including genomic, clinical, and demographic data, with varying levels of censoring. This approach demonstrates competitive performance compared to state-of-the-art techniques, as assessed by the Concordance Index and the Integrated Brier Score. SAVAE also offers an interpretable model that parametrically models covariates and time. Moreover, its generative architecture facilitates further applications such as clustering, data imputation, and the generation of synthetic patient data through latent space inference from survival data.	翻訳日:2023-12-25 15:05:37 公開日:2023-12-22
# ロバストステレオマッチングのためのグローバルオクルージョンアウェアトランスフォーマ Global Occlusion-Aware Transformer for Robust Stereo Matching ( http://arxiv.org/abs/2312.14650v1 ) ライセンス: Link先を確認	Zihua Liu, Yizhou Li and Masatoshi Okutomi	(参考訳) 学習に基づくステレオマッチングアルゴリズムによる顕著な進歩にもかかわらず、オクルード領域などの不条件領域のパフォーマンスは依然としてボトルネックとなっている。受容領域が限られているため、既存のCNNベースの手法はこれらの不条件領域を効果的に扱うのに苦労する。この問題に対処するため,本稿では,長距離依存とオクルージョン・アウェアネスのグローバルコンテキストを活用する,GOAT(Global Occlusion-Aware Transformer)と呼ばれる新しいアテンションベースのステレオマッチングネットワークを提案する。ヤギアーキテクチャにおいて, 初期偏差マップと咬合マスクを並列注意機構を用いて推定するために, 並列偏差・咬合推定モジュールpdoが提案されている。閉塞領域における不均一性の推定をさらに高めるため,OGA (Oocclusion-aware Global aggregate module) を提案する。本モジュールは、オクルード領域の焦点範囲内で制限されたグローバル相関を利用して、オクルード領域の格差を洗練することを目的としている。 sceneflow, kitti 2015, middleburyなど,いくつかの公開ベンチマークデータセットで広範な実験が行われた。その結果,提案手法はすべてのベンチマーク,特にオクルード領域において有意な性能を示した。 Despite the remarkable progress facilitated by learning-based stereo-matching algorithms, the performance in the ill-conditioned regions, such as the occluded regions, remains a bottleneck. Due to the limited receptive field, existing CNN-based methods struggle to handle these ill-conditioned regions effectively. To address this issue, this paper introduces a novel attention-based stereo-matching network called Global Occlusion-Aware Transformer (GOAT) to exploit long-range dependency and occlusion-awareness global context for disparity estimation. In the GOAT architecture, a parallel disparity and occlusion estimation module PDO is proposed to estimate the initial disparity map and the occlusion mask using a parallel attention mechanism. To further enhance the disparity estimates in the occluded regions, an occlusion-aware global aggregation module (OGA) is proposed. This module aims to refine the disparity in the occluded regions by leveraging restricted global correlation within the focus scope of the occluded areas. Extensive experiments were conducted on several public benchmark datasets including SceneFlow, KITTI 2015, and Middlebury. The results show that the proposed GOAT demonstrates outstanding performance among all benchmarks, particularly in the occluded regions.	翻訳日:2023-12-25 15:05:13 公開日:2023-12-22
# genaiのためのpub/subメッセージブローカ Pub/Sub Message Brokers for GenAI ( http://arxiv.org/abs/2312.14647v1 ) ライセンス: Link先を確認	Alaa Saleh, Susanna Pirttikangas and Lauri Lov\'en	(参考訳) 今日のデジタル世界では、Large Language Models(LLMs)のようなジェネレーティブ人工知能(GenAI)がますます普及し、多様なアプリケーションにまたがる範囲を広げている。この採用の増加により、データ中心のGenAIモデルに対する需要が大幅に増加し、堅牢なデータ通信インフラの必要性が浮かび上がっている。このニーズの中心はメッセージブローカで、さまざまなシステムコンポーネント内でデータ転送に必要なチャネルとして機能します。この調査は、従来のメッセージブローカと現代のメッセージブローカを総合的に分析することを目的としており、一般的なプラットフォームの比較研究を提供している。本研究は,オープンソースの可用性,統合監視ツール,メッセージ優先順位付け機構,並列処理機能,信頼性,分散とクラスタリング機能,認証プロセス,データ永続化戦略,耐障害性,スケーラビリティなど,数多くの基準を検討する。さらに、各メッセージブローカの設計と運用が課す固有の制約についても検討し、これらの制限が現実世界の適用性を理解する上で重要であることを認識した。そして、これらの洞察を活用して、高度なメッセージブローカフレームワークを提案します -- GenAIアプリケーションの進化する要求を満たすために必要な適応性と堅牢性を設計します。最後に,genaiコンテキストに特化したメッセージブローカ機構の強化について検討し,汎用的なメッセージブローカフレームワークの開発を重要視する。このようなフレームワークは、近い将来、GenAIの動的かつ増大する要求に対処して、迅速な適応を実現することができるだろう。この二元的アプローチを通じて、我々は、GenAIデータ通信の領域における将来のイノベーションとインフラの進歩を導くための基礎的なコンペディションに貢献するつもりです。 In today's digital world, Generative Artificial Intelligence (GenAI) such as Large Language Models (LLMs) is becoming increasingly prevalent, extending its reach across diverse applications. This surge in adoption has sparked a significant increase in demand for data-centric GenAI models, highlighting the necessity for robust data communication infrastructures. Central to this need are message brokers, which serve as essential channels for data transfer within various system components. This survey aims to delve into a comprehensive analysis of traditional and modern message brokers, offering a comparative study of prevalent platforms. Our study considers numerous criteria including, but not limited to, open-source availability, integrated monitoring tools, message prioritization mechanisms, capabilities for parallel processing, reliability, distribution and clustering functionalities, authentication processes, data persistence strategies, fault tolerance, and scalability. Furthermore, we explore the intrinsic constraints that the design and operation of each message broker might impose, recognizing that these limitations are crucial in understanding their real-world applicability. We then leverage these insights to propose a sophisticated message broker framework -- one designed with the adaptability and robustness necessary to meet the evolving requisites of GenAI applications. Finally, this study examines the enhancement of message broker mechanisms specifically for GenAI contexts, emphasizing the criticality of developing a versatile message broker framework. Such a framework would be poised for quick adaptation, catering to the dynamic and growing demands of GenAI in the foreseeable future. Through this dual-pronged approach, we intend to contribute a foundational compendium that can guide future innovations and infrastructural advancements in the realm of GenAI data communication.	翻訳日:2023-12-25 15:04:50 公開日:2023-12-22
# 複数訪問型健康状態推定による患者記録の協調合成 Collaborative Synthesis of Patient Records through Multi-Visit Health State Inference ( http://arxiv.org/abs/2312.14646v1 ) ライセンス: Link先を確認	Hongda Sun, Hongzhan Lin, Rui Yan	(参考訳) 電子健康記録(EHR)は医療における機械学習アプリケーションの基礎となり、実際の患者記録の有用性はプライバシやセキュリティ上の懸念によって制限されることが多い。合成EHR生成は、この制限を補うための追加の視点を提供する。既存のほとんどの手法は、医学的常識に則ったイベントの組み合わせを制御できないEHRデータにおいて、さまざまな種類のイベントを考慮せずに、実際のEHRデータに基づいて新しいレコードを合成する。本稿では,これらの制約に対処するために,協調的EHR合成のためのマルチビジットヘルスステータス推論モデルMSICを提案する。まず、確率的グラフィカルモデルとして合成EHR生成過程を定式化し、潜伏状態のモデル化により様々な種類の事象を密結合する。次に,複数回の訪問シナリオ用に調整された健康状態推定手法を導出し,過去の記録を効果的に活用し,現在および将来の記録を合成する。さらに、各医療イベントにテキスト記述を追加するための医用レポートの作成を提案し、ehrデータを合成するための幅広いアプリケーションを提供する。各訪問で異なる段落を生成するために,複数の生成元のメッセージパッシングを協調して,高品質なレポートを生成するために2相復号戦略を用いるマルチジェネレータ審議フレームワークを組み込んだ。広く使われているベンチマークMIMIC-IIIとMIMIC-IVに関する広範な実験は、MSICがプライバシーリスクを低く保ちながら、合成データの品質に関する最先端の成果を示すものである。 Electronic health records (EHRs) have become the foundation of machine learning applications in healthcare, while the utility of real patient records is often limited by privacy and security concerns. Synthetic EHR generation provides an additional perspective to compensate for this limitation. Most existing methods synthesize new records based on real EHR data, without consideration of different types of events in EHR data, which cannot control the event combinations in line with medical common sense. In this paper, we propose MSIC, a Multi-visit health Status Inference model for Collaborative EHR synthesis to address these limitations. First, we formulate the synthetic EHR generation process as a probabilistic graphical model and tightly connect different types of events by modeling the latent health states. Then, we derive a health state inference method tailored for the multi-visit scenario to effectively utilize previous records to synthesize current and future records. Furthermore, we propose to generate medical reports to add textual descriptions for each medical event, providing broader applications for synthesized EHR data. For generating different paragraphs in each visit, we incorporate a multi-generator deliberation framework to collaborate the message passing of multiple generators and employ a two-phase decoding strategy to generate high-quality reports. Our extensive experiments on the widely used benchmarks, MIMIC-III and MIMIC-IV, demonstrate that MSIC advances state-of-the-art results on the quality of synthetic data while maintaining low privacy risks.	翻訳日:2023-12-25 15:04:22 公開日:2023-12-22
# 結合複素syk模型の熱力学と動力学 Thermodynamics and dynamics of coupled complex SYK models ( http://arxiv.org/abs/2312.14644v1 ) ライセンス: Link先を確認	Jan C. Louw, Linda M. van Manen, Rishabh Jha	(参考訳) 大きな$qの複素SYKモデルは、様々なブラックホールで共有されるファンデルワールス(平均場)と同じ普遍性クラスに属することが知られている。同時に、マルダセナ=シェンカー=スタンフォード境界(MSS)も飽和し、最大カオスとなる。この研究は、SYK様モデルの共有普遍性クラスと量子カオスのロバスト性を確立し、異なる順序の大きなq$複素SYKモデルの結合系に拡張する。本稿では, 相転移を観察する熱力学的(臨界指数)特性と, 時間外相関器(OTOC)計算による動的(リャプノフ指数)特性の詳細な導出を行う。解析の結果, 相互作用強度比による追加スケーリングパラメータの導入にもかかわらず, 単一SYKモデルと同様, 低温で連続的な位相遷移を行うことがわかった。臨界指数は、ファンデルワールスガスや様々なAdSブラックホールと共有されるランダウ・ギンツブルク(平均場)普遍性クラスと一致している。さらに,結合syk系は低温下では最大q$制限値において最大カオス状態のままであり,マルダセナ・シェンカー・スタンフォード(mss)境界に固着し,これは1つの大きなq$複素sykモデルと一致する特徴である。これらの発見は、複雑な量子系における普遍性とカオスに関するより広範な探求の道を開き、我々の結合したSYK系は、量子カオスのMSS境界を飽和させながら、ファンデルワールスや様々なAdSブラックホールと同じ普遍性クラスに属することを示した。 It has been known that the large-$q$ complex SYK model falls under the same universality class as that of van der Waals (mean-field) which is also shared by a variety of black holes. At the same time, it also saturates the Maldacena-Shenker-Stanford (MSS) bound and is thus maximally chaotic. This work establishes the robustness of shared universality class and quantum chaos for SYK-like models by extending to a system of coupled large-$q$ complex SYK models of different orders. We provide a detailed derivation of thermodynamic (critical exponents) properties observing a phase transition and dynamic (Lyapunov exponent) properties via the out-of-time correlator (OTOC) calculations. Our analysis reveals that, despite the introduction of an additional scaling parameter through interaction strength ratios, the system undergoes a continuous phase transition at low temperatures, similar to that of a single SYK model. The critical exponents align with the Landau-Ginzburg (mean-field) universality class, shared with van der Waals gases and various AdS black holes. Furthermore, we demonstrate that the coupled SYK system remains maximally chaotic in the large-$q$ limit at low temperatures, adhering to the Maldacena-Shenker-Stanford (MSS) bound, a feature consistent with single large-$q$ complex SYK model. These findings open avenues for broader inquiries into the universality and chaos in complex quantum systems by showing that our coupled SYK system belong to the same universality class as that of van der Waals and various AdS black holes while saturating the MSS bound of quantum chaos.	翻訳日:2023-12-25 15:03:54 公開日:2023-12-22
# 測定による圧縮フォック状態の生成 Generation of squeezed Fock states by measurement ( http://arxiv.org/abs/2312.14643v1 ) ライセンス: Link先を確認	S. B. Korolev, E. N. Bashmakova, A. K. Tagantsev, T. Yu. Golubeva	(参考訳) 2モードの絡み合ったガウス状態(TMEG)からの1つ以上の光子サブトラクションによる圧縮フォック状態の生成は理論的に対処される。その結果,任意の順序フォック状態が生成可能であることを示し,tmeg状態のパラメータに課してそのような生成を保証すべき条件を得た。我々はこの条件が満たされる体制を普遍的解決体制と呼んだ。その結果, 任意のTMEG状態からの1光子サブトラクションにより, 第1圧縮Fock状態の生成が引き続き可能となるように, 上記条件は冗長であることがわかった。同時に、最初の圧縮されたフォック状態生成の最大生成確率は、普遍解状態に対応する。本研究では,ビームスプリッタと制御Z演算を用いた圧縮フォック状態の生成に関する記述に,上記の結果を適用した。最大確率でスクイズドフォック状態を得るために必要な,これらの設定パラメータと入力スクイズド状態のパラメータを推定した。 The generation of squeezed Fock states by the one or more photon subtraction from a two-mode entangled Gaussian (TMEG) state is theoretically addressed. We showed that an arbitrary order Fock state can be generated this way and we obtained a condition that should be imposed on the parameters of the TMEG state to guaranty such a generation. We called the regime, in which this condition is satisfied, universal solution regime. We showed that, for first squeezed Fock state, the above condition is redundant such that the generation of the first squeezed Fock state is still possible by a one photon subtraction from an arbitrary TMEG state. At the same time, the maximum generation probability of the first squeezed Fock state generation corresponds to the universal solution regime. We applied the above results to the description of generation of the squeezed Fock states using a beam splitter and a Controlled-Z operation. We have estimated the parameters of such setups and input squeezed states, which are necessary to obtain squeezed Fock states with the maximum probability.	翻訳日:2023-12-25 15:03:24 公開日:2023-12-22
# オーバーザ・エアフェデレーション学習におけるエネルギー効率と分布ロバスト性のバランス Balancing Energy Efficiency and Distributional Robustness in Over-the-Air Federated Learning ( http://arxiv.org/abs/2312.14638v1 ) ライセンス: Link先を確認	Mohamed Badi, Chaouki Ben Issaid, Anis Elgabli and Mehdi Bennis	(参考訳) ワイヤレスエッジデバイスの増加により、エネルギー、帯域幅、レイテンシ、データの均一性に関する課題が拡大した。これらの課題は分散学習のボトルネックになっている。これらの問題に対処するため,エアコン(AirComp)を用いた分布的堅牢な連邦学習(FL)におけるエネルギー効率を保証する新しい手法を提案する。本研究では,エネルギー効率とロバスト性を効果的にバランスさせるために,エネルギー効率に配慮した決定論的手法と,分散ロバスト性に配慮した確率論的手法の2つの相補的な洞察を統合する新しいクライアント選択手法を導入する。シミュレーションの結果,提案アルゴリズムの有効性は,ロバスト性とエネルギー効率の両面から,ベースラインよりも優れた性能を示し,ベースラインよりも3倍以上の省エネを実現している。 The growing number of wireless edge devices has magnified challenges concerning energy, bandwidth, latency, and data heterogeneity. These challenges have become bottlenecks for distributed learning. To address these issues, this paper presents a novel approach that ensures energy efficiency for distributionally robust federated learning (FL) with over air computation (AirComp). In this context, to effectively balance robustness with energy efficiency, we introduce a novel client selection method that integrates two complementary insights: a deterministic one that is designed for energy efficiency, and a probabilistic one designed for distributional robustness. Simulation results underscore the efficacy of the proposed algorithm, revealing its superior performance compared to baselines from both robustness and energy efficiency perspectives, achieving more than 3-fold energy savings compared to the considered baselines.	翻訳日:2023-12-25 15:03:09 公開日:2023-12-22
# ニューラルフローマップ上の流体シミュレーション Fluid Simulation on Neural Flow Maps ( http://arxiv.org/abs/2312.14635v1 ) ライセンス: Link先を確認	Yitong Deng, Hong-Xing Yu, Diyang Zhang, Jiajun Wu, and Bo Zhu	(参考訳) 本稿では,流れ図の理論に基づく流体シミュレーションにより,暗黙的ニューラル表現の新たなパラダイムをブリッジする新しいシミュレーション手法であるニューラル・フロー・マップを導入し,流体現象の最先端のシミュレーションを実現する。重なり合う,多解像度,空間的にスパースグリッドのピラミッドで小さなニューラルネットワークを融合させ,長期時空間速度場を高精度にコンパクトに表現する,新しいハイブリッドニューラルネットワーク表現(Spatially Sparse Neural Fields, SSNF)を考案する。このニューラル・ベロシティ・バッファを手元に,長期的な双方向フローマップとそのヤコビアンを機械的に対称的に計算し,既存の解に対する劇的な精度向上を図る。これらの長距離双方向フローマップは、低い散逸で高いアドベクション精度を実現し、複雑な渦構造を示す高忠実な非圧縮性フローシミュレーションを容易にする。本研究は, 跳躍渦, 衝突渦, 渦再接続, 移動障害物からの渦発生, 密度差など, 様々な困難なシミュレーションシナリオにおいて, 神経流体シミュレーションの有効性を実証する。実例では, エネルギー保存, 視覚の複雑さ, 実験観察への順守, 詳細な渦構造保存の観点から, 既存の手法よりも高い性能を示す。 We introduce Neural Flow Maps, a novel simulation method bridging the emerging paradigm of implicit neural representations with fluid simulation based on the theory of flow maps, to achieve state-of-the-art simulation of inviscid fluid phenomena. We devise a novel hybrid neural field representation, Spatially Sparse Neural Fields (SSNF), which fuses small neural networks with a pyramid of overlapping, multi-resolution, and spatially sparse grids, to compactly represent long-term spatiotemporal velocity fields at high accuracy. With this neural velocity buffer in hand, we compute long-term, bidirectional flow maps and their Jacobians in a mechanistically symmetric manner, to facilitate drastic accuracy improvement over existing solutions. These long-range, bidirectional flow maps enable high advection accuracy with low dissipation, which in turn facilitates high-fidelity incompressible flow simulations that manifest intricate vortical structures. We demonstrate the efficacy of our neural fluid simulation in a variety of challenging simulation scenarios, including leapfrogging vortices, colliding vortices, vortex reconnections, as well as vortex generation from moving obstacles and density differences. Our examples show increased performance over existing methods in terms of energy conservation, visual complexity, adherence to experimental observations, and preservation of detailed vortical structures.	翻訳日:2023-12-25 15:02:52 公開日:2023-12-22
# 説明可能・説明不能ロボットとのインタラクションにおけるマルチモーダルコミュニケーションパターンのマイニング Mining multi-modal communication patterns in interaction with explainable and non-explainable robots ( http://arxiv.org/abs/2312.14634v1 ) ライセンス: Link先を確認	Suna Bensch and Amanda Eriksson	(参考訳) 説明可能で説明不能なロボットと対話する人間のインタラクションパターンについて検討する。説明不能なロボットは、説明可能なロボットとは対照的に、動作や非動作を説明せず、インタラクション中に他のフィードバックも与えないロボットである。 20人の人間が説明可能なpepperロボットか説明不能なpepperロボットのいずれかにボード上のオブジェクトを移動させるように指示したボードゲーム中に、人間の行動を記録し分析した。ビデオの転写と注釈は、アソシエーションルールマイニングのためのトランザクションに変換された。アソシエーション・ルールは、ロボットと人間の相互作用におけるコミュニケーションパターンを発見し、最も興味深いルールは、通常の2乗テストでもテストされた。統計的に有意な結果は、男性と説明不能なロボットと女性と説明可能なロボットの間に強い相関関係があり、人間がロボットのモダリティの一部を反映しているということである。また,人間のインタラクションパターンの文脈化が重要であり,関連ルールを調査ツールとして活用することが重要であることも示唆した。これらの結果は,人間の行動に適応するロボットの設計において重要である。 We investigate interaction patterns for humans interacting with explainable and non-explainable robots. Non-explainable robots are here robots that do not explain their actions or non-actions, neither do they give any other feedback during interaction, in contrast to explainable robots. We video recorded and analyzed human behavior during a board game, where 20 humans verbally instructed either an explainable or non-explainable Pepper robot to move objects on the board. The transcriptions and annotations of the videos were transformed into transactions for association rule mining. Association rules discovered communication patterns in the interaction between the robots and the humans, and the most interesting rules were also tested with regular chi-square tests. Some statistically significant results are that there is a strong correlation between men and non-explainable robots and women and explainable robots, and that humans mirror some of the robot's modality. Our results also show that it is important to contextualize human interaction patterns, and that this can be easily done using association rules as an investigative tool. The presented results are important when designing robots that should adapt their behavior to become understandable for the interacting humans.	翻訳日:2023-12-25 15:02:26 公開日:2023-12-22
# 金融システム設計のためのテキスト-SQL翻訳の強化 Enhancing Text-to-SQL Translation for Financial System Design ( http://arxiv.org/abs/2312.14725v1 ) ライセンス: Link先を確認	Yewei Song, Saad Ezzini, Xunzhu Tang, Cedric Lothritz, Jacques Klein, Tegawend\'e Bissyand\'e, Andrey Boytsov, Ulrick Ble, Anne Goujon	(参考訳) 自然言語質問をSQLクエリに変換するタスクであるText-to-SQLは、さまざまなビジネスプロセスの一部である。その自動化は新たな課題であり、ソフトウェア実践者が自然言語を使ってリレーショナルデータベースとシームレスに対話できるようにし、ビジネスニーズとソフトウェア能力のギャップを埋める。本稿では,様々なNLPタスクの最先端技術を実現したLarge Language Models (LLMs)について考察する。具体的には、テキストからSQLまでのパフォーマンス、評価手法、および入力最適化(プロンプトなど)をベンチマークする。本稿では,SQLクエリ間の類似性を適切に測定するための2つの新しい指標を提案する。全体としては,テキストからsqlへのタスクで適切なllmを選択する方法など,さまざまな調査結果をコミュニティと共有しています。さらに、木ベースの編集距離が、生成したSQLクエリとText2SQLアプローチのベンチマークのオラクルとの類似性を評価するための信頼性の高い指標であることを示す。このメトリクスは、研究者が事前の作業で生成されたクエリを実行するなど、計算コストのかかる実験を行う必要がなくなるため、重要である。本研究は、金融ドメインのユースケースを実装し、text2sqlシステムの進歩と、このドメインでの実用化に寄与する。 Text-to-SQL, the task of translating natural language questions into SQL queries, is part of various business processes. Its automation, which is an emerging challenge, will empower software practitioners to seamlessly interact with relational databases using natural language, thereby bridging the gap between business needs and software capabilities. In this paper, we consider Large Language Models (LLMs), which have achieved state of the art for various NLP tasks. Specifically, we benchmark Text-to-SQL performance, the evaluation methodologies, as well as input optimization (e.g., prompting). In light of the empirical observations that we have made, we propose two novel metrics that were designed to adequately measure the similarity between SQL queries. Overall, we share with the community various findings, notably on how to select the right LLM on Text-to-SQL tasks. We further demonstrate that a tree-based edit distance constitutes a reliable metric for assessing the similarity between generated SQL queries and the oracle for benchmarking Text2SQL approaches. This metric is important as it relieves researchers from the need to perform computationally expensive experiments such as executing generated queries as done in prior works. Our work implements financial domain use cases and, therefore contributes to the advancement of Text2SQL systems and their practical adoption in this domain.	翻訳日:2023-12-25 14:55:48 公開日:2023-12-22
# 離散選択モデルにおける画像:多モード入力におけるデータ同型対応 Images in Discrete Choice Modeling: Addressing Data Isomorphism in Multi-Modality Inputs ( http://arxiv.org/abs/2312.14724v1 ) ライセンス: Link先を確認	Brian Sifringer, Alexandre Alahi	(参考訳) 本稿では,dcm(離散選択モデリング)と機械学習の交点について検討し,dcmの実用機能への画像データの統合とそのモデル解釈性への影響について考察する。本稿では,DCMフレームワーク内の従来の表型入力と同型情報を共有する高次元画像データの埋め込み結果について検討する。ニューラルネットワーク(NN)コンポーネントは、共起が存在するときの画像から表層変数表現を学習し、複製することにより、DCMパラメータの解釈可能性を向上させる。我々は,冗長な情報を分離するためのアーキテクチャ設計調整と,ソース情報マスキングとインパインティングによる同型情報緩和の2つの手法を提案する。半合成データセットを用いて行った実験により, 設計上の変更が不決定性を示す一方で, データソースの直接緩和はDCMの解釈可能なパラメータの整合性を維持する上で, より効果的な戦略であることが示された。本稿は,実世界における知見の適用可能性について考察し,複雑なデータモダリティを結合したハイブリッドモデリングにおける今後の研究の意義について考察する。 MITのモラルマシンデータセットを用いて表と画像データの整合性を完全に制御し、Learning Multinomial Logit(L-MNL)フレームワークをデプロイすることにより、両方の入力を選択モデルにマージする。 This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning, focusing on the integration of image data into DCM's utility functions and its impact on model interpretability. We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework. Our study reveals that neural network (NN) components learn and replicate tabular variable representations from images when co-occurrences exist, thereby compromising the interpretability of DCM parameters. We propose and benchmark two methodologies to address this challenge: architectural design adjustments to segregate redundant information, and isomorphic information mitigation through source information masking and inpainting. Our experiments, conducted on a semi-synthetic dataset, demonstrate that while architectural modifications prove inconclusive, direct mitigation at the data source shows to be a more effective strategy in maintaining the integrity of DCM's interpretable parameters. The paper concludes with insights into the applicability of our findings in real-world settings and discusses the implications for future research in hybrid modeling that combines complex data modalities. Full control of tabular and image data congruence is attained by using the MIT moral machine dataset, and both inputs are merged into a choice model by deploying the Learning Multinomial Logit (L-MNL) framework.	翻訳日:2023-12-25 14:55:27 公開日:2023-12-22
# gerrymandering平面グラフ Gerrymandering Planar Graphs ( http://arxiv.org/abs/2312.14721v1 ) ライセンス: Link先を確認	Jack Dippel, Max Dupr\'e la Tour, April Niu, Adrian Vetta	(参考訳) 地図再帰問題 (gerrymandering) の計算複雑性について検討する。数学的には、選挙地区設計者 (gerrymanderer) は、重み付きグラフを$k$連結成分 (districts) に分割し、その候補 (party) ができるだけ多くの地区で勝利する。先行研究は主に、グラフがパスまたはツリーである特別なケースに関するものである。私たちの焦点は、グラフが平面である現実的なケースに関するものです。我々は、候補数と$\lambda$が定数であり、頂点重み(投票重み)が多項式有界であるとき、ジェリーマンディング問題は$\lambda$-outerplanar graphsの多項式時間で解けることを証明した。対照的に、問題は2つの候補でさえ一般平面グラフにおいてNP完全である。これは、gerrymandering平面グラフの近似アルゴリズムの研究を動機付ける。しかし、候補数が大きければ、ゲリーマンデラーが1つの地区に勝てない場合と、ゲリーマンデラーが少なくとも1つの地区に勝てる場合とを区別することは困難である。これは即時、 P=NP でない限り、再制限問題は平面グラフの多項式時間では適用できないことを意味する。この結論は、優れた近似アルゴリズムの設計のターミナルであるように見えるが、そうではない。ゲリーマンデラーが勝つことができる範囲の最大数が極端に小さい場合にのみ適用されるため、近似可能性の境界は回避できる。実際、固定数の候補に対して、我々の主な結果は、最適値が十分大きな定数であれば、未重み付き平面グラフを再配置するための定数係数近似アルゴリズムが存在することである。 We study the computational complexity of the map redistricting problem (gerrymandering). Mathematically, the electoral district designer (gerrymanderer) attempts to partition a weighted graph into $k$ connected components (districts) such that its candidate (party) wins as many districts as possible. Prior work has principally concerned the special cases where the graph is a path or a tree. Our focus concerns the realistic case where the graph is planar. We prove that the gerrymandering problem is solvable in polynomial time in $\lambda$-outerplanar graphs, when the number of candidates and $\lambda$ are constants and the vertex weights (voting weights) are polynomially bounded. In contrast, the problem is NP-complete in general planar graphs even with just two candidates. This motivates the study of approximation algorithms for gerrymandering planar graphs. However, when the number of candidates is large, we prove it is hard to distinguish between instances where the gerrymanderer cannot win a single district and instances where the gerrymanderer can win at least one district. This immediately implies that the redistricting problem is inapproximable in polynomial time in planar graphs, unless P=NP. This conclusion appears terminal for the design of good approximation algorithms -- but it is not. The inapproximability bound can be circumvented as it only applies when the maximum number of districts the gerrymanderer can win is extremely small, say one. Indeed, for a fixed number of candidates, our main result is that there is a constant factor approximation algorithm for redistricting unweighted planar graphs, provided the optimal value is a large enough constant.	翻訳日:2023-12-25 14:55:02 公開日:2023-12-22
# 静止ボソニックモードのディジタルホモダインとヘテロダイン検出 Digital homodyne and heterodyne detection for stationary bosonic modes ( http://arxiv.org/abs/2312.14720v1 ) ライセンス: Link先を確認	Ingrid Strandberg, Axel Eriksson, Baptiste Royer, Mikael Kervinen, Simone Gasparinetti	(参考訳) ホモ・ヘテロダイン検出は伝搬電磁場を測定する基本的な技術である。しかし、これらの技法をキャビティに閉じ込められた定常場に適用することは困難である。この課題を克服するために,空洞と相互作用する2段階システムの間接的測定を繰り返すことを提案する。提案手法が単一ショットレベルでのホモ・ヘテロダイン検出の測定統計を忠実に再現できることを数値的に示す。このスキームは、回路量子電磁力学を含む様々な物理アーキテクチャで実装することができる。量子検証プロトコルを含む線形検出を必要とする量子アルゴリズムを定常モードで実装する方法について検討した。 Homo- and heterodyne detection are fundamental techniques for measuring propagating electromagnetic fields. However, applying these techniques to stationary fields confined in cavities poses a challenge. As a way to overcome this challenge, we propose to use repeated indirect measurements of a two-level system interacting with the cavity. We demonstrate numerically that the proposed measurement scheme faithfully reproduces measurement statistics of homo- or heterodyne detection at the single-shot level. The scheme can be implemented in various physical architectures, including circuit quantum electrodynamics. Our results pave the way to the implementation of quantum algorithms requiring linear detection, including quantum verification protocols, in stationary modes.	翻訳日:2023-12-25 14:54:32 公開日:2023-12-22
# Rydbergイオンを捕捉した三部量子ラビモデル Tripartite quantum Rabi model with trapped Rydberg ions ( http://arxiv.org/abs/2312.14718v1 ) ライセンス: Link先を確認	Thomas J. Hamlyn, Chi Zhang, Igor Lesanovsky, and Weibin Li	(参考訳) ボソニックモードがスピン-スピン相互作用を通じて2つのスピン-1/2粒子に同時に結合する三成分量子ラビモデル(tqrm)について検討し、スピン-スピン-ボーソンカップリング--二成分スピン-ボーソンカップリングを特徴とする従来の量子ラビモデルから脱却する。 tqrmの対称性は、スピン状態間のエネルギー差を表すデチューニングパラメータに依存する。ゼロデチューニングにおいて、パリティ対称性はTQRMを量子ラビモデルに還元することができる。 3部結合強度が増加するにつれて、基底状態における超ラジカル遷移が予測される。非ゼロデチューニングでは、トータルスピンはTQRMの唯一の保存量として現れる。 3部結合が非ゼロである限り、基底状態において超放射能が優位であることがわかった。固有スペクトルが得られたTQRMのブラックG関数を解析的に導出する。 TQRMは、TQRM内で必要となる三部結合と単体相互作用が自然に存在する、リドバーグイオン量子シミュレータで実現可能である。 We investigate a tripartite quantum Rabi model (TQRM) wherein a bosonic mode concurrently couples to two spin-1/2 particles through a spin-spin interaction, resulting in a spin-spin-boson coupling--a departure from conventional quantum Rabi models featuring bipartite spin-boson couplings. The symmetries of the TQRM depend on the detuning parameter, representing the energy difference between the spin states. At zero detuning, a parity symmetry renders the TQRM reducible to a quantum Rabi model. A subradiant to superradiant transition in the groundstate is predicted as the tripartite coupling strength increases. For non-zero detuning, the total spin emerges as the sole conserved quantity in the TQRM. It is found that superradiance prevails in the groundstate as long as the tripartite coupling remains non-zero. We derive the Braak G-function of the TQRM analytically, with which the eigenspectra are obtained. The TQRM can be realized in a viable trapped Rydberg ion quantum simulator where the required tripartite couplings and single body interactions in the TQRM are naturally present.	翻訳日:2023-12-25 14:54:22 公開日:2023-12-22
# 逆転送多目的最適化 Inverse Transfer Multiobjective Optimization ( http://arxiv.org/abs/2312.14713v1 ) ライセンス: Link先を確認	Jiao Liu, Abhishek Gupta, and Yew-Soon Ong	(参考訳) 転送最適化により、関連するソースタスクからの経験的事前情報を活用することで、ターゲットタスクのデータ効率の最適化が可能になる。これは、厳密な評価予算の下で一連のトレードオフソリューションを求める多目的最適化設定において特に有用である。本稿では,多目的最適化における逆移動の概念を紹介する。逆伝達は、目的空間のパフォーマンスベクトルをタスク固有の決定空間における集団探索分布にマッピングするために確率的逆モデルを用いることで際立っている。このアイデアに基づいて,InvTrEMO(Inverse Transfer Multiobjective Evolutionary Optimizer)を提案する。 invtremoの重要な特徴は、意思決定空間がタスク間で正確に一致していない場合でも、多くのアプリケーション領域で広く使われている共通の客観的関数を利用する能力である。これにより、invTrEMOは異種ソースタスクからの情報をユニークかつ効果的に利用することができる。さらに、invTrEMOは、高精度の逆モデルを重要な副産物として提供し、ユーザの好みに基づいて、オンデマンドで調整されたソリューションを生成する。多目的および多目的ベンチマーク問題に関する実証研究は、実例研究と同様に、最先端の進化的およびベイズ最適化アルゴリズムと比較して、invTrEMOの高速収束率とモデリング精度を示す。 invTrEMOのソースコードはhttps://github.com/LiuJ-2023/invTrEMOで公開されている。 Transfer optimization enables data-efficient optimization of a target task by leveraging experiential priors from related source tasks. This is especially useful in multiobjective optimization settings where a set of trade-off solutions is sought under tight evaluation budgets. In this paper, we introduce a novel concept of inverse transfer in multiobjective optimization. Inverse transfer stands out by employing probabilistic inverse models to map performance vectors in the objective space to population search distributions in task-specific decision space, facilitating knowledge transfer through objective space unification. Building upon this idea, we introduce the first Inverse Transfer Multiobjective Evolutionary Optimizer (invTrEMO). A key highlight of invTrEMO is its ability to harness the common objective functions prevalent in many application areas, even when decision spaces do not precisely align between tasks. This allows invTrEMO to uniquely and effectively utilize information from heterogeneous source tasks as well. Furthermore, invTrEMO yields high-precision inverse models as a significant byproduct, enabling the generation of tailored solutions on-demand based on user preferences. Empirical studies on multi- and many-objective benchmark problems, as well as a practical case study, showcase the faster convergence rate and modelling accuracy of the invTrEMO relative to state-of-the-art evolutionary and Bayesian optimization algorithms. The source code of the invTrEMO is made available at https://github.com/LiuJ-2023/invTrEMO.	翻訳日:2023-12-25 14:54:01 公開日:2023-12-22
# 機械はロバスト、プライベート、効率的に学習できるのか? Can Machines Learn Robustly, Privately, and Efficiently? ( http://arxiv.org/abs/2312.14712v1 ) ライセンス: Link先を確認	Youssef Allouah, Rachid Guerraoui, and John Stephan	(参考訳) 機械学習(ML)アプリケーションの成功は、膨大なデータセットと分散アーキテクチャに依存し、成長するにつれて、MLの課題が提示される。データがセンシティブな情報を含む実世界のシナリオでは、データ中毒やハードウェア障害といった問題が一般的である。プライバシと堅牢性の確保は、公共生活におけるMLの普及に不可欠である。本稿では,分散アーキテクチャにおけるこれらの目的達成に伴うコストについて検討する。分散MLにおけるプライバシとロバスト性の意味を概説し、それらを分離して効率的に達成する方法を明らかにする。しかし、これらの目的の統合は計算効率において顕著な妥協をもたらすと我々は主張する。この複雑なバランスを掘り下げて、MLアプリケーションにおけるプライバシ、堅牢性、計算効率の課題と解決策を探求します。 The success of machine learning (ML) applications relies on vast datasets and distributed architectures, which, as they grow, present challenges for ML. In real-world scenarios, where data often contains sensitive information, issues like data poisoning and hardware failures are common. Ensuring privacy and robustness is vital for the broad adoption of ML in public life. This paper examines the costs associated with achieving these objectives in distributed architectures. We overview the meanings of privacy and robustness in distributed ML, and clarify how they can be achieved efficiently in isolation. However, we contend that the integration of these objectives entails a notable compromise in computational efficiency. We delve into this intricate balance, exploring the challenges and solutions for privacy, robustness, and computational efficiency in ML applications.	翻訳日:2023-12-25 14:53:40 公開日:2023-12-22
# 極性認知デノジングを用いた感覚伝達におけるスタイルコンテンツトレードオフのバランス Balancing the Style-Content Trade-Off in Sentiment Transfer Using Polarity-Aware Denoising ( http://arxiv.org/abs/2312.14708v1 ) ライセンス: Link先を確認	Sourabrata Mukherjee, Zden\v{e}k Kasner, Ond\v{r}ej Du\v{s}ek	(参考訳) テキストの感情伝達は、感情に依存しないコンテンツを保持しながら、文章の感情の極性を反転させることを目的としている。現在のモデルでは感情の変化は良好であるが, 翻訳文のコンテンツ保存は不十分である。本稿では,生成されたテキストの感情属性を正確に制御し,コンテンツの保存とスタイル・コンテンツのトレードオフのバランスを図る,極性認識に基づく感情伝達モデルを提案する。提案手法は,共有エンコーダを用いた表現学習と感情特異的デコーダを用いた感情制御生成の2つの段階からなる。実験結果から,本手法はコンテンツ保存の面では最先端ベースラインを上回っており,スタイル転送精度とフラレンシーの面では競争力を維持していることが示された。 Text sentiment transfer aims to flip the sentiment polarity of a sentence (positive to negative or vice versa) while preserving its sentiment-independent content. Although current models show good results at changing the sentiment, content preservation in transferred sentences is insufficient. In this paper, we present a sentiment transfer model based on polarity-aware denoising, which accurately controls the sentiment attributes in generated text, preserving the content to a great extent and helping to balance the style-content trade-off. Our proposed model is structured around two key stages in the sentiment transfer process: better representation learning using a shared encoder and sentiment-controlled generation using separate sentiment-specific decoders. Empirical results show that our methods outperforms state-of-the-art baselines in terms of content preservation while staying competitive in terms of style transfer accuracy and fluency.	翻訳日:2023-12-25 14:53:25 公開日:2023-12-22
# bonnbeetclouds3d: 実地条件下でのサトウキビ植物のポイントクラウドに基づくオルガンレベル表現型化に向けたデータセット BonnBeetClouds3D: A Dataset Towards Point Cloud-based Organ-level Phenotyping of Sugar Beet Plants under Field Conditions ( http://arxiv.org/abs/2312.14706v1 ) ライセンス: Link先を確認	Elias Marks, Jonas B\"omer, Federico Magistri, Anurag Sah, Jens Behley, Cyrill Stachniss	(参考訳) 農業生産は今後数十年間、気候変動と持続可能性の必要性によって深刻な課題に直面しており、環境への影響を減らしている。自律型無人航空機(uavs)による作物の監視と、新鮮でレジリエントな作物品種の育成を組み合わせることで、ロボットによる非化学除草によるフィールドマネジメントの進歩は、これらの課題に対処するのに役立つ。表現型化と呼ばれる植物形質の分析は、植物の育種に不可欠な活動であるが、大量の手作業が伴う。本稿では,精密表現に必要とされる臓器の微細な形状解析の課題に対処する。この領域における実世界のデータの可利用性は比較的低いため、48種の植物種を含む実育種試験の高精細度画像をuavで取得し、形態学的および外観の多様性を網羅する新しいデータセットを提案する。これにより、異なる多様体にうまく一般化する自律表現型へのアプローチの開発が可能になる。複数視点からの高分解能画像の重ね合わせに基づいて,photogrammetric dense point clouds を計算し,先端および基部として植物,葉,塩分点の詳細な高精度な点ラベルを提供する。さらに,ドイツ連邦植物多様性局の専門家による実生植物における表現型形質の測定を行い,セグメンテーションやキーポイント検出だけでなく,下流のタスクにも新たなアプローチの評価が可能となった。提供されたラベル付きポイントクラウドは、細粒度植物分析を可能にし、自動表現型化アプローチの開発のさらなる進展を支援するとともに、表面再構成、ポイントクラウド完成、ポイントクラウドの意味解釈に関するさらなる研究を可能にする。 Agricultural production is facing severe challenges in the next decades induced by climate change and the need for sustainability, reducing its impact on the environment. Advancements in field management through non-chemical weeding by robots in combination with monitoring of crops by autonomous unmanned aerial vehicles (UAVs) and breeding of novel and more resilient crop varieties are helpful to address these challenges. The analysis of plant traits, called phenotyping, is an essential activity in plant breeding, it however involves a great amount of manual labor. With this paper, we address the problem of automatic fine-grained organ-level geometric analysis needed for precision phenotyping. As the availability of real-world data in this domain is relatively scarce, we propose a novel dataset that was acquired using UAVs capturing high-resolution images of a real breeding trial containing 48 plant varieties and therefore covering great morphological and appearance diversity. This enables the development of approaches for autonomous phenotyping that generalize well to different varieties. Based on overlapping high-resolution images from multiple viewing angles, we compute photogrammetric dense point clouds and provide detailed and accurate point-wise labels for plants, leaves, and salient points as the tip and the base. Additionally, we include measurements of phenotypic traits performed by experts from the German Federal Plant Variety Office on the real plants, allowing the evaluation of new approaches not only on segmentation and keypoint detection but also directly on the downstream tasks. The provided labeled point clouds enable fine-grained plant analysis and support further progress in the development of automatic phenotyping approaches, but also enable further research in surface reconstruction, point cloud completion, and semantic interpretation of point clouds.	翻訳日:2023-12-25 14:53:10 公開日:2023-12-22
# SCUNet++:Swin-UNetとCNN Bottleneckハイブリッドアーキテクチャを併用した肺塞栓CT画像分割の評価 SCUNet++: Assessment of Pulmonary Embolism CT Image Segmentation Leveraging Swin-UNet and CNN Bottleneck Hybrid Architecture with Multi-Fusion Dense Skip Connection ( http://arxiv.org/abs/2312.14705v1 ) ライセンス: Link先を確認	Yifei Chen, Binfeng Zou, Zhaoxin Guo, Yiyu Huang, Yifan Huang, Feiwei Qin, Qinhai Li, Changmiao Wang	(参考訳) 肺塞栓症 (PE) は右室肥大と重症症例の不全につながる肺疾患であり, 重症度は心筋梗塞と突然死のみに次いで2位である。肺動脈CT血管造影(CTPA)は,PEの診断法として広く用いられている。しかし,PE検出は画像技術の限界により臨床実践の課題を呈する。 CTPAはPEに似たノイズを発生させ、その存在が時間を要することを確認し、過剰な診断をしがちである。しかし,従来のPEのセグメンテーション法では,PECT画像の特徴の階層構造,局所的および大域的空間的特徴を十分に考慮できない。本稿では,SCUNet++ (Swin Conv UNet++) と呼ばれる自動PEセグメンテーション手法を提案する。この方法は、エンコーダとデコーダの間の複数の融合密なスキップ接続を内蔵し、スウィントランスをエンコーダとして利用する。そして、デコーダサブネットワークの様々なスケールの特徴を融合させ、スウィン・ユントや他の最先端の手法における必然的なダウンサンプリングによる空間的情報損失を補償し、上記の問題を解決する。本稿では,この手法の理論的解析を行い,FUMPEおよびCAD-PEで公開されているPECT画像データセット上で検証する。実験の結果,提案手法はFUMPEデータセットではDice類似係数83.47%,Hausdorff距離95.%ile(HD95)3.83,CAD-PEデータセットではDSC83.42%,HD955.10を達成できた。これらの結果から,本手法はPEセグメンテーションタスクにおいて高い性能を示し,PEの自動セグメンテーションの精度を高め,臨床医に強力な診断ツールを提供する可能性が示唆された。我々のソースコードと新しいFUMPEデータセットはhttps://github.com/JustlfC03/SCUNet-plusplus.comで入手できる。 Pulmonary embolism (PE) is a prevalent lung disease that can lead to right ventricular hypertrophy and failure in severe cases, ranking second in severity only to myocardial infarction and sudden death. Pulmonary artery CT angiography (CTPA) is a widely used diagnostic method for PE. However, PE detection presents challenges in clinical practice due to limitations in imaging technology. CTPA can produce noises similar to PE, making confirmation of its presence time-consuming and prone to overdiagnosis. Nevertheless, the traditional segmentation method of PE can not fully consider the hierarchical structure of features, local and global spatial features of PE CT images. In this paper, we propose an automatic PE segmentation method called SCUNet++ (Swin Conv UNet++). This method incorporates multiple fusion dense skip connections between the encoder and decoder, utilizing the Swin Transformer as the encoder. And fuses features of different scales in the decoder subnetwork to compensate for spatial information loss caused by the inevitable downsampling in Swin-UNet or other state-of-the-art methods, effectively solving the above problem. We provide a theoretical analysis of this method in detail and validate it on publicly available PE CT image datasets FUMPE and CAD-PE. The experimental results indicate that our proposed method achieved a Dice similarity coefficient (DSC) of 83.47% and a Hausdorff distance 95th percentile (HD95) of 3.83 on the FUMPE dataset, as well as a DSC of 83.42% and an HD95 of 5.10 on the CAD-PE dataset. These findings demonstrate that our method exhibits strong performance in PE segmentation tasks, potentially enhancing the accuracy of automatic segmentation of PE and providing a powerful diagnostic tool for clinical physicians. Our source code and new FUMPE dataset are available at https://github.com/JustlfC03/SCUNet-plusplus.	翻訳日:2023-12-25 14:52:38 公開日:2023-12-22
# 高精度SDEモデリングのための時間変化正規化フロー Time-changed normalizing flows for accurate SDE modeling ( http://arxiv.org/abs/2312.14698v1 ) ライセンス: Link先を確認	Naoufal El Bekri and Lucas Drumetz and Franck Vermet	(参考訳) 生成パラダイムは、機械学習とディープラーニングモデルにおいてますます重要になっている。一般的な生成モデルには正規化フローがあり、これは微分同相変換を通じて基底分布を変換することで正確な精度推定を可能にする。時間分解フローを扱うための正規化フローフレームワークの拡張は、時系列、確率過程、神経確率微分方程式(sdes)をモデル化する強力なツールである動的正規化フローをもたらした。本研究では,ガウス過程の多種多様な族を構成するブラウン運動の時間的変形に基づく,時間変化正規化流れ(tcnf)の新たな変種を提案する。このアプローチにより、よく知られたOrnstein-Uhlenbeckプロセスなど、他の方法ではモデル化できないいくつかのSDEを効果的にモデル化し、事前の方法論を一般化し、結果の改善と推論と予測能力の向上につながる。 The generative paradigm has become increasingly important in machine learning and deep learning models. Among popular generative models are normalizing flows, which enable exact likelihood estimation by transforming a base distribution through diffeomorphic transformations. Extending the normalizing flow framework to handle time-indexed flows gave dynamic normalizing flows, a powerful tool to model time series, stochastic processes, and neural stochastic differential equations (SDEs). In this work, we propose a novel variant of dynamic normalizing flows, a Time Changed Normalizing Flow (TCNF), based on time deformation of a Brownian motion which constitutes a versatile and extensive family of Gaussian processes. This approach enables us to effectively model some SDEs, that cannot be modeled otherwise, including standard ones such as the well-known Ornstein-Uhlenbeck process, and generalizes prior methodologies, leading to improved results and better inference and prediction capability.	翻訳日:2023-12-25 14:52:00 公開日:2023-12-22
# Pola4All:偏光解析のための偏光応用とオープンソースツールキットの調査 Pola4All: survey of polarimetric applications and an open-source toolkit to analyze polarization ( http://arxiv.org/abs/2312.14697v1 ) ライセンス: Link先を確認	Joaquin Rodriguez, Lew-Fock-Chong Lew-Yan-Voon, Renato Martins, Olivier Morel	(参考訳) 光の偏光情報は、物体の素材の種類、ポーズ、形状など、コンピュータビジョンやシーン理解タスクのための豊富な手がかりを提供することができる。新しい安価な偏光センサーの出現に伴い、この画像モダリティは、ポーズ推定、3D再構成、水中ナビゲーション、深度推定といった問題を解決するために、広く一般に利用されるようになった。しかし、この感性モダリティの使用に関するいくつかの制限や、偏光画像を分析するための標準や公開ツールの欠如について観察する。さらに、偏光カメラメーカーは通常、カメラと通信するための取得ツールを提供しているが、偏光情報を利用する処理アルゴリズムはめったにない。本稿では、偏光イメージングを含む最近の応用の進歩を概観し、視覚の偏光に関する最近の進歩とロボットの知覚タスクに関する包括的調査を含む。また、既存のマイクログリッド偏光カメラのほとんどからの情報と通信し、処理するための共通標準を提供する、完全なソフトウェアツールキットも紹介する。このツールキットは、このモダリティのためにいくつかの画像処理アルゴリズムを実装しており、githubで公開されている。 Polarization information of the light can provide rich cues for computer vision and scene understanding tasks, such as the type of material, pose, and shape of the objects. With the advent of new and cheap polarimetric sensors, this imaging modality is becoming accessible to a wider public for solving problems such as pose estimation, 3D reconstruction, underwater navigation, and depth estimation. However, we observe several limitations regarding the usage of this sensorial modality, as well as a lack of standards and publicly available tools to analyze polarization images. Furthermore, although polarization camera manufacturers usually provide acquisition tools to interface with their cameras, they rarely include processing algorithms that make use of the polarization information. In this paper, we review recent advances in applications that involve polarization imaging, including a comprehensive survey of recent advances on polarization for vision and robotics perception tasks. We also introduce a complete software toolkit that provides common standards to communicate with and process information from most of the existing micro-grid polarization cameras on the market. The toolkit also implements several image processing algorithms for this modality, and it is publicly available on GitHub: https://github.com/vibot-lab/Pola4all_JEI_2023.	翻訳日:2023-12-25 14:51:44 公開日:2023-12-22
# オペレーター学習への数学的ガイド A Mathematical Guide to Operator Learning ( http://arxiv.org/abs/2312.14688v1 ) ライセンス: Link先を確認	Nicolas Boull\'e and Alex Townsend	(参考訳) 演算子学習は、基礎となる力学系や偏微分方程式(PDE)の性質をデータから発見することを目的としている。ここでは、演算子学習のステップバイステップガイドを示す。演算子学習に適した問題の種類とPDEを説明し、様々なニューラルネットワークアーキテクチャについて議論し、数値PDEソルバを効果的に活用する方法を説明する。また、トレーニングデータの作成と管理、最適化の実施方法についてアドバイスします。数値線形代数の視点から動機づけることで,演算子学習における様々なニューラルネットワークアーキテクチャの背景にある直感を提供する。 Operator learning aims to discover properties of an underlying dynamical system or partial differential equation (PDE) from data. Here, we present a step-by-step guide to operator learning. We explain the types of problems and PDEs amenable to operator learning, discuss various neural network architectures, and explain how to employ numerical PDE solvers effectively. We also give advice on how to create and manage training data and conduct optimization. We offer intuition behind the various neural network architectures employed in operator learning by motivating them from the point-of-view of numerical linear algebra.	翻訳日:2023-12-25 14:51:23 公開日:2023-12-22
# カーネルの不均一性は自然画像表現のスパース性を改善する Kernel Heterogeneity Improves Sparseness of Natural Images Representations ( http://arxiv.org/abs/2312.14685v1 ) ライセンス: Link先を確認	Hugo J. Ladret, Christian Casanova, Laurent Udo Perrinet	(参考訳) 生物学的ニューラルネットワークと人工ニューラルネットワークの両方が本質的にその性能と運用コストのバランスをとり、計算能力のバランスをとる。通常、効率的なニューロモルフィックニューラルネットワークは、入力の冗長性と次元性を減少させる表現を学ぶものである。これは例えば、スパースコーディングで達成され、自然画像から派生したスパース表現は、入力特徴のサンプリングとそれらの特徴の分散の両方において、異質な表現をもたらす。そこで本研究では,自然画像の構造,特に指向性特徴と対応するスパース符号の関連性を検討した。その結果,複数レベルの分散に散在する入力特徴の表現により,スパースコードのスパース性やレジリエンスが大幅に向上し,復元性能が向上した。これはモデル入力の構造を反響させ、自然画像の不均質なアレエータ構造を考慮できる。自然画像からの学習核は近似表現と密度表現のバランスをとることによって異種性を生み出し、すべての再構成指標を改善する。畳み込みスパース符号化アルゴリズムで用いられるカーネルの不均質性のパラメータ制御を用いて、不均質性がスパース性を強調し、均質性が表現の粒度を改善することを示した。より広い文脈では、これらの符号化戦略は深層畳み込みニューラルネットワークへの入力として機能する。このような分散符号化されたスパース画像データセットは計算効率を向上し、自然的および変動的な入力構造を利用するカーネルの不均一性の利点を強調し、ニューロモルフィックハードウェアのスループットを向上させることができる。 Both biological and artificial neural networks inherently balance their performance with their operational cost, which balances their computational abilities. Typically, an efficient neuromorphic neural network is one that learns representations that reduce the redundancies and dimensionality of its input. This is for instance achieved in sparse coding, and sparse representations derived from natural images yield representations that are heterogeneous, both in their sampling of input features and in the variance of those features. Here, we investigated the connection between natural images' structure, particularly oriented features, and their corresponding sparse codes. We showed that representations of input features scattered across multiple levels of variance substantially improve the sparseness and resilience of sparse codes, at the cost of reconstruction performance. This echoes the structure of the model's input, allowing to account for the heterogeneously aleatoric structures of natural images. We demonstrate that learning kernel from natural images produces heterogeneity by balancing between approximate and dense representations, which improves all reconstruction metrics. Using a parametrized control of the kernels' heterogeneity used by a convolutional sparse coding algorithm, we show that heterogeneity emphasizes sparseness, while homogeneity improves representation granularity. In a broader context, these encoding strategy can serve as inputs to deep convolutional neural networks. We prove that such variance-encoded sparse image datasets enhance computational efficiency, emphasizing the benefits of kernel heterogeneity to leverage naturalistic and variant input structures and possible applications to improve the throughput of neuromorphic hardware.	翻訳日:2023-12-25 14:51:13 公開日:2023-12-22
# 工学的正規微分方程式を分類アルゴリズム(EODECA):徹底的な特徴付けと試験 Engineered Ordinary Differential Equations as Classification Algorithm (EODECA): thorough characterization and testing ( http://arxiv.org/abs/2312.14681v1 ) ライセンス: Link先を確認	Raffaele Marino, Lorenzo Buffoni, Lorenzo Chicchi, Lorenzo Giambagli, Duccio Fanelli	(参考訳) EODECA (Engineered Ordinary Differential Equations as Classification Algorithm) は、機械学習と動的システム理論の共通部分における新しいアプローチであり、分類タスクのためのユニークなフレームワークである[1]。本手法は, 通常の微分方程式 (odes) を用いて, 複雑な分類課題を効率的に処理する力学系構造を特徴とする。論文は、EODECAの動的特性を考察し、ランダムな摂動に対するレジリエンスと、さまざまな分類シナリオにおける堅牢なパフォーマンスを強調した。特に、EODECAの設計には、安定したアトラクタをフェーズ空間に埋め込む機能があり、信頼性を高め、可逆的なダイナミクスを可能にする。本稿では,作業 [1] を拡張し,euler の離散化スキームを用いて包括的解析を行う。特に,EODECAの性能を5つの異なる分類問題で評価し,適応性と効率性を検討した。さらに, mnist と fashion mnist データセットに対する eodeca の有効性を実証し, それぞれ 98.06 %$ と 88.21 %$ という印象的な精度を示した。これらの結果は多層パーセプトロン(MLP)に匹敵するものであり、複雑なデータ処理タスクにおけるEODECAの可能性を示している。我々は、モデルの学習の旅をさらに探求し、前と後の両方のトレーニング環境における進化を評価し、安定した誘引者に向かう能力を強調します。また,eodecaの可逆性についても検討し,意思決定過程と内部作業に光を当てた。本稿では、機械学習アルゴリズムと動的システム方法論のギャップを埋め、より透明で堅牢な機械学習パラダイムに向けた重要なステップを示す。 EODECA (Engineered Ordinary Differential Equations as Classification Algorithm) is a novel approach at the intersection of machine learning and dynamical systems theory, presenting a unique framework for classification tasks [1]. This method stands out with its dynamical system structure, utilizing ordinary differential equations (ODEs) to efficiently handle complex classification challenges. The paper delves into EODECA's dynamical properties, emphasizing its resilience against random perturbations and robust performance across various classification scenarios. Notably, EODECA's design incorporates the ability to embed stable attractors in the phase space, enhancing reliability and allowing for reversible dynamics. In this paper, we carry out a comprehensive analysis by expanding on the work [1], and employing a Euler discretization scheme. In particular, we evaluate EODECA's performance across five distinct classification problems, examining its adaptability and efficiency. Significantly, we demonstrate EODECA's effectiveness on the MNIST and Fashion MNIST datasets, achieving impressive accuracies of $98.06\%$ and $88.21\%$, respectively. These results are comparable to those of a multi-layer perceptron (MLP), underscoring EODECA's potential in complex data processing tasks. We further explore the model's learning journey, assessing its evolution in both pre and post training environments and highlighting its ability to navigate towards stable attractors. The study also investigates the invertibility of EODECA, shedding light on its decision-making processes and internal workings. This paper presents a significant step towards a more transparent and robust machine learning paradigm, bridging the gap between machine learning algorithms and dynamical systems methodologies.	翻訳日:2023-12-25 14:50:44 公開日:2023-12-22
# 複合パイプラインにおける進化的自動機械学習と構造感度解析の統合 Integration Of Evolutionary Automated Machine Learning With Structural Sensitivity Analysis For Composite Pipelines ( http://arxiv.org/abs/2312.14770v1 ) ライセンス: Link先を確認	Nikolay O. Nikitin, Maiia Pinchuk, Valerii Pokrovskii, Peter Shevchenko, Andrey Getmanov, Yaroslav Aksenkin, Ilia Revin, Andrey Stebenkov, Ekaterina Poslavskaya, Anna V. Kalyuzhnaya	(参考訳) 自動機械学習(AutoML)システムは、所定の機械学習問題に対するエンドツーエンドソリューションを提案し、固定パイプラインか柔軟なパイプラインを生成する。固定パイプラインはタスクに依存しない構造であり、その一般的な構成はデータに関係なく同じである。対照的に、柔軟なパイプラインの構造は入力によって異なり、個々のタスクに適切に調整される。しかし、柔軟なパイプラインは構造的に過度に複雑になり、説明性に乏しい。本稿では,フレキシブルな解のロバスト性と解釈性を高める感度解析を取り入れ,フレキシブルなパイプラインの負の点を補償するevosa手法を提案する。 EVOSAは、パイプライングラフ上のエッジやノードの正および負の影響を定量的に推定し、この情報を進化的AutoMLオプティマイザに供給する。 evosaの正しさと効率性は表式,マルチモーダル,コンピュータビジョンのタスクで検証され,提案手法の一般化が示唆された。 Automated machine learning (AutoML) systems propose an end-to-end solution to a given machine learning problem, creating either fixed or flexible pipelines. Fixed pipelines are task independent constructs: their general composition remains the same, regardless of the data. In contrast, the structure of flexible pipelines varies depending on the input, making them finely tailored to individual tasks. However, flexible pipelines can be structurally overcomplicated and have poor explainability. We propose the EVOSA approach that compensates for the negative points of flexible pipelines by incorporating a sensitivity analysis which increases the robustness and interpretability of the flexible solutions. EVOSA quantitatively estimates positive and negative impact of an edge or a node on a pipeline graph, and feeds this information to the evolutionary AutoML optimizer. The correctness and efficiency of EVOSA was validated in tabular, multimodal and computer vision tasks, suggesting generalizability of the proposed approach across domains.	翻訳日:2023-12-25 14:43:08 公開日:2023-12-22
# Large Language Model (LLM) Bias Index -- LLMBI Large Language Model (LLM) Bias Index -- LLMBI ( http://arxiv.org/abs/2312.14769v1 ) ライセンス: Link先を確認	Abiodun Finbarrs Oketunji, Muhammad Anas, Deepthi Saina	(参考訳) LLMBI(Large Language Model Bias Index)は、GPT-4のような大規模言語モデル(LLM)に固有のバイアスを定量化し、対処するための先駆的なアプローチである。多様な分野におけるLSMの普及と影響を認識している。本研究は,モデル応答を誘発する可能性のあるバイアスを系統的に測定し緩和する新しい計量 LLMBI を導入する。年齢,性別,人種的偏見に限らず,多次元の偏見を取り入れた複合スコアリングシステムを用いたLSMBIの定式化を行った。このメトリクスを運用するには, LLM応答の収集と注釈付け, バイアス検出のための洗練された自然言語処理(NLP)技術の適用, 特殊な数学的公式による LLMBI スコアの計算を含む多段階的なプロセスに携わる。この公式は、様々なバイアス次元の重み付け平均値、データセットの多様性の欠陥に対するペナルティ、感情バイアスに対する補正を統合する。 OpenAIのAPIからの応答を用いた実証分析では,バイアス検出の代表的な方法として,高度な感情分析を採用している。この研究は、LLMがテキスト生成において印象的な能力を示す一方で、異なる次元にまたがる様々なバイアスを示すことを明らかにしている。 LLMBIは、モデルと時間とともにバイアスを比較するための定量尺度を提供し、LLMの公平性と信頼性を高める上で、システムエンジニア、研究者、規制当局にとって重要なツールを提供する。偏見のない人間のような反応を模倣するLLMの可能性を強調している。さらに、社会規範や倫理基準の進化に合わせて、そのようなモデルを継続的に監視し、再検討する必要性を強調している。 The Large Language Model Bias Index (LLMBI) is a pioneering approach designed to quantify and address biases inherent in large language models (LLMs), such as GPT-4. We recognise the increasing prevalence and impact of LLMs across diverse sectors. This research introduces a novel metric, LLMBI, to systematically measure and mitigate biases potentially skewing model responses. We formulated LLMBI using a composite scoring system incorporating multiple dimensions of bias, including but not limited to age, gender, and racial biases. To operationalise this metric, we engaged in a multi-step process involving collecting and annotating LLM responses, applying sophisticated Natural Language Processing (NLP) techniques for bias detection, and computing the LLMBI score through a specially crafted mathematical formula. The formula integrates weighted averages of various bias dimensions, a penalty for dataset diversity deficiencies, and a correction for sentiment biases. Our empirical analysis, conducted using responses from OpenAI's API, employs advanced sentiment analysis as a representative method for bias detection. The research reveals LLMs, whilst demonstrating impressive capabilities in text generation, exhibit varying degrees of bias across different dimensions. LLMBI provides a quantifiable measure to compare biases across models and over time, offering a vital tool for systems engineers, researchers and regulators in enhancing the fairness and reliability of LLMs. It highlights the potential of LLMs in mimicking unbiased human-like responses. Additionally, it underscores the necessity of continuously monitoring and recalibrating such models to align with evolving societal norms and ethical standards.	翻訳日:2023-12-25 14:42:51 公開日:2023-12-22
# 量子ビオレント緩和条件について On the Conditions for a Quantum Violent Relaxation ( http://arxiv.org/abs/2312.14768v1 ) ライセンス: Link先を確認	Giachetti Guido and Defenu Nicol\`o	(参考訳) 一般に、古典的な完全連結系は激しい緩和を受けることが知られている。この現象は、熱力学的限界における平均場効果に支配されているにもかかわらず、観測可能な値を有限時間スケールで定常な非熱的値に緩和することを指す。ここでは,熱力学的極限における2体,全対一の相互作用を持つ一般多体系の動力学を解析し,平均場有効ハミルトニアンのスペクトル上で非常に特異的な条件下での暴力的緩和を行うためには,これらの条件がほとんど満たされず,古典的条件に対して「量子」暴力的緩和がほとんど観測されないことを示す。我々の予測はスピンモデルの研究によって検証され、カップリングの値によって、暴力的関係と一般的な熱前相の間の遷移を示す。また, 量子ハミルトニアン-平均場模型のスピンバージョンを解析し, 暴力的相関を示さないことを示した。最後に,暴力的相対図を古典的限界に戻す方法について論じる。その結果、平均場状態においても量子効果がダイナミクスにかなり劇的な影響を与え、光と物質が結合した系の理解を深める方法が示されている。 In general, classical fully-connected systems are known to undergo violent relaxation. This phenomenon refers to the relaxation of observables to stationary, non-thermal, values on a finite timescale, despite their long-time dynamics being dominated by mean-field effects in the thermodynamic limit. Here, we analyze the ``quantum" violent relaxation by studying the dynamics of generic many-body systems with two-body, all-to-all, interactions in the thermodynamic limit. We show that, in order for violent relaxation to occur very specific conditions on the spectrum of the mean-field effective Hamiltonian have to be met. These conditions are hardly met and ``quantum" violent relaxation is observed rarely with respect to its classical counterpart. Our predictions are validated by the study of a spin model which, depending on the value of the coupling, shows a transition between violent-relaxation and a generic prethermal phase. We also analyze a spin version of the quantum Hamiltonian-Mean-Field model, which is shown not to exhibit violent-relaxation. Finally, we discuss how the violent-relaxation picture emerges back in the classical limit. Our results demonstrate how, even in the mean-field regime, quantum effects have a rather dramatic impact on the dynamics, paving the way to a better understanding of light-matter coupled systems.	翻訳日:2023-12-25 14:42:22 公開日:2023-12-22
# 拡張された潜在マルチビューサブスペースクラスタリング Enhanced Latent Multi-view Subspace Clustering ( http://arxiv.org/abs/2312.14763v1 ) ライセンス: Link先を確認	Long Shi, Lei Cao, Jun Wang, Badong Chen	(参考訳) 潜在マルチビューサブスペースクラスタリングは、望ましいクラスタリング性能を持つことが示されている。しかし、元の潜在表現法は、データ行列を複数のビューから次元方向に沿って単一の行列に垂直に結合し、潜在表現行列を復元し、不完全な情報回復をもたらす可能性がある。本稿では,潜在空間表現を完全に回復するために,拡張潜在多視点サブスペースクラスタリング(ELMSC)法を提案する。 elmsc法は、マルチビューデータの表現を強化する拡張データマトリックスを構築することを含む。具体的には、様々なビューから拡張マトリックスのブロック対角位置へデータ行列を積み重ねて補完情報を利用する。一方、非ブロック対角エントリは、異なるビュー間の類似性に基づいて構成され、一貫した情報をキャプチャする。さらに,拡張自己表現行列の非対角ブロックに対するスパース正規化を適用し,一貫性情報の冗長な計算を回避する。最後に,ALMM(Alternating Direction Method of Multipliers)の枠組みに基づく新しい反復アルゴリズムを開発し,EMMSCの最適化問題を解く。実世界のデータセットに関する広範囲な実験により,提案するelmscが,最先端のマルチビュークラスタリング手法よりも高いクラスタリング性能を実現することを実証した。 Latent multi-view subspace clustering has been demonstrated to have desirable clustering performance. However, the original latent representation method vertically concatenates the data matrices from multiple views into a single matrix along the direction of dimensionality to recover the latent representation matrix, which may result in an incomplete information recovery. To fully recover the latent space representation, we in this paper propose an Enhanced Latent Multi-view Subspace Clustering (ELMSC) method. The ELMSC method involves constructing an augmented data matrix that enhances the representation of multi-view data. Specifically, we stack the data matrices from various views into the block-diagonal locations of the augmented matrix to exploit the complementary information. Meanwhile, the non-block-diagonal entries are composed based on the similarity between different views to capture the consistent information. In addition, we enforce a sparse regularization for the non-diagonal blocks of the augmented self-representation matrix to avoid redundant calculations of consistency information. Finally, a novel iterative algorithm based on the framework of Alternating Direction Method of Multipliers (ADMM) is developed to solve the optimization problem for ELMSC. Extensive experiments on real-world datasets demonstrate that our proposed ELMSC is able to achieve higher clustering performance than some state-of-art multi-view clustering methods.	翻訳日:2023-12-25 14:41:58 公開日:2023-12-22
# 自己閉量子軌道からの幾何相に対するアクションフォーマリズム Action formalism for geometric phases from self-closing quantum trajectories ( http://arxiv.org/abs/2312.14760v1 ) ライセンス: Link先を確認	Dominic Shea and Alessandro Romito	(参考訳) 測定を受けると、量子系は確率的量子軌道に沿って進化し、最終的な射影計測においてポスト選択によって観測可能な幾何学的位相を自然に備えることができる。軌道を後選択して閉ループを形成すると、幾何相は測定強度によって駆動される位相遷移を行う。本稿では,単一量子ビット系の連続ガウス測度によって誘導される自閉軌跡の部分集合の幾何学的位相について検討する。動作法を用いて稀な自閉事象を解析できる確率経路積分を用いて定式化を開発し,測定誘起幾何位相を組み込む。測定強度パラメータの関数として,最も可能性の高い軌道の幾何位相が自己閉軌道の位相遷移を行うことを示す。さらに、最も可能性の高い自己閉軌道近傍におけるガウス補正は、全量子軌道の数値シミュレーションの結果と一致して、遷移点を定量的に変化させる。 When subject to measurements, quantum systems evolve along stochastic quantum trajectories that can be naturally equipped with a geometric phase observable via a post-selection in a final projective measurement. When post-selecting the trajectories to form a close loop, the geometric phase undergoes a topological transition driven by the measurement strength. Here, we study the geometric phase of a subset of self-closing trajectories induced by a continuous Gaussian measurement of a single qubit system. We utilize a stochastic path integral that enables the analysis of rare self-closing events using action methods and develop the formalism to incorporate the measurement-induced geometric phase therein. We show that the geometric phase of the most likely trajectories undergoes a topological transition for self-closing trajectories as a function of the measurement strength parameter. Moreover, the inclusion of Gaussian corrections in the vicinity of the most probable self-closing trajectory quantitatively changes the transition point in agreement with results from numerical simulations of the full set of quantum trajectories.	翻訳日:2023-12-25 14:41:37 公開日:2023-12-22
# グラフ学習における信号フィルタリングのための拡散マップ Diffusion Maps for Signal Filtering in Graph Learning ( http://arxiv.org/abs/2312.14758v1 ) ライセンス: Link先を確認	Todd Hildebrant	(参考訳) 本稿では,グラフ信号の基底構造を理解するために,グラフシフト演算子としての拡散マップを提案する。本研究は,マルコフ変動最小化問題に対する拡散マップ生成フィルタを用いたグラフ学習の改善を評価する。本稿では,合成温度センサデータと実世界の温度センサデータを用いた実例を通して,本手法の有効性を示す。これらの例は、拡散マップグラフ信号モデルと他のよく使われるグラフ信号演算子を比較する。その結果、複雑な非ユークリッドデータ構造の分析と理解に新たなアプローチが得られた。 This paper explores the application diffusion maps as graph shift operators in understanding the underlying geometry of graph signals. The study evaluates the improvements in graph learning when using diffusion map generated filters to the Markov Variation minimization problem. The paper showcases the effectiveness of this approach through examples involving synthetically generated and real-world temperature sensor data. These examples also compare the diffusion map graph signal model with other commonly used graph signal operators. The results provide new approaches for the analysis and understanding of complex, non-Euclidean data structures.	翻訳日:2023-12-25 14:41:22 公開日:2023-12-22
# ジョセフソン接合における散逸性量子相転移の欠如:理論 Absence of a dissipative quantum phase transition in Josephson junctions: Theory ( http://arxiv.org/abs/2312.14754v1 ) ライセンス: Link先を確認	Carles Altimiras, Daniel Esteve, \c{C}a\u{g}lar Girit, H\'el\`ene le Sueur, Philippe Joyez	(参考訳) 強誘電体ジョセフソン接合(RSJ)の縮小密度行列を,ファインマン・ヴァーノン関数に基づく正確な数値スキームである確率的リウヴィル方程式法を用いて求める。すべてのパラメータを見てみると、同じ不飽和ジャンクションよりも超伝導が強いことが分かる。シュミドの超伝導絶縁量子相転移の痕跡は、長い間RSJで起こっていると信じられていた。この研究は、実験的な観測に基づいて、ムラニらによって2020年に発表された同様の結論を理論的に裏付けている。従来の研究における絶縁接合の予測は、紫外線遮断のないオーミック環境を考慮していたことが判明した。 We obtain the reduced density matrix of a resistively shunted Josephson junction (RSJ), using the stochastic Liouville equation method in imaginary time - an exact numerical scheme based on the Feynman-Vernon influence functional. For all parameters looked at, we find a shunted junction is more superconducting than the same unshunted junction. We find no trace of Schmid's superconducting-insulating quantum phase transition long believed to occur in the RSJ. This work confirms theoretically a similar conclusion drawn in 2020 by Murani et al., based on experimental observations. We reveal that predictions of an insulating junction in previous works were due to considering Ohmic environments with no UV cutoff.	翻訳日:2023-12-25 14:41:16 公開日:2023-12-22
# ダウンロードファウンデーションモデルのアクセシブルな微調整の危険性 Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models ( http://arxiv.org/abs/2312.14751v1 ) ライセンス: Link先を確認	Alan Chan, Ben Bucknall, Herbie Bradley, David Krueger	(参考訳) プリトレーニングされた基礎モデルの重み付けの公開リリースは、ダウンロード可能なアクセス \citep{solaiman_gradient_2023} として知られている。私たちの研究は、ダウンロード可能なモデルの微調整がますます危険を増す可能性があると主張している。まず,微調整のアクセシビリティ向上に関する研究を強調する。 A)微調整の計算コストを削減し(B)、より多くのアクター間でそのコストを共有する能力を向上させる研究に分割した。第2に,悪質な使用を促進し,潜在的に危険であるモデルの監視を困難にすることで,よりアクセスしやすい微調整手法が危険を増大させる可能性がある。第3に,潜在的な緩和策と,よりアクセスしやすい微調整の利点について考察する。リスクに関する確実性については,対策の急激な発展の必要性を強調して結論付ける。 Public release of the weights of pretrained foundation models, otherwise known as downloadable access \citep{solaiman_gradient_2023}, enables fine-tuning without the prohibitive expense of pretraining. Our work argues that increasingly accessible fine-tuning of downloadable models may increase hazards. First, we highlight research to improve the accessibility of fine-tuning. We split our discussion into research that A) reduces the computational cost of fine-tuning and B) improves the ability to share that cost across more actors. Second, we argue that increasingly accessible fine-tuning methods may increase hazard through facilitating malicious use and making oversight of models with potentially dangerous capabilities more difficult. Third, we discuss potential mitigatory measures, as well as benefits of more accessible fine-tuning. Given substantial remaining uncertainty about hazards, we conclude by emphasizing the urgent need for the development of mitigations.	翻訳日:2023-12-25 14:41:02 公開日:2023-12-22
# 異常検出から自動ログラベリングへの進歩と先駆的根本原因解析 Progressing from Anomaly Detection to Automated Log Labeling and Pioneering Root Cause Analysis ( http://arxiv.org/abs/2312.14748v1 ) ライセンス: Link先を確認	Thorsten Wittkopp, Alexander Acker, Odej Kao	(参考訳) AIOpsの領域は、AIとMLの力でITの世界を変えつつある。ラベル付きデータに制限があるにもかかわらず、教師付きモデルは、特にディープラーニング環境でラベルを活用することの重要性を強調している。本研究は,ログ異常に対する分類法を導入し,ラベリング課題を軽減するための自動データラベリングを検討することで,この分野を強化する。さらに、多様な異常検出技術の可能性と、その特定の異常タイプとの整合について調査する。しかし、この探査は異常検出では停止しない。この研究は、根本原因分析が異常検出に続く未来を予見し、異常の根本原因を解明する。この未知の領域は、ITシステム管理に革命をもたらす大きな可能性を秘めている。本論文は, 異常検出と自動ラベル付けの理解を深め, 形質転換根本原因分析の段階を設定する。これらの進歩は、よりレジリエントなITシステムを約束し、継続的に進化する技術的状況において、運用効率とユーザ満足度を高めます。 The realm of AIOps is transforming IT landscapes with the power of AI and ML. Despite the challenge of limited labeled data, supervised models show promise, emphasizing the importance of leveraging labels for training, especially in deep learning contexts. This study enhances the field by introducing a taxonomy for log anomalies and exploring automated data labeling to mitigate labeling challenges. It goes further by investigating the potential of diverse anomaly detection techniques and their alignment with specific anomaly types. However, the exploration doesn't stop at anomaly detection. The study envisions a future where root cause analysis follows anomaly detection, unraveling the underlying triggers of anomalies. This uncharted territory holds immense potential for revolutionizing IT systems management. In essence, this paper enriches our understanding of anomaly detection, and automated labeling, and sets the stage for transformative root cause analysis. Together, these advances promise more resilient IT systems, elevating operational efficiency and user satisfaction in an ever-evolving technological landscape.	翻訳日:2023-12-25 14:40:46 公開日:2023-12-22
# 1次元弾性波シミュレーションのための量子計算概念 A quantum computing concept for 1-D elastic wave simulation ( http://arxiv.org/abs/2312.14747v1 ) ライセンス: Link先を確認	Malte Schade, Cyrill Boesch, Vaclav Hapla, Andreas Fichtner	(参考訳) 量子コンピューティングは、少なくとも一部のアプリケーションでは、従来のスーパーコンピュータでは提供できないスピードアップを約束しているため、近年かなりの注目を集めている。既存の量子コンピュータは、多くの場合、重要な問題を解決するには小さすぎるが、その将来的なドメイン科学への影響はすでに検討されている。この文脈内では、理論的な定式化と実量子コンピュータへの実装という、2つの要素を持つ異種媒体における1次元弾性波伝播の量子コンピューティングの概念を示す。この手法は有限差分近似に基づいており、続いて離散弾性波動方程式をSchr\"{o}dinger方程式に空間保存変換し、ゲートベースの量子コンピュータ上で直接シミュレートすることができる。誤差のない量子シミュレータの実装は、我々のアプローチを検証し、実量子コンピュータ IBM Brisbane 上の小さな問題による数値実験の基礎を形成する。後者は、誤りのないバージョンと定性的に一致するが、量子デコヒーレンスとノイズ効果によって汚染されるシミュレーション結果を生成する。連続バージョンによるSchr\"{o}dinger方程式への離散変換を補完することで、スペクトル要素法のような他の空間離散化スキームによる有限差分を置き換えることができる。誤差補正量子チップの出現を予測した結果,本手法と質量ばね解析の類似性から,量子コンピューティングの手法は,古典的計算機のシミュレーションよりも指数関数的に高速に動作する波動場シミュレーションに繋がる可能性が示唆された。 Quantum computing has attracted considerable attention in recent years because it promises speed-ups that conventional supercomputers cannot offer, at least for some applications. Though existing quantum computers are, in most cases, still too small to solve significant problems, their future impact on domain sciences is already being explored now. Within this context, we present a quantum computing concept for 1-D elastic wave propagation in heterogeneous media with two components: a theoretical formulation and an implementation on a real quantum computer. The method rests on a finite-difference approximation, followed by a sparsity-preserving transformation of the discrete elastic wave equation to a Schr\"{o}dinger equation, which can be simulated directly on a gate-based quantum computer. An implementation on an error-free quantum simulator verifies our approach and forms the basis of numerical experiments with small problems on the real quantum computer IBM Brisbane. The latter produce simulation results that qualitatively agree with the error-free version but are contaminated by quantum decoherence and noise effects. Complementing the discrete transformation to the Schr\"{o}dinger equation by a continuous version allows the replacement of finite differences by other spatial discretisation schemes, such as the spectral-element method. Anticipating the emergence of error-corrected quantum chips, an analogy between our method and analyses of coupled mass-spring systems suggests that our quantum computing approach may lead to wave field simulations that run exponentially faster than simulations on classical computers.	翻訳日:2023-12-25 14:40:29 公開日:2023-12-22
# ESBMC v7.4: インターバルのパワーを損なう ESBMC v7.4: Harnessing the Power of Intervals ( http://arxiv.org/abs/2312.14746v1 ) ライセンス: Link先を確認	Rafael Menezes, Mohannad Aldughaim, Bruno Farias, Xianzhiyu Li, Edoardo Manino, Fedor Shmarov, Kunjian Song, Franz Brau{\ss}e, Mikhail R. Gadelha, Norbert Tihanyi, Konstantin Korovin, Lucas C. Cordeiro	(参考訳) ESBMCはモデルチェックのために多くの最先端技術を実装しています。従来サポートされていたプログラムやプロパティの検証結果を得るために,新たに改良された機能について報告する。 ESBMCは、プログラム内の式を静的に解析し、検証性能を向上させる。これにはブールと整数の区間に基づく推論、前方と後方の請負業者、そしてそれらのユビキティのためにシングルトン間隔に関する特定の最適化が含まれる。他の関連する改善は、並列プログラムの検証、およびいくつかの操作モデル、内部モデル、およびpthreadやC数学ライブラリなどのライブラリの検証である。拡張メモリ安全性解析により、到達可能なメモリリークの追跡が可能になった。 ESBMC implements many state-of-the-art techniques for model checking. We report on new and improved features that allow us to obtain verification results for previously unsupported programs and properties. ESBMC employs a new static interval analysis of expressions in programs to increase verification performance. This includes interval-based reasoning over booleans and integers, forward and backward contractors, and particular optimizations related to singleton intervals because of their ubiquity. Other relevant improvements concern the verification of concurrent programs, as well as several operational models, internal ones, and also those of libraries such as pthread and the C mathematics library. An extended memory safety analysis now allows tracking of memory leaks that are considered still reachable.	翻訳日:2023-12-25 14:39:58 公開日:2023-12-22
# 捕捉イオン量子コンピュータにおける静電相互作用エネルギーの推定 Estimation of electrostatic interaction energies on a trapped-ion quantum computer ( http://arxiv.org/abs/2312.14739v1 ) ライセンス: Link先を確認	Pauline J. Ollitrault, Matthias Loipersberger, Robert M. Parrish, Alexander Erhard, Christine Maier, Christian Sommer, Juris Ulmanis, Thomas Monz, Christian Gogolin, Christofer S. Tautermann, Gian-Luca R. Anselmetti, Matthias Degroote, Nikolaj Moll, Raffaele Santagati, Michael Streif	(参考訳) トラップイオン量子コンピュータを用いた静電相互作用エネルギーのハードウェア実装について述べる。計算系として,一酸化窒素還元酵素 (NOR) を触媒とした$\mathrm{NO}$から$\mathrm{N}_2\mathrm{O}$への還元に着目した。量子コンピュータは、NOR活性空間内で近似基底状態を生成するために使用される。必要な1粒子密度行列を効率的に測定するために,回路長を延ばさずに,フェルミオン基底回転を量子回路に組み込む。計算の基礎における測定は、古典的コンピュータ上の静電相互作用エネルギーを計算する入力として使用される。実験結果は, ハードウェアノイズにもかかわらず, 化学的精度で静電相互作用エネルギーを求めるため, 同じ回路の古典的なノイズレスシミュレーションと強く一致した。この研究は、相互作用エネルギーのような特定の観測対象に適したアルゴリズムは、単純な超分子的アプローチでは個々の基底状態エネルギーよりもはるかに少ない量子資源を必要とすることを示している。 We present the first hardware implementation of electrostatic interaction energies using a trapped-ion quantum computer. As test system for our computation, we focus on the reduction of $\mathrm{NO}$ to $\mathrm{N}_2\mathrm{O}$ catalyzed by a nitric oxide reductase (NOR). The quantum computer is used to generate an approximate ground state within the NOR active space. To efficiently measure the necessary one-particle density matrices, we incorporate fermionic basis rotations into the quantum circuit without extending the circuit length, laying the groundwork for further efficient measurement routines using factorizations. Measurements in the computational basis are then used as inputs for computing the electrostatic interaction energies on a classical computer. Our experimental results strongly agree with classical noise-less simulations of the same circuits, finding electrostatic interaction energies within chemical accuracy despite hardware noise. This work shows that algorithms tailored to specific observables of interest, such as interaction energies, may require significantly fewer quantum resources than individual ground state energies would in the straightforward supermolecular approach.	翻訳日:2023-12-25 14:39:47 公開日:2023-12-22
# Combinatoryカテゴリー文法を用いた対話文の計算意味と評価ベンチマーク Computational Semantics and Evaluation Benchmark for Interrogative Sentences via Combinatory Categorial Grammar ( http://arxiv.org/abs/2312.14737v1 ) ライセンス: Link先を確認	Hayate Funakura, Koji Mineshima	(参考訳) 本稿では, Combinatory Categorial Grammar (CCG) の枠組みの中で,多種多様な極性質問に対する構成意味論について述べる。提案する分析の説明力を評価するために,質問文の意味性を評価するための質問応答データセットQSEMを提案する。我々は既存のCCGパーサを用いて分析を行い、データセットを用いて評価を行う。評価の結果,QSEMに含まれるサンプルの約半数に対して,CCG木を用いた注釈付きデータと意味表現が得られた。さらに,CCGの理論的能力と既存のCCGパーサの能力の相違についても論じる。 We present a compositional semantics for various types of polar questions and wh-questions within the framework of Combinatory Categorial Grammar (CCG). To assess the explanatory power of our proposed analysis, we introduce a question-answering dataset QSEM specifically designed to evaluate the semantics of interrogative sentences. We implement our analysis using existing CCG parsers and conduct evaluations using the dataset. Through the evaluation, we have obtained annotated data with CCG trees and semantic representations for about half of the samples included in QSEM. Furthermore, we discuss the discrepancy between the theoretical capacity of CCG and the capabilities of existing CCG parsers.	翻訳日:2023-12-25 14:39:27 公開日:2023-12-22
# メタプロンプトを用いた視覚知覚のための高調波拡散モデル Harnessing Diffusion Models for Visual Perception with Meta Prompts ( http://arxiv.org/abs/2312.14733v1 ) ライセンス: Link先を確認	Qiang Wan, Zilong Huang, Bingyi Kang, Jiashi Feng, Li Zhang	(参考訳) 視覚モデルの生成的前訓練の問題は、長年の余波として続いている。現在,テキスト・ツー・イメージ(t2i)拡散モデルは,テキスト入力にマッチする高精細な画像を生成するための優れた習熟度を示す。拡散モデルを使用して視覚的知覚タスクに取り組むことができるか? 本稿では,視覚知覚タスクにおける拡散モデルを利用した簡易かつ効果的なスキームを提案する。我々の重要な洞察は、学習可能な埋め込み(メタプロンプト)を事前訓練された拡散モデルに導入し、知覚のための適切な特徴を抽出することである。メタプロンプトの効果は2倍である。まず、T2Iモデルのテキスト埋め込みを直接置き換えることで、特徴抽出中にタスク関連機能を活性化することができる。第二に、抽出された機能を再配置して、モデルがタスクの最も関連する機能に集中することを保証するために使用される。さらに,拡散モデルの性質をフル活用し,より強力な視覚的特徴をもたらす再帰的改善訓練戦略を設計する。様々なベンチマークにわたる大規模な実験により、我々のアプローチの有効性が検証された。提案手法は,NYU深度V2およびKITTIの深度推定タスクとCityScapesのセマンティックセグメンテーションタスクにおいて,新たな性能記録を実現する。同時に,提案手法は,ade20kにおける意味セグメンテーションやcocoデータセットにおけるポーズ推定に匹敵する結果を得るとともに,そのロバスト性と汎用性を示す。 The issue of generative pretraining for vision models has persisted as a long-standing conundrum. At present, the text-to-image (T2I) diffusion model demonstrates remarkable proficiency in generating high-definition images matching textual inputs, a feat made possible through its pre-training on large-scale image-text pairs. This leads to a natural inquiry: can diffusion models be utilized to tackle visual perception tasks? In this paper, we propose a simple yet effective scheme to harness a diffusion model for visual perception tasks. Our key insight is to introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception. The effect of meta prompts are two-fold. First, as a direct replacement of the text embeddings in the T2I models, it can activate task-relevant features during feature extraction. Second, it will be used to re-arrange the extracted features to ensures that the model focuses on the most pertinent features for the task on hand. Additionally, we design a recurrent refinement training strategy that fully leverages the property of diffusion models, thereby yielding stronger visual features. Extensive experiments across various benchmarks validate the effectiveness of our approach. Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes. Concurrently, the proposed method attains results comparable to the current state-of-the-art in semantic segmentation on ADE20K and pose estimation on COCO datasets, further exemplifying its robustness and versatility.	翻訳日:2023-12-25 14:39:16 公開日:2023-12-22
# 北エフハニカムモデルにおける高密度渦格子の有効モデル Effective models for dense vortex lattices in the Kitaev honeycomb model ( http://arxiv.org/abs/2312.14729v1 ) ライセンス: Link先を確認	David J. Alspaugh, Jean-No\"el Fuchs, Anna Ritz-Zwilling and Julien Vidal	(参考訳) 北エフハニカムモデルにおいて,高密度渦構成のための低エネルギー有効モデルを導入する。具体的には,渦フリープラーペットが渦フル背景に対して三角形格子を形成する渦の構成を考える。渦密度によって、これらの「二重」構成は、翻訳と反転対称性によって分類された2つの族のいずれかに属する。時間反転対称性破断項の関数として、ある族は偶数チャーン数を拡張されたギャップレス位相で割ったガッピング位相を示し、もう一方は偶数または奇数チャーン数を持つガッピング位相を臨界点で割った。我々は,各家系に有効なモデルを構築し,これらのモデルのパラメータを状態の積分密度に適合させて決定し,キタエフハニカムモデルのエネルギースペクトルとチャーン数を再現する。また、位相図を導き、これらのモデルの妥当性を決定する。 We introduce low-energy effective models for dense configurations of vortices in the Kitaev honeycomb model. Specifically, we consider configurations of vortices in which vortex-free plaquettes form triangular lattices against a vortex-full background. Depending on the vortex density, these "dual" configurations belong to either one of two families classified by translation and inversion symmetry. As a function of a time-reversal symmetry breaking term, one family exhibits gapped phases with even Chern numbers separated by extended gapless phases, while the other exhibits gapped phases with even or odd Chern numbers, separated by critical points. We construct an effective model for each family, determine the parameters of these models by fitting the integrated density of states, and reproduce energy spectra and Chern numbers of the Kitaev honeycomb model. We also derive phase diagrams and determine these models' validity.	翻訳日:2023-12-25 14:38:47 公開日:2023-12-22
# ストロンチウム量子ガス顕微鏡 A strontium quantum-gas microscope ( http://arxiv.org/abs/2312.14818v1 ) ライセンス: Link先を確認	Sandra Buob, Jonatan H\"oschele, Vasiliy Makhalov, Antonio Rubio-Abadal, Leticia Tarruell	(参考訳) 量子ガス顕微鏡の開発は、量子変性多体系を単一原子レベルで探索する新しい方法をもたらした。これまで、これらの設定のほとんどはアルカリ原子に焦点を合わせてきた。アルカリ元素への量子ガス顕微鏡の拡張は、量子シミュレーションの分野にSU(N)対称フェルミオン同位体や超狭光遷移のような新しいツールを提供する。ここでは,Hubbard-Regime光学格子における$^{84}$Srボソニック量子ガスのサイト分解像を実演する。量子ガスは2次元の面内格子と光シート電位によって閉じ込められ、ストロンチウムのクロックマジック波長813.4nmで動作する。空間分解能の高い広帯域461nm遷移を用いた蛍光イメージングを実現する。同時に、狭い689nmの相互結合線で魅力的なシホス冷却を行う。蛍光画像から原子占有率を再構成し,94%以上の撮像率を得た。最後に,Bose-Hubbard系における$^{84}$Sr超流動を実現する。単原子分解能を持つ位相コヒーレンスプローブである伸長時の干渉パターンを観察した。ストロンチウム量子ガス顕微鏡は、散逸するハバード模型、原子配列の量子光学、および顕微鏡レベルでsu(n)フェルミオンを研究するための新しいプラットフォームを提供する。 The development of quantum-gas microscopes has brought novel ways of probing quantum degenerate many-body systems at the single-atom level. Until now, most of these setups have focused on alkali atoms. Expanding quantum-gas microscopy to alkaline-earth elements will provide new tools, such as SU(N)-symmetric fermionic isotopes or ultranarrow optical transitions, to the field of quantum simulation. Here, we demonstrate the site-resolved imaging of a $^{84}$Sr bosonic quantum gas in a Hubbard-regime optical lattice. The quantum gas is confined by a two-dimensional in-plane lattice and a light-sheet potential, which operate at the strontium clock-magic wavelength of 813.4 nm. We realize fluorescence imaging using the broad 461 nm transition, which provides high spatial resolution. Simultaneously, we perform attractive Sisyphus cooling with the narrow 689 nm intercombination line. We reconstruct the atomic occupation from the fluorescence images, obtaining imaging fidelities above 94%. Finally, we realize a $^{84}$Sr superfluid in the Bose-Hubbard regime. We observe its interference pattern upon expansion, a probe of phase coherence, with single-atom resolution. Our strontium quantum-gas microscope provides a new platform to study dissipative Hubbard models, quantum optics in atomic arrays, and SU(N) fermions at the microscopic level.	翻訳日:2023-12-25 14:31:08 公開日:2023-12-22
# PARDINUS:オートエンコーダに基づく写真追跡空白画像の削除を弱めに監視 PARDINUS: Weakly supervised discarding of photo-trapping empty images based on autoencoders ( http://arxiv.org/abs/2312.14812v1 ) ライセンス: Link先を確認	David de la Rosa, Antonio J Rivera, Mar\'ia J del Jesus, Francisco Charte	(参考訳) 写真撮影カメラは野生生物の監視に広く利用されている。これらのカメラは、動きが検出されたときに写真を撮り、動物が現れる像を捉えます。これらの画像の大部分は空で、画像には野生生物は現れない。画像のフィルタリングは、生物学者の手作業で何時間もかかるので、簡単な作業ではない。したがって、このタスクの自動化には顕著な関心がある。空のフォトトラッピング画像の自動破棄は、機械学習の分野ではまだオープンフィールドである。既存のソリューションは、トレーニングフェーズで画像のアノテーションを必要とする最先端の教師付き畳み込みニューラルネットワークに依存することが多い。 PARDINUS (Weakly suPervised discARDINg of photo-trapping empty image based on aUtoencoderS) は、弱教師付き学習の基礎の上に構築され、この手法が更なるラベル付け作業を必要とする他の完全教師付き手法に等しいか、超えていることを証明している。 Photo-trapping cameras are widely employed for wildlife monitoring. Those cameras take photographs when motion is detected to capture images where animals appear. A significant portion of these images are empty - no wildlife appears in the image. Filtering out those images is not a trivial task since it requires hours of manual work from biologists. Therefore, there is a notable interest in automating this task. Automatic discarding of empty photo-trapping images is still an open field in the area of Machine Learning. Existing solutions often rely on state-of-the-art supervised convolutional neural networks that require the annotation of the images in the training phase. PARDINUS (Weakly suPervised discARDINg of photo-trapping empty images based on aUtoencoderS) is constructed on the foundation of weakly supervised learning and proves that this approach equals or even surpasses other fully supervised methods that require further labeling work.	翻訳日:2023-12-25 14:30:49 公開日:2023-12-22
# レースカーのロック差分を正確に制御する三輪車モデル A Tricycle Model to Accurately Control an Autonomous Racecar with Locked Differential ( http://arxiv.org/abs/2312.14808v1 ) ライセンス: Link先を確認	Ayoub Raji, Nicola Musiu, Alessandro Toschi, Francesco Prignoli, Eugenio Mascaro, Pietro Musso, Francesco Amerotti, Alexander Liniger, Silvio Sorrentino, Marko Bertogna	(参考訳) 本稿では,自律オープンホイールレースカーの側方ダイナミクスに対するロックド・ディファレンシャルの効果をモデル化する新しい定式化法を提案する。このモデルはモデル予測コントローラで使用されており、マイクロステップ離散化アプローチを用いてダイナミクスを正確に線形化し、リアルタイム実装に適した予測を生成する。モデルの安定性解析と,オフライン軌道生成パイプライン,オンライン局所速度プロファイルプランナ,低レベル縦型コントローラを含む全体計画制御スキームの概要について述べる。横道追跡の改善は、モンツァF1レーストラックでの最初のインディ自律チャレンジイベントでダララ AV-21で生産された予備的な実験結果で実証された。タイヤリミットに近い動作を行う場合の解の有効性を実証する高忠実度シミュレータにおいて, 最終調整およびチューニングを行った。 In this paper, we present a novel formulation to model the effects of a locked differential on the lateral dynamics of an autonomous open-wheel racecar. The model is used in a Model Predictive Controller in which we included a micro-steps discretization approach to accurately linearize the dynamics and produce a prediction suitable for real-time implementation. The stability analysis of the model is presented, as well as a brief description of the overall planning and control scheme which includes an offline trajectory generation pipeline, an online local speed profile planner, and a low-level longitudinal controller. An improvement of the lateral path tracking is demonstrated in preliminary experimental results that have been produced on a Dallara AV-21 during the first Indy Autonomous Challenge event on the Monza F1 racetrack. Final adjustments and tuning have been performed in a high-fidelity simulator demonstrating the effectiveness of the solution when performing close to the tire limits.	翻訳日:2023-12-25 14:30:31 公開日:2023-12-22
# 量子コンピューティングの幾何学 The Geometry of Quantum Computing ( http://arxiv.org/abs/2312.14807v1 ) ライセンス: Link先を確認	E. Ercolessi, R. Fioresi, T. Weber	(参考訳) 本稿では,いくつかの量子計算問題の幾何学的モデリングについて概説する。この用語の簡単な導入の後、量子情報幾何学とZX-計算に焦点を合わせ、量子コンピューティング問題と量子群、すなわちホップ代数との接続を確立する。 In this expository paper we present a brief introduction to the geometrical modeling of some quantum computing problems. After a brief introduction to establish the terminology, we focus on quantum information geometry and ZX-calculus, establishing a connection between quantum computing questions and quantum groups, i.e. Hopf algebras.	翻訳日:2023-12-25 14:30:15 公開日:2023-12-22
# 海洋生物音響データに対する信号対雑音比が生成的対立ネットワークに及ぼす影響 The Effects of Signal-to-Noise Ratio on Generative Adversarial Networks Applied to Marine Bioacoustic Data ( http://arxiv.org/abs/2312.14806v1 ) ライセンス: Link先を確認	Georgia Atkinson, Nick Wright, A. Stephen McGough and Per Berggren	(参考訳) 近年,海洋生物音響分野におけるデータセットの補足にgans(generative adversarial network)が用いられている。これはデータ収集コスト、データの分散性、前処理支援などの要因によって引き起こされる。海洋生物音響データの顕著な課題の1つは、GANのような深層学習技術を適用する際に困難を呈する低信号-雑音比(SNR)である。本研究では,SNRがGAN演奏に与える影響について検討し,GAN演奏に対する3つの評価手法について検討し,特にWaveGANにおけるSNRの効果について興味深い結果を得た。 In recent years generative adversarial networks (GANs) have been used to supplement datasets within the field of marine bioacoustics. This is driven by factors such as the cost to collect data, data sparsity and aid preprocessing. One notable challenge with marine bioacoustic data is the low signal-to-noise ratio (SNR) posing difficulty when applying deep learning techniques such as GANs. This work investigates the effect SNR has on the audio-based GAN performance and examines three different evaluation methodologies for GAN performance, yielding interesting results on the effects of SNR on GANs, specifically WaveGAN.	翻訳日:2023-12-25 14:30:10 公開日:2023-12-22
# 自由空間結合トラップイオンを有する量子リピータノード Quantum repeater node with free-space coupled trapped ions ( http://arxiv.org/abs/2312.14805v1 ) ライセンス: Link先を確認	Max Bergerhoff, Omar Elshehy, Stephan Kucera, Matthias Kreis, and J\"urgen Eschner	(参考訳) 量子中継セル(quantum repeater cell)は、直接伝送におけるファイバー損失が避けられないため、距離制限を克服できる量子ネットワークの基本構成要素である。我々は、量子記憶として働く同じトラップにおいて、2つの自由空間結合$^{40}$ca$^+$イオンに基づく量子リピータセルの実装を実証する。本研究では, 個々のイオンからの単一光子の放出を制御し, 原子光子と光子光子の絡み合いの非同期発生を実証する。我々は,生成率のスケーリングと忠実性について考察する。 The quantum repeater cell is a basic building block for a quantum network, as it allows to overcome the distance limitations due to unavoidable fiber loss in direct transmission. We demonstrate the implementation of a quantum repeater cell, based on two free-space coupled $^{40}$Ca$^+$ ions in the same trap that act as quantum memories. We demonstrate the asynchronous generation of atom-photon and photon-photon entanglement by controlled emission of single photons from the individually addressed ions and entanglement swapping. We discuss the fidelity as well as the scaling of the generated rate.	翻訳日:2023-12-25 14:29:57 公開日:2023-12-22
# 大きな言語モデルを使って株式を宣伝する Use large language models to promote equity ( http://arxiv.org/abs/2312.14804v1 ) ライセンス: Link先を確認	Emma Pierson, Divya Shanmugam, Rajiv Movva, Jon Kleinberg, Monica Agrawal, Mark Dredze, Kadija Ferryman, Judy Wawira Gichoya, Dan Jurafsky, Pang Wei Koh, Karen Levy, Sendhil Mullainathan, Ziad Obermeyer, Harini Suresh, Keyon Vafa	(参考訳) 大きな言語モデル(LLM)の進歩は、彼らの社会的影響に対する関心の爆発を引き起こした。ソーシャルエクイティへの影響に関する議論の多くは、"どのようにllmが偏り、どのようにバイアスを軽減できるか"というような質問に焦点をあてて、警告的あるいは否定的になっている。 AIが一般的に、特にLLMがバイアスを封じ込めている方法は、十分に文書化されている。しかし、同じように重要で議論の少ない、機会に焦点を絞ったカウンターポイントは、"llmが株式を促進できる有望なアプリケーションは何でしょうか? LLMがより公平な世界を実現するためには、バイアスや障害モードに対して防御を行うだけでは十分ではありません。エクイティエンハンティング(エクイティエンハンシング)のユースケースに積極的に適用することで、過小評価されたグループに対する機会を増やし、社会的差別を減らすことも必要です。 aiの影響を決定する選択肢はたくさんありますし、パイプラインの非常に早い段階での基本的な選択は、aiを適用すべき問題です。パイプラインでのみ焦点を合わせれば -- LLMが本質的に電力を消費するユースケースを促進することで、より公平になる -- 、その影響を公平に導く重要な機会を逃すことになるでしょう。本稿では,リスクと注意点を明確に保ちつつ,新たに可能な4つの研究指針を提示することで,株式を促進することへのllmの新たな可能性について強調する。 Advances in large language models (LLMs) have driven an explosion of interest about their societal impacts. Much of the discourse around how they will impact social equity has been cautionary or negative, focusing on questions like "how might LLMs be biased and how would we mitigate those biases?" This is a vital discussion: the ways in which AI generally, and LLMs specifically, can entrench biases have been well-documented. But equally vital, and much less discussed, is the more opportunity-focused counterpoint: "what promising applications do LLMs enable that could promote equity?" If LLMs are to enable a more equitable world, it is not enough just to play defense against their biases and failure modes. We must also go on offense, applying them positively to equity-enhancing use cases to increase opportunities for underserved groups and reduce societal discrimination. There are many choices which determine the impact of AI, and a fundamental choice very early in the pipeline is the problems we choose to apply it to. If we focus only later in the pipeline -- making LLMs marginally more fair as they facilitate use cases which intrinsically entrench power -- we will miss an important opportunity to guide them to equitable impacts. Here, we highlight the emerging potential of LLMs to promote equity by presenting four newly possible, promising research directions, while keeping risks and cautionary points in clear view.	翻訳日:2023-12-25 14:29:49 公開日:2023-12-22
# 複雑なデータ検索のためのセマンティックパーシング:リレーショナルデータベースへのノンコードアクセスのためのクエリプラン対SQLのターゲット Semantic Parsing for Complex Data Retrieval: Targeting Query Plans vs. SQL for No-Code Access to Relational Databases ( http://arxiv.org/abs/2312.14798v1 ) ライセンス: Link先を確認	Ben Eyal, Amir Bachar, Ophir Haroche, Michael Elhadad	(参考訳) 大きな言語モデル(LLM)は、与えられたデータベーススキーマに基づいて自然言語の質問からSQLクエリを生成するタスクであるtext-to-SQLの進歩を加速させた。 SQLの宣言的な性質にもかかわらず、それは引き続き複雑なプログラミング言語である。本稿では,より単純な構文と複雑なクエリのモジュール仕様を備えた代替クエリ言語の可能性を検討する。目的は、現代のニューラルセマンティックパーシングアーキテクチャによってより容易に学習できるクエリ言語を作成すると同時に、対話型クエリプランアシスタントによって生成されたクエリプランの有効性をよりよく評価することである。提案されている代替クエリ言語はQuery Plan Language (QPL)と呼ばれる。モジュール式として設計されており、sql common table expression (cte) の制限された形式に変換できる。 qplの目的は、ユーザが自然言語で質問を表現できるだけでなく、検証しやすいターゲット言語を提供することによって、非プログラマが複雑なデータ検索にアクセスできるようにすることである。本稿は、QPLのモジュラリティが複雑なクエリプランを構成的に生成する上で、ニューラルネットワークLLMのメリットを実証する。これには質問分解戦略と計画段階が含まれる。我々は、QPLに変換されたSpiderテキスト-SQLデータセットのバージョンの実験を行う。 qplプログラムの階層構造により,クエリの複雑さを自然に測定できる。この評価に基づき、複雑な合成クエリ上で既存のテキスト-SQLシステムの低精度を同定する。複雑なクエリの課題に対して,微調整 LLM と様々なプロンプト戦略を用いて,反復的かつユーザ制御的な方法で対処する方法を提案する。 Large Language Models (LLMs) have spurred progress in text-to-SQL, the task of generating SQL queries from natural language questions based on a given database schema. Despite the declarative nature of SQL, it continues to be a complex programming language. In this paper, we investigate the potential of an alternative query language with simpler syntax and modular specification of complex queries. The purpose is to create a query language that can be learned more easily by modern neural semantic parsing architectures while also enabling non-programmers to better assess the validity of the query plans produced by an interactive query plan assistant. The proposed alternative query language is called Query Plan Language (QPL). It is designed to be modular and can be translated into a restricted form of SQL Common Table Expressions (CTEs). The aim of QPL is to make complex data retrieval accessible to non-programmers by allowing users to express their questions in natural language while also providing an easier-to-verify target language. The paper demonstrates how neural LLMs can benefit from QPL's modularity to generate complex query plans in a compositional manner. This involves a question decomposition strategy and a planning stage. We conduct experiments on a version of the Spider text-to-SQL dataset that has been converted to QPL. The hierarchical structure of QPL programs enables us to measure query complexity naturally. Based on this assessment, we identify the low accuracy of existing text-to-SQL systems on complex compositional queries. We present ways to address the challenge of complex queries in an iterative, user-controlled manner, using fine-tuned LLMs and a variety of prompting strategies in a compositional manner.	翻訳日:2023-12-25 14:29:27 公開日:2023-12-22
# マルチコストシナリオにおけるサポートベクトルマシンについて On support vector machines under a multiple-cost scenario ( http://arxiv.org/abs/2312.14795v1 ) ライセンス: Link先を確認	Sandra Ben\'itez-Pe\~na and Rafael Blanquero and Emilio Carrizosa and Pepa Ram\'irez-Cobo	(参考訳) Support Vector Machine(SVM)はバイナリ分類において強力なツールであり、優れた誤分類率を持つことで知られている。一方、医学診断、チャーン、詐欺予測などの現実世界の分類問題の多くは、異なるクラスで異なる可能性のある誤分類コストを伴っている。しかし、そのような誤分類コストに対して正確な値を提供することは困難であり、許容できる誤分類率を識別することがより容易である。本稿では,問題定式化に性能制約を組み込むことで,誤分類コストを考慮した新しいSVMモデルを提案する。具体的には,最大辺が与えられた閾値以下の誤分類率を有する超平面を求める。そのような最大辺超平面は、線形制約と整数変数を持つ二次凸問題を解くことによって得られる。報告された数値的経験から、我々のモデルは、あるクラスにおける誤分類率(おそらく他のクラスにおける誤分類率の増加による)をユーザが制御でき、実行時間の観点からも実現可能であることを示す。 Support Vector Machine (SVM) is a powerful tool in binary classification, known to attain excellent misclassification rates. On the other hand, many realworld classification problems, such as those found in medical diagnosis, churn or fraud prediction, involve misclassification costs which may be different in the different classes. However, it may be hard for the user to provide precise values for such misclassification costs, whereas it may be much easier to identify acceptable misclassification rates values. In this paper we propose a novel SVM model in which misclassification costs are considered by incorporating performance constraints in the problem formulation. Specifically, our aim is to seek the hyperplane with maximal margin yielding misclassification rates below given threshold values. Such maximal margin hyperplane is obtained by solving a quadratic convex problem with linear constraints and integer variables. The reported numerical experience shows that our model gives the user control on the misclassification rates in one class (possibly at the expense of an increase in misclassification rates for the other class) and is feasible in terms of running times.	翻訳日:2023-12-25 14:29:04 公開日:2023-12-22
# euオンラインプラットフォームのソフトウェアドキュメンテーションにおけるランキング透明性の遵守に関する実証的研究 An Empirical Study on Compliance with Ranking Transparency in the Software Documentation of EU Online Platforms ( http://arxiv.org/abs/2312.14794v1 ) ライセンス: Link先を確認	Francesco Sovrano, Micha\"el Lognoul, Alberto Bacchelli	(参考訳) 欧州連合(eu)のプラットフォーム・ツー・ビジネス(p2b)規制の遵守は、オンラインプラットフォームでは困難であり、当局にとってコンプライアンスの評価は困難である。これは部分的には、ランキングの透明性に関する情報(ソフトウェアドキュメントなど)を評価する自動化ツールの欠如によるものだ。私たちの研究はこの問題に2つの方法で取り組む。まず、主要な6つのプラットフォーム(Amazon、Bing、Booking、Google、Tripadvisor、Yahoo)のコンプライアンスを実証的に評価し、ドキュメントにかなりの違いがあることを明らかにする。第2に,ChatGPTと情報検索技術に基づく自動コンプライアンス評価ツールの導入とテストを行う。これらのツールは人的判断に対して評価され、コンプライアンス評価のための信頼できるプロキシとして有望な結果を示す。今回の発見は、規制遵守の強化に寄与し、これらのプラットフォームにおけるビジネス格差を含む不平等の低減を目指す国連持続可能な開発目標10.3に適合する可能性がある。 Compliance with the European Union's Platform-to-Business (P2B) Regulation is challenging for online platforms, and assessing their compliance can be difficult for public authorities. This is partly due to the lack of automated tools for assessing the information (e.g., software documentation) platforms provide concerning ranking transparency. Our study tackles this issue in two ways. First, we empirically evaluate the compliance of six major platforms (Amazon, Bing, Booking, Google, Tripadvisor, and Yahoo), revealing substantial differences in their documentation. Second, we introduce and test automated compliance assessment tools based on ChatGPT and information retrieval technology. These tools are evaluated against human judgments, showing promising results as reliable proxies for compliance assessments. Our findings could help enhance regulatory compliance and align with the United Nations Sustainable Development Goal 10.3, which seeks to reduce inequality, including business disparities, on these platforms.	翻訳日:2023-12-25 14:28:44 公開日:2023-12-22
# 速度-歪み-知覚-分類トレードオフ:逆領域GANによる連成音源符号化と変調 The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANs ( http://arxiv.org/abs/2312.14792v1 ) ライセンス: Link先を確認	Junli Fang, Jo\~ao F. C. Mota, Baoshan Lu, Weicheng Zhang, Xuemin Hong	(参考訳) jscm(joint source coding and modulation)フレームワークは、データから自動的に学習できるディープラーニングの最近の開発によって実現され、エンドツーエンドで最高の圧縮符号と変調スキームが実現されている。本稿では,jscmシナリオにおいて,チャネルレート,歪み,知覚,分類精度との間に厳密なトレードオフが存在することを示す。次に,そのトレードオフをナビゲートする2つの画像圧縮手法を提案する。inverse-domain generative adversarial network (id-gan)と,id-ganの性能に関する洞察を提示するよりシンプルでヒューリスティックな手法である。実験の結果は理論的な結果と相関するだけでなく,提案したID-GANアルゴリズムは従来の分離手法や最近の深層JSCMアーキテクチャと比較してシステム性能を著しく向上することを示した。 The joint source coding and modulation (JSCM) framework was enabled by recent developments in deep learning, which allows to automatically learn from data, and in an end-to-end fashion, the best compression codes and modulation schemes. In this paper, we show the existence of a strict tradeoff between channel rate, distortion, perception, and classification accuracy in a JSCM scenario. We then propose two image compression methods to navigate that tradeoff: an inverse-domain generative adversarial network (ID-GAN), which achieves extreme compression, and a simpler, heuristic method that reveals insights about the performance of ID-GAN. Experiment results not only corroborate the theoretical findings, but also demonstrate that the proposed ID-GAN algorithm significantly improves system performance compared to traditional separation-based methods and recent deep JSCM architectures.	翻訳日:2023-12-25 14:28:26 公開日:2023-12-22
# 固有値探索と勾配降下のための改良量子アルゴリズム Improved Quantum Algorithms for Eigenvalues Finding and Gradient Descent ( http://arxiv.org/abs/2312.14786v1 ) ライセンス: Link先を確認	Nhat A. Nghiem and Tzu-Chieh Wei	(参考訳) ブロック符号化は、最近開発された量子アルゴリズムの統一フレームワークを形成する量子信号処理において重要な要素である。探索、振幅推定、ハミルトニアンシミュレーションなどいくつかの問題において、リソース利用の単純化と最適化のために、量子信号処理の能力はこれらを超え、新しい量子アルゴリズムを考案する未解決の可能性を提供する。本稿では,前述した2つの量子アルゴリズムである最大固有値推定と量子勾配降下を実質的に拡張するためにブロック符号化を利用する。高度な手順を含む以前の研究とは異なり、ユニタリブロックエンコーディングを用いて、これらの新しい量子アルゴリズムは、基本操作であっても、元のアルゴリズムに存在する主要なスケーリング要因を排除できることを示しています。これにより、複雑な計算問題に驚くほどの効率で対処できるより効率的な量子アルゴリズムが得られる。さらに,提案手法を,行列反転や複数の固有値推定など,異なる文脈に拡張する方法を示す。 Block encoding is a key ingredient in the recently developed quantum signal processing that forms a unifying framework for quantum algorithms. Initially showcased for simplifying and optimizing resource utilization in several problems, such as searching, amplitude estimation, and Hamiltonian simulation, the capabilities of the quantum signal processing go beyond these and offer untapped potential for devising new quantum algorithms. In this article, we utilize block encoding to substantially enhance two previously proposed quantum algorithms: largest eigenvalue estimation and quantum gradient descent. Unlike previous works that involve sophisticated procedures, our findings, using the unitary block encoding, demonstrate that even with elementary operations, these new quantum algorithms can eliminate major scaling factors present in their original counterparts. This yields much more efficient quantum algorithms capable of tackling complex computational problems with remarkable efficiency. Furthermore, we show how to extend our proposed method to different contexts, including matrix inversion and multiple eigenvalues estimation.	翻訳日:2023-12-25 14:28:08 公開日:2023-12-22
# ロボットソフトウェア開発のためのROSパッケージ検索 : 知識グラフに基づくアプローチ ROS package search for robot software development: a knowledge graph-based approach ( http://arxiv.org/abs/2312.14781v1 ) ライセンス: Link先を確認	Shuo Wang, Xinjun Mao, Shuo Yang, Menghan Wu, Zhang Zhang	(参考訳) ROS(Robot Operating System)パッケージは、ロボットソフトウェア開発で効果的に再利用できるソフトウェアアーティファクトの一種として、ますます人気が高まっている。実際、利用可能な大量のパッケージからソフトウェアの機能要件によく適合する適切なROSパッケージを見つけることは、現在の検索方法を用いた非自明なタスクである。 ROSパッケージの従来の検索手法は、ロボットタスクに関連するキーワードを汎用検索エンジンやコードホスティングプラットフォームに入力して、潜在的に適切なROSパッケージのほぼ全ての結果を得る。しかし,タスク関連キーワードがROSパッケージの機能と正確に一致しないため,これらの検索手法の精度は比較的低い。本稿では, ROSパッケージの検索精度を向上させるために, セマンティックレベルの ROS Package Knowledge Graph (RPKG) を利用した, セマンティックベースの検索手法を提案する。まず、RPKGを構築するために、ROSパッケージのテキスト記述のデータセットから意味概念を抽出するために多次元特徴抽出技術を用いる。このプロセスから抽出されたセマンティックな特徴は、かなりの数のエンティティと関係をもたらします。その後、ロボットドメイン固有の小さなコーパスを作成し、さらに事前訓練された言語モデルBERT-ROSを作成し、抽出した特徴のセマンティクスを効果的に表現する埋め込みを生成する。これらの埋め込みは、RPKG内のROSパッケージ検索プロセスにおいて、意味レベルの理解と比較を促進する上で重要な役割を果たす。次に,従来のキーワード検索法よりも正確なrosパッケージを検索するユーザ検索クエリから,複数の特徴の重み付き類似性を取り入れた,新しい意味マッチングに基づく検索アルゴリズムを提案する。 ROS (Robot Operating System) packages have become increasingly popular as a type of software artifact that can be effectively reused in robotic software development. Indeed, finding suitable ROS packages that closely match the software's functional requirements from the vast number of available packages is a nontrivial task using current search methods. The traditional search methods for ROS packages often involve inputting keywords related to robotic tasks into general-purpose search engines or code hosting platforms to obtain approximate results of all potentially suitable ROS packages. However, the accuracy of these search methods remains relatively low because the task-related keywords may not precisely match the functionalities offered by the ROS packages. To improve the search accuracy of ROS packages, this paper presents a novel semantic-based search approach that relies on the semantic-level ROS Package Knowledge Graph (RPKG) to automatically retrieve the most suitable ROS packages. Firstly, to construct the RPKG, we employ multi-dimensional feature extraction techniques to extract semantic concepts from the dataset of ROS package text descriptions. The semantic features extracted from this process result in a substantial number of entities and relationships. Subsequently, we create a robot domain-specific small corpus and further fine-tune a pre-trained language model, BERT-ROS, to generate embeddings that effectively represent the semantics of the extracted features. These embeddings play a crucial role in facilitating semantic-level understanding and comparisons during the ROS package search process within the RPKG. Secondly, we introduce a novel semantic matching-based search algorithm that incorporates the weighted similarities of multiple features from user search queries, which searches out more accurate ROS packages than the traditional keyword search method.	翻訳日:2023-12-25 14:27:51 公開日:2023-12-22
# 局所密度構造を用いた画像から画像への変換GANの圧縮 Compressing Image-to-Image Translation GANs Using Local Density Structures on Their Learned Manifold ( http://arxiv.org/abs/2312.14776v1 ) ライセンス: Link先を確認	Alireza Ganjdanesh, Shangqian Gao, Hirad Alipanah, Heng Huang	(参考訳) generative adversarial networks (gans) は、画像から画像への変換のための複雑なデータ分布のモデリングにおいて顕著な成功を示している。それでも、彼らの高い計算要求は、エッジデバイスのような実践的なシナリオへの展開を禁止している。既存のGAN圧縮法は主に知識蒸留や畳み込み分類器の刈り取り技術に依存している。したがって、彼らは GAN の臨界特性、すなわちその学習多様体上の局所密度構造を無視している。そこで,新たな視点からgan圧縮にアプローチし,prunedモデルに学習多様体上の元のパラメータ重モデルの密度構造を保存するよう明示的に促す。原生成器の学習多様体を生成試料周辺の局所近傍に分割することにより,prunedモデルの目的を達成する。そこで我々は,カーネル密度推定法に類似した各近傍の局所密度構造を保存するために,プルーニングモデルを定式化する新しいプルーニング目標を提案する。また, 判別器と生成器を2つのプルーニング剤でプルーニングする協調プルーニングスキームを開発した。我々は,対応するモデルのアーキテクチャを決定する際に,ピアのフィードバックを交換することで,ジェネレータと識別器の相互作用を捉えるエージェントを設計する。このような設計により, プルーニング法は高性能サブネットワークを効率よく見つけることができ, プルーニング時のベースラインと比較して, ジェネレータと判別器のバランスをより効率的に維持できる。画像変換GANモデルであるPix2PixとCycleGANについて,様々なベンチマークデータセットとアーキテクチャを用いて実験を行った。 Generative Adversarial Networks (GANs) have shown remarkable success in modeling complex data distributions for image-to-image translation. Still, their high computational demands prohibit their deployment in practical scenarios like edge devices. Existing GAN compression methods mainly rely on knowledge distillation or convolutional classifiers' pruning techniques. Thus, they neglect the critical characteristic of GANs: their local density structure over their learned manifold. Accordingly, we approach GAN compression from a new perspective by explicitly encouraging the pruned model to preserve the density structure of the original parameter-heavy model on its learned manifold. We facilitate this objective for the pruned model by partitioning the learned manifold of the original generator into local neighborhoods around its generated samples. Then, we propose a novel pruning objective to regularize the pruned model to preserve the local density structure over each neighborhood, resembling the kernel density estimation method. Also, we develop a collaborative pruning scheme in which the discriminator and generator are pruned by two pruning agents. We design the agents to capture interactions between the generator and discriminator by exchanging their peer's feedback when determining corresponding models' architectures. Thanks to such a design, our pruning method can efficiently find performant sub-networks and can maintain the balance between the generator and discriminator more effectively compared to baselines during pruning, thereby showing more stable pruning dynamics. Our experiments on image translation GAN models, Pix2Pix and CycleGAN, with various benchmark datasets and architectures demonstrate our method's effectiveness.	翻訳日:2023-12-25 14:27:22 公開日:2023-12-22
# クロスエイジおよびクロスサイトドメインシフトが新生児および新生児脳の深層学習に基づく白質繊維推定に及ぼす影響 Cross-Age and Cross-Site Domain Shift Impacts on Deep Learning-Based White Matter Fiber Estimation in Newborn and Baby Brains ( http://arxiv.org/abs/2312.14773v1 ) ライセンス: Link先を確認	Rizhong Lin, Ali Gholipour, Jean-Philippe Thiran, Davood Karimi, Hamza Kebiri and Meritxell Bach Cuadra	(参考訳) 深層学習モデルでは拡散磁気共鳴イメージングデータから組織微細構造を推定できることが示されている。しかし、これらのモデルは、異なるスキャナーやプロトコルからのデータや、様々な年齢でスキャンされた幼児や子供の発達した脳など、固有の変異のあるデータに適用された場合、ドメインシフトの課題に直面している。データ調和や成人脳の領域適応など、これらの課題に対処するいくつかの手法が提案されている。しかし、これらの手法は乳幼児の急速に発達する脳における繊維配向分布関数の推定には未解明のままである。本研究では,201人の新生児と165人の赤ちゃんの2つのコホート間の年齢効果とドメインシフトについて,モーメント法と微調整戦略を用いて詳細に検討した。以上の結果から,新生児と比較して乳児の微構造発達の変動が深層学習モデルのクロスエイジング性能に直接影響することが示唆された。また,少数の対象領域サンプルがドメインシフト問題を著しく軽減できることを実証した。 Deep learning models have shown great promise in estimating tissue microstructure from limited diffusion magnetic resonance imaging data. However, these models face domain shift challenges when test and train data are from different scanners and protocols, or when the models are applied to data with inherent variations such as the developing brains of infants and children scanned at various ages. Several techniques have been proposed to address some of these challenges, such as data harmonization or domain adaptation in the adult brain. However, those techniques remain unexplored for the estimation of fiber orientation distribution functions in the rapidly developing brains of infants. In this work, we extensively investigate the age effect and domain shift within and across two different cohorts of 201 newborns and 165 babies using the Method of Moments and fine-tuning strategies. Our results show that reduced variations in the microstructural development of babies in comparison to newborns directly impact the deep learning models' cross-age performance. We also demonstrate that a small number of target domain samples can significantly mitigate domain shift problems.	翻訳日:2023-12-25 14:26:54 公開日:2023-12-22
# 乱流: コードのための命令調整型大規模言語モデルの体系的および自動テスト Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code ( http://arxiv.org/abs/2312.14856v1 ) ライセンス: Link先を確認	Shahin Honarvar, Mark van der Wilk, Alastair Donaldson	(参考訳) 本稿では,新しいベンチマークである乱流を用いて,命令調整型大規模言語モデル(LLM)のコード生成における正確性と堅牢性を体系的に評価する手法を提案する。 turbulence は、自然言語 $\textit{question templates}$ の大規模なセットで構成されており、それぞれがプログラミングの問題であり、様々な形式で問うことができるようにパラメータ化されている。各質問テンプレートには関連する$\textit{test oracle}$があり、llmによって返されるコードソリューションが正しいかどうかを判断する。したがって、単一の質問テンプレートから LLM に $\textit{neighbourhood}$ と非常に似たプログラミング質問を問うことができ、各質問に対して返された結果の正しさを評価することができる。例えば、$\textit{anomalies}$, LLMが近隣で$\textit{almost all}$を正しく解決するが、特定のパラメータのインスタンス化には失敗する。我々は,OpenAI,Cohere,Metaの5つのLLMに対して,それぞれ2つの温度構成で実験を行った。以上の結果から, 乱流はLLM推論能力のギャップを明らかにすることができることがわかった。 LLMが近隣の問題を解決することができるが、近隣全体の問題を解決するために一般化することができないケースを体系的に識別することによって、我々の手法は$\textit{robustness}$問題をハイライトするのに効果的である。我々は、llmが間違ったコード結果を返す際に犯す誤りの種類に光を当てるデータと例を示します。 We present a method for systematically evaluating the correctness and robustness of instruction-tuned large language models (LLMs) for code generation via a new benchmark, Turbulence. Turbulence consists of a large set of natural language $\textit{question templates}$, each of which is a programming problem, parameterised so that it can be asked in many different forms. Each question template has an associated $\textit{test oracle}$ that judges whether a code solution returned by an LLM is correct. Thus, from a single question template, it is possible to ask an LLM a $\textit{neighbourhood}$ of very similar programming questions, and assess the correctness of the result returned for each question. This allows gaps in an LLM's code generation abilities to be identified, including $\textit{anomalies}$ where the LLM correctly solves $\textit{almost all}$ questions in a neighbourhood but fails for particular parameter instantiations. We present experiments against five LLMs from OpenAI, Cohere and Meta, each at two temperature configurations. Our findings show that, across the board, Turbulence is able to reveal gaps in LLM reasoning ability. This goes beyond merely highlighting that LLMs sometimes produce wrong code (which is no surprise): by systematically identifying cases where LLMs are able to solve some problems in a neighbourhood but do not manage to generalise to solve the whole neighbourhood, our method is effective at highlighting $\textit{robustness}$ issues. We present data and examples that shed light on the kinds of mistakes that LLMs make when they return incorrect code results.	翻訳日:2023-12-25 14:19:37 公開日:2023-12-22
# TACO:アルゴリズムによるCOde生成データセットのトピック TACO: Topics in Algorithmic COde generation dataset ( http://arxiv.org/abs/2312.14852v1 ) ライセンス: Link先を確認	Rongao Li (1 and 2), Jie Fu (1), Bo-Wen Zhang (1), Tao Huang (2), Zhihong Sun (2), Chen Lyu (2), Guang Liu (1), Zhi Jin (3), Ge Li (3) ((1) Beijing Academy of Artificial Intelligence, (2) School of Information Science and Engineering, Shandong Normal University, China, (3) Key Lab of HCST (PKU), MOE, SCS, Peking University, China)	(参考訳) 我々は,オープンソースの大規模コード生成データセットであるtacoを紹介し,アルゴリズムの光学に重点を置いて,コード生成モデルの分野でより困難なトレーニングデータセットと評価ベンチマークを提供する。 TACOには、現実のプログラミングシナリオにおける問題理解と推論能力を向上または評価する、より難しい競合レベルのプログラミング質問が含まれている。トレーニングとテストセットには25433と1000のコーディング問題があり、最大155万の多様な解答がある。さらに、各TACO問題には、タスクトピック、アルゴリズム、プログラミングスキル、難易度といったいくつかのきめ細かいラベルが含まれており、コード生成モデルのトレーニングと評価をより正確に参照している。データセットと評価スクリプトはHugging Face Hub(https://huggingface.co/datasets/BAAI/TACO)とGithub(https://github.com/FlagOpen/TACO)で入手できる。 We introduce TACO, an open-source, large-scale code generation dataset, with a focus on the optics of algorithms, designed to provide a more challenging training dataset and evaluation benchmark in the field of code generation models. TACO includes competition-level programming questions that are more challenging, to enhance or evaluate problem understanding and reasoning abilities in real-world programming scenarios. There are 25433 and 1000 coding problems in training and test set, as well as up to 1.55 million diverse solution answers. Moreover, each TACO problem includes several fine-grained labels such as task topics, algorithms, programming skills, and difficulty levels, providing a more precise reference for the training and evaluation of code generation models. The dataset and evaluation scripts are available on Hugging Face Hub (https://huggingface.co/datasets/BAAI/TACO) and Github (https://github.com/FlagOpen/TACO).	翻訳日:2023-12-25 14:19:06 公開日:2023-12-22
# kemeny定数を用いた最適マルコフ鎖分割のためのグラフニューラルネットワークの大規模トレーディング Large Scale Traning of Graph Neural Networks for Optimal Markov-Chain Partitioning Using the Kemeny Constant ( http://arxiv.org/abs/2312.14847v1 ) ライセンス: Link先を確認	Sam Alexander Martino, Jo\~ao Morado, Chenghao Li, Zhenghao Lu, Edina Rosta	(参考訳) 従来のクラスタリングアルゴリズムは、グラフ内の複雑な関係を捉え、任意のクラスタリング基準に一般化するのに苦労することが多い。グラフデータの表現を学習する強力なフレームワークとしてのグラフニューラルネットワーク(GNN)の出現は、その問題を解決するための新しいアプローチを提供する。これまでの研究は、GNNが様々な基準を用いてパーティショニングを提案できることを示したが、これらのアプローチはまだマルコフ連鎖や運動ネットワークに拡張されていない。これらは分子システムの研究で頻繁に発生し、特に生化学的モデリングのコミュニティに興味を持つ。本稿では,マルコフ連鎖のグラフ分割問題に対処するために,複数のgnnベースのアーキテクチャを提案する。このアプローチは、提案されたパーティショニングがケメニー定数をどの程度変更するかを最小化することを目的としている。本稿では,エンコーダデコーダアーキテクチャを用いて,リニアレイヤを持つGraphSAGEベースのGNNが,このコンテキストにおいてより大きく,より表現力に富んだアテンションベースモデルよりも優れていることを示す。概念実証として,まずランダムに連結されたグラフをクラスタ化する手法を実証する。また、運動ネットワークとして1次元自由エネルギープロファイルに対応する線形鎖構造を用いる。その後,分子動力学から得られたデータセットを用いた実験により,本手法の有効性を示す。本手法の性能をpcca+などの他の分割手法と比較する。本稿では,特徴量選択とハイパーパラメータ選択の重要性を検討し,gnnの大規模並列学習のための汎用的戦略を提案する。 Traditional clustering algorithms often struggle to capture the complex relationships within graphs and generalise to arbitrary clustering criteria. The emergence of graph neural networks (GNNs) as a powerful framework for learning representations of graph data provides new approaches to solving the problem. Previous work has shown GNNs to be capable of proposing partitionings using a variety of criteria, however, these approaches have not yet been extended to work on Markov chains or kinetic networks. These arise frequently in the study of molecular systems and are of particular interest to the biochemical modelling community. In this work, we propose several GNN-based architectures to tackle the graph partitioning problem for Markov Chains described as kinetic networks. This approach aims to minimize how much a proposed partitioning changes the Kemeny constant. We propose using an encoder-decoder architecture and show how simple GraphSAGE-based GNNs with linear layers can outperform much larger and more expressive attention-based models in this context. As a proof of concept, we first demonstrate the method's ability to cluster randomly connected graphs. We also use a linear chain architecture corresponding to a 1D free energy profile as our kinetic network. Subsequently, we demonstrate the effectiveness of our method through experiments on a data set derived from molecular dynamics. We compare the performance of our method to other partitioning techniques such as PCCA+. We explore the importance of feature and hyperparameter selection and propose a general strategy for large-scale parallel training of GNNs for discovering optimal graph partitionings.	翻訳日:2023-12-25 14:18:48 公開日:2023-12-22
# Rydberg tweezerアレイにおける不均衡ホッピングを伴う2成分Bose-Hubbardモデルのシミュレーション Simulating a two component Bose-Hubbard model with imbalanced hopping in a Rydberg tweezer array ( http://arxiv.org/abs/2312.14846v1 ) ライセンス: Link先を確認	Y. Zhang, A. Gaddie, H-V. Do, G. W. Biedermann, R. J. Lewis-Swan	(参考訳) 中性原子の光学トウェザーアレイは、相互作用の範囲とハミルトニアンによって量子シミュレーションのための汎用的なプラットフォームを提供する。本稿では,共振双極子相互作用を特徴とする多層Rydberg原子配列を用いた2成分Bose-Hubbardモデルを提案する。 bose-hubbardモデルの局所ヒルベルト空間を符号化するために使用できる状態の多様性は、各成分の相対ホッピング率の制御とスピンフリップホッピングの実現を可能にする。数値シミュレーションを用いて、多レベルリドバーグ原子がモデルの多様な非平衡クエンチダイナミクスを探求する機会を与えることを示す。例えば、有効スピンの緩和時間スケールと荷電自由度を分離し、ハードコアボソン相互作用に起因する動的制約により、2つの成分の有効ホッピング速度が大きく異なる場合の緩やかな緩和の仕組みを観察する。本稿では,最新のrydberg tweezer配列で提案を実現する技術的詳細について述べる。 Optical tweezer arrays of neutral atoms provide a versatile platform for quantum simulation due to the range of interactions and Hamiltonians that can be realized and explored. We propose to simulate a two-component Bose-Hubbard model with power-law hopping using arrays of multilevel Rydberg atoms featuring resonant dipolar interactions. The diversity of states that can be used to encode the local Hilbert space of the Bose-Hubbard model enables control of the relative hopping rate of each component and even the realization of spin-flip hopping. We use numerical simulations to show how multilevel Rydberg atoms provide an opportunity to explore the diverse non-equilibrium quench dynamics of the model. For example, we demonstrate a separation of the relaxation timescales of effective spin and charge degrees of freedom, and observe regimes of slow relaxation when the effective hopping rates of the two components are vastly different due to dynamical constraints arising from hardcore boson interactions. We discuss the technical details of realizing our proposal in state-of-the-art Rydberg tweezer arrays.	翻訳日:2023-12-25 14:18:28 公開日:2023-12-22
# メタファー翻訳の精神医学への応用について On the Use of Metaphor Translation in Psychiatry ( http://arxiv.org/abs/2312.14845v1 ) ライセンス: Link先を確認	Lois Wong	(参考訳) 英語能力に限界がある個人にメンタルヘルスを提供すること(LEP)は、精神医学において迫る問題である。精神科医療の訓練を受けた人の大多数は英語話者であるため、LEP患者に与えられるメンタルヘルスケアの質は英語話者に提供されるものよりも著しく低い。メンタルヘルスケアの提供は、患者と医療提供者の間のコミュニケーションと理解に焦点を合わせており、物理的な医療の領域よりもはるかに多く、英語話者は、LEPのメタファのような比喩的な言語を理解できないことが多い。したがって、フィギュラティブ言語翻訳は、公平な精神医学的ケアを提供するのに有用である。現在、メタファーは精神的な問題に苦しむ個人を識別し、それらの個人が自分の経験を理解し、伝達するのを手助けすることの両方において最重要であることが示されている。そこで本稿は,精神医学領域における機械翻訳の可能性を調査し,既存の機械の伝達可能性やメタファー翻訳研究のさらなる研究の必要性を明らかにすることを目的とする。 Providing mental healthcare to individuals with limited English proficiency (LEP) remains a pressing problem within psychiatry. Because the majority of individuals trained in providing psychiatric care are English speakers, the quality of mental healthcare given to LEP patients is significantly lower than that provided for English speakers. The provision of mental healthcare is contingent on communication and understanding between the patient and healthcare provider, much more so than in the realm of physical healthcare, and English speakers are often unable to comprehend figurative language such as metaphors used by LEPs. Hence, Figurative Language Translation is invaluable to providing equitable psychiatric care. Now, metaphor has been shown to be paramount in both identifying individuals struggling with mental problems and helping those individuals understand and communicate their experiences. Therefore, this paper aims to survey the potential of Machine Translation for providing equitable psychiatric healthcare and highlights the need for further research on the transferability of existing machine and metaphor translation research in the domain of psychiatry.	翻訳日:2023-12-25 14:18:09 公開日:2023-12-22
# SusDevOps: ソフトウェアエンジニアリングの第一原則に持続可能性を促進する SusDevOps: Promoting Sustainability to a First Principle in Software Engineering ( http://arxiv.org/abs/2312.14843v1 ) ライセンス: Link先を確認	Istvan David	(参考訳) 持続性は現代のソフトウェアシステムの重要な特性になりつつある。持続可能なソフトウェア工学に関する知識は大幅に増えていますが、ソフトウェアデリバリライフサイクル内でサステナビリティ関連の活動を行うエンドツーエンドのフレームワークは欠落しています。この記事では、DevOpsコンテキストにおける第一原則への持続可能性を促進するSusDevOpsフレームワークを提案する。ソフトウェア開発スタートアップ企業を事例として,SusDevOpsのライフサイクルフェーズとテクニックを実演する。 Sustainability is becoming a key property of modern software systems. While there is a substantial and growing body of knowledge on engineering sustainable software, end-to-end frameworks that situate sustainability-related activities within the software delivery lifecycle are missing. In this article, we propose the SusDevOps framework that promotes sustainability to a first principle within a DevOps context. We demonstrate the lifecycle phases and techniques of SusDevOps through the case of a software development startup company.	翻訳日:2023-12-25 14:17:52 公開日:2023-12-22
# 旅行セールスマン問題に対するラグランジアン乗算器の学習 Learning Lagrangian Multipliers for the Travelling Salesman Problem ( http://arxiv.org/abs/2312.14836v1 ) ライセンス: Link先を確認	Augustin Parjadis, Quentin Cappart, Bistra Dilkina, Aaron Ferber, Louis-Martin Rousseau	(参考訳) ラグランジアン緩和(英: lagrangian relax)は、最適化問題における制約を緩和するために用いられる多目的数学の手法であり、双対境界の生成により、実現可能な解の最適性と、制約プログラミング(重み付き回路制約など)における効率的なプロパゲータの設計を証明できる。しかしながら、ラグランジアン乗法(例えば劣勾配法)を導出する従来の過程はしばしば計算集約的であり、大規模あるいは時間に敏感な問題に対する実用性を制限している。そこで本研究では,グラフニューラルネットワークの能力を活用して問題構造を活用し,精度の高いラグランジアン乗算器を効率的に生成することを目的とした,教師なし学習手法を提案する。この手法を、旅行セールスマン問題に対する有名なヘルド・カルプ・ラグランジアン緩和に適用する。中心となる考え方は、正確なラグランジアン乗算を予測し、ヘルド=カルプ緩和境界を生成するための暖かい出発点としてそれらを用いることである。これらの境界は、分岐とバウンドのアルゴリズムによって実行されるフィルタリングプロセスを強化するために使われる。実現可能な解を見つけることに焦点を当てた既存の文献の多くとは対照的に、我々のアプローチは両面で動作し、学習が最適性の証明を加速できることを示す。我々は,200都市までの事例を考慮し,メートル法トラベルセールスマン問題の様々な分布について実験を行う。その結果、本手法は、重み付き回路のグローバル制約のフィルタリングレベルを改善し、タイムアウトまでの未解決インスタンスに対する係数2による最適性ギャップを減らし、解決インスタンスの実行時間を10%削減できることを示した。 Lagrangian relaxation is a versatile mathematical technique employed to relax constraints in an optimization problem, enabling the generation of dual bounds to prove the optimality of feasible solutions and the design of efficient propagators in constraint programming (such as the weighted circuit constraint). However, the conventional process of deriving Lagrangian multipliers (e.g., using subgradient methods) is often computationally intensive, limiting its practicality for large-scale or time-sensitive problems. To address this challenge, we propose an innovative unsupervised learning approach that harnesses the capabilities of graph neural networks to exploit the problem structure, aiming to generate accurate Lagrangian multipliers efficiently. We apply this technique to the well-known Held-Karp Lagrangian relaxation for the travelling salesman problem. The core idea is to predict accurate Lagrangian multipliers and to employ them as a warm start for generating Held-Karp relaxation bounds. These bounds are subsequently utilized to enhance the filtering process carried out by branch-and-bound algorithms. In contrast to much of the existing literature, which primarily focuses on finding feasible solutions, our approach operates on the dual side, demonstrating that learning can also accelerate the proof of optimality. We conduct experiments across various distributions of the metric travelling salesman problem, considering instances with up to 200 cities. The results illustrate that our approach can improve the filtering level of the weighted circuit global constraint, reduce the optimality gap by a factor two for unsolved instances up to a timeout, and reduce the execution time for solved instances by 10%.	翻訳日:2023-12-25 14:17:44 公開日:2023-12-22
# リッチ中国語記述に基づくプロトタイプガイドによる人物検索 Prototype-Guided Text-based Person Search based on Rich Chinese Descriptions ( http://arxiv.org/abs/2312.14834v1 ) ライセンス: Link先を確認	Ziqiang Wu, Bingpeng Ma	(参考訳) テキストベース人物検索は,人物検出とテキストベース人物検索の統一課題と見なすことができる,未カットシーン画像からの問合せテキストに基づいて,対象人物のローカライズと識別を同時に行うことを目的としている。本研究では,広く利用されている人物検索データセットPRWに基づく大規模ベンチマークデータセットPRW-TPS-CNを提案する。私たちのデータセットには47,102の文が含まれています。これらのテキストは上から下までの人物像を正確に記述しており、これは自然な記述順序に従っている。また、より包括的な評価のために、私たちのデータセットに中国語と英語の記述も提供します。これらの特徴はデータセットをより適用しやすくします。個人検出とテキストに基づく人物検索の不整合を軽減するために,PRW-TPS-CNデータセットのリッチテキストを活用する。本研究では,複数のテキストをテキストプロトタイプとして集約して,人物の顕著なテキスト特徴を維持することを提案する。全体のプロトタイプは画像アテンションマップを生成し、テキストベースの人物検索の低下を引き起こす検出ミスアライメントを解消する。これにより、人物検出とテキストに基づく人物検索との矛盾が軽減される。 PRW-TPS-CNデータセットについて広範な実験を行った。実験の結果, PRW-TPS-CNデータセットの有効性と, 提案手法の最先端性能が示された。 Text-based person search aims to simultaneously localize and identify the target person based on query text from uncropped scene images, which can be regarded as the unified task of person detection and text-based person retrieval task. In this work, we propose a large-scale benchmark dataset named PRW-TPS-CN based on the widely used person search dataset PRW. Our dataset contains 47,102 sentences, which means there is quite more information than existing dataset. These texts precisely describe the person images from top to bottom, which in line with the natural description order. We also provide both Chinese and English descriptions in our dataset for more comprehensive evaluation. These characteristics make our dataset more applicable. To alleviate the inconsistency between person detection and text-based person retrieval, we take advantage of the rich texts in PRW-TPS-CN dataset. We propose to aggregate multiple texts as text prototypes to maintain the prominent text features of a person, which can better reflect the whole character of a person. The overall prototypes lead to generating the image attention map to eliminate the detection misalignment causing the decrease of text-based person retrieval. Thus, the inconsistency between person detection and text-based person retrieval is largely alleviated. We conduct extensive experiments on the PRW-TPS-CN dataset. The experimental results show the PRW-TPS-CN dataset's effectiveness and the state-of-the-art performance of our approach.	翻訳日:2023-12-25 14:17:15 公開日:2023-12-22
# 電界波の夢:拡散モデルを用いた心臓励起波の生成モデル Dreaming of Electrical Waves: Generative Modeling of Cardiac Excitation Waves using Diffusion Models ( http://arxiv.org/abs/2312.14830v1 ) ライセンス: Link先を確認	Tanish Baranwal, Jan Lebert, Jan Christoph	(参考訳) 心臓の電気波は、心房細動や心室細動などの不整脈が持続する間に回転する渦巻波またはスクロール波を形成する。波動力学は通常、励起媒質中の反応拡散ダイナミクスを記述する結合偏微分方程式を用いてモデル化される。最近では、物理的および生物学的システムにおいて時空間パターンを生成する代替として、データ駆動生成モデリングが出現している。本稿では,心筋組織における電磁波パターン生成モデルのための拡散確率モデルについて検討する。我々は、非条件および条件付き生成タスクにおいて、そのような波動パターンを生成できるように、模擬波動パターンを用いた拡散モデルを訓練した。例えば,表面2次元計測から3次元波動を再構成し,パラメータ固有ダイナミクスを進化・生成するなど,インパインティングタスクについて検討した。拡散生成溶液を生体物理モデルを用いて得られた溶液と比較し, 拡散モデルがスパイラル波とスクロール波のダイナミクスを再現することを学び, 心筋組織における励起波のモデリングのためのデータ駆動型アプローチとして機能することを発見した。例えば、心室細動(vf)ダイナミックスを瞬時に開始することが可能であり、ペーシングプロトコルを適用しなくてもウェーブブレイクを誘発できることがわかった。 vfダイナミクスは任意の心室ジオメトリーで生成でき、時間とともに進化することができる。しかし, 拡散モデルでは, 制約が不十分な場合, 波動パターンを「ハロシン化」することがわかった。これらの制限に拘わらず、拡散モデルは心不整脈研究や診断に多くの可能性を持つ興味深い強力なツールである。 Electrical waves in the heart form rotating spiral or scroll waves during life-threatening arrhythmias such as atrial or ventricular fibrillation. The wave dynamics are typically modeled using coupled partial differential equations, which describe reaction-diffusion dynamics in excitable media. More recently, data-driven generative modeling has emerged as an alternative to generate spatio-temporal patterns in physical and biological systems. Here, we explore denoising diffusion probabilistic models for the generative modeling of electrical wave patterns in cardiac tissue. We trained diffusion models with simulated electrical wave patterns to be able to generate such wave patterns in unconditional and conditional generation tasks. For instance, we explored inpainting tasks, such as reconstructing three-dimensional wave dynamics from superficial two-dimensional measurements, and evolving and generating parameter-specific dynamics. We characterized and compared the diffusion-generated solutions to solutions obtained with biophysical models and found that diffusion models learn to replicate spiral and scroll waves dynamics so well that they could serve as an alternative data-driven approach for the modeling of excitation waves in cardiac tissue. For instance, we found that it is possible to initiate ventricular fibrillation (VF) dynamics instantaneously without having to apply pacing protocols in order to induce wavebreak. The VF dynamics can be created in arbitrary ventricular geometries and can be evolved over time. However, we also found that diffusion models `hallucinate' wave patterns when given insufficient constraints. Regardless of these limitations, diffusion models are an interesting and powerful tool with many potential applications in cardiac arrhythmia research and diagnostics.	翻訳日:2023-12-25 14:16:51 公開日:2023-12-22
# Plan, Posture and Go: オープンワールドテキスト・ツー・モーション・ジェネレーションを目指して Plan, Posture and Go: Towards Open-World Text-to-Motion Generation ( http://arxiv.org/abs/2312.14828v1 ) ライセンス: Link先を確認	Jinpeng Liu, Wenxun Dai, Chunyu Wang, Yiji Cheng, Yansong Tang, Xin Tong	(参考訳) 従来のテキストからモーションへの生成法は通常、限られたテキストとモーションのペアで訓練されるため、オープンワールドシナリオへの一般化は困難である。 CLIPモデルを用いて動き空間とテキスト空間を整列し、自然言語の動作記述から動き生成を可能にする研究もある。しかし、それらは依然として限定的で非現実的な動きを発生させることに制限されている。これらの問題に対処するため,動作プランナ,姿勢ディフューザ,go-diffuser の3つのモジュールからなる PRO-Motion という分割型フレームワークを提案する。モーションプランナーは、大きな言語モデル(llm)に目標の動きにおける主要な姿勢を記述する一連のスクリプトを生成するよう指示する。自然言語とは異なり、スクリプトは、非常に単純なテキストテンプレートに従って、あらゆる可能な姿勢を記述できる。これにより、スクリプトを姿勢に変換する姿勢微分器の複雑さが大幅に減少し、オープンワールド生成への道が開ける。最後に、go-diffuserは別の拡散モデルとして実装され、すべての姿勢に対する全体翻訳と回転を推定し、現実的な動きをもたらす。実験により,本手法が他の手法よりも優れていることを示すとともに,複雑なオープンワールドプロンプトから多様で現実的な動作を生成できることを実証した。プロジェクトページはhttps://moonsliu.github.io/pro-motion。 Conventional text-to-motion generation methods are usually trained on limited text-motion pairs, making them hard to generalize to open-world scenarios. Some works use the CLIP model to align the motion space and the text space, aiming to enable motion generation from natural language motion descriptions. However, they are still constrained to generate limited and unrealistic in-place motions. To address these issues, we present a divide-and-conquer framework named PRO-Motion, which consists of three modules as motion planner, posture-diffuser and go-diffuser. The motion planner instructs Large Language Models (LLMs) to generate a sequence of scripts describing the key postures in the target motion. Differing from natural languages, the scripts can describe all possible postures following very simple text templates. This significantly reduces the complexity of posture-diffuser, which transforms a script to a posture, paving the way for open-world generation. Finally, go-diffuser, implemented as another diffusion model, estimates whole-body translations and rotations for all postures, resulting in realistic motions. Experimental results have shown the superiority of our method with other counterparts, and demonstrated its capability of generating diverse and realistic motions from complex open-world prompts such as "Experiencing a profound sense of joy". The project page is available at https://moonsliu.github.io/Pro-Motion.	翻訳日:2023-12-25 14:16:25 公開日:2023-12-22
# 量子実時間発展のためのテンソル正規化群法 Tensor Renormalization Group Methods for Quantum Real-time Evolution ( http://arxiv.org/abs/2312.14825v1 ) ライセンス: Link先を確認	Michael Hite and Yannick Meurice	(参考訳) 格子ゲージ理論における実時間発展のab-initio計算は、非常に興味深い応用であるが、計算の難解な側面を提示している。ユークリッド時間格子場理論の文脈で開発されたテンソル再正規化群法は, トロタライズ展開作用素のリアルタイム計算に応用できることを示す。本稿では,各種観測器の切断手順の最適化について検討する。この数値解法を1次元量子イジングモデルに適用し,順序相の外部横場を用いて計算を行い,$n_{s}=4$および8サイトの普遍量子計算と比較する。 Ab-initio calculations of real-time evolution for lattice gauge theory have very interesting potential applications but present challenging computational aspects. We show that tensor renormalization group methods developed in the context of Euclidean-time lattice field theory can be applied to calculation of Trotterized evolution operators at real time. We discuss the optimization of truncation procedures for various observables. We apply the numerical methods to the 1D Quantum Ising Model with an external transverse field in the ordered phase and compare with universal quantum computing for $N_{s}=4$ and 8 sites.	翻訳日:2023-12-25 14:16:00 公開日:2023-12-22
# 無信仰DRLとMCTSによる検査・保守計画の検討 An investigation of belief-free DRL and MCTS for inspection and maintenance planning ( http://arxiv.org/abs/2312.14824v1 ) ライセンス: Link先を確認	Daniel Koutas, Elizabeth Bismut, Daniel Straub	(参考訳) 本稿では,検査・保守(I&M)計画において発生するような,不確実性の下での逐次決定プロセスのための新しいDeep Reinforcement Learning(DRL)アーキテクチャを提案する。 I&M計画のための他のDRLアルゴリズムとは異なり、提案された+RQNアーキテクチャは信念状態の計算を不要とし、代わりに誤観測を直接処理する。このアルゴリズムは、劣化する一成分系の基本的なI&M計画問題に適用する。さらに,モンテカルロ木を用いたI&M問題探索の性能について検討し,+RQNと比較した。この比較は、2つの方法の結果のポリシーの統計分析と、信念空間におけるそれらの可視化を含む。 We propose a novel Deep Reinforcement Learning (DRL) architecture for sequential decision processes under uncertainty, as encountered in inspection and maintenance (I&M) planning. Unlike other DRL algorithms for (I&M) planning, the proposed +RQN architecture dispenses with computing the belief state and directly handles erroneous observations instead. We apply the algorithm to a basic I&M planning problem for a one-component system subject to deterioration. In addition, we investigate the performance of Monte Carlo tree search for the I&M problem and compare it to the +RQN. The comparison includes a statistical analysis of the two methods' resulting policies, as well as their visualization in the belief space.	翻訳日:2023-12-25 14:15:50 公開日:2023-12-22
# 部分データからの極性双対と量子共分散行列の再構成 Polar Duality and the Reconstruction of Quantum Covariance Matrices from Partial Data ( http://arxiv.org/abs/2312.14823v1 ) ライセンス: Link先を確認	Maurice A. de Gosson	(参考訳) 先行研究で導入されたラグランジアンとシンプレクティック極双対性の概念を用いて,量子共分散行列の再構成の問題に対処する。我々は、パウリの再構成問題を非自明に一般化するガウス量子状態に適用し、そのような状態の簡単なトモグラフィー的特徴を述べる。 We address the problem of the reconstruction of quantum covariance matrices using the notion of Lagrangian and symplectic polar duality introduced in previous work. We apply our constructions to Gaussian quantum states which leads to a non-trivial generalization of Pauli's reconstruction problem and we state a simple tomographic characterization of such states.	翻訳日:2023-12-25 14:15:37 公開日:2023-12-22
# 最適移動による自己注意の規則性理解 Understanding the Regularity of Self-Attention with Optimal Transport ( http://arxiv.org/abs/2312.14820v1 ) ライセンス: Link先を確認	Val\'erie Castin, Pierre Ablin, Gabriel Peyr\'e	(参考訳) トランスフォーマーとそのマルチヘッドアテンションメカニズムは、幅広いドメインで最先端のモデルを上回ることで、わずか数年でマシンラーニングの状況を完全に変えました。しかし、理論的な観点から彼らの堅牢性についてはほとんど分かっていない。ニューラルネットワークのロバスト性を測定する攻撃非依存的な方法を提供する,自己注意の局所的なリプシッツ定数を研究することで,この問題に対処する。入力をwasserstein距離を備えた確率測度として見ることにより,測定理論の枠組みを採用する。これにより、無限長の入力に対する注意を一般化し、コンパクト集合上の自己アテンションのリプシッツ定数の上界と下界を導出することができる。下限は先行結果を大幅に改善し、コンパクト集合の半径と指数関数的に増大し、入力空間に付加的な制約を伴わずに堅牢性保証を得る可能性を排除する。我々の結果は、高局所リプシッツ定数の測度は典型的にはいくつかのディラックから構成されており、非常に不均衡な質量分布であることも指摘している。最後に,指標数を変化させる摂動下での自己アテンションの安定性を解析し,測定理論の枠組みにおいて自然な問題と考えられる。特に、いくつかの入力に対して、トークンを摂動前に重複する攻撃は、単にトークンを移動させる攻撃よりも効率的であることを示す。この現象を質量分割と呼ぶ。 Transformers and their multi-head attention mechanism have completely changed the machine learning landscape in just a few years, by outperforming state-of-art models in a wide range of domains. Still, little is known about their robustness from a theoretical perspective. We tackle this problem by studying the local Lipschitz constant of self-attention, that provides an attack-agnostic way of measuring the robustness of a neural network. We adopt a measure-theoretic framework, by viewing inputs as probability measures equipped with the Wasserstein distance. This allows us to generalize attention to inputs of infinite length, and to derive an upper bound and a lower bound on the Lipschitz constant of self-attention on compact sets. The lower bound significantly improves prior results, and grows more than exponentially with the radius of the compact set, which rules out the possibility of obtaining robustness guarantees without any additional constraint on the input space. Our results also point out that measures with a high local Lipschitz constant are typically made of a few diracs, with a very unbalanced distribution of mass. Finally, we analyze the stability of self-attention under perturbations that change the number of tokens, which appears to be a natural question in the measure-theoretic framework. In particular, we show that for some inputs, attacks that duplicate tokens before perturbing them are more efficient than attacks that simply move tokens. We call this phenomenon mass splitting.	翻訳日:2023-12-25 14:15:30 公開日:2023-12-22
# 光制御単一分子によるフォノン寿命の増大 Enhanced phonon lifetimes with optically controlled single molecules ( http://arxiv.org/abs/2312.14819v1 ) ライセンス: Link先を確認	Victor Ceban and Mihai A. Macovei	(参考訳) 有機結晶からなるメカニカル共振器に埋め込まれた単一分子のフォノンダイナミクスについて検討した。システム全体が、バッドキャビティ限界内の光共振器に配置される。分子集団の光制御がフォノンダイナミクスに影響を及ぼすことが判明した。長寿命フォノンは遷移周波数の変調によって分子の崩壊ダイナミクスを減速させるときに得られる。実験結果は,他の2レベルエミッタと機械共振器を用いたオプティメカルセットアップにも有効である。 We have investigated the phonon dynamics of a single-molecule embedded in a mechanical resonator made of an organic crystal. The whole system is placed in an optical resonator within the bad cavity limit. We have found that the optical control of the molecular population affects the phonon dynamics. Long-lived phonons are obtained when slowing-down the decay dynamics of the molecule via modulation of the transition frequency. The discussed results are also valid for optomechanical setups based on other types of two-level emitters and mechanical resonators.	翻訳日:2023-12-25 14:15:06 公開日:2023-12-22
# FAST:ブラックボックス生成モデルにおける弱学習のための類似性認識 FAST: Feature Aware Similarity Thresholding for Weak Unlearning in Black-Box Generative Models ( http://arxiv.org/abs/2312.14895v1 ) ライセンス: Link先を確認	Subhodip Panda, Prathosh AP	(参考訳) プライバシーに関する懸念の高まりと規制枠組みの遵守によって推進される、深層生成モデルの規制の強化は、これらのモデルに対する正確な制御メカニズムの必要性を強調する。この緊急性は特に、否定的、攻撃的、または潜在的に有害なコンテンツを含む生成モデルが出力を生成する事例によって強調される。これに対し、機械学習は特定の知識を選択的に忘れたり、事前訓練されたモデルから望ましくないデータサブセットの影響を取り除いたりする。しかし、現代の機械学習のアプローチでは、学習中にモデルパラメータやアーキテクチャの詳細へのアクセスを想定することが多い。下流タスクでは、これらのモデルはブラックボックスシステムとして機能し、アクセシブルな事前訓練パラメータ、アーキテクチャ、トレーニングデータを持つ。このようなシナリオでは、望ましくない出力をフィルタリングする可能性も現実的な代替となる。第一に,フィルタリングと未学習プロセスの関係を明らかにすること,第二に,ブラックボックスシステムとして特徴付けられるモデルから生成された望ましくない出力の表示を緩和する手法を定式化することである。本研究における理論的分析は,ブラックボックスモデルの文脈において,フィルタリングが弱いアンラーニングの一形態であることを示す。提案手法は,潜在空間における不必要な特徴の表現を体系的に符号化することにより,望ましくない出力を効果的に抑制する。 The heightened emphasis on the regulation of deep generative models, propelled by escalating concerns pertaining to privacy and compliance with regulatory frameworks, underscores the imperative need for precise control mechanisms over these models. This urgency is particularly underscored by instances in which generative models generate outputs that encompass objectionable, offensive, or potentially injurious content. In response, machine unlearning has emerged to selectively forget specific knowledge or remove the influence of undesirable data subsets from pre-trained models. However, modern machine unlearning approaches typically assume access to model parameters and architectural details during unlearning, which is not always feasible. In multitude of downstream tasks, these models function as black-box systems, with inaccessible pre-trained parameters, architectures, and training data. In such scenarios, the possibility of filtering undesired outputs becomes a practical alternative. The primary goal of this study is twofold: first, to elucidate the relationship between filtering and unlearning processes, and second, to formulate a methodology aimed at mitigating the display of undesirable outputs generated from models characterized as black-box systems. Theoretical analysis in this study demonstrates that, in the context of black-box models, filtering can be seen as a form of weak unlearning. Our proposed \textbf{\textit{Feature Aware Similarity Thresholding(FAST)}} method effectively suppresses undesired outputs by systematically encoding the representation of unwanted features in the latent space.	翻訳日:2023-12-25 14:09:17 公開日:2023-12-22
# DRStageNet: 基礎画像からの糖尿病網膜症の深層学習 DRStageNet: Deep Learning for Diabetic Retinopathy Staging from Fundus Images ( http://arxiv.org/abs/2312.14891v1 ) ライセンス: Link先を確認	Yevgeniy Men, Jonathan Fhima, Leo Anthony Celi, Lucas Zago Ribeiro, Luis Filipe Nakayama, Joachim A. Behar	(参考訳) 糖尿病網膜症(英: Diabetic retinopathy, DR)は、糖尿病の合併症である。視覚障害の予防にはタイムリーな識別が不可欠である。近年,デジタルファウンダス画像(DFI)からのDRステージングアルゴリズムが提案されている。しかし、モデルがトレーニングされたソースドメインと、それがデプロイされるターゲットドメインとの間の分散シフトのために、モデルはしばしば一般化できない。ソースドメインとターゲットドメインが完全にオーバーラップしていない場合、一般的な、特に難しいシフトが発生する。本研究では,この課題を軽減するために設計されたディープラーニングモデルDRStageNetを紹介する。我々は, 患者人口, 民族, 地理的起源, コンコービデンスをカバーする合計93,534のDFIを含む7つの公開データセットを使用した。我々は、自己教師型視覚変換器の事前訓練モデルであるDINOv2を微調整し、一般化性能を高めるためにマルチソース領域の微調整戦略を実装した。我々は,最近発表された基盤モデルを含む2つの最先端ベンチマークに対して,本手法の優位性をベンチマークし,実証する。我々は, 分解能の高いヒートマップを提供するために, grad-rollout法を回帰タスクに適用した。誤差解析の結果,主誤差の59\%が不正な参照ラベルであった。 DRStageNetはURL[原稿の受け入れ]でアクセスできます。 Diabetic retinopathy (DR) is a prevalent complication of diabetes associated with a significant risk of vision loss. Timely identification is critical to curb vision impairment. Algorithms for DR staging from digital fundus images (DFIs) have been recently proposed. However, models often fail to generalize due to distribution shifts between the source domain on which the model was trained and the target domain where it is deployed. A common and particularly challenging shift is often encountered when the source- and target-domain supports do not fully overlap. In this research, we introduce DRStageNet, a deep learning model designed to mitigate this challenge. We used seven publicly available datasets, comprising a total of 93,534 DFIs that cover a variety of patient demographics, ethnicities, geographic origins and comorbidities. We fine-tune DINOv2, a pretrained model of self-supervised vision transformer, and implement a multi-source domain fine-tuning strategy to enhance generalization performance. We benchmark and demonstrate the superiority of our method to two state-of-the-art benchmarks, including a recently published foundation model. We adapted the grad-rollout method to our regression task in order to provide high-resolution explainability heatmaps. The error analysis showed that 59\% of the main errors had incorrect reference labels. DRStageNet is accessible at URL [upon acceptance of the manuscript].	翻訳日:2023-12-25 14:08:50 公開日:2023-12-22
# NPHardEval: 複雑性クラスによる大規模言語モデルの推論能力の動的ベンチマーク NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes ( http://arxiv.org/abs/2312.14890v1 ) ライセンス: Link先を確認	Lizhou Fan, Wenyue Hua, Lingyao Li, Haoyang Ling, Yongfeng Zhang, Libby Hemphill	(参考訳) 複雑な推論能力は、現在のLLMの最も重要な特徴の1つであり、複雑な意思決定タスクにおいて重要な役割を果たすために利用されてきた。したがって,LLMの推論能力を評価するために,大規模言語モデル (LLM) の推論能力に関する多くのベンチマークが確立されている。しかし、現在のベンチマークはLLMが達成できる推論能力の全範囲を厳格に評価する上で不十分である。これらのベンチマークは公開アクセス可能で静的であるため、モデルが特定のベンチマークメトリクスに対する応答を調整できる可能性があり、その結果、パフォーマンスが増大する。これらの制限に対処するため、我々の研究は NPHardEval という新しいベンチマークを導入した。このベンチマークは、900のアルゴリズム質問の範囲でLLMの推論能力を評価し、NP-Hard複雑性クラスまで拡張するように設計されている。これらの質問は、NPハード複雑性クラス以下の幅広い複雑性クラスを表現するために慎重に選ばれ、LLMの推論能力の厳密な測度を提供する。本研究では,LLMにおける推論の現況に光を当て,複雑なクラス間でのLLMの性能の比較を通して,客観的かつ厳密な視点を提供する。さらに、このベンチマークは動的更新メカニズムで設計されており、データポイントは毎月更新される。このような定期的な更新は、ベンチマークに過剰に適合するllmのリスクを緩和し、より正確で信頼性の高い推論能力の評価を促進する上で、重要な役割を果たす。 NPHardEvalのベンチマークデータセットとコードはhttps://github.com/casmlab/NPHardEvalで公開されている。 Complex reasoning ability is one of the most important features of current LLMs, which has also been leveraged to play an integral role in complex decision-making tasks. Therefore, the investigation into the reasoning capabilities of Large Language Models (LLMs) is critical: numerous benchmarks have been established to assess the reasoning abilities of LLMs. However, current benchmarks are inadequate in offering a rigorous evaluation of the full extent of reasoning abilities that LLMs are capable of achieving. They are also prone to the risk of overfitting, as these benchmarks, being publicly accessible and static, allow models to potentially tailor their responses to specific benchmark metrics, thereby inflating their performance. Addressing these limitations, our research introduces a new benchmark, named NPHardEval. This benchmark is designed to evaluate the reasoning abilities of LLMs across a broad spectrum of 900 algorithmic questions, extending up to the NP-Hard complexity class. These questions are meticulously chosen to represent a wide range of complexity class below the NP-hard complexity class, offering a rigorous measure of the reasoning ability of LLMs. Through this study, we shed light on the current state of reasoning in LLMs, providing an objective and rigorous perspective through the comparison of LLMs' performance across complex classes. Moreover, this benchmark is designed with a dynamic update mechanism, where the datapoints are refreshed on a monthly basis. Such regular updates play a crucial role in mitigating the risk of LLMs overfitting to the benchmark, promoting a more accurate and reliable assessment of their reasoning capabilities. The benchmark dataset and code of NPHardEval are available at https://github.com/casmlab/NPHardEval.	翻訳日:2023-12-25 14:08:30 公開日:2023-12-22
# 非個人データと個人データからのレート最適分類について On rate-optimal classification from non-private and from private data ( http://arxiv.org/abs/2312.14889v1 ) ライセンス: Link先を確認	Bal\'azs Csan\'ad Cs\'aji, L\'aszl\'o Gy\"orfi, Ambrus Tam\'as	(参考訳) 本稿では,古典的な分類問題を再考するが,プライバシー制約を課す。このような制約下では、生データ$(X_1,Y_1),\ldots,(X_n,Y_n)$を直接観察することはできず、全ての分類器は適切な局所微分プライバシー機構のランダム化結果の関数である。統計学者は、このプライバシーメカニズムの形式を自由に選択でき、ここでは、各特徴ベクトルの位置情報とラベルの$Y_i$の区別にLaplace分散ノイズを追加します。分類規則は、よく研究された分割分類規則の民営化版である。標準のリプシッツ条件とマージン条件に加えて、非プライベートデータとプライベートデータの両方に対して、分類誤差確率の正確な収束率を計算する新しい特徴が導入された。 In this paper we revisit the classical problem of classification, but impose privacy constraints. Under such constraints, the raw data $(X_1,Y_1),\ldots,(X_n,Y_n)$ cannot be directly observed, and all classifiers are functions of the randomised outcome of a suitable local differential privacy mechanism. The statistician is free to choose the form of this privacy mechanism, and here we add Laplace distributed noise to a discretisation of the location of each feature vector $X_i$ and to its label $Y_i$. The classification rule is the privatized version of the well-studied partitioning classification rule. In addition to the standard Lipschitz and margin conditions, a novel characteristic is introduced, by which the exact rate of convergence of the classification error probability is calculated, both for non-private and private data.	翻訳日:2023-12-25 14:07:44 公開日:2023-12-22
# 共分散カーネルからのガウス過程のサンプルパス規則性 Sample Path Regularity of Gaussian Processes from the Covariance Kernel ( http://arxiv.org/abs/2312.14886v1 ) ライセンス: Link先を確認	Natha\"el Da Costa, Marvin Pf\"ortner, Lancelot Da Costa, Philipp Hennig	(参考訳) ガウス過程 (GPs) は函数空間上の確率分布を定義するための最も一般的な形式である。 GPの応用は無数であるが、GPサンプルパスの包括的理解、すなわち確率測度を定義する関数空間は不足している。実際には、GPは確率測度によってではなく、平均関数と共分散核によって構成される。本稿では,対応するgpのサンプルパスに対する共分散核について,与えられた正則性を達成するための必要十分条件を与える。定常および等方的 GP の場合をさらに単純化する、特に直感的な条件を与えるため、H\"古い正則性の枠組みを用いる。そして,この結果により,Mat\'ern GP などの機械学習アプリケーションでよく用いられるGPのサンプルパス規則性の,新規かつ異常に厳密な特徴付けが可能であることを示す。 Gaussian processes (GPs) are the most common formalism for defining probability distributions over spaces of functions. While applications of GPs are myriad, a comprehensive understanding of GP sample paths, i.e. the function spaces over which they define a probability measure on, is lacking. In practice, GPs are not constructed through a probability measure, but instead through a mean function and a covariance kernel. In this paper we provide necessary and sufficient conditions on the covariance kernel for the sample paths of the corresponding GP to attain a given regularity. We use the framework of H\"older regularity as it grants us particularly straightforward conditions, which simplify further in the cases of stationary and isotropic GPs. We then demonstrate that our results allow for novel and unusually tight characterisations of the sample path regularities of the GPs commonly used in machine learning applications, such as the Mat\'ern GPs.	翻訳日:2023-12-25 14:07:21 公開日:2023-12-22
# ランジュバン拡散を用いた多様体上のサンプリングと推定 Sampling and estimation on manifolds using the Langevin diffusion ( http://arxiv.org/abs/2312.14882v1 ) ライセンス: Link先を確認	Karthik Bharath, Alexander Lewis, Akash Sharma, Michael V Tretyakov	(参考訳) 誤差境界は、コンパクトリーマン多様体上の不変測度 $d\mu_\phi \propto e^{-\phi} \mathrm{dvol}_g $ で本質的に定義されたランゲヴィン拡散の離散化を用いてサンプリングと推定のために導出される。離散化されたマルコフ過程に基づく$\mu_\phi $の線形汎関数の2つの推定器は、単一の軌跡に基づく時間分解推定器と、複数の独立軌跡に基づくアンサンブル吸収推定器である。離散化ステップサイズにおける$\phi$, first-order error bounds の平滑性という名目上のレベル以上の制限を課すことなく、両方の推定子のバイアスと分散を導出する。誤差の順序はユークリッド空間と平坦空間の最適速度と一致し、不変測度 $\mu_\phi$ と離散化されたマルコフ過程の定常測度の間の距離上の一階境界につながる。 2つの偏微分方程式とランジュバン拡散に対応する作用素の半群との関係を利用した証明技術の一般性は、ランジュバン拡散に関連するより一般的なサンプリングアルゴリズムの研究に役立てることができる。非コンパクト多様体の場合への解析の拡張条件について述べる。正曲率と負曲率の多様体上の分布、対コンケーブ、その他の数値的挿絵は導出境界上で解明され、サンプリングアルゴリズムの実用的有用性を示す。 Error bounds are derived for sampling and estimation using a discretization of an intrinsically defined Langevin diffusion with invariant measure $d\mu_\phi \propto e^{-\phi} \mathrm{dvol}_g $ on a compact Riemannian manifold. Two estimators of linear functionals of $\mu_\phi $ based on the discretized Markov process are considered: a time-averaging estimator based on a single trajectory and an ensemble-averaging estimator based on multiple independent trajectories. Imposing no restrictions beyond a nominal level of smoothness on $\phi$, first-order error bounds, in discretization step size, on the bias and variances of both estimators are derived. The order of error matches the optimal rate in Euclidean and flat spaces, and leads to a first-order bound on distance between the invariant measure $\mu_\phi$ and a stationary measure of the discretized Markov process. Generality of the proof techniques, which exploit links between two partial differential equations and the semigroup of operators corresponding to the Langevin diffusion, renders them amenable for the study of a more general class of sampling algorithms related to the Langevin diffusion. Conditions for extending analysis to the case of non-compact manifolds are discussed. Numerical illustrations with distributions, log-concave and otherwise, on the manifolds of positive and negative curvature elucidate on the derived bounds and demonstrate practical utility of the sampling algorithm.	翻訳日:2023-12-25 14:06:31 公開日:2023-12-22
# SutraNets: 時系列・確率予測のためのサブシリーズ自動回帰ネットワーク SutraNets: Sub-series Autoregressive Networks for Long-Sequence, Probabilistic Forecasting ( http://arxiv.org/abs/2312.14880v1 ) ライセンス: Link先を確認	Shane Bergsma, Timothy Zeyl, Lei Guo	(参考訳) 本稿では,長周期時系列のニューラル確率予測のための新しい手法であるkyoNetsを提案する。経網は自己回帰生成モデルを用いて、長い列の確率を条件付き確率の積に分解する。長いシーケンスを生成する場合、ほとんどの自己回帰的アプローチは有害なエラー蓄積と長距離依存関係のモデリングにおける課題に苦しむ。低周波サブシリーズに対する長変量予測を多変量予測として扱う。自己回帰は時間とサブシリーズをまたいで進行し、コヒーレントな多変量(そして、それゆえ高周波不変量)出力を保証する。サブシリーズは少ないステップで生成できるため、リガネットはエラー蓄積や信号経路距離を効果的に削減する。 6つの実世界のデータセットにおける競合の代替案よりも予測精度が大幅に向上し、サブシリーズの数を変動させ、基礎となるシーケンスモデルの深さと幅をスケールする。 We propose SutraNets, a novel method for neural probabilistic forecasting of long-sequence time series. SutraNets use an autoregressive generative model to factorize the likelihood of long sequences into products of conditional probabilities. When generating long sequences, most autoregressive approaches suffer from harmful error accumulation, as well as challenges in modeling long-distance dependencies. SutraNets treat long, univariate prediction as multivariate prediction over lower-frequency sub-series. Autoregression proceeds across time and across sub-series in order to ensure coherent multivariate (and, hence, high-frequency univariate) outputs. Since sub-series can be generated using fewer steps, SutraNets effectively reduce error accumulation and signal path distances. We find SutraNets to significantly improve forecasting accuracy over competitive alternatives on six real-world datasets, including when we vary the number of sub-series and scale up the depth and width of the underlying sequence models.	翻訳日:2023-12-25 14:05:40 公開日:2023-12-22
# Pangu-Agent:構造化推論による微調整可能なジェネリストエージェント Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning ( http://arxiv.org/abs/2312.14878v1 ) ライセンス: Link先を確認	Filippos Christianos, Georgios Papoudakis, Matthieu Zimmer, Thomas Coste, Zhihao Wu, Jingxuan Chen, Khyati Khandelwal, James Doran, Xidong Feng, Jiacheng Liu, Zheng Xiong, Yicheng Luo, Jianye Hao, Kun Shao, Haitham Bou-Ammar, Jun Wang	(参考訳) 人工知能(AI)エージェントを作成するための重要な方法は強化学習(RL)である。しかし、認識を行動にマッピングするスタンドアロンのRLポリシーの構築は、主に複数のタスクにまたがる汎用性の欠如と、大量のトレーニングデータの必要性など、深刻な問題に直面している。主な原因は、政策を策定する際、事前情報を知覚行動サイクルに効果的に統合できないことである。大規模言語モデル(LLM)は、クロスドメイン知識をAIエージェントに組み込む基本的な方法として登場したが、特定の決定問題に対する重要な学習と適応は欠如している。本稿では、構造化推論をAIエージェントのポリシーに統合し学習するための一般的なフレームワークモデルを提案する。私たちの方法論は、人間の脳にあるモジュラリティによって動機付けられています。このフレームワークは、内在的および外在的関数の構築を利用して、推論構造に関する以前の理解を追加する。また、認知プロセスのモジュール構造と一致して、すべてのモジュールや関数内でモデルを学習する適応能力も提供する。フレームワークの詳細を説明し、他のAIパイプラインや既存のフレームワークと比較する。本稿では,本手法の有効性を示す実験を取り上げ,実用的応用について検討する。この結果から,組織的推論や事前知識が組み込まれている場合,AIエージェントの動作と適応性が向上することが示唆された。これにより、レジリエントで一般的なaiエージェントシステムへのドアが開く。 A key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL). However, constructing a standalone RL policy that maps perception to action directly encounters severe problems, chief among them being its lack of generality across multiple tasks and the need for a large amount of training data. The leading cause is that it cannot effectively integrate prior information into the perception-action cycle when devising the policy. Large language models (LLMs) emerged as a fundamental way to incorporate cross-domain knowledge into AI agents but lack crucial learning and adaptation toward specific decision problems. This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies. Our methodology is motivated by the modularity found in the human brain. The framework utilises the construction of intrinsic and extrinsic functions to add previous understandings of reasoning structures. It also provides the adaptive ability to learn models inside every module or function, consistent with the modular structure of cognitive processes. We describe the framework in-depth and compare it with other AI pipelines and existing frameworks. The paper explores practical applications, covering experiments that show the effectiveness of our method. Our results indicate that AI agents perform and adapt far better when organised reasoning and prior knowledge are embedded. This opens the door to more resilient and general AI agent systems.	翻訳日:2023-12-25 14:05:22 公開日:2023-12-22
# 社会的選択理論を用いた大規模言語モデルからのロバスト知識抽出 Robust Knowledge Extraction from Large Language Models using Social Choice Theory ( http://arxiv.org/abs/2312.14877v1 ) ライセンス: Link先を確認	Nico Potyka, Yuqicheng Zhu, Yunjie He, Evgeny Kharlamov, Steffen Staab	(参考訳) 大規模言語モデル(llm)は、会話エージェント、クリエイティブライティング、テキストの改善、一般的なクエリ応答など、幅広いアプリケーションをサポートする可能性がある。しかし、ランダムに答えを生成し、答えは通常堅牢ではないため、医学のような高レベルのドメインでのクエリ応答には適していない。 LLMクエリのロバスト性を改善するために,ランク付けクエリを繰り返し使用し,ソーシャル選択理論の手法を用いてクエリを集約する手法を提案する。医学的診断や障害診断などの診断環境におけるランキングクエリについて検討し、文献からの部分ボルダ選択関数が複数のクエリ結果のマージにどのように適用できるかについて議論する。我々は、我々の設定におけるいくつかの興味深い特性について論じ、我々のアプローチの堅牢性を実証的に評価する。 Large-language models (LLMs) have the potential to support a wide range of applications like conversational agents, creative writing, text improvement, and general query answering. However, they are ill-suited for query answering in high-stake domains like medicine because they generate answers at random and their answers are typically not robust - even the same query can result in different answers when prompted multiple times. In order to improve the robustness of LLM queries, we propose using ranking queries repeatedly and to aggregate the queries using methods from social choice theory. We study ranking queries in diagnostic settings like medical and fault diagnosis and discuss how the Partial Borda Choice function from the literature can be applied to merge multiple query results. We discuss some additional interesting properties in our setting and evaluate the robustness of our approach empirically.	翻訳日:2023-12-25 14:04:58 公開日:2023-12-22
# 進化プログラム合成によるマルチグリッド手法設計の自動化 Automating the Design of Multigrid Methods with Evolutionary Program Synthesis ( http://arxiv.org/abs/2312.14875v1 ) ライセンス: Link先を確認	Jonas Schmitt	(参考訳) 自然の最も基本的な法則の多くは偏微分方程式(PDE)として定式化することができる。したがって、これらの方程式を理解することは近代科学と工学の多くの分野において非常に重要である。しかし、多くのPDEの一般解は不明であるため、これらの方程式の効率的な近似解は人類の最大の課題の一つである。マルチグリッドはPDEを数値的に解く最も効果的な方法の1つであるが、多くの場合、効率的もしくは少なくとも動作するマルチグリッドソルバの設計はオープンな問題である。この論文は、進化プログラム合成手法である文法誘導型遺伝的プログラミングが、高い効率性と一般化を達成する前例のない構造のマルチグリッド法を発見できることを証明している。そこで我々は,同じマルチグリッド型ソルバを内部構造を適応させることなく,異なるサイズの問題に適用することが可能な,記号的に操作可能な形式言語におけるマルチグリッドメソッドの自動生成を実現する,新しい文脈自由文法を開発した。効率的なマルチグリッド手法の自動設計をプログラム合成タスクとして扱うことで、異なる平滑化と粗いグリッド補正ステップの組み合わせを含む、マルチグリッド操作の新しいシーケンスを離散化階層の各レベルで見つけることができる。このアプローチの実現可能性を証明するため,PythonフレームワークであるEvoStencilsの形で実装されている。この実装は、pythonオブジェクトの有向非巡回グラフの形でマルチグリッドメソッドのアルゴリズムシーケンスを表現することから、コード生成フレームワークexastencilsと進化的計算ライブラリdeapの機能を使った自動生成と最適化までの全ステップを含んでいる。 Many of the most fundamental laws of nature can be formulated as partial differential equations (PDEs). Understanding these equations is, therefore, of exceptional importance for many branches of modern science and engineering. However, since the general solution of many PDEs is unknown, the efficient approximate solution of these equations is one of humanity's greatest challenges. While multigrid represents one of the most effective methods for solving PDEs numerically, in many cases, the design of an efficient or at least working multigrid solver is an open problem. This thesis demonstrates that grammar-guided genetic programming, an evolutionary program synthesis technique, can discover multigrid methods of unprecedented structure that achieve a high degree of efficiency and generalization. For this purpose, we develop a novel context-free grammar that enables the automated generation of multigrid methods in a symbolically-manipulable formal language, based on which we can apply the same multigrid-based solver to problems of different sizes without having to adapt its internal structure. Treating the automated design of an efficient multigrid method as a program synthesis task allows us to find novel sequences of multigrid operations, including the combination of different smoothing and coarse-grid correction steps on each level of the discretization hierarchy. To prove the feasibility of this approach, we present its implementation in the form of the Python framework EvoStencils, which is freely available as open-source software. This implementation comprises all steps from representing the algorithmic sequence of a multigrid method in the form of a directed acyclic graph of Python objects to its automatic generation and optimization using the capabilities of the code generation framework ExaStencils and the evolutionary computation library DEAP.	翻訳日:2023-12-25 14:04:43 公開日:2023-12-22
# brainvis:画像再構成による脳と視覚信号の橋渡しを探索する BrainVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction ( http://arxiv.org/abs/2312.14871v1 ) ライセンス: Link先を確認	Honghao Fu, Zhiqi Shen, Jing Jih Chin, Hao Wang	(参考訳) 脳信号からの視覚刺激の分析と再構成は、人間の視覚系の理解を効果的に進める。しかし、脳波信号は複雑であり、大量のノイズを含んでいる。これは、脳波埋め込みを細かな意味情報と整合させることの難しさや、トレーニングのために追加の大規模な自己収集データセットに依存することなど、脳波からの視覚刺激再構成の既存の作業に実質的な制限をもたらす。これらの課題に対処するために、BrainVisと呼ばれる新しいアプローチを提案する。まず,脳波信号を様々な単位に分割し,学習難易度を高めるため,脳波の時間領域特性を自己監督的に取得する手法を提案する。さらに,脳波の表現性を高めるために周波数領域機能を利用することも提案する。次に,脳波の時間-周波数埋め込みとCLIP空間の粗いセマンティクスと微粒なセマンティクスの補間を同時に調整し,一次視覚成分の強調と相互アライメントの困難さを低減する。最後に,カスケード拡散モデルを用いて画像の再構成を行う。提案したBrainVisは,意味的忠実度復元と生成品質の両面で,芸術の状態を上回ります。特に、トレーニングデータスケールを以前の作業の10%に削減しました。 Analyzing and reconstructing visual stimuli from brain signals effectively advances understanding of the human visual system. However, the EEG signals are complex and contain a amount of noise. This leads to substantial limitations in existing works of visual stimuli reconstruction from EEG, such as difficulties in aligning EEG embeddings with the fine-grained semantic information and a heavy reliance on additional large self-collected dataset for training. To address these challenges, we propose a novel approach called BrainVis. Firstly, we divide the EEG signals into various units and apply a self-supervised approach on them to obtain EEG time-domain features, in an attempt to ease the training difficulty. Additionally, we also propose to utilize the frequency-domain features to enhance the EEG representations. Then, we simultaneously align EEG time-frequency embeddings with the interpolation of the coarse and fine-grained semantics in the CLIP space, to highlight the primary visual components and reduce the cross-modal alignment difficulty. Finally, we adopt the cascaded diffusion models to reconstruct images. Our proposed BrainVis outperforms state of the arts in both semantic fidelity reconstruction and generation quality. Notably, we reduce the training data scale to 10% of the previous work.	翻訳日:2023-12-25 14:04:14 公開日:2023-12-22
# 財務報告の数値推論 Numerical Reasoning for Financial Reports ( http://arxiv.org/abs/2312.14870v1 ) ライセンス: Link先を確認	Abhinav Arun and Ashish Dhiman and Mehul Soni and Yibei Hu	(参考訳) 財務報告は、会社の運用に関する重要な洞察を提供するが、一般的に3040ページに及ぶ広範な報告書は、ダイナミックマーケットにおける迅速な意思決定の課題を提起している。この問題に対処するために、我々は、これらのレポートに基づく質問から重要な指標と運用メトリクスを抽出するために、微調整されたLarge Language Models (LLMs)を活用しました。我々は、重要なデータを見つける方法を考案し、FinQAデータセットを利用してLlama-2 7BとT5モデルの両方を微調整し、質問応答をカスタマイズした。我々は,数値推論と計算における競合精度である最終数値解のベースラインに匹敵する結果を得た。 Financial reports offer critical insights into a company's operations, yet their extensive length typically spanning 30 40 pages poses challenges for swift decision making in dynamic markets. To address this, we leveraged finetuned Large Language Models (LLMs) to distill key indicators and operational metrics from these reports basis questions from the user. We devised a method to locate critical data, and leverage the FinQA dataset to fine-tune both Llama-2 7B and T5 models for customized question answering. We achieved results comparable to baseline on the final numerical answer, a competitive accuracy in numerical reasoning and calculation.	翻訳日:2023-12-25 14:03:52 公開日:2023-12-22
# 時空間線形:普遍的多変量時系列予測に向けて Spatiotemporal-Linear: Towards Universal Multivariate Time Series Forecasting ( http://arxiv.org/abs/2312.14869v1 ) ライセンス: Link先を確認	Aiyinsi Zuo, Haixi Zhang, Zirui Li, Ce Zheng	(参考訳) 複雑な多変量時系列予測(TSF)の分野において、一般的なテクニックは、トランスフォーマーベースの設計からリカレントニューラルネットワークまで、複雑なディープラーニングアーキテクチャに依存することが多い。しかし、近年の知見から、単純な線形モデルは多様なデータセットの洗練された構成を克服できることが示唆されている。これらのモデルは、観測を複数の将来の時間ステップに直接マッピングし、反復的多段階予測における誤差蓄積を最小限にする。しかし、これらのモデルはデータに空間的および時間的情報を組み込むことができず、洞察に富んだ予測を導くパターンや依存関係を捉えるのに重要である。この監視は、特に特定のシーケンス長とデータセット条件下でのパフォーマンスボトルネックを招き、その普遍的な適用を妨げます。これに対して,STL(SpatioTemporal-Linear)フレームワークを提案する。 STLは、Linearベースのアーキテクチャを拡張するために、時間組込みと空間インフォームドのバイパスをシームレスに統合する。これらの余分なルートはデータに対するより堅牢で洗練された回帰を提供し、特に観測量に制限があり、依存関係をキャプチャする単純な線形レイヤの容量が減少する。実証的な証拠は、STLの長所を強調し、さまざまな観測期間と予測期間とデータセットにわたって、線形とトランスフォーマーのベンチマークを上回っている。このような堅牢性は、トラフィックの軌跡やまれな疾患の進行予測などを含む、さまざまな応用分野にまたがる適合性を強調する。この談話を通じて、深層学習技術を用いた多変量時系列予測において、STLの特異な能力がより一般的なパラダイムとなることを検証するだけでなく、普遍的なアプリケーションのためのデータスカース予測シナリオに取り組む必要性も強調する。コードは利用可能になる。 Within the field of complicated multivariate time series forecasting (TSF), popular techniques frequently rely on intricate deep learning architectures, ranging from transformer-based designs to recurrent neural networks. However, recent findings suggest that simple Linear models can surpass sophisticated constructs on diverse datasets. These models directly map observation to multiple future time steps, thereby minimizing error accumulation in iterative multi-step prediction. Yet, these models fail to incorporate spatial and temporal information within the data, which is critical for capturing patterns and dependencies that drive insightful predictions. This oversight often leads to performance bottlenecks, especially under specific sequence lengths and dataset conditions, preventing their universal application. In response, we introduce the SpatioTemporal-Linear (STL) framework. STL seamlessly integrates time-embedded and spatially-informed bypasses to augment the Linear-based architecture. These extra routes offer a more robust and refined regression to the data, particularly when the amount of observation is limited and the capacity of simple linear layers to capture dependencies declines. Empirical evidence highlights STL's prowess, outpacing both Linear and Transformer benchmarks across varied observation and prediction durations and datasets. Such robustness accentuates its suitability across a spectrum of applications, including but not limited to, traffic trajectory and rare disease progression forecasting. Through this discourse, we not only validate the STL's distinctive capacities to become a more general paradigm in multivariate time-series prediction using deep-learning techniques but also stress the need to tackle data-scarce prediction scenarios for universal application. Code will be made available.	翻訳日:2023-12-25 14:03:42 公開日:2023-12-22
# VIEScore: 条件付き画像合成評価のための説明可能なメトリクスを目指して VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation ( http://arxiv.org/abs/2312.14867v1 ) ライセンス: Link先を確認	Max Ku and Dongfu Jiang and Cong Wei and Xiang Yue and Wenhu Chen	(参考訳) 条件付き画像生成研究の急速に進歩する分野では、様々なモデルの性能と能力を効果的に評価する上で、限定的な説明可能性などの課題がある。本稿では、条件付き画像生成タスクを評価するための視覚指示誘導説明可能なメトリクスVIESCOREを紹介する。 VIESCOREは、Multimodal Large Language Models(MLLM)の一般的な知識をバックボーンとして活用し、トレーニングや微調整を必要としない。条件付き画像タスクにおいて,VIESCOREを7つの重要なタスクで評価した結果,(1)VIESCORE(GPT4-v)は人間と0.3のスピアマン相関を高い精度で達成し,その相関は0.45であることがわかった。 2) VIESCORE (オープンソースMLLM) は合成画像の評価において GPT-4v よりも著しく弱い。 (3)VIESCOREは、生成タスクにおける人間の評価と同等に相関するが、編集タスクでは困難である。これらの結果から,VIESCOREは画像合成タスクの評価において,人間の判断に取って代わる大きな可能性を秘めていると考えられる。 In the rapidly advancing field of conditional image generation research, challenges such as limited explainability lie in effectively evaluating the performance and capabilities of various models. This paper introduces VIESCORE, a Visual Instruction-guided Explainable metric for evaluating any conditional image generation tasks. VIESCORE leverages general knowledge from Multimodal Large Language Models (MLLMs) as the backbone and does not require training or fine-tuning. We evaluate VIESCORE on seven prominent tasks in conditional image tasks and found: (1) VIESCORE (GPT4-v) achieves a high Spearman correlation of 0.3 with human evaluations, while the human-to-human correlation is 0.45. (2) VIESCORE (with open-source MLLM) is significantly weaker than GPT-4v in evaluating synthetic images. (3) VIESCORE achieves a correlation on par with human ratings in the generation tasks but struggles in editing tasks. With these results, we believe VIESCORE shows its great potential to replace human judges in evaluating image synthesis tasks.	翻訳日:2023-12-25 14:03:11 公開日:2023-12-22
# YAYI 2: 多言語オープンソース大規模言語モデル YAYI 2: Multilingual Open-Source Large Language Models ( http://arxiv.org/abs/2312.14862v1 ) ライセンス: Link先を確認	Yin Luo, Qingchao Kong, Nan Xu, Jia Cao, Bao Hao, Baoyu Qu, Bo Chen, Chao Zhu, Chenyang Zhao, Donglei Zhang, Fan Feng, Feifei Zhao, Hailong Sun, Hanxuan Yang, Haojun Pan, Hongyu Liu, Jianbin Guo, Jiangtao Du, Jingyi Wang, Junfeng Li, Lei Sun, Liduo Liu, Lifeng Dong, Lili Liu, Lin Wang, Liwen Zhang, Minzheng Wang, Pin Wang, Ping Yu, Qingxiao Li, Rui Yan, Rui Zou, Ruiqun Li, Taiwen Huang, Xiaodong Wang, Xiaofei Wu, Xin Peng, Xina Zhang, Xing Fang, Xinglin Xiao, Yanni Hao, Yao Dong, Yigang Wang, Ying Liu, Yongyu Jiang, Yungan Wang, Yuqi Wang, Zhangsheng Wang, Zhaoxin Yu, Zhen Luo, Wenji Mao, Lei Wang, Dajun Zeng	(参考訳) 自然言語処理の最近の進歩として、大規模言語モデル(llm)は多くの実世界のタスクで人間レベルの言語理解と生成能力を達成し、人工知能への潜在的な道だと見なされている。 LLMの研究をより促進するために、Llama 2 や Falcon など多くのオープンソース LLM が最近提案され、プロプライエタリなモデルに匹敵するパフォーマンスを得た。しかし、これらのモデルは主に英語のシナリオ用に設計されており、中国の文脈ではパフォーマンスが悪い。本稿では,300億のパラメータを持つベースモデルとチャットモデルを含むYAYI 2を提案する。 YAYI 2は、トレーニング済みのデータ処理パイプラインによってフィルタされた2.65兆のトークンを含む多言語コーパス上で、スクラッチから事前トレーニングされる。ベースモデルは、数百万の指示による教師付き微調整と、人間のフィードバックからの強化学習によって、人間の価値と整合する。 MMLUやCMMLUのような複数のベンチマークでの大規模な実験は、提案されたYAYI 2が他の同様のサイズのオープンソースモデルより優れていることを一貫して証明している。 As the latest advancements in natural language processing, large language models (LLMs) have achieved human-level language understanding and generation abilities in many real-world tasks, and even have been regarded as a potential path to the artificial general intelligence. To better facilitate research on LLMs, many open-source LLMs, such as Llama 2 and Falcon, have recently been proposed and gained comparable performances to proprietary models. However, these models are primarily designed for English scenarios and exhibit poor performances in Chinese contexts. In this technical report, we propose YAYI 2, including both base and chat models, with 30 billion parameters. YAYI 2 is pre-trained from scratch on a multilingual corpus which contains 2.65 trillion tokens filtered by our pre-training data processing pipeline. The base model is aligned with human values through supervised fine-tuning with millions of instructions and reinforcement learning from human feedback. Extensive experiments on multiple benchmarks, such as MMLU and CMMLU, consistently demonstrate that the proposed YAYI 2 outperforms other similar sized open-source models.	翻訳日:2023-12-25 14:02:50 公開日:2023-12-22
# macs: マスコンディショニングされた3dハンドと物体の動き合成 MACS: Mass Conditioned 3D Hand and Object Motion Synthesis ( http://arxiv.org/abs/2312.14929v1 ) ライセンス: Link先を確認	Soshi Shimada, Franziska Mueller, Jan Bednarik, Bardia Doosti, Bernd Bickel, Danhang Tang, Vladislav Golyanik, Jonathan Taylor, Christian Theobalt, Thabo Beeler	(参考訳) 質量のような物体の物理的性質は、我々の手でそれを操作する方法に大きな影響を与えます。驚くべきことに、これまでの3dモーション合成の作業では、この側面は無視されている。本研究は, 合成した3次元手の動きの自然性を改善するために, MACSによる最初のMAss Conditioned 3Dハンドとオブジェクトモーション合成手法を提案する。提案手法はカスケード拡散モデルに基づき,物体質量と相互作用型に基づいて再現可能な相互作用を生成する。 MACSはまた、手動で描画された3Dオブジェクトの軌跡を入力として受け入れ、オブジェクトの質量によって条件付けられた自然な3Dハンドモーションを合成する。この柔軟性により、MLタスク用の合成トレーニングデータの生成、グラフィックワークフロー用のハンドの高速アニメーション、コンピュータゲーム用のキャラクターインタラクションの生成など、さまざまなダウンストリームアプリケーションにMACSを使用することができる。我々は,MACSが訓練中に見つからない補間および外挿された物体の質量を合理的に一般化するのに,小規模データセットが十分であることを示す。さらにmacは,表面接触合成モデルであるconnetが生成するマスコンディショニングコンタクトラベルにより,被写体に対する適度な一般化を示す。総合的なユーザ調査により、合成された3Dハンドオブジェクトの相互作用は、極めて可塑性でリアルであることが確認された。 The physical properties of an object, such as mass, significantly affect how we manipulate it with our hands. Surprisingly, this aspect has so far been neglected in prior work on 3D motion synthesis. To improve the naturalness of the synthesized 3D hand object motions, this work proposes MACS the first MAss Conditioned 3D hand and object motion Synthesis approach. Our approach is based on cascaded diffusion models and generates interactions that plausibly adjust based on the object mass and interaction type. MACS also accepts a manually drawn 3D object trajectory as input and synthesizes the natural 3D hand motions conditioned by the object mass. This flexibility enables MACS to be used for various downstream applications, such as generating synthetic training data for ML tasks, fast animation of hands for graphics workflows, and generating character interactions for computer games. We show experimentally that a small-scale dataset is sufficient for MACS to reasonably generalize across interpolated and extrapolated object masses unseen during the training. Furthermore, MACS shows moderate generalization to unseen objects, thanks to the mass-conditioned contact labels generated by our surface contact synthesis model ConNet. Our comprehensive user study confirms that the synthesized 3D hand-object interactions are highly plausible and realistic.	翻訳日:2023-12-25 13:55:54 公開日:2023-12-22
# 人のフィードバックからの強化学習に関する調査 A Survey of Reinforcement Learning from Human Feedback ( http://arxiv.org/abs/2312.14925v1 ) ライセンス: Link先を確認	Timo Kaufmann, Paul Weng, Viktor Bengs, Eyke H\"ullermeier	(参考訳) 人間からのフィードバックからの強化学習(RLHF)は、工学的な報酬関数に頼るのではなく、人間のフィードバックから学習する強化学習(RL)の一種である。プレファレンスベース強化学習(pbrl)の関連設定に関する先行研究に基づき、人工知能と人間とコンピュータの相互作用の交差点に位置する。この位置付けは、知的システムのパフォーマンスと適応性を高めるとともに、目的と人間の価値の整合性を向上させるための有望な道を提供する。 LLM(Large Language Models)のトレーニングは、RLHFが人間の目的に向けたモデルの能力をターゲットにする決定的な役割を担った近年において、この可能性を著しく証明している。本稿では、RLHFの基礎を概観し、機械エージェントと人間の入力の間の複雑なダイナミクスを探求する。近年, LLM の RLHF に焦点が当てられているが,本調査では多種多様な応用, 広範にわたる影響について, より広い視点で検討している。我々は,rlhfを支える基本原理を考察し,アルゴリズムと人間のフィードバックの共生関係を考察し,この分野の主要な研究動向について考察した。本稿は,RLHF研究の現況を合成することによって,この急成長する研究分野の包括的理解を研究者や実践者に提供することを目的とする。 Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of Large Language Models (LLMs) has impressively demonstrated this potential in recent years, where RLHF played a decisive role in targeting the model's capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between machine agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.	翻訳日:2023-12-25 13:55:31 公開日:2023-12-22
# 前向きアルゴリズムによる畳み込みニューラルネットワークの学習 Training Convolutional Neural Networks with the Forward-Forward algorithm ( http://arxiv.org/abs/2312.14924v1 ) ライセンス: Link先を確認	Riccardo Scodellaro, Ajinkya Kulkarni, Frauke Alves, Matthias Schr\"oter	(参考訳) 最近のディープニューラルネットワークによる画像解析の成功は、畳み込みニューラルネットワーク(CNN)によってほぼ完全に達成されている。これらのcnnのトレーニングは、実際にはすべてのディープニューラルネットワークアーキテクチャにおいて、ネットワークの出力と望ましい結果を比較するバックプロパゲーションアルゴリズムを使用しており、ネットワークの重み付けを望ましい結果に向けてチューニングするために差が使用される。 2022年のプレプリントで、Geoffrey Hinton氏は、望ましい結果とネットワークの入力時のイメージを渡す別のトレーニング方法を提案した。このフォーワードフォワード(FF)アルゴリズムは、現在まで完全に接続されたネットワークでしか使われていない。本稿では,FFパラダイムをCNNに拡張する方法について述べる。新たな空間拡張ラベル法を特徴とするff学習cnnは,mnist手書き文字データセット上で99.0%の分類精度を実現する。提案アルゴリズムの性能に異なるハイパーパラメータがどう影響するかを示し、標準バックプロパゲーション手法を用いてトレーニングしたCNNと比較する。さらに、クラスアクティベーションマップを用いて、FFアルゴリズムによってどの種類の機能が学習されるかを調べる。 The recent successes in analyzing images with deep neural networks are almost exclusively achieved with Convolutional Neural Networks (CNNs). The training of these CNNs, and in fact of all deep neural network architectures, uses the backpropagation algorithm where the output of the network is compared with the desired result and the difference is then used to tune the weights of the network towards the desired outcome. In a 2022 preprint, Geoffrey Hinton suggested an alternative way of training which passes the desired results together with the images at the input of the network. This so called Forward Forward (FF) algorithm has up to now only been used in fully connected networks. In this paper, we show how the FF paradigm can be extended to CNNs. Our FF-trained CNN, featuring a novel spatially-extended labeling technique, achieves a classification accuracy of 99.0% on the MNIST hand-written digits dataset. We show how different hyperparameters affect the performance of the proposed algorithm and compare the results with CNN trained with the standard backpropagation approach. Furthermore, we use Class Activation Maps to investigate which type of features are learnt by the FF algorithm.	翻訳日:2023-12-25 13:55:07 公開日:2023-12-22
# Fast-NTK:大規模モデルのためのパラメータ効率の良い未学習 Fast-NTK: Parameter-Efficient Unlearning for Large-Scale Models ( http://arxiv.org/abs/2312.14923v1 ) ライセンス: Link先を確認	Guihong Li, Hsiang Hsu, Chun-Fu Chen, and Radu Marculescu	(参考訳) 機械学習の急速な成長により、ユーザはデータの削除を要求できる‘忘れられる権利’のような立法活動が加速した。これに対して ``machine unlearning'' では,スクラッチから再トレーニングを必要とせずに,不要なデータの選択的削除を提案する。 neural-tangent-kernel-based (ntk-based)アンラーニング手法は性能に優れているが、特に大規模モデルやデータセットでは計算の複雑さが著しい。このアルゴリズムは,CNNの細調整バッチ正規化層や視覚変換器の視覚的プロンプトなどのパラメータ効率の高い微調整手法を取り入れることで,計算複雑性を大幅に低減する。実験結果から,より大規模なニューラルネットワークやデータセット(88mパラメータ,5kイメージなど)に対するスケーラビリティが,より小さなケース(例えば8mパラメータ,500イメージ)向けに設計された従来のフルモデルntkベースのアプローチの限界を上回っていることが示された。特に当社のアプローチは,retainセットのみをリトレーニングする従来の方法に匹敵するパフォーマンスを維持しています。これにより、ディープニューラルネットワークにおける実践的でスケーラブルなNTKベースのアンラーニングが可能になる。 The rapid growth of machine learning has spurred legislative initiatives such as ``the Right to be Forgotten,'' allowing users to request data removal. In response, ``machine unlearning'' proposes the selective removal of unwanted data without the need for retraining from scratch. While the Neural-Tangent-Kernel-based (NTK-based) unlearning method excels in performance, it suffers from significant computational complexity, especially for large-scale models and datasets. Our work introduces ``Fast-NTK,'' a novel NTK-based unlearning algorithm that significantly reduces the computational complexity by incorporating parameter-efficient fine-tuning methods, such as fine-tuning batch normalization layers in a CNN or visual prompts in a vision transformer. Our experimental results demonstrate scalability to much larger neural networks and datasets (e.g., 88M parameters; 5k images), surpassing the limitations of previous full-model NTK-based approaches designed for smaller cases (e.g., 8M parameters; 500 images). Notably, our approach maintains a performance comparable to the traditional method of retraining on the retain set alone. Fast-NTK can thus enable for practical and scalable NTK-based unlearning in deep neural networks.	翻訳日:2023-12-25 13:54:47 公開日:2023-12-22
# 高次統計から効率的に学ぶ:仮説テスト、ランダム特徴、ニューラルネットワーク Learning from higher-order statistics, efficiently: hypothesis tests, random features, and neural networks ( http://arxiv.org/abs/2312.14922v1 ) ライセンス: Link先を確認	Eszter Sz\'ekely, Lorenzo Bardone, Federica Gerace, Sebastian Goldt	(参考訳) ニューラルネットワークは高次元データセットにおける統計的パターンの発見に優れる。実際、3つ以上の変数間の非ガウス相関を定量化する高次累積は、ニューラルネットワークの性能にとって特に重要である。しかし、高次累積から特徴を抽出するニューラルネットワークはどの程度効率的か? 我々はこの問題をスパイク累積モデルで研究し、統計学者は-$d$次元入力の-$p\ge 4$累積から特権的な方向または「スパイク」を復元する必要がある。まず,スパイク累積モデルからの入力と等方的ガウス入力を区別するために必要となるサンプル数〜n$を解析することにより,スパイク回復の基本統計と計算限界を特徴付ける。統計的微分可能性には$n\gtrsim d$サンプルが必要であるのに対し、多項式時間における2つの分布を区別するには、幅広い種類のアルゴリズム、すなわち低次予想でカバーされているものに対して$n \gtrsim d^2$サンプルが必要である。これらの結果は,この問題に広く統計学と計算学のギャップが存在することを示唆している。数値実験により、ニューラルネットワークは2つの分布を二次的なサンプル複雑性で区別することを学び、ランダムな特徴のような"怠慢"な手法は、この方法でのランダムな推測よりも優れていることが示されている。その結果、ニューラルネットワークはスパイク累積モデルにおける高次相関から情報を効率的に抽出し、ニューラルネットワークが必要とするデータ量と高次累積モデルから学習するためのランダム特徴のギャップを明らかにする。 Neural networks excel at discovering statistical patterns in high-dimensional data sets. In practice, higher-order cumulants, which quantify the non-Gaussian correlations between three or more variables, are particularly important for the performance of neural networks. But how efficient are neural networks at extracting features from higher-order cumulants? We study this question in the spiked cumulant model, where the statistician needs to recover a privileged direction or "spike" from the order-$p\ge 4$ cumulants of~$d$-dimensional inputs. We first characterise the fundamental statistical and computational limits of recovering the spike by analysing the number of samples~$n$ required to strongly distinguish between inputs from the spiked cumulant model and isotropic Gaussian inputs. We find that statistical distinguishability requires $n\gtrsim d$ samples, while distinguishing the two distributions in polynomial time requires $n \gtrsim d^2$ samples for a wide class of algorithms, i.e. those covered by the low-degree conjecture. These results suggest the existence of a wide statistical-to-computational gap in this problem. Numerical experiments show that neural networks learn to distinguish the two distributions with quadratic sample complexity, while "lazy" methods like random features are not better than random guessing in this regime. Our results show that neural networks extract information from higher-order correlations in the spiked cumulant model efficiently, and reveal a large gap in the amount of data required by neural networks and random features to learn from higher-order cumulants.	翻訳日:2023-12-25 13:54:23 公開日:2023-12-22
# イネ表現型データのための新しいサンプルクラスタリングアルゴリズム A Novel Sampled Clustering Algorithm for Rice Phenotypic Data ( http://arxiv.org/abs/2312.14920v1 ) ライセンス: Link先を確認	Mithun Singh, Kapil Ahuja, Milind B. Ratnaparkhe	(参考訳) 植物種のフェノタイプ(または物理的)特性は、一般的にクラスタリングに使用される。最近の研究の一つ(Shastri et al. (2021))では、確率的サンプリング(ピボットサンプリング)とスペクトル的クラスタリングアルゴリズムを用いてダイズ種を分類した。これらの手法は、低コストで高精度なクラスタリングを得るために使われた。本研究では,初期のアルゴリズムをイネの群落に拡張する。基本アルゴリズムを3つの方法で改善する。まず,スペクトルクラスタリングにおける類似性行列を構築する新しい関数を提案する。一般に、自然指数関数はこの目的のために用いられる。スペクトルグラフ理論とチーガーの不等式に基づき、代わりに基本"a"指数関数を用いることを提案する。これはクラスタリングに好適な類似性行列スペクトルを与え、固有値解析によってサポートする。第二に、スペクトルクラスタリングで類似性行列を構築するために使われる関数は、以前固定因子(グローバルスケーリングと呼ばれる)でスケールされた。 Zelnik-Manor と Perona (2004) のアイデアに基づいて、行列要素(局所スケーリングと呼ばれる)によって変化する因子を使い、よりうまく機能する。第三に、重要なサンプリングアルゴリズムにおけるspecieの包含確率を計算するために、我々は以前、specieの特性値がそれぞれの基底値からどれだけ離れているか(すべての種で計算されている)を捉えた偏差の概念を用いていた。基本値を見つけるために、以前は最大関数が使われていた。現在では中央値関数を使っており、より直感的です。我々はこの選択を統計分析を用いて支持する。 1865種のイネについての実験を行い、シルエット値の観点から、我々の新しいサンプリングスペクトルクラスタリングは階層クラスタリングよりも61%優れていることを実証した。また,新しいアルゴリズムは,サンプリングによる階層的クラスタリングよりもかなり高速である。 Phenotypic (or Physical) characteristics of plant species are commonly used to perform clustering. In one of our recent works (Shastri et al. (2021)), we used a probabilistically sampled (using pivotal sampling) and spectrally clustered algorithm to group soybean species. These techniques were used to obtain highly accurate clusterings at a reduced cost. In this work, we extend the earlier algorithm to cluster rice species. We improve the base algorithm in three ways. First, we propose a new function to build the similarity matrix in Spectral Clustering. Commonly, a natural exponential function is used for this purpose. Based upon the spectral graph theory and the involved Cheeger's inequality, we propose the use a base "a" exponential function instead. This gives a similarity matrix spectrum favorable for clustering, which we support via an eigenvalue analysis. Second, the function used to build the similarity matrix in Spectral Clustering was earlier scaled with a fixed factor (called global scaling). Based upon the idea of Zelnik-Manor and Perona (2004), we now use a factor that varies with matrix elements (called local scaling) and works better. Third, to compute the inclusion probability of a specie in the pivotal sampling algorithm, we had earlier used the notion of deviation that captured how far specie's characteristic values were from their respective base values (computed over all species). A maximum function was used before to find the base values. We now use a median function, which is more intuitive. We support this choice using a statistical analysis. With experiments on 1865 rice species, we demonstrate that in terms of silhouette values, our new Sampled Spectral Clustering is 61% better than Hierarchical Clustering (currently prevalent). Also, our new algorithm is significantly faster than Hierarchical Clustering due to the involved sampling.	翻訳日:2023-12-25 13:53:54 公開日:2023-12-22
# Lift-Attend-Splat:変圧器を用いたバードアイビューカメラライダー融合 Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers ( http://arxiv.org/abs/2312.14919v1 ) ライセンス: Link先を確認	James Gunn, Zygmunt Lenyk, Anuj Sharma, Andrea Donati, Alexandru Buburuzan, John Redford, and Romain Mueller	(参考訳) 補完的なセンサモダリティの組み合わせは、自律運転(ad)のような安全クリティカルなロボティクスアプリケーションのための堅牢な認識を提供するために不可欠である。近年のAD用カメラとライダーの融合法は,ライダーからの深度情報を直接利用するよりも,単眼深度推定に頼っている。ここでは,本手法が期待通り深度を生かしていないこと,また,過度に深度推定を改良しても物体検出性能は向上せず,また,絶対的に深度推定を除去しても物体検出性能は劣化しないことを示す。これは、単眼深度に依存することは、カメラとライダーの融合において不要なアーキテクチャ上のボトルネックであることを示唆している。そこで本研究では,単眼深度推定を完全にバイパスし,単純な注意機構を用いて鳥眼網のカメラとライダーの機能を選択・融合する新しい融合手法を提案する。提案手法は,lidar機能の利用に基づいてカメラ機能の利用を変調し,単眼深度推定に基づくベースラインよりも,nuscenesデータセット上でより優れた3dオブジェクト検出を実現することを示す。 Combining complementary sensor modalities is crucial to providing robust perception for safety-critical robotics applications such as autonomous driving (AD). Recent state-of-the-art camera-lidar fusion methods for AD rely on monocular depth estimation which is a notoriously difficult task compared to using depth information from the lidar directly. Here, we find that this approach does not leverage depth as expected and show that naively improving depth estimation does not lead to improvements in object detection performance and that, strikingly, removing depth estimation altogether does not degrade object detection performance. This suggests that relying on monocular depth could be an unnecessary architectural bottleneck during camera-lidar fusion. In this work, we introduce a novel fusion method that bypasses monocular depth estimation altogether and instead selects and fuses camera and lidar features in a bird's-eye-view grid using a simple attention mechanism. We show that our model can modulate its use of camera features based on the availability of lidar features and that it yields better 3D object detection on the nuScenes dataset than baselines relying on monocular depth estimation.	翻訳日:2023-12-25 13:53:28 公開日:2023-12-22
# PoseGen: NeRFで3DのPoseデータセットを生成する学習 PoseGen: Learning to Generate 3D Human Pose Dataset with NeRF ( http://arxiv.org/abs/2312.14915v1 ) ライセンス: Link先を確認	Mohsen Gholami, Rabab Ward, Z. Jane Wang	(参考訳) 本稿では,Neural Radiance Fields (NeRF) を用いた3次元ポーズデータセット生成のためのエンドツーエンドフレームワークを提案する。公開データセットは一般的に、人間のポーズやカメラの視点に関して、限られた多様性を持っている。結果として、公開データセットでトレーニングされたポーズ推定器は、未発見の分散サンプルに適用された場合、著しく低下する。以前の研究では、2d-3dのポーズペアを生成したり、大量のランダムデータをレンダリングすることで、パブリックデータセットの強化を提案した。このようなアプローチは、画像レンダリングを見落としたり、事前訓練されたモデルに最適なデータセットをもたらす。本稿では,与えられたポーズ推定器からフィードバック損失を伴うデータセット(人間の3dポーズと画像)を生成する方法を提案する。先行技術とは対照的に、生成されたデータは事前学習したモデルのロバスト性を改善するために最適化されます。 posegenの目的は、与えられた事前学習モデルの予測誤差を最大化するデータの分布を学ぶことである。学習したデータ分布は、事前学習されたモデルのOODサンプルを含むため、事前学習されたモデルをさらに微調整するために、そのような分布からサンプリングしたデータは、モデルの一般化性を向上させる。これは3次元データ生成のためのNeRFを提案する最初の研究である。 NeRFはデータ駆動であり、人間の3Dスキャンを必要としない。したがって、データ生成にNeRFを使うことは、便利なユーザ固有のデータ生成のための新しい方向である。提案したPoseGenは,平均6%の改善率で4つのデータセット上で2つのベースラインモデル(SPINとHybrIK)を改善した。 This paper proposes an end-to-end framework for generating 3D human pose datasets using Neural Radiance Fields (NeRF). Public datasets generally have limited diversity in terms of human poses and camera viewpoints, largely due to the resource-intensive nature of collecting 3D human pose data. As a result, pose estimators trained on public datasets significantly underperform when applied to unseen out-of-distribution samples. Previous works proposed augmenting public datasets by generating 2D-3D pose pairs or rendering a large amount of random data. Such approaches either overlook image rendering or result in suboptimal datasets for pre-trained models. Here we propose PoseGen, which learns to generate a dataset (human 3D poses and images) with a feedback loss from a given pre-trained pose estimator. In contrast to prior art, our generated data is optimized to improve the robustness of the pre-trained model. The objective of PoseGen is to learn a distribution of data that maximizes the prediction error of a given pre-trained model. As the learned data distribution contains OOD samples of the pre-trained model, sampling data from such a distribution for further fine-tuning a pre-trained model improves the generalizability of the model. This is the first work that proposes NeRFs for 3D human data generation. NeRFs are data-driven and do not require 3D scans of humans. Therefore, using NeRF for data generation is a new direction for convenient user-specific data generation. Our extensive experiments show that the proposed PoseGen improves two baseline models (SPIN and HybrIK) on four datasets with an average 6% relative improvement.	翻訳日:2023-12-25 13:53:07 公開日:2023-12-22
# 置換不変量子回路 Permutation-invariant quantum circuits ( http://arxiv.org/abs/2312.14909v1 ) ライセンス: Link先を確認	Maximilian Balthasar Mansky, Santiago Londo\~no Castillo, Victor Ramos Puigvert, Claudia Linnhoff-Popien	(参考訳) 問題記述への物理的対称性の実装は、パラメータと計算複雑性の削減を可能にする。置換対称性を量子回路への最も制限的な離散対称性として積分する。置換対称性は、他のすべての離散群の超群である。我々は、キュービット上の$\operatorname{swap}$操作で置換を識別する。対称性の対応するリー代数への拡張に基づいて、量子回路要素の構成は指数によって示される。これにより、置換群対称性を量子回路アンサーゼに統合することができる。パラメータの数のスケーリングは$\mathcal{o}(n^3)$であり、一般の場合よりもかなり低く、対称性が量子計算の適用性を制限することを示す。また、置換による置換対称性の下で不変であるように既存の回路を適応する方法を示す。 The implementation of physical symmetries into problem descriptions allows for the reduction of parameters and computational complexity. We show the integration of the permutation symmetry as the most restrictive discrete symmetry into quantum circuits. The permutation symmetry is the supergroup of all other discrete groups. We identify the permutation with a $\operatorname{SWAP}$ operation on the qubits. Based on the extension of the symmetry into the corresponding Lie algebra, quantum circuit element construction is shown via exponentiation. This allows for ready integration of the permutation group symmetry into quantum circuit ansatzes. The scaling of the number of parameters is found to be $\mathcal{O}(n^3)$, significantly lower than the general case and an indication that symmetry restricts the applicability of quantum computing. We also show how to adapt existing circuits to be invariant under a permutation symmetry by modification.	翻訳日:2023-12-25 13:52:42 公開日:2023-12-22
# 擬エルミート系の量子化 Quantization of pseudo-hermitian systems ( http://arxiv.org/abs/2312.14906v1 ) ライセンス: Link先を確認	M.C. Baldiotti, R. Fresneda	(参考訳) この研究は、任意の次元のグラスマン代数に対する \cite{baldiotti2021} の一般化である。ここでは、非エルミート量子力学に着目した擬古典理論の共変量子化スキームを提案する。量子化は、正準古典論を任意の次元における等価量子実現に準じる。形式論をハイゼンベルク相互作用を持つ2つの結合スピンの問題に適用する。 This work is a generalization of \cite{baldiotti2021} to Grassmann algebras of arbitrary dimensions. Here we present a covariant quantization scheme for pseudoclassical theories focused on non-hermitian quantum mechanics. The quantization maps canonically related pseudoclassical theories to equivalent quantum realizations in arbitrary dimensions. We apply the formalism to the problem of two coupled spins with Heisenberg interaction.	翻訳日:2023-12-25 13:52:31 公開日:2023-12-22
# 量子アルゴリズムによる科学応用 Quantum algorithms for scientific applications ( http://arxiv.org/abs/2312.14904v1 ) ライセンス: Link先を確認	R. Au-Yeung and B. Camino and O. Rathore and V. Kendon	(参考訳) 量子コンピューティングは、様々なアプリケーション分野の計算能力の次のステップを提供すると約束している。本稿では,実世界の応用において真の量子優位性を達成するために必要な量子ハイプとブレークスルーの背後にある科学を考察する。ハイパフォーマンスコンピューティング(HPC)に最も影響を与える可能性のある分野には、量子システムのシミュレーション、最適化、機械学習などがある。我々は、HPCの現在の科学・工学的利用のかなりの部分を占める材料シミュレーションと計算流体力学の例を引用する。潜在的な課題は、量子デバイスのための古典的なデータのエンコーディングとデコード、および古典プロセッサと量子プロセッサ間のクロック速度のミスマッチである。現在の古典的手法への控えめな量子拡張でさえも、気象予報、工学、航空宇宙、薬物設計、持続可能な開発のための「緑」素材の実現など、広範囲に及ぶ影響をもたらすだろう。これは計算科学、工学、量子コンピューティングのコミュニティの協力による多大な努力を必要とする。 Quantum computing promises to provide the next step up in computational power for diverse application areas. In this review, we examine the science behind the quantum hype and breakthroughs required to achieve true quantum advantage in real world applications. Areas that are likely to have the greatest impact on high performance computing (HPC) include simulation of quantum systems, optimisation, and machine learning. We draw our examples from materials simulations and computational fluid dynamics which account for a large fraction of current scientific and engineering use of HPC. Potential challenges include encoding and decoding classical data for quantum devices, and mismatched clock speeds between classical and quantum processors. Even a modest quantum enhancement to current classical techniques would have far-reaching impacts in areas such as weather forecasting, engineering, aerospace, drug design, and realising ``green'' materials for sustainable development. This requires significant effort from the computational science, engineering and quantum computing communities working together.	翻訳日:2023-12-25 13:52:27 公開日:2023-12-22
# bipartiete mixed separable state を用いた ancilla-assisted process tomography の検討 Ancilla-Assisted Process Tomography with Bipartiete Mixed Separable States ( http://arxiv.org/abs/2312.14901v1 ) ライセンス: Link先を確認	Zhuoran Bao, Daniel F. V. James	(参考訳) apt(ancilla-assisted process tomography)の実施には,システム状態と補助状態との絡み合いが厳密な要件ではないことが示されている。代わりに、システム・アンシラ状態は忠実であることが要求され、これは状態を表すある行列の可逆性と同値である。しかし、小さなエラー増幅をもたらす忠実な状態と、より大きなエラー増幅をもたらす状態とを区別することは困難である。 2量子ビット系アンシラ状態に限定され,2つの量子ビットの相関を分類する可逆性問題とシナスター性の概念を結びつける理論的解析を行う。シナスターネスを用いることで、最小の誤差増幅で2つの量子ビットの忠実な混合分離状態を構成する方法を提供する。最大絡み合う状態は最小の誤差増幅を与える一方、分離可能なワーナー状態は最大絡み合う状態よりも大きい不均一な誤差増幅を生成することを示した。それでも、分離可能なヴェルナー状態または等方性状態の反転による誤差増幅は、任意の混合分離可能な状態において最良である。 It has been shown that the entanglement between the system state and the ancillary state is not a strict requirement for performing ancilla-assisted process tomography(AAPT). Instead, it only requires that the system-ancilla state be faithful, which is equivalent to the invertibility of a certain matrix representing the state. However, it is difficult to distinguish between a faithful state that brings small error amplification and one that produces larger error amplification. Restricted to two-qubit system-ancilla states, we present a theoretical analysis to connect the invertibility problem to the concept of sinisterness, which classifies the correlation of two qubits. Using sinisterness, we provide a way of constructing all two qubits faithful mixed separable states with the smallest error amplification. We show that the maximally entangled states provided the smallest error amplification, while the separable Werner states produced an uneven error amplification larger than the maximally entangled state. Nevertheless, the error amplification due to inverting the separable Werner states or isotropic states is the best any mixed separable state can do.	翻訳日:2023-12-25 13:52:12 公開日:2023-12-22
# キャリブレーションノイズ源を用いた低ノイズ極低温マイクロ波増幅器の特性評価 Low-noise cryogenic microwave amplifier characterization with a calibrated noise source ( http://arxiv.org/abs/2312.14900v1 ) ライセンス: Link先を確認	M. Malnou, T. F. Q. Larson, J. D. Teufel, F. Lecocq and J. Aumentado	(参考訳) パラメトリック増幅器は超伝導量子コンピューティングのワークホースとなっているが、これらの装置の研究と開発は不整合であり、時にはノイズ性能評価手法の誤認によって妨げられている。ノイズ特性の背景にある概念は明らかに単純であり、測定や解釈、分析において誤りを犯すことのできる場所はたくさんある。本稿では,ノイズ特性評価の基礎と,パワーハンドリング能力に制限のあるパラメトリック増幅器の特殊問題について述べる。本稿では,高電子移動型トランジスタ増幅器,ジョセフソン走行波パラメトリック増幅器,ジョセフソンパラメトリック増幅器の3つの具体例を紹介する。我々は,50-$\Omega$ショットノイズトンネル接合(SNTJ)をブロードバンドノイズ源として使用することを強調し,低温増幅増幅器の実用性を実証した。これらの実用的な例は、損失の役割と追加のパラメトリック増幅器 'idler' 入力モードを強調している。 Parametric amplifiers have become a workhorse in superconducting quantum computing, however research and development of these devices has been hampered by inconsistent, and sometimes misleading noise performance characterization methodologies. The concepts behind noise characterization are deceptively simple, and there are many places where one can make mistakes, either in measurement or interpretation and analysis. In this article we cover the basics of noise performance characterization, and the special problems it presents in parametric amplifiers with limited power handling capability. We illustrate the issues with three specific examples: a high-electron mobility transistor amplifier, a Josephson traveling-wave parametric amplifier, and a Josephson parametric amplifier. We emphasize the use of a 50-$\Omega$ shot noise tunnel junction (SNTJ) as a broadband noise source, demonstrating its utility for cryogenic amplifier amplifications. These practical examples highlight the role of loss as well as the additional parametric amplifier `idler' input mode.	翻訳日:2023-12-25 13:51:52 公開日:2023-12-22
# バグレポートから関連するテスト入力を抽出した自動テストケース生成 Enriching Automatic Test Case Generation by Extracting Relevant Test Inputs from Bug Reports ( http://arxiv.org/abs/2312.14898v1 ) ライセンス: Link先を確認	Wendk\^uuni C. Ou\'edraogo, Laura Plein, Kader Kabor\'e, Andrew Habib, Jacques Klein, David Lo, Tegawend\'e F. Bissyand\'e	(参考訳) ソフトウェアの品質は、提出されたテストの品質に大きく依存します。したがって、バグ検出のためのテストを書くことは不可欠である。しかし、手動で行うと時間がかかります。したがって、テストケース生成の自動化は、ソフトウェアエンジニアリングコミュニティにおけるエキサイティングな研究領域である。ほとんどのアプローチはユニットテストの生成に重点を置いている。残念なことに、現在の取り組みは、しばしば関連する入力を生成しないため、自動生成テストの効率が制限される。テストインプットの関連性を改善するために,自動テスト生成ツールに供給可能な入力値を特定するための,バグレポートの探索手法である \name を提案する。本研究では,バグレポートから抽出した入力を \name で評価し,evosuite でテストケースを生成する。評価はDefects4Jベンチマークで行われる。 Defects4J プロジェクトでは,正規表現を用いた場合,正規表現を使用せず,関連する入力の 68.68 % を抽出できた。さらに,本研究では,全プロジェクトにおけるラインとインストラクションのカバレッジを向上させる可能性を示唆した。全体として、ベースラインによって検出されなかった45のバグの検出に繋がった関連するインプットの収集に成功した。 The quality of a software is highly dependent on the quality of the tests it is submitted to. Writing tests for bug detection is thus essential. However, it is time-consuming when done manually. Automating test cases generation has therefore been an exciting research area in the software engineering community. Most approaches have been focused on generating unit tests. Unfortunately, current efforts often do not lead to the generation of relevant inputs, which limits the efficiency of automatically generated tests. Towards improving the relevance of test inputs, we present \name, a technique for exploring bug reports to identify input values that can be fed to automatic test generation tools. In this work, we investigate the performance of using inputs extracted from bug reports with \name to generate test cases with Evosuite. The evaluation is performed on the Defects4J benchmark. For Defects4J projects, our study has shown that \name successfully extracted 68.68\% of relevant inputs when using regular expression in its approach versus 50.21\% relevant inputs without regular expression. Further, our study has shown the potential to improve the Line and Instruction Coverage across all projects. Overall, we successfully collected relevant inputs that led to the detection of 45 bugs that were previously undetected by the baseline.	翻訳日:2023-12-25 13:51:34 公開日:2023-12-22
# 強抗ヘビー塑性はネットワークアトラクタ景観の凸性を変化させる Strong anti-Hebbian plasticity alters the convexity of network attractor landscapes ( http://arxiv.org/abs/2312.14896v1 ) ライセンス: Link先を確認	Lulu Gong, Xudong Chen, ShiNung Ching	(参考訳) 本稿では,ペアワイズ学習ルールの存在下でのリカレントニューラルネットワークについて検討する。特に,このようなネットワークの誘引的景観が,大規模最適化問題を媒介するルールの能力に重きを置き,学習の強みと自然(反ヘビー語と反ヘビー語)の機能としてどのように変化するかに関心を持っている。フォーマルな分析を通して、ヘビアンから反ヘビアン学習への移行は、ネットワークの誘引者景観の凸性を破壊するピッチフォーク分岐をもたらすことを示す。大規模な設定では、反ヘビアン可塑性は複数の安定平衡をもたらし、そのような効果は相互接続や「チョーク」点において超える可能性がある。さらに、アトラクタランドスケープはより速いものよりも遅い学習率に敏感である。これらの結果は、異なるペアの可塑性規則によって符号化される対象関数の種類に関する洞察を与える。 In this paper, we study recurrent neural networks in the presence of pairwise learning rules. We are specifically interested in how the attractor landscapes of such networks become altered as a function of the strength and nature (Hebbian vs. anti-Hebbian) of learning, which may have a bearing on the ability of such rules to mediate large-scale optimization problems. Through formal analysis, we show that a transition from Hebbian to anti-Hebbian learning brings about a pitchfork bifurcation that destroys convexity in the network attractor landscape. In larger-scale settings, this implies that anti-Hebbian plasticity will bring about multiple stable equilibria, and such effects may be outsized at interconnection or `choke' points. Furthermore, attractor landscapes are more sensitive to slower learning rates than faster ones. These results provide insight into the types of objective functions that can be encoded via different pairwise plasticity rules.	翻訳日:2023-12-25 13:51:11 公開日:2023-12-22

Title

Authors

Abstract

論文公表日・翻訳日

# 機械的相対論者におけるスピンの解釈

Une interpretation du spin en mecanique relativiste ( http://arxiv.org/abs/2406.15353v1 )

ライセンス: Link先を確認

Stefan Catheline,

(参考訳) 本論文は、スピンを再び研究することを目的としている。したがって、出発点は、量子力学のフレームにおいてのみコヒーレントな方法で記述できるシュテルンとゲルラッハの実験結果である。代わりに、剛体回転に関する前回の記事に続いて、相対論的力学的な視点が提案されている。実際、この相対論的剛体回転の地平線に関する慎重な研究は、スピン特性と完全に一致した任意の観測角度から不変であるように見える。

This paper aims at studying the spin once again. The departure point is thus the Stern and Gerlach experimental results that can be described in a coherent way in the frame of quantum mechanics only. Instead, the relativistic mechanics point of view is proposed here following the work presented in a previous article about rigid body rotation. Indeed, a careful study of the horizon of this relativistic rigid body rotation appears to be invariant from any observation angle in full agreement with the spin property.

翻訳日:2024-07-01 07:21:03 公開日:2023-12-22

# メタヒューリスティックスを用いたニューラルネットワークを用いた炭素繊維強化ポリマーのコンクリートの強度に及ぼす閉じ込め効果予測

Predicting Confinement Effect of Carbon Fiber Reinforced Polymers on Strength of Concrete using Metaheuristics-based Artificial Neural Networks ( http://arxiv.org/abs/2403.13809v1 )

ライセンス: Link先を確認

Sarmed Wahab, Mohamed Suleiman, Faisal Shabbir, Nasim Shakouri Mahmoudabadi, Sarmad Waqas, Nouman Herl, Afaq Ahmad,

(参考訳) 本稿では, メタヒューリスティックスに基づく人工ニューラルネットワークを用いた炭素繊維強化ポリマー(CFRP)のコンクリートシリンダー強度に対する閉じ込め効果の予測について述べる。 708CFRP拘束コンクリートシリンダーの詳細なデータベースを作成し, シリンダーの直径 (d) および高さ (h) などの幾何学的パラメータ, コンクリート(fco'), 厚み (nt), CFRP(Ef), 弾性率 (Ef), コンクリートひずみ拘束コンクリートひずみ, コンクリートfcc'の究極圧縮強度を含む8つのパラメータに関する情報を得た。粒子群最適化(PSO)、グレーオオカミ最適化(GWO)、バットアルゴリズム(BA)の3つのメタヒューリスティックモデルが実装されている。これらのアルゴリズムは平均二乗誤差の客観的関数を用いてデータに基づいて訓練され、その予測結果は実験と有限要素解析に対して検証される。 PSOのハイブリッドモデルでは、CFRP充填コンクリートシリンダーの強度を99.13%、GWOは98.17%と予測した。軸圧縮強度予測の精度は、これらの予測モデルが実験手法の信頼性の高い解であることを示した。予測モデルは、特に、プロセスが迅速かつ経済的になるような、フルスケールの時間を要する実験テストを避けるのに適している。

This article deals with the study of predicting the confinement effect of carbon fiber reinforced polymers (CFRPs) on concrete cylinder strength using metaheuristics-based artificial neural networks. A detailed database of 708 CFRP confined concrete cylinders is developed from previously published research with information on 8 parameters including geometrical parameters like the diameter (d) and height (h) of a cylinder, unconfined compressive strength of concrete (fco'), thickness (nt), the elastic modulus of CFRP (Ef), unconfined concrete strain confined concrete strain and the ultimate compressive strength of confined concrete fcc'. Three metaheuristic models are implemented including particle swarm optimization (PSO), grey wolf optimizer (GWO), and bat algorithm (BA). These algorithms are trained on the data using an objective function of mean square error and their predicted results are validated against the experimental studies and finite element analysis. The study shows that the hybrid model of PSO predicted the strength of CFRP-confined concrete cylinders with maximum accuracy of 99.13% and GWO predicted the results with an accuracy of 98.17%. The high accuracy of axial compressive strength predictions demonstrated that these prediction models are a reliable solution to the empirical methods. The prediction models are especially suitable for avoiding full-scale time-consuming experimental tests that make the process quick and economical.

翻訳日:2024-03-25 07:17:26 公開日:2023-12-22

# Google Tag Manager: EUデータ保護法に基づくデータ漏洩とその潜在的な違反

Google Tag Manager: Hidden Data Leaks and its Potential Violations under EU Data Protection Law ( http://arxiv.org/abs/2312.08806v2 )

ライセンス: Link先を確認

Gilles Mertens, Nataliia Bielova, Vincent Roca, Cristiana Santos, Michael Toth,

(参考訳) タグ管理システムは、ウェブサイトのパブリッシャーが複数のサードパーティのJavaScriptスクリプト(タグ)をウェブサイトにインストールするのをサポートするために開発された。 2012年、GoogleはGoogle Tag Manager(GTM)という独自のTMSを開発した。 2020年、新しい"Server-side" GTMが導入され、パブリッシャはTagを直接サーバに組み込めるようになった。しかしながら、GTMのどちらのバージョンも学術研究コミュニティによって徹底的に評価されていない。本稿では,Google Tag Management (GTM) アーキテクチャの2つのバージョンである Client- and Server-side GTM について検討する。 78のクライアントサイドタグ,8つのサーバサイドタグ,2つのConsent Management Platform (CMP) を内部から分析することにより,複数の隠れデータリーク,GTMパーミッションシステムをパスしてスクリプトを注入するタグ,デフォルトで有効となる同意などを検出する。我々は法律の専門家とともに、GTMとそのアクターの詳細な法的分析を行い、潜在的な法的違反とその責任を特定する。我々は,法的コンプライアンスを容易にするため,GTMの勧告と多数の改善を提案する。

Tag Management Systems were developed in order to support website publishers in installing multiple third-party JavaScript scripts (Tags) on their websites. In 2012, Google developed its own TMS called "Google Tag Manager" (GTM) that is currently present on 28 million live websites. In 2020, a new "Server-side" GTM was introduced, allowing publishers to include Tags directly on the server. However, neither version of GTM has yet been thoroughly evaluated by the academic research community. In this work, we study, for the first time, the two versions of the Google Tag Management (GTM) architectures: Client- and Server-side GTM. By analyzing these systems with 78 Client-side Tags, 8 Server-side Tags and two Consent Management Platforms (CMPs) from the inside, we discover multiple hidden data leaks, Tags bypassing GTM permission system to inject scripts, and consent enabled by default. With a legal expert, we perform an in-depth legal analysis of GTM and its actors to identify potential legal violations and their liabilities. We provide recommendations and propose numerous improvements for GTM to facilitate legal compliance.

翻訳日:2024-03-18 12:17:07 公開日:2023-12-22

# 再生可能なERC-20RトークンのRプールと決済市場

R-Pool and Settlement Markets for Recoverable ERC-20R Tokens ( http://arxiv.org/abs/2312.14375v1 )

ライセンス: Link先を確認

Kaili Wang, Qinchen Wang, Calvin Cai, Dan Boneh,

(参考訳) ERC-20RはERC-20を取り巻くラッパーで、資産が移管された後、限られた時間枠内で資産回復をサポートする。ブロックチェーンの盗難と損失を減らすために、被害者がリカバリウィンドウで盗まれた資産や失われた資産を回収できるようにする。誠実な受信者がERC-20Rの資産を受け取った場合、回収ウィンドウが終了するまで(例えば24時間)待たなければならない。多くのDeFiサービスは、通常の運用に干渉できるため、未解決の回収可能な資産を受け入れることを拒否する可能性が高い、と我々は主張する。そのため、アリスはERC-20Rトークンを受け取ったとき、DeFiサービスで使えるようになるまで24時間待たなければならない。しかし、もしAliceが、すぐに使える未開のERC-20トークンと、包んだトークンを交換するために、お金を払ってくれるとしたらどうだろう? 本稿では,同じ資産のベースとなるERC-20に対して,未設定のERC-20R資産を交換するためのプールの設計方法について検討する。このようなプールを設計することは、いくつかの難しい疑問を提起し、解決策を提示します。

ERC-20R is a wrapper around ERC-20 that supports asset recovery within a limited time window after an asset is transferred. It is designed to reduce theft and losses on the blockchain by allowing a victim to recover their stolen or lost assets during the recovery window. When an honest recipient receives an ERC-20R asset, they must wait until the recovery windows elapses (say, 24 hours), before they can unwrap the asset back to its base ERC-20 form. We argue that many DeFi services will likely refuse to accept unsettled recoverable assets because they can interfere with their normal operations. Consequently, when Alice receives an ERC-20R token, she must wait 24 hours before she can use it with a DeFi service. But what if Alice is willing to pay a fee to exchange the wrapped token for an unwrapped ERC-20 token that can be used right away? In this paper we explore how to design a pool to exchange an unsettled ERC-20R asset for a base ERC-20 of the same asset. Designing such a pool raises several challenging questions and we present our solutions.

翻訳日:2024-03-18 11:28:19 公開日:2023-12-22

# 検索可能な暗号化機能の検討と同型暗号化の評価

A Review on Searchable Encryption Functionality and the Evaluation of Homomorphic Encryption ( http://arxiv.org/abs/2312.14434v1 )

ライセンス: Link先を確認

Brian Kishiyama, Izzat Alsmadi,

(参考訳) Google Cloud Platform、Microsoft Azure、Amazon Web Servicesなどのクラウドサービスプロバイダは、継続的に進化するクラウドサービスを提供する。それは成長する産業です。 NetflixやPayPalのような企業は、データストレージ、コンピューティングパワー、その他のサービスにCloudを頼っている。企業にとって、クラウドはコストを削減し、柔軟性を提供し、成長を可能にする。しかし、クラウドにはセキュリティとプライバシに関する懸念がある。クラウドサービスはインターネットを通じてアクセスされるので、ハッカーや攻撃者はどこからでもサーバーにアクセスすることができる。クラウド内のデータを保護するためには、アップロード前に暗号化されるべきであり、ストレージやトランジットでも保護されるべきである。一方、データ所有者は暗号化されたデータにアクセスする必要があるかもしれない。また、変更、更新、削除、読み込み、検索、共有も必要になる。データがクラウドで復号化されると、機密データが露出し、公開され、誤使用される可能性がある。 1つの解決策は、データを暗号化形式で残し、暗号化されたデータを操作する検索可能暗号化(SE)を使用することである。 SEの機能は、開始以来改善され、研究は、SEを改善する方法を模索し続けている。本稿は、2019年から2023年までのクラウドサービスに関連するサーチブル暗号化の機能についてレビューし、そのスキームの1つであるFully Homomorphic Encryptionを評価する。全体としては、複数の機能が集約され、テストされるにつれて、SE効率が向上する段階にあるように思われる。

Cloud Service Providers, such as Google Cloud Platform, Microsoft Azure, or Amazon Web Services, offer continuously evolving cloud services. It is a growing industry. Businesses, such as Netflix and PayPal, rely on the Cloud for data storage, computing power, and other services. For businesses, the cloud reduces costs, provides flexibility, and allows for growth. However, there are security and privacy concerns regarding the Cloud. Because Cloud services are accessed through the internet, hackers and attackers could possibly access the servers from anywhere. To protect data in the Cloud, it should be encrypted before it is uploaded, it should be protected in storage and also in transit. On the other hand, data owners may need to access their encrypted data. It may also need to be altered, updated, deleted, read, searched, or shared with others. If data is decrypted in the Cloud, sensitive data is exposed and could be exposed and misused. One solution is to leave the data in its encrypted form and use Searchable Encryption (SE) which operates on encrypted data. The functionality of SE has improved since its inception and research continues to explore ways to improve SE. This paper reviews the functionality of Searchable Encryption, mostly related to Cloud services, in the years 2019 to 2023, and evaluates one of its schemes, Fully Homomorphic Encryption. Overall, it seems that research is at the point where SE efficiency is increased as multiple functionalities are aggregated and tested.

翻訳日:2024-03-18 11:28:19 公開日:2023-12-22

# コンカレンシーランドスケープのナビゲーション:レースコンディショナビリティ検出装置の調査

Navigating the Concurrency Landscape: A Survey of Race Condition Vulnerability Detectors ( http://arxiv.org/abs/2312.14479v1 )

ライセンス: Link先を確認

Aishwarya Upadhyay, Vijay Laxmi, Smita Naval,

(参考訳) 技術が進歩し続け、産業5.0の時代には、オペレーティングシステム、ファイルシステム、Web、ネットワークアプリケーションに大きなパラダイムシフトがありました。従来のマルチプロセッシングとマルチコアシステムの利用により、並列プログラミングはますます広まりつつある。しかし、このトランスフォーメーションは、並行プログラムが広く普及しているため、重大な障害と潜在的なセキュリティエクスプロイトに繋がった、並行バグとして知られる新しい一連の問題を引き起こした。過去20年間、多くの研究者がこれらのバグの公表、発見、緩和、予防に力を注いできた。並行性バグのスペクトルの中で、データレースや競合状態の脆弱性が最も多く、すべての並行性バグの80%が停滞している。本研究は,レースコンディションバグ検出の領域に焦点をあてる。我々はこれらの検出器を,それらが採用する多様な手法に基づいて系統的に分類する。さらに、レース検出に関連する技術やアルゴリズムを探索し、時間とともにこのフィールドの進化をトレースします。さらに,レースコンディションの脆弱性の検出にファジング技術を適用した。これらの検出器とその静的解析をレビューすることにより、競合状態の脆弱性検出における精度、性能、適用性、包括性などの今後の研究の方向性を概説する。

As technology continues to advance and we usher in the era of Industry 5.0, there has been a profound paradigm shift in operating systems, file systems, web, and network applications. The conventional utilization of multiprocessing and multicore systems has made concurrent programming increasingly pervasive. However, this transformation has brought about a new set of issues known as concurrency bugs, which, due to their wide prevalence in concurrent programs, have led to severe failures and potential security exploits. Over the past two decades, numerous researchers have dedicated their efforts to unveiling, detecting, mitigating, and preventing these bugs, with the last decade witnessing a surge in research within this domain. Among the spectrum of concurrency bugs, data races or race condition vulnerabilities stand out as the most prevalent, accounting for a staggering 80\% of all concurrency bugs. This survey paper is focused on the realm of race condition bug detectors. We systematically categorize these detectors based on the diverse methodologies they employ. Additionally, we delve into the techniques and algorithms associated with race detection, tracing the evolution of this field over time. Furthermore, we shed light on the application of fuzzing techniques in the detection of race condition vulnerabilities. By reviewing these detectors and their static analyses, we draw conclusions and outline potential future research directions, including enhancing accuracy, performance, applicability, and comprehensiveness in race condition vulnerability detection.

翻訳日:2024-03-18 11:28:18 公開日:2023-12-22

# 日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀・日銀

Concurrent Asynchronous Byzantine Agreement in Expected-Constant Rounds, Revisited ( http://arxiv.org/abs/2312.14506v1 )

ライセンス: Link先を確認

Ran Cohen, Pouyan Forghani, Juan Garay, Rutvik Patel, Vassilis Zikas,

(参考訳) ランダム化なしでは、Byzantine agreement (BA) は同期設定では直線的なラウンド数を必要とするが、非同期設定では不可能である。上記の制限を回避できるプリミティブは、oblivious Common coin (OCC) として知られている。ランダムなコインに一定の確率で合意できるが、これは合意が不可能である場合、つまり、プレイヤーは合意が達成されたかどうかを知らない。私たちの研究の出発点は、非同期環境で最適なレジリエンス(最終的なメッセージ配信を伴う)を持つ情報理論多値OCCには、既知のプロトコルが存在しないことです。文献のこの明らかな穴は特に問題であり、多値OCCはいくつかの構成で暗黙的または明示的に使用される。本稿では,最適なレジリエンス,すなわち$t < n/3$の汚職を許容し,この重要なギャップを埋める非同期設定において,最初の情報理論多値OCCプロトコルを提案する。さらに,本プロトコルは,よりシンプルで同期的な設定において,既知の構成では達成できない特性である指数的サイズのドメインでOCCを効率的に実装する。次に、非同期BAの並列合成を丸保存する問題に目を向ける。このタスクのプロトコルはBen-OrとEl-Yaniv [Distributed Computing '03]によって提案されました。しかし、その構造はいくつかの点で欠陥がある。したがって、第2のコントリビューションとして、上記のタスクに対してよりシンプルでモジュール化されたプロトコルを提供しています。 BAはセキュアなマルチパーティ計算プロトコルのコアビルディングブロックであるため、コンポーザビリティの保証を提供する最初のフレームワークになります。

It is well known that without randomization, Byzantine agreement (BA) requires a linear number of rounds in the synchronous setting, while it is flat out impossible in the asynchronous setting. The primitive which allows to bypass the above limitation is known as oblivious common coin (OCC). It allows parties to agree with constant probability on a random coin, where agreement is oblivious, i.e., players are not aware whether or not agreement has been achieved. The starting point of our work is the observation that no known protocol exists for information-theoretic multi-valued OCC with optimal resiliency in the asynchronous setting (with eventual message delivery). This apparent hole in the literature is particularly problematic, as multi-valued OCC is implicitly or explicitly used in several constructions. In this paper, we present the first information-theoretic multi-valued OCC protocol in the asynchronous setting with optimal resiliency, i.e., tolerating $t < n/3$ corruptions, thereby filling this important gap. Further, our protocol efficiently implements OCC with an exponential-size domain, a property which is not even achieved by known constructions in the simpler, synchronous setting. We then turn to the problem of round-preserving parallel composition of asynchronous BA. A protocol for this task was proposed by Ben-Or and El-Yaniv [Distributed Computing '03]. Their construction, however, is flawed in several ways. Thus, as a second contribution, we provide a simpler, more modular protocol for the above task. Finally, and as a contribution of independent interest, we provide proofs in Canetti's Universal Composability framework; this makes our work the first one offering composability guarantees, which are important as BA is a core building block of secure multi-party computation protocols.

翻訳日:2024-03-18 11:28:18 公開日:2023-12-22

# 移動中のサイバーセキュリティ : CAVの今後の試験施設の課題と要件

Cybersecurity in Motion: A Survey of Challenges and Requirements for Future Test Facilities of CAVs ( http://arxiv.org/abs/2312.14687v1 )

ライセンス: Link先を確認

Ioannis Mavromatis, Theodoros Spyridopoulos, Pietro Carnelli, Woon Hau Chin, Ahmed Khalil, Jennifer Chakravarty, Lucia Cipolina Kun, Robert J. Piechocki, Colin Robbins, Daniel Cunnington, Leigh Chase, Lamogha Chiazor, Chris Preston, Rahul, Aftab Khan,

(参考訳) 旅行のやり方は急速に変化しており、C-ITS(Cooperative Intelligent Transportation Systems)がこの進化の最前線にいる。しかし、C-ITSの採用は新たなリスクと課題をもたらし、サイバーセキュリティを安全性と信頼性を確保するための最優先事項にしている。この前提に基づいて,C-ITSのサイバーセキュリティの研究,試験,評価を促進するために設計されたCSCE(Cybersecurity Centre of Excellence)を提案する。我々は,CSCEの試験施設の設計,機能,課題について検討し,技術,セキュリティ,社会的要求の概要を述べる。本研究は, 今後のC-ITSに適応する柔軟性を強調し, 潜在的な脅威の検出・緩和におけるこれらのシステムの有効性について, 徹底的な調査・分析を通じて評価する。最後に、C-ITSのサイバーセキュリティに関するさらなる研究を動機付けることを目的として、様々なC-ITSドメインにおける現在の未解決課題を特定した。

The way we travel is changing rapidly, and Cooperative Intelligent Transportation Systems (C-ITSs) are at the forefront of this evolution. However, the adoption of C-ITSs introduces new risks and challenges, making cybersecurity a top priority for ensuring safety and reliability. Building on this premise, this paper presents an envisaged Cybersecurity Centre of Excellence (CSCE) designed to bolster research, testing, and evaluation of the cybersecurity of C-ITSs. We explore the design, functionality, and challenges of CSCE's testing facilities, outlining the technological, security, and societal requirements. Through a thorough survey and analysis, we assess the effectiveness of these systems in detecting and mitigating potential threats, highlighting their flexibility to adapt to future C-ITSs. Finally, we identify current unresolved challenges in various C-ITS domains, with the aim of motivating further research into the cybersecurity of C-ITSs.

翻訳日:2024-03-18 11:28:18 公開日:2023-12-22

# コンピュータサイエンスコースにおけるChatGPTの統合 : 学生の知覚と示唆

Integrating ChatGPT in a Computer Science Course: Students Perceptions and Suggestions ( http://arxiv.org/abs/2402.01640v1 )

ライセンス: Link先を確認

Kehinde Aruleba, Ismaila Temitayo Sanusi, George Obaido and Blessing Ogbuokiri

(参考訳) 近年,ChatGPTなどの人工知能ツールの教育システムへの統合が注目されている。本経験報告では,ChatGPTをコンピュータサイエンス科目に統合するための学生の認識と提案について考察する。コード補完と分析を含むChatGPT活動に続いて、7人の学生が詳細なインタビューに参加した。書き起こされたインタビューの結果から、chatgptはプログラミングを含む学習体験を向上させる可能性を示唆している。彼らは、クエリに即座に応答し、パーソナライズされた学習をサポートするツールの能力を強調した。しかし、ChatGPTへの依存度が学生の批判的思考や問題解決スキルに悪影響を及ぼす恐れがある。これらの結果は,コンピュータ科学コースにおけるChatGPTを用いたバランスをとることの重要性を示している。この研究の成果は、AIツールを教育の文脈に組み込むことを探求する教育者、カリキュラムデザイナー、政策立案者に大きな影響を与える。

The integration of artificial intelligence tools such as ChatGPT in the education system has gained attention in recent years. This experience report explores students' perceptions and suggestions for integrating ChatGPT in a computer science course. Following a ChatGPT activity which includes code completion and analysis, seven students participated in in-depth interviews. Findings from the transcribed interviews suggest that ChatGPT has the potential to enhance learning experience including programming. They highlighted the tool's ability to respond immediately to queries and supporting personalised learning. However, they raise concerns that heavy reliance on ChatGPT may adversely affect students' critical thinking and problem-solving skills. These findings show the importance of carefully balancing using ChatGPT in computer science courses. The findings of this research have significant implications for educators, curriculum designers and policymakers as they explore integrating AI tools into educational contexts.

翻訳日:2024-02-11 17:14:27 公開日:2023-12-22

# AI-Artificial Intelligenceのグローバルな影響:最近の進歩と今後の方向性,レビュー

The Global Impact of AI-Artificial Intelligence: Recent Advances and Future Directions, A Review ( http://arxiv.org/abs/2401.12223v1 )

ライセンス: Link先を確認

Chandregowda Pachegowda

(参考訳) 人工知能(AI)は、経済、医療、交通など社会の多くの側面を変革する可能性を持つ新興技術である。この記事では、AIのグローバルな影響に関する最近の研究論文を合成し、その潜在的なメリットとリスクを探る。この記事では、経済的、倫理的、社会的、セキュリティとプライバシ、仕事のずれといった、AIの影響を強調している。偏見、セキュリティ、プライバシー侵害などの問題を含む、AI開発に関する倫理的懸念について論じている。 AIの責任ある開発と展開を保証するためには、政府、産業、学界の協力が不可欠である。この記事は、社会全体にAIが及ぼす影響の認識と理解を促進するために、公的なエンゲージメントと教育の重要性を強調して締めくくっている。

Artificial intelligence (AI) is an emerging technology that has the potential to transform many aspects of society, including the economy, healthcare, and transportation. This article synthesizes recent research literature on the global impact of AI, exploring its potential benefits and risks. The article highlights the implications of AI, including its impact on economic, ethical, social, security & privacy, and job displacement aspects. It discusses the ethical concerns surrounding AI development, including issues of bias, security, and privacy violations. To ensure the responsible development and deployment of AI, collaboration between government, industry, and academia is essential. The article concludes by emphasizing the importance of public engagement and education to promote awareness and understanding of AI's impact on society at large.

翻訳日:2024-01-28 15:40:50 公開日:2023-12-22

# 場所別アルゴリズムによるパトロール管理のためのデバイアス手法

A debiasing technique for place-based algorithmic patrol management ( http://arxiv.org/abs/2401.06162v1 )

ライセンス: Link先を確認

Alexander Einarsson (1), Simen Oestmo (2), Lester Wollman (2), Duncan Purves (3), Ryan Jenkins (4) ((1) Northwestern University (2) SoundThinking Inc. (3) University of Florida (4) California Polytechnic State University)

(参考訳) 近年、データ駆動型警察に革命が起こった。これにより、履歴データのバイアスがアルゴリズムによる意思決定にどのように影響するかが調査されるようになった。本稿では,位置対応型アルゴリズムパトロール管理システムのデバイアス化手法を提案する。本手法は, モデルに高い精度を保ちながら, 人種的に偏りのある特徴を効率的に除去することを示す。最後に、この研究が発見した公正性とデータ駆動ポリシングの領域における将来の潜在的な研究の長いリストを提供する。

In recent years, there has been a revolution in data-driven policing. With that has come scrutiny on how bias in historical data affects algorithmic decision making. In this exploratory work, we introduce a debiasing technique for place-based algorithmic patrol management systems. We show that the technique efficiently eliminates racially biased features while retaining high accuracy in the models. Finally, we provide a lengthy list of potential future research in the realm of fairness and data-driven policing which this work uncovered.

翻訳日:2024-01-22 12:51:09 公開日:2023-12-22

# 信頼できる人間中心型自動意思決定システム

Trustworthy human-centric based Automated Decision-Making Systems ( http://arxiv.org/abs/2401.06161v1 )

ライセンス: Link先を確認

Marcelino Cabrera and Carlos Cruz and Pavel Novoa-Hern\'andez and David A. Pelta and Jos\'e Luis Verdegay

(参考訳) 自動意思決定システム(ADS: Automated Decision-Making Systems)は、様々な分野、活動、職業に普及し、性能を高めている。しかし、この普及はADSの誤用を含む潜在的なリスクをもたらす。このような誤用は、ADSが不必要である場合や、必須条件、条件、条件が見過ごされている場合に現れ、意図しない結果をもたらす。本研究では, デジタル化, デジタルトランスフォーメーション, ADS の現代社会と将来の文脈における活用に関連する意味, 差別, 倫理的考察について, 徹底的に検討する。 ADSの展開において、規制、透明性、倫理的行動の強制的な要求に重点を置いている。

Automated Decision-Making Systems (ADS) have become pervasive across various fields, activities, and occupations, to enhance performance. However, this widespread adoption introduces potential risks, including the misuse of ADS. Such misuse may manifest when ADS is employed in situations where it is unnecessary or when essential requirements, conditions, and terms are overlooked, leading to unintended consequences. This research paper presents a thorough examination of the implications, distinctions, and ethical considerations associated with digitalization, digital transformation, and the utilization of ADS in contemporary society and future contexts. Emphasis is placed on the imperative need for regulation, transparency, and ethical conduct in the deployment of ADS.

翻訳日:2024-01-22 12:51:03 公開日:2023-12-22

# 未来保護教育:大規模言語モデルを用いた口腔検査シミュレーションのためのプロトタイプ

Future-proofing Education: A Prototype for Simulating Oral Examinations Using Large Language Models ( http://arxiv.org/abs/2401.06160v1 )

ライセンス: Link先を確認

Andr\'e Nitze

(参考訳) 本研究は,高等教育における大規模言語モデル(llm)の効果について検討し,プロトタイプを用いた自動口腔検査シミュレーションに焦点をあてた。プロトタイプの設計上の留意点を述べるとともに, 教育者, 学生の中から選択したグループで評価した。技術的および教育的観察について考察する。プロトタイプは、口腔検査のシミュレーション、パーソナライズされたフィードバックの提供、教育者のワークロードの合理化に有効であることが判明した。このプロトタイプの有望な成果は、教育の民主化、多様な学生の参加、教育の質と効率の向上におけるllmの可能性を示している。

This study explores the impact of Large Language Models (LLMs) in higher education, focusing on an automated oral examination simulation using a prototype. The design considerations of the prototype are described, and the system is evaluated with a select group of educators and students. Technical and pedagogical observations are discussed. The prototype proved to be effective in simulating oral exams, providing personalized feedback, and streamlining educators' workloads. The promising results of the prototype show the potential for LLMs in democratizing education, inclusion of diverse student populations, and improvement of teaching quality and efficiency.

翻訳日:2024-01-22 12:50:51 公開日:2023-12-22

# FRED: 空中画像オブジェクト検出における全回転等価性を目指して

FRED: Towards a Full Rotation-Equivariance in Aerial Image Object Detection ( http://arxiv.org/abs/2401.06159v1 )

ライセンス: Link先を確認

Chanho Lee, Jinsu Son, Hyounguk Shon, Yunho Jeon, Junmo Kim

(参考訳) 回転同分散は、指向オブジェクト検出において必須だが挑戦的な性質である。一般物体検出器は、従来のCNNの翻訳等価性による空間シフトに対するロバストネスを自然に活用するが、回転等価性を達成することは、依然として解明の目標である。現在の検出器は回転不変の特徴を引き出すために様々なアライメント技術を展開しているが、それでも高容量モデルと重データ拡張に頼っている。本稿では,画像から境界ボックス予測までのプロセス全体が厳密な同値である完全回転同値指向物体検出器(fred)を提案する。具体的には、不変タスク(オブジェクト分類)と同変タスク(オブジェクトローカライゼーション)を分離して、エンドツーエンドの等価性を達成する。境界ボックスを回転同変ベクトルの集合として表現し、回転同変局在化を実装する。さらに,これらの回転同変ベクトルを変形可能な畳み込みのオフセットとして利用し,既存の空間適応の利点を高めた。完全な回転同分散を活用し,既存手法と比較して画像レベルの回転に対して高いロバスト性を示す。さらに,fredは,実験を通じて非軸協調学習に一歩近づいたことを示す。最新の手法と比較して,提案手法はDOTA-v1.0で同等の性能を示し,DOTA-v1.5では1.5mAPで性能が向上し,モデルパラメータは16%まで大幅に減少する。

Rotation-equivariance is an essential yet challenging property in oriented object detection. While general object detectors naturally leverage robustness to spatial shifts due to the translation-equivariance of the conventional CNNs, achieving rotation-equivariance remains an elusive goal. Current detectors deploy various alignment techniques to derive rotation-invariant features, but still rely on high capacity models and heavy data augmentation with all possible rotations. In this paper, we introduce a Fully Rotation-Equivariant Oriented Object Detector (FRED), whose entire process from the image to the bounding box prediction is strictly equivariant. Specifically, we decouple the invariant task (object classification) and the equivariant task (object localization) to achieve end-to-end equivariance. We represent the bounding box as a set of rotation-equivariant vectors to implement rotation-equivariant localization. Moreover, we utilized these rotation-equivariant vectors as offsets in the deformable convolution, thereby enhancing the existing advantages of spatial adaptation. Leveraging full rotation-equivariance, our FRED demonstrates higher robustness to image-level rotation compared to existing methods. Furthermore, we show that FRED is one step closer to non-axis aligned learning through our experiments. Compared to state-of-the-art methods, our proposed method delivers comparable performance on DOTA-v1.0 and outperforms by 1.5 mAP on DOTA-v1.5, all while significantly reducing the model parameters to 16%.

翻訳日:2024-01-22 12:50:41 公開日:2023-12-22

# Voila-A: ユーザの視線を意識した視覚言語モデル

Voila-A: Aligning Vision-Language Models with User's Gaze Attention ( http://arxiv.org/abs/2401.09454v1 )

ライセンス: Link先を確認

Kun Yan, Lei Ji, Zeyu Wang, Yuntao Wang, Nan Duan, Shuai Ma

(参考訳) 近年、視覚と言語理解の統合は、人工知能、特にビジョン・ランゲージ・モデル(VLM)を通じて、大きな進歩をもたらした。しかし、既存のvlmは複雑なシーンや複数のオブジェクトで現実世界のアプリケーションを扱うことや、その焦点を人間の様々な注意パターンに合わせることが困難に直面している。本稿では,ar や vr デバイスで収集可能な視線情報について,vlm の人間的注意の指標として紹介するとともに,これらのモデルの現実の応用における解釈性と有効性を高めるために,視線アライメントのための新しいアプローチ voila-a を提案する。まず、数百分間の視線データを収集し、局所的な物語を用いて人間の視線モダリティを模倣できることを実証する。そして、GPT-4を利用して自動データアノテーションパイプラインを設計し、VOILA-COCOデータセットを生成する。さらに,Voila Perceiverモジュールを改良し,事前学習した知識を保ちながら視線情報をVLMに統合する。我々は,視線追跡装置を用いて実生活シナリオをキャプチャするVOILA-GAZEテストセットとホールドアウト検証セットを用いて,Voila-Aを評価する。実験の結果,voila-aはいくつかのベースラインモデルを大きく上回っている。モデルの注意を人間の視線パターンに合わせることで、Voila-Aはより直感的でユーザ中心のVLMを実現すると同時に、幅広いアプリケーションにわたる人間とAIのインタラクションを促進する。

In recent years, the integration of vision and language understanding has led to significant advancements in artificial intelligence, particularly through Vision-Language Models (VLMs). However, existing VLMs face challenges in handling real-world applications with complex scenes and multiple objects, as well as aligning their focus with the diverse attention patterns of human users. In this paper, we introduce gaze information, feasibly collected by AR or VR devices, as a proxy for human attention to guide VLMs and propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications. First, we collect hundreds of minutes of gaze data to demonstrate that we can mimic human gaze modalities using localized narratives. We then design an automatic data annotation pipeline utilizing GPT-4 to generate the VOILA-COCO dataset. Additionally, we innovate the Voila Perceiver modules to integrate gaze information into VLMs while preserving their pretrained knowledge. We evaluate Voila-A using a hold-out validation set and a newly collected VOILA-GAZE Testset, which features real-life scenarios captured with a gaze-tracking device. Our experimental results demonstrate that Voila-A significantly outperforms several baseline models. By aligning model attention with human gaze patterns, Voila-A paves the way for more intuitive, user-centric VLMs and fosters engaging human-AI interaction across a wide range of applications.

翻訳日:2024-01-22 09:29:15 公開日:2023-12-22

# 航空機翼の圧力分布の学習係数に対するリーマン幾何学的特徴の統合

Incorporating Riemannian Geometric Features for Learning Coefficient of Pressure Distributions on Airplane Wings ( http://arxiv.org/abs/2401.09452v1 )

ライセンス: Link先を確認

Liwei Hu, Wenyong Wang, Yu Xiang, Stefan Sommer

(参考訳) 航空機の空力係数は、特に攻撃角度(AoA)が大きい場合、その幾何学によって著しく影響を受ける。空気力学の分野では、伝統的な多項式ベースのパラメータ化は、翼の幾何学を記述するためにできるだけ少数のパラメータを使用する。しかし、翼の3次元幾何学は2次元翼よりも複雑であるため、多項式ベースのパラメータ化は翼全体の形状を正確に表現することが困難である。既存のディープラーニングベースの手法では、2D翼や2D翼の形状に関する巨大な潜在神経表現を抽出することができる。最近の研究では、幾何学的特徴を直接ニューラルネットワークへの入力として取り込むことで、予測された空力係数の精度を向上させることができる。幾何学理論により, 翼面上の圧力係数(CP)分布の学習にリーマン幾何学的特徴を取り入れることを提案する。提案手法は,幾何学的特徴(リーマン計量,接続,曲率)を計算し,さらに幾何学的特徴,座標,飛行条件を深層学習モデルに入力し,CP分布を予測する。実験の結果,最先端のディープ・アテンション・ネットワーク (dan) と比較して, dlr-f11 航空機テストセットの予測平均二乗誤差 (mse) を平均8.41%削減できた。

The aerodynamic coefficients of aircrafts are significantly impacted by its geometry, especially when the angle of attack (AoA) is large. In the field of aerodynamics, traditional polynomial-based parameterization uses as few parameters as possible to describe the geometry of an airfoil. However, because the 3D geometry of a wing is more complicated than the 2D airfoil, polynomial-based parameterizations have difficulty in accurately representing the entire shape of a wing in 3D space. Existing deep learning-based methods can extract massive latent neural representations for the shape of 2D airfoils or 2D slices of wings. Recent studies highlight that directly taking geometric features as inputs to the neural networks can improve the accuracy of predicted aerodynamic coefficients. Motivated by geometry theory, we propose to incorporate Riemannian geometric features for learning Coefficient of Pressure (CP) distributions on wing surfaces. Our method calculates geometric features (Riemannian metric, connection, and curvature) and further inputs the geometric features, coordinates and flight conditions into a deep learning model to predict the CP distribution. Experimental results show that our method, compared to state-of-the-art Deep Attention Network (DAN), reduces the predicted mean square error (MSE) of CP by an average of 8.41% for the DLR-F11 aircraft test set.

翻訳日:2024-01-22 09:28:50 公開日:2023-12-22

# 分子コンフォメーション予測のための拡散駆動生成枠組み

Diffusion-Driven Generative Framework for Molecular Conformation Prediction ( http://arxiv.org/abs/2401.09451v1 )

ライセンス: Link先を確認

Bobin Yang, Zhenghan Chen

(参考訳) 二次元グラフ表現から3次元分子配置を推測するタスクは、計算化学の領域と医薬品の開発において重要な意味を持つ。これは分子機構と相互作用の理解に根本的に寄与する。機械学習の急速な進化、特に深層生成ネットワークの領域では、そのような予測モデリングの精度が飛躍的に向上した。従来の方法論では、最初は原子間距離を推定した後、距離幾何学の問題を解くことによって空間分子構造を彫刻する。しかし、この逐次的アプローチは時折、局所原子配列の複雑さを正確に捉えることに失敗し、結果として生じる構造モデルの完全性を損なう。これらの欠陥に対処するため、この研究は古典的非平衡熱力学で見られる拡散原理に基づくアバンギャルド生成フレームワークである 'method{} を導入する。 \method{} は原子を離散的な実体として定義し、マルコフ鎖に似た過程を通じて確率的ノイズの分布をコヒーレントな分子形式に戻す拡散の反転を導く。この変換は、抽象潜在空間における分子グラフの初期表現から始まり、タスクの特定の要求を尊重するように調整された精巧な双レベル最適化スキームを通じて3次元形式の実現へと進む。

The task of inferring three-dimensional molecular configurations from their two-dimensional graph representations is of critical significance in the domains of computational chemistry and the development of pharmaceuticals. It contributes fundamentally to our grasp of molecular mechanisms and interactions. The rapid evolution of machine learning, especially in the realm of deep generative networks, has catalyzed breakthroughs in the precision of such predictive modeling. Traditional methodologies typically employ a bifurcated strategy: initially estimating interatomic distances followed by sculpting the spatial molecular structure via solving a distance geometry problem. This sequential approach, however, occasionally fails to capture the intricacies of local atomic arrangements accurately, thus compromising the integrity of the resultant structural models. Addressing these deficiencies, this work introduces an avant-garde generative framework: \method{}, which is predicated on the diffusion principles found in classical non-equilibrium thermodynamics. \method{} envisages atoms as discrete entities and is adept at guiding the reversal of diffusion morphing a distribution of stochastic noise back into coherent molecular forms through a process akin to a Markov chain. This transformation begins with the initial representation of a molecular graph in an abstract latent space, progressing to the realization of the three-dimensional forms via an elaborate bilevel optimization scheme, tailored to respect the task's specific requirements.

翻訳日:2024-01-22 09:28:26 公開日:2023-12-22

# ai支援による病理診断への協力-empaiaイニシアチブ

Joining Forces for Pathology Diagnostics with AI Assistance: The EMPAIA Initiative ( http://arxiv.org/abs/2401.09450v1 )

ライセンス: Link先を確認

Norman Zerbe, Lars Ole Schwen, Christian Gei{\ss}ler, Katja Wiesemann, Tom Bisson, Peter Boor, Rita Carvalho, Michael Franz, Christoph Jansen, Tim-Rasmus Kiehl, Bj\"orn Lindequist, Nora Charlotte Pohlan, Sarah Schmell, Klaus Strohmenger, Falk Zakrzewski, Markus Plass, Michael Takla, Tobias K\"uster, Andr\'e Homeyer, Peter Hufnagl

(参考訳) 過去10年間で、病理学における人工知能(AI)の手法は大幅に進歩した。しかし, 臨床診断製品への研究成果の翻訳における技術的, 規制的ハードルや, 標準化されたインターフェースの欠如など, 日常的な臨床実践への統合は遅れている。オープンでベンダ中立のEMPAIAイニシアチブは、これらの課題に対処する。本稿では,EMPAIAの成果と教訓について概説する。 EMPAIAは病理AIエコシステムの様々なステークホルダー、すなわち病理学者、コンピュータ科学者、産業を統合する。緊密なコラボレーションでは、技術的相互運用性標準、AIテストと製品開発のための推奨、説明可能性メソッドを開発しました。モジュール化されたオープンソースのEMPAIAプラットフォームを実装し、6つの異なるベンダーから11のAIベースの画像分析アプリを統合することに成功した。ヨーロッパとアジアで14種類の病理実験室で, 臨床現場におけるAIの活用を優先して検討した。技術開発に加えて、すべてのステークホルダーがデジタル病理とAIに関する情報と経験を共有するためのフォーラムを作りました。商業的、臨床的、学術的なステークホルダーはempaiaの共通のオープンソースインターフェースを採用することができ、大規模な標準化とプロセスの合理化にユニークな機会を提供する。日常的な実験室でのAI支援を効果的かつ広く確立するためには、さらなる努力が必要である。この目的のために、持続可能なインフラである非営利団体EMPAIA Internationalが、標準化を継続し、AI支援デジタル病理の未来に対する幅広い実装と擁護を支援するために設立された。

Over the past decade, artificial intelligence (AI) methods in pathology have advanced substantially. However, integration into routine clinical practice has been slow due to numerous challenges, including technical and regulatory hurdles in translating research results into clinical diagnostic products and the lack of standardized interfaces. The open and vendor-neutral EMPAIA initiative addresses these challenges. Here, we provide an overview of EMPAIA's achievements and lessons learned. EMPAIA integrates various stakeholders of the pathology AI ecosystem, i.e., pathologists, computer scientists, and industry. In close collaboration, we developed technical interoperability standards, recommendations for AI testing and product development, and explainability methods. We implemented the modular and open-source EMPAIA platform and successfully integrated 11 AI-based image analysis apps from 6 different vendors, demonstrating how different apps can use a single standardized interface. We prioritized requirements and evaluated the use of AI in real clinical settings with 14 different pathology laboratories in Europe and Asia. In addition to technical developments, we created a forum for all stakeholders to share information and experiences on digital pathology and AI. Commercial, clinical, and academic stakeholders can now adopt EMPAIA's common open-source interfaces, providing a unique opportunity for large-scale standardization and streamlining of processes. Further efforts are needed to effectively and broadly establish AI assistance in routine laboratory use. To this end, a sustainable infrastructure, the non-profit association EMPAIA International, has been established to continue standardization and support broad implementation and advocacy for an AI-assisted digital pathology future.

翻訳日:2024-01-22 09:28:05 公開日:2023-12-22

# Tumbug: 絵画的,普遍的な知識表現方法

Tumbug: A pictorial, universal knowledge representation method ( http://arxiv.org/abs/2401.09448v1 )

ライセンス: Link先を確認

Mark A. Atkins

(参考訳) 人工知能(AGI)の鍵は、一般的にコモンセンス推論(CSR)や、ほぼ同等に、特にCSRに適した知識表現法(KRM)の発見であると考えられており、著者らはCSR用のカスタムKRMを開発した。タムバグと呼ばれるこのKRMは、人間の脳がある種のKRMを使用しているという証拠が増えているため、自然界での写真として設計された。 tumbugは、roger schankのconceptual dependency (cd) theoryに似ているが、tumbugは、主に人間指向のアクティビティに基づいて約17のコンポーネント(=6つの原始的な概念カテゴリと11の原始的な行為)を使用しているcd理論とは対照的に、科学と人間の生活の基本的な概念に基づいた約30のコンポーネントを使用している。 Tumbugのビルディングブロックはすべて、従来のObject-Attribute-Value表現の3つのコンポーネント {O, A, V} と、Change and Systemである2つの新しいコンポーネント {C, S} に対応する5つのベーシックビルディングブロックに一般化することが判明した。 SCOVA」と呼ばれる5つの構成要素からなるこの集合は、すべての知識表現の普遍的な基盤であると考えられる。

Since the key to artificial general intelligence (AGI) is commonly believed to be commonsense reasoning (CSR) or, roughly equivalently, discovery of a knowledge representation method (KRM) that is particularly suitable for CSR, the author developed a custom KRM for CSR. This novel KRM called Tumbug was designed to be pictorial in nature because there exists increasing evidence that the human brain uses some pictorial type of KRM, and no well-known prior research in AGI has researched this KRM possibility. Tumbug is somewhat similar to Roger Schank's Conceptual Dependency (CD) theory, but Tumbug is pictorial and uses about 30 components based on fundamental concepts from the sciences and human life, in contrast to CD theory, which is textual and uses about 17 components (= 6 Primitive Conceptual Categories + 11 Primitive Acts) based mainly on human-oriented activities. All the Building Blocks of Tumbug were found to generalize to only five Basic Building Blocks that exactly correspond to the three components {O, A, V} of traditional Object-Attribute-Value representation plus two new components {C, S}, which are Change and System. Collectively this set of five components, called "SCOVA," seems to be a universal foundation for all knowledge representation.

翻訳日:2024-01-22 09:27:38 公開日:2023-12-22

# シミュレーションに基づく推定による孤立パルサー集団合成

Isolated pulsar population synthesis with simulation-based inference ( http://arxiv.org/abs/2312.14848v1 )

ライセンス: Link先を確認

Vanessa Graber, Michele Ronchi, Celsa Pardo-Araujo, Nanda Rea

(参考訳) 我々は、パルサー集団合成とシミュレーションに基づく推論を組み合わせることで、孤立したギャラクティック電波パルサーの磁気回転特性を抑える。まず、中性子星の誕生特性と進化をモデル化するための柔軟な枠組みを開発し、その動的、回転的、磁気的特性に焦点を当てた。特に、対数正規分布から初期磁場強度の$B$とスピン周期の$P$をサンプリングし、電力法則で遅延磁場崩壊を捉える。各ログノーマルは平均、$\mu_{\log b}, \mu_{\log p}$,および標準偏差、$\sigma_{\log b}, \sigma_{\log p}$ で記述されるが、パワーロームは$a_{\rm late}$で特徴づけられ、5つの自由パラメータが生成される。その後、恒星の電波放射と観測バイアスをモデル化し、3つの電波サーベイで検出を模倣し、入力パラメータを変化させて合成$p$-$\dot{p}$ダイアグラムの大規模なデータベースを作成する。次に、シミュレーションに基づく推論アプローチに従い、ニューラルネットワークを用いて、5つのモデルパラメータの後方分布を直接推測する深層ニューラルネットワークを訓練する。シミュレーションデータ上でこれらの個々の神経密度推定器の検証を成功させた後、観測されたパルサー集団の後方分布をネットワークのアンサンブルで推定する。我々は、対数正規分布に対して$\mu_{\log B} = 13.10^{+0.08}_{-0.10}$、$\sigma_{\log B} = 0.45^{+0.05}_{-0.05}$、$\mu_{\log P} = -1.00^{+0.26}_{-0.21}$、$\sigma_{\log P} = 0.38^{+0.33}_{-0.18}$、$a_{\rm late} = -1.80^{+0.65}_{-0.61}$、9.5\%$信頼区間における電力法について$を得る。このアプローチは、複雑な集団合成フレームワークに対するロバストな統計推論への重要なステップであり、銀河パルサーの将来の多波長解析の基礎を形成する。

We combine pulsar population synthesis with simulation-based inference to constrain the magneto-rotational properties of isolated Galactic radio pulsars. We first develop a flexible framework to model neutron-star birth properties and evolution, focusing on their dynamical, rotational and magnetic characteristics. In particular, we sample initial magnetic-field strengths, $B$, and spin periods, $P$, from log-normal distributions and capture the late-time magnetic-field decay with a power law. Each log-normal is described by a mean, $\mu_{\log B}, \mu_{\log P}$, and standard deviation, $\sigma_{\log B}, \sigma_{\log P}$, while the power law is characterized by the index, $a_{\rm late}$, resulting in five free parameters. We subsequently model the stars' radio emission and observational biases to mimic detections with three radio surveys, and produce a large database of synthetic $P$-$\dot{P}$ diagrams by varying our input parameters. We then follow a simulation-based inference approach that focuses on neural posterior estimation and employ this database to train deep neural networks to directly infer the posterior distributions of the five model parameters. After successfully validating these individual neural density estimators on simulated data, we use an ensemble of networks to infer the posterior distributions for the observed pulsar population. We obtain $\mu_{\log B} = 13.10^{+0.08}_{-0.10}$, $\sigma_{\log B} = 0.45^{+0.05}_{-0.05}$ and $\mu_{\log P} = -1.00^{+0.26}_{-0.21}$, $\sigma_{\log P} = 0.38^{+0.33}_{-0.18}$ for the log-normal distributions, and $a_{\rm late} = -1.80^{+0.65}_{-0.61}$ for the power law at $95\%$ credible interval. Our approach represents a crucial step towards robust statistical inference for complex population-synthesis frameworks and forms the basis for future multi-wavelength analyses of Galactic pulsars.

翻訳日:2024-01-15 13:15:57 公開日:2023-12-22

# 分離データアソシエーションとスムージングを用いたトランスベースマルチオブジェクトスムージング

Transformer-Based Multi-Object Smoothing with Decoupled Data Association and Smoothing ( http://arxiv.org/abs/2312.17261v1 )

ライセンス: Link先を確認

Juliano Pinto, Georg Hess, Yuxuan Xia, Henk Wymeersch, Lennart Svensson

(参考訳) マルチオブジェクト追跡(Multi-object Tracking、MOT)は、ある時間ウィンドウ上で、未知および時間変化のオブジェクトの状態軌跡を推定するタスクである。オブジェクト検出を時間ウィンドウ内のすべての測定値に条件付けできるマルチオブジェクト平滑化タスクに取り組むために,いくつかのアルゴリズムが提案されている。しかし、最適性能の手法は難解な計算複雑性に悩まされ、近似が必要であり、複雑な環境では準最適に実行する。深層学習に基づくアルゴリズムはこの問題に対処する可能性があるが、正確なマルチオブジェクトモデルが利用可能であり、測定が低次元であるような環境では広く適用されていない。本稿では,データ関連タスクをスムーズなタスクから切り離すような,この設定に適した新しいDLアーキテクチャを提案する。本研究では,従来のベイズトラッカーとDLトラッカーのスムーズ化問題設定における最初の比較として,従来のベイズトラッカーとDLトラッカーとの比較を行った。

Multi-object tracking (MOT) is the task of estimating the state trajectories of an unknown and time-varying number of objects over a certain time window. Several algorithms have been proposed to tackle the multi-object smoothing task, where object detections can be conditioned on all the measurements in the time window. However, the best-performing methods suffer from intractable computational complexity and require approximations, performing suboptimally in complex settings. Deep learning based algorithms are a possible venue for tackling this issue but have not been applied extensively in settings where accurate multi-object models are available and measurements are low-dimensional. We propose a novel DL architecture specifically tailored for this setting that decouples the data association task from the smoothing task. We compare the performance of the proposed smoother to the state-of-the-art in different tasks of varying difficulty and provide, to the best of our knowledge, the first comparison between traditional Bayesian trackers and DL trackers in the smoothing problem setting.

翻訳日:2024-01-15 12:50:25 公開日:2023-12-22

# TimePillars: テンポラリリカレントな3D LiDARオブジェクト検出

TimePillars: Temporally-Recurrent 3D LiDAR Object Detection ( http://arxiv.org/abs/2312.17260v1 )

ライセンス: Link先を確認

Ernesto Lozano Calvo, Bernardo Taveira, Fredrik Kahl, Niklas Gustafsson, Jonathan Larsson, Adam Tonderski

(参考訳) LiDARポイントクラウドに適用される物体検出は、ロボット工学、特に自律運転において重要なタスクである。フィールドで主に使用される単一のフレームメソッドは、個々のセンサースキャンから情報を活用する。最近の手法は比較的低い推論時間で優れた性能を達成する。しかし、LiDARデータに固有の疎度を考えると、これらの手法は、安全な自動化を実現する上で欠かせない長距離検出(例えば200m)に苦慮している。複数のスキャンを集約することは、より密度の高いクラウド表現につながるだけでなく、システムにタイムアウェアネスをもたらし、環境の変化に関する情報を提供する。しかし、この種のソリューションは、しばしば非常に問題固有のものであり、慎重にデータ処理を必要とし、実行時要求を満たさない傾向があります。この文脈では,lidarデータのピラー表現を時間にわたって活用し,ハードウェア統合効率の制約を尊重し,新たなzenseact open dataset (zod) の多様性と長距離情報を活用する時間的リカレントオブジェクト検出パイプラインであるtimepillarsを提案する。実験を通じて、繰り返しの利点を証明し、基礎的なビルディングブロックがいかに堅牢で効率的な結果が得られるかを示す。

Object detection applied to LiDAR point clouds is a relevant task in robotics, and particularly in autonomous driving. Single frame methods, predominant in the field, exploit information from individual sensor scans. Recent approaches achieve good performance, at relatively low inference time. Nevertheless, given the inherent high sparsity of LiDAR data, these methods struggle in long-range detection (e.g. 200m) which we deem to be critical in achieving safe automation. Aggregating multiple scans not only leads to a denser point cloud representation, but it also brings time-awareness to the system, and provides information about how the environment is changing. Solutions of this kind, however, are often highly problem-specific, demand careful data processing, and tend not to fulfil runtime requirements. In this context we propose TimePillars, a temporally-recurrent object detection pipeline which leverages the pillar representation of LiDAR data across time, respecting hardware integration efficiency constraints, and exploiting the diversity and long-range information of the novel Zenseact Open Dataset (ZOD). Through experimentation, we prove the benefits of having recurrency, and show how basic building blocks are enough to achieve robust and efficient results.

翻訳日:2024-01-15 12:50:03 公開日:2023-12-22

# 大規模言語モデルエージェントのためのワーキングメモリの強化

Empowering Working Memory for Large Language Model Agents ( http://arxiv.org/abs/2312.17259v1 )

ライセンス: Link先を確認

Jing Guo, Nan Li, Jianchuan Qi, Hang Yang, Ruiqiao Li, Yuzhen Feng, Si Zhang, Ming Xu

(参考訳) 大きな言語モデル(LLM)は印象的な言語機能を実現している。しかし、鍵となる制限は人間のような記憶能力の欠如である。 LLMは連続的な相互作用に制約のあるメモリ保持を示し、複雑な推論を妨げる。本稿では,認知心理学のワーキングメモリフレームワークを適用し,LLMアーキテクチャを向上する可能性について考察する。従来のLLMメモリ設計の限界は、異なるダイアログエピソードの分離や永続的なメモリリンクの欠如など、分析される。これに対処するため、集中型ワーキングメモリハブとエピソディックバッファアクセスを組み込んだ革新的なモデルが提案されている。このアーキテクチャは、複雑なタスクや協調的なシナリオにおいて、微妙な文脈推論の継続性を高めることを目的としている。有望ではあるが、エピソードメモリエンコーディング、ストレージ、優先順位付け、検索、セキュリティの最適化にはさらなる研究が必要である。本稿では,より高度で人間らしい記憶能力を持つLSMエージェントを開発するための戦略的青写真を提供し,汎用人工知能における重要なフロンティアとしてメモリ機構を強調した。

Large language models (LLMs) have achieved impressive linguistic capabilities. However, a key limitation persists in their lack of human-like memory faculties. LLMs exhibit constrained memory retention across sequential interactions, hindering complex reasoning. This paper explores the potential of applying cognitive psychology's working memory frameworks, to enhance LLM architecture. The limitations of traditional LLM memory designs are analyzed, including their isolation of distinct dialog episodes and lack of persistent memory links. To address this, an innovative model is proposed incorporating a centralized Working Memory Hub and Episodic Buffer access to retain memories across episodes. This architecture aims to provide greater continuity for nuanced contextual reasoning during intricate tasks and collaborative scenarios. While promising, further research is required into optimizing episodic memory encoding, storage, prioritization, retrieval, and security. Overall, this paper provides a strategic blueprint for developing LLM agents with more sophisticated, human-like memory capabilities, highlighting memory mechanisms as a vital frontier in artificial general intelligence.

翻訳日:2024-01-15 12:49:42 公開日:2023-12-22

# MLによるフライング -- Affine 変換の CNN インバージョン

Flying By ML -- CNN Inversion of Affine Transforms ( http://arxiv.org/abs/2312.17258v1 )

ライセンス: Link先を確認

L. Van Warren

(参考訳) 本稿では,cnnを用いてアフィン変換を反転させ,計器画像から航空機の状態を推定し,コックピットゲージの読解を自動化する機械学習手法について述べる。本研究は,ターン・アンド・バンクインジケータの合成画像を用いて検証し,単一画像からのデータセット生成,ノイズフリートレーニングのための「クリーントレーニング原理」,カテゴリデータからの連続値予測のためのcnn補間といった手法を導入する。ハイパーパラメータ最適化やMLシステムエンジニアリングに関する洞察も提供する。

This paper describes a machine learning method to automate reading of cockpit gauges, using a CNN to invert affine transformations and deduce aircraft states from instrument images. Validated with synthetic images of a turn-and-bank indicator, this research introduces methods such as generating datasets from a single image, the 'Clean Training Principle' for optimal noise-free training, and CNN interpolation for continuous value predictions from categorical data. It also offers insights into hyperparameter optimization and ML system software engineering.

翻訳日:2024-01-15 12:49:26 公開日:2023-12-22

# 長期条件記憶を持つ大規模言語モデルアシスタントの進化

Evolving Large Language Model Assistant with Long-Term Conditional Memory ( http://arxiv.org/abs/2312.17257v1 )

ライセンス: Link先を確認

Ruifeng Yuan, Shichao Sun, Zili Wang, Ziqiang Cao, Wenjie Li

(参考訳) 大規模言語モデルの急速な発展に伴い、ChatGPTのようなAIアシスタントは人々の作品や生活に広く浸透してきた。本稿では,言語長期記憶を利用した大規模言語モデルアシスタントについて述べる。ユーザーとaiアシスタントの間の履歴対話から知識と経験を保存し、より良い反応を生み出すための将来の対話に適用することに焦点を当てている。モデルは、完了した対話ごとに一連のレコードを生成し、それらをメモリに格納する。後の使用例では、新しいユーザ入力が与えられ、モデルがそれを使って関連するメモリを取得し、応答の質を改善する。メモリの最良の形態を見つけるために,メモリ構築のさまざまな方法を探り,条件記憶と呼ばれる新しい記憶機構を提案し,従来の手法の問題を解決する。また,生成過程におけるメモリの検索と利用について検討する。アシスタントはGPT-4をバックボーンとして使用し、長期記憶を持つAIアシスタントが必要とするさまざまな能力に着目した3つの構築されたテストデータセットで評価する。

With the rapid development of large language models, AI assistants like ChatGPT have widely entered people's works and lives. In this paper, we present an evolving large language model assistant that utilizes verbal long-term memory. It focuses on preserving the knowledge and experience from the history dialogue between the user and AI assistant, which can be applied to future dialogue for generating a better response. The model generates a set of records for each finished dialogue and stores them in the memory. In later usage, given a new user input, the model uses it to retrieve its related memory to improve the quality of the response. To find the best form of memory, we explore different ways of constructing the memory and propose a new memorizing mechanism called conditional memory to solve the problems in previous methods. We also investigate the retrieval and usage of memory in the generation process. The assistant uses GPT-4 as the backbone and we evaluate it on three constructed test datasets focusing on different abilities required by an AI assistant with long-term memory.

翻訳日:2024-01-15 12:49:17 公開日:2023-12-22

# voronoi tessellation の自己分化法

A Method for Auto-Differentiation of the Voronoi Tessellation ( http://arxiv.org/abs/2312.16192v1 )

ライセンス: Link先を確認

Sergei Shumilin, Alexander Ryabov, Evgeny Burnaev, Vladimir Vanovskii

(参考訳) ボロノイテッセルレーション(英: Voronoi tessellation)またはボロノイ図(英: Voronoi diagram)は、様々な科学分野に応用できる重要な計算幾何学技術である。これは、与えられた空間を点の集合に近接して領域に分割することである。自動微分は最適化タスクを解決する強力なツールです。自己微分は、バックプロパゲーションアルゴリズムを使って勾配を計算する計算グラフを構築することを前提としている。しかし、しばしばボロノイ音節はパイプラインの唯一の区別不能部分であり、エンドツーエンドの区別を禁止している。本稿では,2次元ヴォロノイテッセルレーションの自動微分法を提案する。この方法により、ヴォロノイのテッセル化と勾配の通過が可能であるため、構築をエンドツーエンドで微分できる。実装の詳細といくつかの重要な応用について述べる。私たちの知る限りでは、これはvoronoiの幾何学的パラメータの完全な集合を微分可能な方法で提供するvoronoi tessellationの最初の自己微分可能実現である。

Voronoi tessellation, also known as Voronoi diagram, is an important computational geometry technique that has applications in various scientific disciplines. It involves dividing a given space into regions based on the proximity to a set of points. Autodifferentiation is a powerful tool for solving optimization tasks. Autodifferentiation assumes constructing a computational graph that allows to compute gradients using backpropagation algorithm. However, often the Voronoi tessellation remains the only non-differentiable part of a pipeline, prohibiting end-to-end differentiation. We present the method for autodifferentiation of the 2D Voronoi tessellation. The method allows one to construct the Voronoi tessellation and pass gradients, making the construction end-to-end differentiable. We provide the implementation details and present several important applications. To the best of our knowledge this is the first autodifferentiable realization of the Voronoi tessellation providing full set of Voronoi geometrical parameters in a differentiable way.

翻訳日:2024-01-15 12:47:34 公開日:2023-12-22

# 乳癌におけるリソース制限自動Ki67指数の推定

Resource-Limited Automated Ki67 Index Estimation in Breast Cancer ( http://arxiv.org/abs/2401.00014v1 )

ライセンス: Link先を確認

J. Gliozzo, G. Marin\`o, A. Bonometti, M. Frasca and D. Malchiodi

(参考訳) 腫瘍進展と化学療法反応の予測は、最近、腫瘍浸潤性リンパ球(TIL)と核タンパク質Ki67を予後因子として用いている。近年,深層ニューラルネットワーク (dnns) が乳癌細胞においてki67の発現を推定し,腫瘍内tilsスコアを同時決定する結果が得られた。しかし、この10年間で、深層モデルによって引き起こされた異常な進歩は、少なくとも資源需要と同じくらいに増大した。深層モデルのクエリに必要な計算コストは、IoTベースのアプリケーションのように、リソース制限の強い制限を表している(場合によっては保存する)。そこで本研究では,乳がん検診においてki67陽性細胞の割合を効果的に推定するための資源消費対応dnnを提案する。提案手法では, メモリ使用量の75%と89%を削減し, エネルギー消費量を1.5倍に削減し, ベンチマーク・オブ・ザ・アート・ソリューションの総合的精度を向上した。このようなポジティブな結果に刺激されて,我々は,その汎用利用を可能にするために採用したフレームワークと,その利用をサポートするパブリックソフトウェアリポジトリを開発,構成した。

The prediction of tumor progression and chemotherapy response has been recently tackled exploiting Tumor Infiltrating Lymphocytes (TILs) and the nuclear protein Ki67 as prognostic factors. Recently, deep neural networks (DNNs) have been shown to achieve top results in estimating Ki67 expression and simultaneous determination of intratumoral TILs score in breast cancer cells. However, in the last ten years the extraordinary progress induced by deep models proliferated at least as much as their resource demand. The exorbitant computational costs required to query (and in some cases also to store) a deep model represent a strong limitation in resource-limited contexts, like that of IoT-based applications to support healthcare personnel. To this end, we propose a resource consumption-aware DNN for the effective estimate of the percentage of Ki67-positive cells in breast cancer screenings. Our approach reduced up to 75% and 89% the usage of memory and disk space respectively, up to 1.5x the energy consumption, and preserved or improved the overall accuracy of a benchmark state-of-the-art solution. Encouraged by such positive results, we developed and structured the adopted framework so as to allow its general purpose usage, along with a public software repository to support its usage.

翻訳日:2024-01-15 12:26:22 公開日:2023-12-22

# 継承表現を訓練したニューラルネットワークに基づくマルチモーダル認知マップ

Multi-Modal Cognitive Maps based on Neural Networks trained on Successor Representations ( http://arxiv.org/abs/2401.01364v1 )

ライセンス: Link先を確認

Paul Stoewer, Achim Schilling, Andreas Maier and Patrick Krauss

(参考訳) 認知地図(Cognitive map)は、脳が記憶を効率的に整理し、そこからコンテキストを取り出す方法に関する概念である。 Entorhinal-hippocampal complexは、エピソードやリレーショナルメモリ処理、空間ナビゲーションに深く関わっており、場所や格子細胞を介して認知地図を構築すると考えられている。認知地図の有望な特性を利用するため,我々は,細胞動態と認知地図表現をモデル化可能な後継表現を用いたマルチモーダルニューラルネットワークを構築した。ここでは、画像と単語埋め込みからなるマルチモーダル入力を用いる。ネットワークは、新規入力とトレーニングデータベースとの類似性を学習し、認知地図の表現を成功させる。その後、ネットワークの予測は、1つのモダリティから別のモダリティへの推測に90\%以上の精度で使用できる。したがって、提案手法は、現在のAIシステムを改善するためのビルディングブロックであり、オブジェクトが現れる環境と異なるモダリティをよりよく理解することができる。したがって、特定のモダリティと特定の遭遇との関連性は、類似した情報が少なく、学習された認知地図から追加情報が推測されるような、新たな状況における文脈認識につながる可能性がある。脳のentorhinal-hippocampal complex(entorhinal-hippocampal complex)で表される認知地図は、記憶からコンテキストを整理し取り出すもので、chatgptのような大規模言語モデル(llm)が類似したアーキテクチャを利用して高レベルの処理中心として機能することを示唆している。最後に、マルチモーダル入力を利用することで、LLMは、さまざまな形式のデータ(画像や単語など)間のギャップを埋め、学習された関連を通じてコンテキスト認識と抽象概念の基盤を築き、AIの基盤問題に対処することができる。

Cognitive maps are a proposed concept on how the brain efficiently organizes memories and retrieves context out of them. The entorhinal-hippocampal complex is heavily involved in episodic and relational memory processing, as well as spatial navigation and is thought to built cognitive maps via place and grid cells. To make use of the promising properties of cognitive maps, we set up a multi-modal neural network using successor representations which is able to model place cell dynamics and cognitive map representations. Here, we use multi-modal inputs consisting of images and word embeddings. The network learns the similarities between novel inputs and the training database and therefore the representation of the cognitive map successfully. Subsequently, the prediction of the network can be used to infer from one modality to another with over $90\%$ accuracy. The proposed method could therefore be a building block to improve current AI systems for better understanding of the environment and the different modalities in which objects appear. The association of specific modalities with certain encounters can therefore lead to context awareness in novel situations when similar encounters with less information occur and additional information can be inferred from the learned cognitive map. Cognitive maps, as represented by the entorhinal-hippocampal complex in the brain, organize and retrieve context from memories, suggesting that large language models (LLMs) like ChatGPT could harness similar architectures to function as a high-level processing center, akin to how the hippocampus operates within the cortex hierarchy. Finally, by utilizing multi-modal inputs, LLMs can potentially bridge the gap between different forms of data (like images and words), paving the way for context-awareness and grounding of abstract concepts through learned associations, addressing the grounding problem in AI.

翻訳日:2024-01-15 10:09:50 公開日:2023-12-22

# SoK: 三角形のモデリング - 機械学習における公正さ、解釈可能性、プライバシの相互作用について

SoK: Taming the Triangle -- On the Interplays between Fairness, Interpretability and Privacy in Machine Learning ( http://arxiv.org/abs/2312.16191v1 )

ライセンス: Link先を確認

Julien Ferry (LAAS-ROC), Ulrich A\"ivodji (ETS), S\'ebastien Gambs (UQAM), Marie-Jos\'e Huguet (LAAS-ROC), Mohamed Siala (LAAS-ROC)

(参考訳) 機械学習技術は、大学入学、ローンの帰属、再分配予測などの高い意思決定にますます使われている。したがって、学習したモデルが人間によって監査または理解され、差別や偏見を発生または再現せず、トレーニングデータに関する機密情報を漏洩しないようにすることが重要である。実際、解釈可能性、公正性、プライバシは、責任ある機械学習を開発する上で重要な要件であり、これら3つ全てが過去10年間に広く研究されてきた。しかし、それらは主に孤立していると考えられ、実際には肯定的にも否定的にも互いに相互作用する。本稿ではsok(systematization of knowledge)論文において,これら3つのデシデラタ間の相互作用に関する文献について検討した。より正確には、それぞれの相互作用について、同定されたシナジーと緊張を要約する。これらの知見は、いくつかの基本的な理論的および経験的対立を浮き彫りにしつつ、高レベルの実用性を維持することを目的とした場合、これらの異なる要件を共同で検討することは困難であることを示す。この問題を解決するために, 注意深い設計がこれらの異なる関心事を実際にうまく処理できることを示すため, 融和機構の可能性についても論じる。

Machine learning techniques are increasingly used for high-stakes decision-making, such as college admissions, loan attribution or recidivism prediction. Thus, it is crucial to ensure that the models learnt can be audited or understood by human users, do not create or reproduce discrimination or bias, and do not leak sensitive information regarding their training data. Indeed, interpretability, fairness and privacy are key requirements for the development of responsible machine learning, and all three have been studied extensively during the last decade. However, they were mainly considered in isolation, while in practice they interplay with each other, either positively or negatively. In this Systematization of Knowledge (SoK) paper, we survey the literature on the interactions between these three desiderata. More precisely, for each pairwise interaction, we summarize the identified synergies and tensions. These findings highlight several fundamental theoretical and empirical conflicts, while also demonstrating that jointly considering these different requirements is challenging when one aims at preserving a high level of utility. To solve this issue, we also discuss possible conciliation mechanisms, showing that a careful design can enable to successfully handle these different concerns in practice.

翻訳日:2023-12-31 03:01:10 公開日:2023-12-22

# Hessian-based generalization Guaranteesを用いたディープニューラルネットワークのロバスト微調整

Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees ( http://arxiv.org/abs/2206.02659v6 )

ライセンス: Link先を確認

Haotian Ju, Dongyue Li, Hongyang R. Zhang

(参考訳) 対象タスクにおける事前訓練されたディープニューラルネットワークの微調整を検討する。我々は、しばしば観測される過剰フィッティングの問題(例えば、ターゲットデータセットが小さい場合や、トレーニングラベルが騒がしい場合など)を理解するために、微調整の一般化特性について検討する。深層ネットワークに対する既存の一般化手法は、微調整モデルの初期化(即ち事前訓練されたネットワーク)からの距離や、深層ネットワークの雑音安定性などの概念に依存する。本稿では,PAC-Bayesian解析によるヘッセン系距離測定を同定し,微調整モデルの一般化ギャップとよく相関することを示した。理論的には、微調整モデルに対するヘッセン距離に基づく一般化境界を証明できる。また,オーバーフィッティングが重要な問題であるラベルノイズに対する微調整に関する拡張研究についても述べる。本稿では,このアルゴリズムについて,クラス条件付き独立ノイズモデルに基づくアルゴリズムと一般化誤差保証を提案する。経験的に、ヘッセン距離測度は、実際に微調整されたモデルの観測された一般化ギャップのスケールと一致する。また,ノイズの多いトレーニングラベルを用いた画像分類タスクでもアルゴリズムをテストし,先行手法の利得と微調整モデルのヘッセン距離測定値の低下を示した。

We consider fine-tuning a pretrained deep neural network on a target task. We study the generalization properties of fine-tuning to understand the problem of overfitting, which has often been observed (e.g., when the target dataset is small or when the training labels are noisy). Existing generalization measures for deep networks depend on notions such as distance from the initialization (i.e., the pretrained network) of the fine-tuned model and noise stability properties of deep networks. This paper identifies a Hessian-based distance measure through PAC-Bayesian analysis, which is shown to correlate well with observed generalization gaps of fine-tuned models. Theoretically, we prove Hessian distance-based generalization bounds for fine-tuned models. We also describe an extended study of fine-tuning against label noise, where overfitting remains a critical problem. We present an algorithm and a generalization error guarantee for this algorithm under a class conditional independent noise model. Empirically, we observe that the Hessian-based distance measure can match the scale of the observed generalization gap of fine-tuned models in practice. We also test our algorithm on several image classification tasks with noisy training labels, showing gains over prior methods and decreases in the Hessian distance measure of the fine-tuned model.

翻訳日:2023-12-27 23:29:31 公開日:2023-12-22

# DynGFN:GFlowNetを用いた遺伝子制御ネットワークのベイズ推定に向けて

DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets ( http://arxiv.org/abs/2302.04178v4 )

ライセンス: Link先を確認

Lazar Atanackovic, Alexander Tong, Bo Wang, Leo J. Lee, Yoshua Bengio, Jason Hartford

(参考訳) 細胞生物学における大きな課題の1つは、遺伝子発現と細胞機能を制御する遺伝子とその産物間の相互作用を記述する遺伝子制御ネットワーク(GRN)を推論することである。 1) 規制ネットワークは本質的に循環的であるため、grnを有向非循環グラフ(dag)としてモデル化すべきではなく、2) 観測は重要な測定ノイズを持つので、典型的なサンプルサイズでは、データが与えられた可能性のあるグラフの大きな同値クラスが常に存在し、この不確かさを捉える方法を求めている。既存の方法は、チャレンジ(1)、ダイナミックスから循環構造を識別すること、あるいはチャレンジ(2)、DAGよりも複雑なベイズ後部を学習することに焦点を当てるが、両方ではない。本稿では、RNAベロシティ技術を用いて遺伝子発現の「速度」を推定できるという事実を活用し、両方の課題に対処するアプローチを開発する。速度情報へのアクセスがあるので,ベイズ構造学習問題を動的系のスパース同定問題として扱うことができ,循環フィードバックループを時間を通じて捉えることができる。本研究の目的は, 離散構造上の不確実性をモデル化することであり, 生成フローネットワーク(GFlowNets)を用いて, 結合空間の後方分布を推定することである。提案手法は, 従来のベイズ構造学習法と比較して, 循環構造の分布をよりよくカプセル化した後部学習法であることが示唆された。

One of the grand challenges of cell biology is inferring the gene regulatory network (GRN) which describes interactions between genes and their products that control gene expression and cellular function. We can treat this as a causal discovery problem but with two non-standard challenges: (1) regulatory networks are inherently cyclic so we should not model a GRN as a directed acyclic graph (DAG), and (2) observations have significant measurement noise, so for typical sample sizes there will always be a large equivalence class of graphs that are likely given the data, and we want methods that capture this uncertainty. Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both. In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges. Because we have access to velocity information, we can treat the Bayesian structure learning problem as a problem of sparse identification of a dynamical system, capturing cyclic feedback loops through time. Since our objective is to model uncertainty over discrete structures, we leverage Generative Flow Networks (GFlowNets) to estimate the posterior distribution over the combinatorial space of possible sparse dependencies. Our results indicate that our method learns posteriors that better encapsulate the distributions of cyclic structures compared to counterpart state-of-the-art Bayesian structure learning approaches.

翻訳日:2023-12-27 23:21:14 公開日:2023-12-22

# PAC-Optimal Hyper-PosteriorによるスケーラブルなPAC-Bayesianメタラーニング:理論から実践へ

Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior: From Theory to Practice ( http://arxiv.org/abs/2211.07206v3 )

ライセンス: Link先を確認

Jonas Rothfuss, Martin Josifoski, Vincent Fortuin, Andreas Krause

(参考訳) Meta-Learningは、関連する学習タスクのデータセットから有用な帰納バイアスを取得することで、新しいタスクの学習プロセスを高速化することを目的としている。実際には、利用可能な関連するタスクの数は少ないことが多いが、既存のアプローチのほとんどは、多くのタスクを前提としており、非現実的で過度に適合する傾向がある。メタラーニング文学における中心的な疑問は、未発見のタスクへの一般化を確実にするための規則化の方法である。本研究では,pac-ベイズ理論を用いた理論的解析を行い,rothfuss et al. (2021a) によって初めて導かれたメタラーニングの一般化を提案する。重要なことに、この境界はPACOHと呼ばれる最適超後光の閉形式を導出することができ、最高の性能保証をもたらす。 PAC-Bayesian per-task 学習境界におけるメタラーニングの条件と程度について,理論的解析および実証事例研究を行った。閉形式PACOHは、二段階最適化に依存しない実践的なメタラーニングアプローチを刺激し、うまくスケールする標準的な変分法に対処可能な確率的最適化問題を引き起こす。実験の結果,PACOHをガウス過程とベイジアンニューラルネットワークモデルでインスタンス化する場合,提案手法はよりスケーラブルで,予測精度と不確実性評価の両面において最先端性能が得られることがわかった。

Meta-Learning aims to speed up the learning process on new tasks by acquiring useful inductive biases from datasets of related learning tasks. While, in practice, the number of related tasks available is often small, most of the existing approaches assume an abundance of tasks; making them unrealistic and prone to overfitting. A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks. In this work, we provide a theoretical analysis using the PAC-Bayesian theory and present a generalization bound for meta-learning, which was first derived by Rothfuss et al. (2021a). Crucially, the bound allows us to derive the closed form of the optimal hyper-posterior, referred to as PACOH, which leads to the best performance guarantees. We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds. The closed-form PACOH inspires a practical meta-learning approach that avoids the reliance on bi-level optimization, giving rise to a stochastic optimization problem that is amenable to standard variational methods that scale well. Our experiments show that, when instantiating the PACOH with Gaussian processes and Bayesian Neural Networks models, the resulting methods are more scalable, and yield state-of-the-art performance, both in terms of predictive accuracy and the quality of uncertainty estimates.

翻訳日:2023-12-27 23:16:39 公開日:2023-12-22

# 説明制約による学習

Learning with Explanation Constraints ( http://arxiv.org/abs/2303.14496v3 )

ライセンス: Link先を確認

Rattana Pukdee, Dylan Sam, J. Zico Kolter, Maria-Florina Balcan, Pradeep Ravikumar

(参考訳) 大規模なディープラーニングモデルは解釈が難しいため、最近はブラックボックスモデルの説明に焦点が当てられている。対照的に、モデルがどのように振る舞うべきかという apriori の説明があるかもしれない。本稿では,説明制約からの学習としてこの概念を定式化し,その説明がモデル学習をいかに改善できるかを分析するための学習論的枠組みを提案する。これらの説明はいつ役に立つのか? 私たちの最初の重要な貢献は、新しいデータに対する期待でこれらの説明制約を満たす一連のモデルを通じてこの問題に対処します。線形モデルと2層ニューラルネットワークの両方の設定における勾配情報から得られる説明の標準クラスに対して、これらのモデルの利点(Rademacher複雑性の低減の観点から)を特徴づける。さらに,より単純な拡張ラグランジアン法と比較して,より優れた性能を実現し,より頻繁な制約を満たす変分近似によって,我々のフレームワークのアルゴリズム的解を提供する。我々は,大規模な合成および実世界の実験に対するアプローチの利点を実証する。

As larger deep learning models are hard to interpret, there has been a recent focus on generating explanations of these black-box models. In contrast, we may have apriori explanations of how models should behave. In this paper, we formalize this notion as learning from explanation constraints and provide a learning theoretic framework to analyze how such explanations can improve the learning of our models. One may naturally ask, "When would these explanations be helpful?" Our first key contribution addresses this question via a class of models that satisfies these explanation constraints in expectation over new data. We provide a characterization of the benefits of these models (in terms of the reduction of their Rademacher complexities) for a canonical class of explanations given by gradient information in the settings of both linear models and two layer neural networks. In addition, we provide an algorithmic solution for our framework, via a variational approximation that achieves better performance and satisfies these constraints more frequently, when compared to simpler augmented Lagrangian methods to incorporate these explanations. We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.

翻訳日:2023-12-27 23:08:19 公開日:2023-12-22

# DeblurSR:スパイク表現の下のイベントベースの動き

DeblurSR: Event-Based Motion Deblurring Under the Spiking Representation ( http://arxiv.org/abs/2303.08977v3 )

ライセンス: Link先を確認

Chen Song, Chandrajit Bajaj, Qixing Huang

(参考訳) 本稿では,ぼやけた映像をシャープな映像に変換する新しい動きデブラリング手法であるdeblursrを提案する。 DeblurSRはイベントデータを利用して動きのあいまいさを補償し、スパイキング表現を利用してシャープな出力ビデオを時間から強度へのマッピングとしてパラメータ化する。私たちの重要な貢献であるスパイキング表現(SR)は、生物において生物学的ニューロンがどのように相互に通信するかを決定する神経型原理にインスパイアされています。スパイクが鋭いエッジを表現できる理由と、スパイクパラメータがニューロモルフィックな視点からどのように解釈されるかについて議論する。 DeblurSRは出力品質が高く、最先端のイベントベースのモーションデブロア法よりも少ない計算資源を必要とする。さらに,我々のアプローチは,暗黙的神経表現の最近の進歩と相まって,ビデオの超解像まで容易に拡張できることを示した。 DeblurSRの実装と視覚化はhttps://github.com/chensong1995/DeblurSRで公開されている。

We present DeblurSR, a novel motion deblurring approach that converts a blurry image into a sharp video. DeblurSR utilizes event data to compensate for motion ambiguities and exploits the spiking representation to parameterize the sharp output video as a mapping from time to intensity. Our key contribution, the Spiking Representation (SR), is inspired by the neuromorphic principles determining how biological neurons communicate with each other in living organisms. We discuss why the spikes can represent sharp edges and how the spiking parameters are interpreted from the neuromorphic perspective. DeblurSR has higher output quality and requires fewer computing resources than state-of-the-art event-based motion deblurring methods. We additionally show that our approach easily extends to video super-resolution when combined with recent advances in implicit neural representation. The implementation and animated visualization of DeblurSR are available at https://github.com/chensong1995/DeblurSR.

翻訳日:2023-12-27 23:06:51 公開日:2023-12-22

# 次世代外科ナビゲーション : マーカレスマルチビュー6dofによる手術器具の姿勢推定

Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose Estimation of Surgical Instruments ( http://arxiv.org/abs/2305.03535v2 )

ライセンス: Link先を確認

Jonas Hein, Nicola Cavalcanti, Daniel Suter, Lukas Zingg, Fabio Carrillo, Lilian Calvet, Mazda Farshad, Marc Pollefeys, Nassir Navab, Philipp F\"urnstahl

(参考訳) 従来のコンピュータビジョンの最先端の研究は、外科領域でますます活用されている。コンピュータ支援手術において特に注目されるのは、計器位置決めのためのマーカーベースのトラッキングシステムと、深層学習を用いた純画像ベースの6DoFポーズ推定に置き換えることである。しかし、最先端の単一視点ポーズ推定法はまだ手術ナビゲーションに必要な精度を満たさない。そこで本研究では,手術器具の高精度かつ閉塞性6DoFポーズ推定のためのマルチビュー設定の利点を考察し,手術室の課題に対処する理想的なカメラシステムを提案する。この作品の貢献は3倍である。まず,スタティックカメラとヘッドマウントカメラからなるマルチカメラキャプチャセットアップを提案し,様々なカメラ構成におけるポーズ推定手法の性能について検討する。第2に,手術用湿式手術室と実手術室で撮影し,外科医,器具,患者解剖学の豊富なアノテーションを含む多視点RGB-Dビデオデータセットを公開する。第3に,手術器具の6dofポーズ推定作業における3つの最先端シングルビューおよびマルチビュー法を評価し,カメラ構成,トレーニングデータ,咬合が姿勢精度および一般化能力に及ぼす影響を分析した。最適な方法は5台のカメラを多視点ポーズ最適化に利用し、手術訓練では1.01mmと0.89\degの平均位置と方位誤差、最適な条件下では2.79mmと3.33\degを達成する。手術器具のマーカーレストラッキングが既存のマーカーベースシステムに代わる可能性が高まっていることを示す。

State-of-the-art research of traditional computer vision is increasingly leveraged in the surgical domain. A particular focus in computer-assisted surgery is to replace marker-based tracking systems for instrument localization with pure image-based 6DoF pose estimation using deep-learning methods. However, state-of-the-art single-view pose estimation methods do not yet meet the accuracy required for surgical navigation. In this context, we investigate the benefits of multi-view setups for highly accurate and occlusion-robust 6DoF pose estimation of surgical instruments and derive recommendations for an ideal camera system that addresses the challenges in the operating room. The contributions of this work are threefold. First, we present a multi-camera capture setup consisting of static and head-mounted cameras, which allows us to study the performance of pose estimation methods under various camera configurations. Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre and including rich annotations for surgeon, instrument, and patient anatomy. Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments and analyze the influence of camera configurations, training data, and occlusions on the pose accuracy and generalization ability. The best method utilizes five cameras in a multi-view pose optimization and achieves an average position and orientation error of 1.01 mm and 0.89\deg for a surgical drill as well as 2.79 mm and 3.33\deg for a screwdriver under optimal conditions. Our results demonstrate that marker-less tracking of surgical instruments is becoming a feasible alternative to existing marker-based systems.

翻訳日:2023-12-27 22:57:31 公開日:2023-12-22

# CAMEL: デバイス上での効率的な学習のためのAIモデルと組み込みDRAMの共同設計

CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device Learning ( http://arxiv.org/abs/2305.03148v3 )

ライセンス: Link先を確認

Sai Qian Zhang, Thierry Tambe, Nestor Cuevas, Gu-Yeon Wei, David Brooks

(参考訳) オンデバイス学習は、aiモデルがユーザデータに適応できるようにし、エッジプラットフォームにおけるサービス品質を向上させる。しかし、リソース制限されたデバイスでのAIのトレーニングは、コンピューティングワークロードの要求と、ディープニューラルネットワーク(DNN)が必要とするメモリ消費とデータアクセスが大きな課題となっている。そこで本研究では,過渡訓練データの主要記憶媒体として組込み動的ランダムアクセスメモリ(edram)の利用を提案する。静的ランダムアクセスメモリ(SRAM)と比較して、eDRAMはより高いストレージ密度と低いリーク電力を提供し、アクセスコストと電力リークを低減させる。それでも、保存されたデータの整合性を維持するために、周期的なパワーハングリーリフレッシュ操作はシステム性能を低下させる可能性がある。高価なeDRAMリフレッシュ操作の発生を最小限に抑えるため、トレーニングプロセス中に保存されたデータの寿命を短縮することが有用である。これを実現するために、我々はアルゴリズムとハードウェアの共同設計の原則を採用し、トレーニングを通してデータ寿命とストレージコストを効果的に削減する可逆的なDNNアーキテクチャのファミリーを導入した。さらに,eDRAMをプライマリオンチップメモリとして活用した,高効率なオンデバイストレーニングエンジン「textit{CAMEL}」を提案する。このエンジンは、トレーニング精度を向上しつつ、メモリ使用量とチップ外DRAMトラフィックを大幅に削減したデバイス上での効率的なトレーニングを可能にする。我々は、異なるデータセットを持つ複数のDNN上でCAMELシステムを評価し、トレーニングプロセスの2.5\times$スピードアップと2.8\times$トレーニングエネルギセーブを他のベースラインハードウェアプラットフォームよりも実証した。

On-device learning allows AI models to adapt to user data, thereby enhancing service quality on edge platforms. However, training AI on resource-limited devices poses significant challenges due to the demanding computing workload and the substantial memory consumption and data access required by deep neural networks (DNNs). To address these issues, we propose utilizing embedded dynamic random-access memory (eDRAM) as the primary storage medium for transient training data. In comparison to static random-access memory (SRAM), eDRAM provides higher storage density and lower leakage power, resulting in reduced access cost and power leakage. Nevertheless, to maintain the integrity of the stored data, periodic power-hungry refresh operations could potentially degrade system performance. To minimize the occurrence of expensive eDRAM refresh operations, it is beneficial to shorten the lifetime of stored data during the training process. To achieve this, we adopt the principles of algorithm and hardware co-design, introducing a family of reversible DNN architectures that effectively decrease data lifetime and storage costs throughout training. Additionally, we present a highly efficient on-device training engine named \textit{CAMEL}, which leverages eDRAM as the primary on-chip memory. This engine enables efficient on-device training with significantly reduced memory usage and off-chip DRAM traffic while maintaining superior training accuracy. We evaluate our CAMEL system on multiple DNNs with different datasets, demonstrating a $2.5\times$ speedup of the training process and $2.8\times$ training energy savings than the other baseline hardware platforms.

翻訳日:2023-12-27 22:57:00 公開日:2023-12-22

# SPIRES(Structured prompt interrogation and Recursive extract of semantics: SPIRES):ゼロショット学習を用いた知識ベース獲得手法

Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning ( http://arxiv.org/abs/2304.02711v2 )

ライセンス: Link先を確認

J. Harry Caufield, Harshad Hegde, Vincent Emonet, Nomi L. Harris, Marcin P. Joachimiak, Nicolas Matentzoglu, HyeongSik Kim, Sierra A.T. Moxon, Justin T. Reese, Melissa A. Haendel, Peter N. Robinson, and Christopher J. Mungall

(参考訳) 知識ベースとオントロジーの作成は、手動のキュレーションに依存する時間のかかる作業である。 ai/nlpアプローチは、これらの知識ベースを投入する専門家キュレーターを支援するが、現在のアプローチは広範なトレーニングデータに依存しており、任意の複雑なネストされた知識スキーマを投入できない。本稿では,SPIRES(Structured Prompt Interrogation and Recursive extract of Semantics)を提案する。Large Language Models (LLMs) によるゼロショット学習(ZSL) と,フレキシブルプロンプトからの汎用クエリ応答を,特定のスキーマに準拠した情報から行うことによる知識抽出手法である。詳細なユーザ定義の知識スキーマと入力テキストが与えられた場合、SPIRESはGPT-3+に対して即時尋問を行い、提供されたスキーマと一致する応答の集合を得る。 SPIRESは既存のオントロジーと語彙を使って、一致するすべての要素の識別子を提供する。本稿では,食品レシピの抽出,多種の細胞シグナル伝達経路,疾患治療,多段階薬物機構,化学・疾患因果グラフなど,さまざまな領域におけるSPIRESの使用例を紹介する。現在のSPIRES精度は、既存のリレーショナル抽出(RE)メソッドの中間範囲に匹敵するが、簡単にカスタマイズでき、柔軟性があり、重要なことに、トレーニングデータがない場合に新しいタスクを実行する能力がある。本手法は,LLMの言語解釈機能を活用して知識ベースを組み立て,手作業による知識のキュレーションと取得を支援するとともに,LLM以外のデータベースやオントロジーによる検証を支援する一般的な戦略を支援する。 SPIRESはオープンソースのOntoGPTパッケージの一部として利用可能である。

Creating knowledge bases and ontologies is a time consuming task that relies on a manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrary complex nested knowledge schemas. Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning (ZSL) and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against GPT-3+ to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for all matched elements. We present examples of use of SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease causation graphs. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction (RE) methods, but has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM. SPIRES is available as part of the open source OntoGPT package: https://github.com/ monarch-initiative/ontogpt.

翻訳日:2023-12-27 22:54:08 公開日:2023-12-22

# ポリタプレット損失を考慮した理解・論理推論タスクの深層マニフォールド学習

Deep Manifold Learning for Reading Comprehension and Logical Reasoning Tasks with Polytuplet Loss ( http://arxiv.org/abs/2304.01046v4 )

ライセンス: Link先を確認

Jeffrey Lu, Ivan Rodriguez

(参考訳) 理解と論理的推論タスクを読む機械学習モデルの開発における現在のトレンドは、論理的ルールを理解し活用するモデルの能力を改善することに焦点を当てている。本研究は、人間が理解や論理的推論タスクを与えられたときに使用する共通の戦略を表現することにより、他のモデルよりも解釈可能なコンポーネントを持つ、新しい損失関数と付随するモデルアーキテクチャを提供することに焦点を当てている。我々の戦略は、絶対的精度よりも相対的精度を強調し、理論的には不完全な知識で正しい答えを生成できる。本稿では,この戦略の有効性を考察し,読解の理解と論理的推論の問題を解き明かす。モデルは、難読性理解と論理的推論ベンチマークであるreclorデータセットで評価された。本稿では,各選択の真の精度を学習するよりも,回答選択の相対的正しさを優先的に学習するポリタップレット損失関数を提案する。以上の結果から,ポリtuplet損失モデルが既存のベースラインモデルよりも優れていることが示唆されたが,その効果を定量化するためにはさらなる研究が必要である。

The current trend in developing machine learning models for reading comprehension and logical reasoning tasks is focused on improving the models' abilities to understand and utilize logical rules. This work focuses on providing a novel loss function and accompanying model architecture that has more interpretable components than some other models by representing a common strategy employed by humans when given reading comprehension and logical reasoning tasks. Our strategy involves emphasizing relative accuracy over absolute accuracy and can theoretically produce the correct answer with incomplete knowledge. We examine the effectiveness of this strategy to solve reading comprehension and logical reasoning questions. The models were evaluated on the ReClor dataset, a challenging reading comprehension and logical reasoning benchmark. We propose the polytuplet loss function, which forces prioritization of learning the relative correctness of answer choices over learning the true accuracy of each choice. Our results indicate that models employing polytuplet loss outperform existing baseline models, though further research is required to quantify the benefits it may present.

翻訳日:2023-12-27 22:53:24 公開日:2023-12-22

# ソーシャルメディアにおけるエンゲージメント,ユーザ満足度,分断コンテンツの増幅

Engagement, User Satisfaction, and the Amplification of Divisive Content on Social Media ( http://arxiv.org/abs/2305.16941v5 )

ライセンス: Link先を確認

Smitha Milli, Micah Carroll, Yike Wang, Sashrika Pandey, Sebastian Zhao, Anca D. Dragan

(参考訳) 事前登録されたランダム化実験で、twitterのエンゲージメントベースのランキングアルゴリズムは、感情的にチャージされ、グループ外で敵対的なコンテンツを増幅し、ユーザーが自分の政治的アウトグループについてより悪くなると感じていることがわかった。さらに,ユーザが選択した政治的つぶやきを好まないことを見出し,エンゲージメントに基づくアルゴリズムがユーザの好みを満たさないことを示唆する。最後に,ユーザの指定した嗜好に基づいてコンテンツのランク付けを行い,怒りやパルチザン,グループ外の敵対的コンテンツの削減に加えて,エコーチェンバーの強化の可能性も探究する。この証拠は、エンゲージメント、ユーザの選好、社会政治的な結果のバランスをとる、より微妙なコンテンツランキングアプローチの必要性を強調している。

In a pre-registered randomized experiment, we found that, relative to a reverse-chronological baseline, Twitter's engagement-based ranking algorithm amplifies emotionally charged, out-group hostile content that users say makes them feel worse about their political out-group. Furthermore, we find that users do not prefer the political tweets selected by the algorithm, suggesting that the engagement-based algorithm underperforms in satisfying users' stated preferences. Finally, we explore the implications of an alternative approach that ranks content based on users' stated preferences and find a reduction in angry, partisan, and out-group hostile content but also a potential reinforcement of echo chambers. The evidence underscores the necessity for a more nuanced approach to content ranking that balances engagement, users' stated preferences, and sociopolitical outcomes.

翻訳日:2023-12-27 22:43:08 公開日:2023-12-22

# DeltaNN:画像認識モデルの性能に及ぼす計算環境パラメータの影響の評価

DeltaNN: Assessing the Impact of Computational Environment Parameters on the Performance of Image Recognition Models ( http://arxiv.org/abs/2306.06208v4 )

ライセンス: Link先を確認

Nikolaos Louloudakis, Perry Gibson, Jos\'e Cano, and Ajitha Rajan

(参考訳) 画像認識タスクは一般的にディープラーニングを使用し、膨大な処理能力を必要とするため、高速でタイムリーな処理にはGPUやTPUなどのハードウェアアクセラレータに依存する。リアルタイム画像認識タスクの失敗は、モデル展開中にハードウェアアクセラレーターのサブ最適マッピングによって起こり、タイミングの不確実性と誤動作を引き起こす可能性がある。ハードウェアアクセラレータのマッピングは、ディープラーニングフレームワークやコンパイラ、デバイスライブラリなど、複数のソフトウェアコンポーネントを使用して行われます。自律運転や医用画像などの安全クリティカルなアプリケーションにおける画像認識タスクの利用の増加により、ディープラーニングフレームワークやコンパイラ最適化、ハードウェアデバイスなどのパラメータがモデル性能や正確性に与える影響が十分に理解されていないため、計算環境の変化に対する彼らの堅牢性を評価することが不可欠である。本稿では,差分テストフレームワーク DeltaNN を提案する。これによって,異なる計算環境パラメータが,展開中の画像認識モデルの性能,ポストトレーニングに与える影響を評価することができる。 DeltaNNは、ディープラーニングフレームワーク、コンパイラ最適化、ハードウェアデバイスなど、環境パラメータの変化に対する所定の画像認識モデルの異なる実装を生成し、結果としてモデルパフォーマンスの違いを分析する。 deltannを用いて,imagenetデータセットを用いた3つの人気のある画像認識モデルのロバスト性解析を行う。異なる設定における誤分類や推論時間の違いによる影響を報告する。合計で、ディープラーニングフレームワーク全体で最大72%のアウトプットラベルの差異を観測し、コンパイラの最適化を適用する場合、推論時間に関して予想外のパフォーマンス低下を最大81%観察した。

Image recognition tasks typically use deep learning and require enormous processing power, thus relying on hardware accelerators like GPUs and TPUs for fast, timely processing. Failure in real-time image recognition tasks can occur due to sub-optimal mapping on hardware accelerators during model deployment, which may lead to timing uncertainty and erroneous behavior. Mapping on hardware accelerators is done using multiple software components like deep learning frameworks, compilers, and device libraries, that we refer to as the computational environment. Owing to the increased use of image recognition tasks in safety-critical applications like autonomous driving and medical imaging, it is imperative to assess their robustness to changes in the computational environment, as the impact of parameters like deep learning frameworks, compiler optimizations, and hardware devices on model performance and correctness is not yet well understood. In this paper we present a differential testing framework, DeltaNN, that allows us to assess the impact of different computational environment parameters on the performance of image recognition models during deployment, post training. DeltaNN generates different implementations of a given image recognition model for variations in environment parameters, namely, deep learning frameworks, compiler optimizations and hardware devices and analyzes differences in model performance as a result. Using DeltaNN, we conduct an empirical study of robustness analysis of three popular image recognition models using the ImageNet dataset. We report the impact in terms of misclassifications and inference time differences across different settings. In total, we observed up to 72% output label differences across deep learning frameworks, and up to 81% unexpected performance degradation in terms of inference time, when applying compiler optimizations.

翻訳日:2023-12-27 22:31:39 公開日:2023-12-22

# 画像認識におけるBuggy Deep Learning Framework変換のためのフォールトローカライゼーション

Fault Localization for Buggy Deep Learning Framework Conversions in Image Recognition ( http://arxiv.org/abs/2306.06157v4 )

ライセンス: Link先を確認

Nikolaos Louloudakis, Perry Gibson, Jos\'e Cano, and Ajitha Rajan

(参考訳) ディープニューラルネットワーク(dnn)をデプロイする場合、開発者はモデルをディープラーニングフレームワークから別のもの(tensorflowからpytorchなど)に変換することが多い。しかし、このプロセスはエラーを起こしやすく、ターゲットモデルの精度に影響を及ぼす可能性がある。画像認識に広く用いられている3つのDNN(MobileNetV2,ResNet101,InceptionV3)に対して,その影響の程度を明らかにするために,よく知られた4つのディープラーニングフレームワーク(PyTorch,Keras,TensorFlow(TF),TFLite)に変換された差分解析を行い,最大72%のモデルクラッシュと出力ラベルの差異を明らかにした。このような誤りを軽減するため,本研究では,事前学習された画像認識モデルに着目した,バギー深層学習フレームワーク変換のフォールトローカライズと修復への新しいアプローチを提案する。我々の手法は4段階の分析から成り立っている。 1)変換ツール、 2)モデルパラメータ。 3)モデルハイパーパラメータ、及び 4)グラフ表現。さらに,検出された障害の障害修復に関する様々な戦略を提案する。我々は,Apache TVMディープラーニングコンパイラ上で,InceptionV3のTFからTFLiteへの変換のための予備的なフォールトローカライズ解析を行うことにより,本手法を実装した。提案手法は,重みの精度誤差を導入し,モデルの精度を低下させる共通DNNコンバータツールの欠陥を検出する。障害ローカライズ後、私たちは問題を修復し、コンバージョンエラーをゼロにしました。

When deploying Deep Neural Networks (DNNs), developers often convert models from one deep learning framework to another (e.g., TensorFlow to PyTorch). However, this process is error-prone and can impact target model accuracy. To identify the extent of such impact, we perform and briefly present a differential analysis against three DNNs widely used for image recognition (MobileNetV2, ResNet101, and InceptionV3) converted across four well-known deep learning frameworks (PyTorch, Keras, TensorFlow (TF), and TFLite), which revealed numerous model crashes and output label discrepancies of up to 72%. To mitigate such errors, we present a novel approach towards fault localization and repair of buggy deep learning framework conversions, focusing on pre-trained image recognition models. Our technique consists of four stages of analysis: 1) conversion tools, 2) model parameters, 3) model hyperparameters, and 4) graph representation. In addition, we propose various strategies towards fault repair of the faults detected. We implement our technique on top of the Apache TVM deep learning compiler, and we test it by conducting a preliminary fault localization analysis for the conversion of InceptionV3 from TF to TFLite. Our approach detected a fault in a common DNN converter tool, which introduced precision errors in weights, reducing model accuracy. After our fault localization, we repaired the issue, reducing our conversion error to zero.

翻訳日:2023-12-27 22:31:11 公開日:2023-12-22

# transformerg2g:transformerを用いた時間グラフ埋め込み学習のための適応時間ステップ

TransformerG2G: Adaptive time-stepping for learning temporal graph embeddings using transformers ( http://arxiv.org/abs/2307.02588v2 )

ライセンス: Link先を確認

Alan John Varghese, Aniruddha Bora, Mengjia Xu, George Em Karniadakis

(参考訳) 動的グラフ埋め込みは、様々なアプリケーションにおける多様な時間グラフ解析タスク(リンク予測、ノード分類、レコメンダシステム、異常検出、グラフ生成など)に対処するための非常に効果的な手法として登場した。このような時間グラフは異質な過渡的ダイナミクス、時間間隔の変化、その進化を通して高度に進化するノードの特徴を示す。したがって、歴史的グラフコンテキストからの長距離依存関係を組み込むことは、時間的ダイナミクスを正確に学習する上で重要な役割を果たす。本稿では,不確かさを定量化したグラフ埋め込みモデルtransformerg2gを開発した。これは,先進的なトランスフォーマーエンコーダを利用して,現在の状態 (t$) と以前の状況 (タイムスタンプ [$t-1, t-l$], $l$ is the length of context) から中間ノード表現を学習する。さらに、2つの射影層を用いて低次元多変量ガウス分布を生成し、各ノードの潜伏埋め込みをtimetamp$t$で行う。我々は,TAA(Temporal Edge Outearance)プロットによって測定された,‘novelty’のレベルが異なる多様なベンチマークを検討する。提案したTransformerG2Gモデルは, リンク予測精度と計算効率の両面から, 従来の多段階法と先行研究(DynG2G)より優れていることを示す。さらに、複数のグラフスナップショットにまたがる学習時間依存の注意重みは、変換器によって実現された自動適応時間ステップの開発を明らかにする。注意重みを調べることで、時間的依存関係を解明し、影響力のある要素を特定し、グラフ構造内の複雑な相互作用についての洞察を得ることができる。例えば,グラフトポロジー進化の様々な段階において,注意重みとノード次数との間に強い相関関係を見出した。

Dynamic graph embedding has emerged as a very effective technique for addressing diverse temporal graph analytic tasks (i.e., link prediction, node classification, recommender systems, anomaly detection, and graph generation) in various applications. Such temporal graphs exhibit heterogeneous transient dynamics, varying time intervals, and highly evolving node features throughout their evolution. Hence, incorporating long-range dependencies from the historical graph context plays a crucial role in accurately learning their temporal dynamics. In this paper, we develop a graph embedding model with uncertainty quantification, TransformerG2G, by exploiting the advanced transformer encoder to first learn intermediate node representations from its current state ($t$) and previous context (over timestamps [$t-1, t-l$], $l$ is the length of context). Moreover, we employ two projection layers to generate lower-dimensional multivariate Gaussian distributions as each node's latent embedding at timestamp $t$. We consider diverse benchmarks with varying levels of ``novelty" as measured by the TEA (Temporal Edge Appearance) plots. Our experiments demonstrate that the proposed TransformerG2G model outperforms conventional multi-step methods and our prior work (DynG2G) in terms of both link prediction accuracy and computational efficiency, especially for high degree of novelty. Furthermore, the learned time-dependent attention weights across multiple graph snapshots reveal the development of an automatic adaptive time stepping enabled by the transformer. Importantly, by examining the attention weights, we can uncover temporal dependencies, identify influential elements, and gain insights into the complex interactions within the graph structure. For example, we identified a strong correlation between attention weights and node degree at the various stages of the graph topology evolution.

翻訳日:2023-12-27 22:19:15 公開日:2023-12-22

# ピック・プレイスにおける対称性の活用

Leveraging Symmetries in Pick and Place ( http://arxiv.org/abs/2308.07948v2 )

ライセンス: Link先を確認

Haojie Huang, Dian Wang, Arsh Tangri, Robin Walters, Robert Platt

(参考訳) ロボットピックと配置タスクは、選択対象と所望の場所ポーズの両方の翻訳と回転の下で対称である。例えば、ピックオブジェクトが回転または変換された場合、最適なピックアクションも回転または変換されるべきである。同じことが、場所のポーズにも当てはまります。所望の場所のポーズが変わった場合、所望の場所のアクションもそれに応じて変化するべきです。 transporter netとして知られる最近提案されたpick and placeフレームワークは、これらの対称性の一部をキャプチャするが、すべてではない。本稿では,平面式ロボットピック・アンド・プレイスに存在する対称性を解析的に研究し,すべての対称性を捉える方法でトランスポーターネットに同変ニューラルモデルを組み込む方法を提案する。 Equivariant Transporter Net と呼ばれる新しいモデルは、ピック・アンド・プレイス・対称性に同値であり、ピック・アンド・プレイス・ポーズに即座に知識を一般化することができる。実験結果から,非対称型モデルよりもサンプル効率が良好であることを示し,様々な模倣学習タスクにおいて,人間によるごく少数のデモンストレーションを用いて,実演されたピック・アンド・プレース動作を模倣できるシステムを開発した。

Robotic pick and place tasks are symmetric under translations and rotations of both the object to be picked and the desired place pose. For example, if the pick object is rotated or translated, then the optimal pick action should also rotate or translate. The same is true for the place pose; if the desired place pose changes, then the place action should also transform accordingly. A recently proposed pick and place framework known as Transporter Net captures some of these symmetries, but not all. This paper analytically studies the symmetries present in planar robotic pick and place and proposes a method of incorporating equivariant neural models into Transporter Net in a way that captures all symmetries. The new model, which we call Equivariant Transporter Net, is equivariant to both pick and place symmetries and can immediately generalize pick and place knowledge to different pick and place poses. We evaluate the new model empirically and show that it is much more sample efficient than the non-symmetric version, resulting in a system that can imitate demonstrated pick and place behavior using very few human demonstrations on a variety of imitation learning tasks.

翻訳日:2023-12-27 22:07:38 公開日:2023-12-22

# 計画による計画のための反復的オプション発見

Iterative Option Discovery for Planning, by Planning ( http://arxiv.org/abs/2310.01569v2 )

ライセンス: Link先を確認

Kenny Young, Richard S. Sutton

(参考訳) オプションという形で有用な時間的抽象化を見つけることは、ますます複雑なドメインに強化学習と計画を適用する上で鍵となると広く考えられている。 alphazeroで使用されるポリシ学習に対するエキスパートイテレーションアプローチの実証的成功に基づいて,オプション発見の類似的なアプローチであるoption iterationを提案する。任意の場所で検索結果にマッチするように訓練された単一の強力なポリシーを学ぶのではなく、オプションイテレーションは、各状態が遭遇するたびに、セット内の少なくとも1つのポリシーが、将来に向けて検索結果にマッチするように訓練された一連のオプションポリシーを学ぶ。直感的には、現在の状態の詳細に複雑な依存関係を持つ単一のグローバルな強いポリシーを学ぶよりも、アルゴリズムが賭けをヘッジできるため、これはかなり簡単かもしれない。このようなローカルな強力なポリシーの集合を学習することで、より優れた選択肢がより良い検索結果に結びつき、より良い選択肢のトレーニングを可能にする、希少なサイクルをもたらす検索アルゴリズムをガイドすることができる。実験により,オプションイテレーションで学習したオプションを用いたプランニングは,プリミティブアクションの空間で動作する類似の計画アルゴリズムと,エキスパートイテレーションによる単一ロールアウトポリシーの学習と比較して,計画環境に挑戦する上で大きなメリットをもたらすことが示された。

Discovering useful temporal abstractions, in the form of options, is widely thought to be key to applying reinforcement learning and planning to increasingly complex domains. Building on the empirical success of the Expert Iteration approach to policy learning used in AlphaZero, we propose Option Iteration, an analogous approach to option discovery. Rather than learning a single strong policy that is trained to match the search results everywhere, Option Iteration learns a set of option policies trained such that for each state encountered, at least one policy in the set matches the search results for some horizon into the future. Intuitively, this may be significantly easier as it allows the algorithm to hedge its bets compared to learning a single globally strong policy, which may have complex dependencies on the details of the current state. Having learned such a set of locally strong policies, we can use them to guide the search algorithm resulting in a virtuous cycle where better options lead to better search results which allows for training of better options. We demonstrate experimentally that planning using options learned with Option Iteration leads to a significant benefit in challenging planning environments compared to an analogous planning algorithm operating in the space of primitive actions and learning a single rollout policy with Expert Iteration.

翻訳日:2023-12-27 21:58:09 公開日:2023-12-22

# See-Through Visuotactile Sensorを用いたマルチモーダルおよびフォースマッチ型模倣学習

Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor ( http://arxiv.org/abs/2311.01248v2 )

ライセンス: Link先を確認

Trevor Ablett, Oliver Limoyo, Adam Sigal, Affan Jilani, Jonathan Kelly, Kaleem Siddiqi, Francois Hogan, Gregory Dudek

(参考訳) Kinesthetic Teachingは、模倣学習(IL)のための接触豊富なタスクの専門的なロボットデモを集めるための一般的なアプローチであるが、通常、ロボットによって環境に置かれる力を無視して、動きを計測するだけである。さらに、接触に富んだタスクは、接触と接触の両方を正確に検知する必要があるため、従来の感覚モダリティの提供は困難である。両センサを用いたSee-Through-Your-Skin (STS) Visuotactile Sensorを用いてこれらの課題に対処する。 (i)審美的指導を改善するための測定ツール、及び (ii)接触式ドア操作タスクにおけるポリシー入力として。 stsセンサは、半透明な表面と制御可能な照明を利用して、視覚モードと触覚モードを切り替えることができ、単一のセンサで、接触前の視覚センシングと接触時の触覚センシングの両方を可能にする。まず,触覚信号を用いた審美的指導の際,ロボットが読み取る力とマッチングできる触覚力マッチング手法を提案する。第2に、STSモードスイッチングを制御するポリシーを開発し、STSを視覚から触覚モードに切り替えるための適切なタイミングを学習できるようにする。最後に,手首装着眼球カメラの視覚データとSTSの視覚的・触覚的データの価値を比較し比較するため,複数の観察構成について検討した。実世界の実験実験から3000回以上のテストエピソードが得られた結果、力のマッチングは平均的な政策成功率を62.5%、STSモードの切り替えは30.3%、STSデータは42.5%向上することが判明した。この結果から, IL のルックスルー触覚センシング, 力のマッチングを可能にするデータ収集, 正確なタスクフィードバックを可能にするポリシー実行の両面での有用性を強調した。

Kinesthetic Teaching is a popular approach to collecting expert robotic demonstrations of contact-rich tasks for imitation learning (IL), but it typically only measures motion, ignoring the force placed on the environment by the robot. Furthermore, contact-rich tasks require accurate sensing of both reaching and touching, which can be difficult to provide with conventional sensing modalities. We address these challenges with a See-Through-your-Skin (STS) visuotactile sensor, using the sensor both (i) as a measurement tool to improve kinesthetic teaching, and (ii) as a policy input in contact-rich door manipulation tasks. An STS sensor can be switched between visual and tactile modes by leveraging a semi-transparent surface and controllable lighting, allowing for both pre-contact visual sensing and during-contact tactile sensing with a single sensor. First, we propose tactile force matching, a methodology that enables a robot to match forces read during kinesthetic teaching using tactile signals. Second, we develop a policy that controls STS mode switching, allowing a policy to learn the appropriate moment to switch an STS from its visual to its tactile mode. Finally, we study multiple observation configurations to compare and contrast the value of visual and tactile data from an STS with visual data from a wrist-mounted eye-in-hand camera. With over 3,000 test episodes from real-world manipulation experiments, we find that the inclusion of force matching raises average policy success rates by 62.5%, STS mode switching by 30.3%, and STS data as a policy input by 42.5%. Our results highlight the utility of see-through tactile sensing for IL, both for data collection to allow force matching, and for policy execution to allow accurate task feedback.

翻訳日:2023-12-27 21:30:05 公開日:2023-12-22

# インテリジェントな製造アプリケーションのための大規模基盤モデル:調査

Large Scale Foundation Models for Intelligent Manufacturing Applications: A Survey ( http://arxiv.org/abs/2312.06718v3 )

ライセンス: Link先を確認

Haotian Zhang, Semujju Stuart Dereck, Zhicheng Wang, Xianwei Lv, Kang Xu, Liang Wu, Ye Jia, Jing Wu, Zhuo Long, Wensheng Liang, X.G. Ma, and Ruiyan Zhuang

(参考訳) 人工知能の応用、特に深層学習は知的製造の様々な側面を大幅に改善したが、一般化能力の貧弱さ、高品質なトレーニングデータセットの確立の困難、ディープラーニング手法の不満足な性能など、幅広い雇用の課題に直面した。大規模な基礎モデル(LSFM)の出現は、人工知能の分野で波を巻き起こし、ディープラーニングモデルをシングルタスク、シングルモーダル、限定データパターンから、多様なタスクを含むパラダイム、マルチモーダル、大規模データセットの事前トレーニングへとシフトさせた。 LSFMは、強力な一般化能力、自動高品質のトレーニングデータセット生成、様々な領域での優れた性能を示したが、LSFMの知能製造への応用はまだ初期段階にあった。このトピックの体系的な概要は欠如しており、特に深層学習の課題がLSFMによってどのように対処され、これらの課題が体系的に取り組まれるかについてである。このギャップを埋めるため,本稿では,現在のlsfm像とその知的製造における利点を体系的に提示した。そして、さまざまなインテリジェントな製造アプリケーションにおいて、現在のディープラーニングモデルが直面する課題と包括的に比較する。 LSFMを利用してこれらの課題に対処するためのロードマップも概説した。最後に、LSFMを実世界のインテリジェントな製造シナリオに適用する事例研究を行い、LSFMが産業にどのように貢献し、その効率を向上するかを示した。

Although the applications of artificial intelligence especially deep learning had greatly improved various aspects of intelligent manufacturing, they still face challenges for wide employment due to the poor generalization ability, difficulties to establish high-quality training datasets, and unsatisfactory performance of deep learning methods. The emergence of large scale foundational models(LSFMs) had triggered a wave in the field of artificial intelligence, shifting deep learning models from single-task, single-modal, limited data patterns to a paradigm encompassing diverse tasks, multimodal, and pre-training on massive datasets. Although LSFMs had demonstrated powerful generalization capabilities, automatic high-quality training dataset generation and superior performance across various domains, applications of LSFMs on intelligent manufacturing were still in their nascent stage. A systematic overview of this topic was lacking, especially regarding which challenges of deep learning can be addressed by LSFMs and how these challenges can be systematically tackled. To fill this gap, this paper systematically expounded current statue of LSFMs and their advantages in the context of intelligent manufacturing. and compared comprehensively with the challenges faced by current deep learning models in various intelligent manufacturing applications. We also outlined the roadmaps for utilizing LSFMs to address these challenges. Finally, case studies of applications of LSFMs in real-world intelligent manufacturing scenarios were presented to illustrate how LSFMs could help industries, improve their efficiency.

翻訳日:2023-12-27 21:11:28 公開日:2023-12-22

# エージェント注意:ソフトマックスと線形注意の統合について

Agent Attention: On the Integration of Softmax and Linear Attention ( http://arxiv.org/abs/2312.08874v2 )

ライセンス: Link先を確認

Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Shiji Song, Gao Huang

(参考訳) attentionモジュールはTransformersの重要なコンポーネントである。グローバルアテンションメカニズムは高い表現性を提供するが、その過剰な計算コストは様々なシナリオで適用性を制限する。本稿では,計算効率と表現力のバランスをとるために,新しい注意パラダイムであるエージェント注意(Agent Attention)を提案する。具体的には、エージェントアテンションは4倍の$(Q, A, K, V)$と表現され、従来のアテンションモジュールに追加のエージェントトークンセット$A$を導入する。エージェントトークンは最初、クエリトークンのエージェントとして機能し、$k$と$v$から情報を集約し、その後、情報を$q$にブロードキャストする。エージェントトークンの数をクエリトークンの数よりもはるかに小さく設計できるため、グローバルコンテキストモデリング能力を維持しつつ、広く採用されているsoftmaxの注意よりもエージェントの注意ははるかに効率的である。興味深いことに,提案するエージェントアテンションは線形アテンションの一般化形式と等価である。したがって,エージェント・アテンションはソフトマックス・アテンションと高効率線形アテンションをシームレスに統合する。広範な実験により、様々な視覚トランスフォーマーや、画像分類、物体検出、意味セグメンテーション、画像生成など、様々な視覚タスクにおけるエージェントの注意の有効性が実証された。特に、エージェントの注意は高解像度シナリオにおいて顕著な性能を示しており、その線形の注意の性質に依拠している。例えば、安定拡散に適用した場合、エージェントアテンションは生成を加速し、追加のトレーニングなしで画像生成品質を大幅に向上させる。コードはhttps://github.com/LeapLabTHU/Agent-Attentionで入手できる。

The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation power. Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$, introduces an additional set of agent tokens $A$ into the conventional attention module. The agent tokens first act as the agent for the query tokens $Q$ to aggregate information from $K$ and $V$, and then broadcast the information back to $Q$. Given the number of agent tokens can be designed to be much smaller than the number of query tokens, the agent attention is significantly more efficient than the widely adopted Softmax attention, while preserving global context modelling capability. Interestingly, we show that the proposed agent attention is equivalent to a generalized form of linear attention. Therefore, agent attention seamlessly integrates the powerful Softmax attention and the highly efficient linear attention. Extensive experiments demonstrate the effectiveness of agent attention with various vision Transformers and across diverse vision tasks, including image classification, object detection, semantic segmentation and image generation. Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its linear attention nature. For instance, when applied to Stable Diffusion, our agent attention accelerates generation and substantially enhances image generation quality without any additional training. Code is available at https://github.com/LeapLabTHU/Agent-Attention.

翻訳日:2023-12-27 20:58:14 公開日:2023-12-22

# ICD-LM:言語モデリングによる視覚言語インテクスト記述の構成

ICD-LM: Configuring Vision-Language In-Context Demonstrations by Language Modeling ( http://arxiv.org/abs/2312.10104v2 )

ライセンス: Link先を確認

Yingzhe Peng, Xu Yang, Haoxuan Ma, Shuo Xu, Chi Zhang, Yucheng Han, Hanwang Zhang

(参考訳) 本稿では,LVLM(Large Vision-Language Model)のための強力なIn-Context Demonstration (ICD) シーケンスをどのように構成し,In-Context Learning (ICL) による視覚-Languageタスクを解決するかを検討する。 icdシーケンスの構成は、文を構成するミラープロセスである、すなわち、言語モデルを介して文を単語単位で構成できるように観察した後、icdシーケンスを1つずつ構成することもできる。その結果、有効なICDシーケンスを生成するために設計されたICD言語モデル(ICD-LM)を導入する。これには、さまざまなクエリサンプルのために手作りのICDシーケンスのデータセットを作成し、それをICD-LMのトレーニングに使用することが含まれる。提案手法は,ICDを別々に選択・注文する従来の方法と異なり,同時にICDを選択・注文する方法を学習し,シーケンスの効果を高める。さらに、データ構築中に、ICL実装を意図したLVLMを使用して、各ICDシーケンスの強度を検証することにより、モデル固有のデータセットと、このデータセットによってトレーニングされたICD-LMもモデル固有である。 ICD設定のための言語モデルを用いて,視覚的質問応答と画像キャプションの実験により,我々の方法論を検証した。本研究は,各種データセット構築およびICD-LM開発環境が結果に及ぼす影響について検討する。コードはhttps://github.com/ForJadeForest/ICD-LMで公開されている。

This paper studies how to configure powerful In-Context Demonstration (ICD) sequences for a Large Vision-Language Model (LVLM) to solve Vision-Language tasks through In-Context Learning (ICL). After observing that configuring an ICD sequence is a mirror process of composing a sentence, i.e., just as a sentence can be composed word by word via a Language Model, an ICD sequence can also be configured one by one. Consequently, we introduce an ICD Language Model (ICD-LM) specifically designed to generate effective ICD sequences. This involves creating a dataset of hand-crafted ICD sequences for various query samples and using it to train the ICD-LM. Our approach, diverging from traditional methods in NLP that select and order ICDs separately, enables to simultaneously learn how to select and order ICDs, enhancing the effect of the sequences. Moreover, during data construction, we use the LVLM intended for ICL implementation to validate the strength of each ICD sequence, resulting in a model-specific dataset and the ICD-LM trained by this dataset is also model-specific. We validate our methodology through experiments in Visual Question Answering and Image Captioning, confirming the viability of using a Language Model for ICD configuration. Our comprehensive ablation studies further explore the impact of various dataset construction and ICD-LM development settings on the outcomes. The code is given in https://github.com/ForJadeForest/ICD-LM.

翻訳日:2023-12-27 20:44:28 公開日:2023-12-22

# 教師付き自己組み立て型インコンテキスト学習によるタスク性能とモデル校正について

On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning ( http://arxiv.org/abs/2312.13772v2 )

ライセンス: Link先を確認

Chengzu Li, Han Zhou, Goran Glava\v{s}, Anna Korhonen, Ivan Vuli\'c

(参考訳) 標準教師付き微調整(SFT)パラダイムに従って、インコンテキスト学習(ICL)は、最近の大規模言語モデル(LLM)の進歩によって推進される効率的なアプローチとなり、数発のデータセットで様々なタスクにわたって有望なパフォーマンスが得られる。しかし、両方のパラダイムは、特にそのような限られたデータ設定において、過信(すなわち誤校正)の致命的な問題に悩まされがちである。本研究では,学習方法の異なる選択に対して,パフォーマンスとキャリブレーションと相互作用の両方の観点から,行動の詳細な分析を行う。広範に制御された実験により,タスク性能とキャリブレーションの同時獲得は困難であり,低リソースシナリオにおけるすべての学習手法に誤校正の問題が存在することがわかった。この性能とキャリブレーションの難しいトレードオフに対処するために、異なるモデリング段階(例えば、インコンテキストの例のバリエーションやプロンプトのバリエーション、異なるアンサンブル戦略など)で適用される自己認識技術の可能性を検討する。 ICLに加えて、SFT上での自己理解の可能性も正当化し、予測を校正し、比較や性能の向上を図る。我々の研究は、選択する学習パラダイムと、タスクパフォーマンスとllmのキャリブレーションの両方を強化する方法に光を当てている。

Following the standard supervised fine-tuning (SFT) paradigm, in-context learning (ICL) has become an efficient approach propelled by the recent advancements in large language models (LLMs), yielding promising performance across various tasks in few-shot data setups. However, both paradigms are prone to suffer from the critical problem of overconfidence (i.e., miscalibration), especially in such limited data setups. In this work, we deliver an in-depth analysis of the behavior across different choices of learning methods from the perspective of both performance and calibration, as well as their interplay. Through extensive controlled experiments, we find that simultaneous gains for both task performance and calibration are difficult to achieve, and the problem of miscalibration exists across all learning methods in low-resource scenarios. To address this challenging trade-off between performance and calibration, we then investigate the potential of self-ensembling techniques applied at different modeling stages (e.g., variations of in-context examples or variations in prompts or different ensembling strategies). We justify the feasibility of self-ensembling on SFT in addition to ICL, to make the predictions more calibrated and have comparable or even better performance. Our work sheds light on which learning paradigm to choose and how to enhance both task performance and calibration of LLMs.

翻訳日:2023-12-27 20:36:46 公開日:2023-12-22

# RGB-only NeRF-SLAMのための3次元型オパシティとハイブリッドオドメトリー

Ternary-type Opacity and Hybrid Odometry for RGB-only NeRF-SLAM ( http://arxiv.org/abs/2312.13332v2 )

ライセンス: Link先を確認

Junru Lin, Asen Nachkov, Songyou Peng, Luc Van Gool, Danda Pani Paudel

(参考訳) 不透明な表面を持つ立体的な3dシーンの不透明性はバイナリタイプであると考えられている。しかし,この特性は既存のRGBのみのNeRF-SLAMに従わないことがわかった。そのため,RGBのみのNeRF-SLAMパイプラインに導入する動機がある。残念なことに、ボリュームトリップレンダリング機能による最適化は、望ましい事前の統合を容易化しない。その代わり, 3次型 (TT) の不透明度は良好に支持されている。本研究では,三元型不透明性が手作業に適している理由について検討する。特に、ボリュームレンダリングプロセスを通じて放射率と不透明度を共同最適化する過程に関する理論的知見を提供する。ベンチマークデータセットに関する徹底的な実験を通じて、我々の主張を検証し、最適化プロセスに関する洞察を提供する。そこで本研究では,ボリュームとワーピングを併用した画像レンダリングを併用した,シンプルながら斬新なビジュアルオドメトリー手法を提案する。より具体的には、提案されたハイブリッドオドメトリ(ho)は、イメージウォーピングベースの粗オドメトリも使用し、最終的なスピードアップを桁違いに導く。さらに,提案するttとhoが相互に補完し,速度と精度の両面でベンチマークデータセットに最先端の結果を提供することを示した。

The opacity of rigid 3D scenes with opaque surfaces is considered to be of a binary type. However, we observed that this property is not followed by the existing RGB-only NeRF-SLAM. Therefore, we are motivated to introduce this prior into the RGB-only NeRF-SLAM pipeline. Unfortunately, the optimization through the volumetric rendering function does not facilitate easy integration of the desired prior. Instead, we observed that the opacity of ternary-type (TT) is well supported. In this work, we study why ternary-type opacity is well-suited and desired for the task at hand. In particular, we provide theoretical insights into the process of jointly optimizing radiance and opacity through the volumetric rendering process. Through exhaustive experiments on benchmark datasets, we validate our claim and provide insights into the optimization process, which we believe will unleash the potential of RGB-only NeRF-SLAM. To foster this line of research, we also propose a simple yet novel visual odometry scheme that uses a hybrid combination of volumetric and warping-based image renderings. More specifically, the proposed hybrid odometry (HO) additionally uses image warping-based coarse odometry, leading up to an order of magnitude final speed-up. Furthermore, we show that the proposed TT and HO well complement each other, offering state-of-the-art results on benchmark datasets in terms of both speed and accuracy.

翻訳日:2023-12-27 20:35:11 公開日:2023-12-22

# 説明可能性保証付きアンサンブルの学習性能最大化

Learning Performance Maximizing Ensembles with Explainability Guarantees ( http://arxiv.org/abs/2312.12715v2 )

ライセンス: Link先を確認

Vincent Pisztora, Jia Li

(参考訳) 本稿では,本質的な説明可能なガラス箱モデルとブラックボックスモデルとの観測を最適に割り当てる手法を提案する。任意の説明可能性レベル(すなわち、説明可能なモデルが予測関数である観察の割合)に対して最適な割り当てが定義され、基礎となるタスク上でのアンサンブルの性能を最大化し、最大アンサンブル性能条件の下で割り当てられた観測に対する説明可能なモデルの性能を最大化する。提案手法は,様々な説明可能およびブラックボックスモデルタイプにわたる表型データセットのベンチマークスイート上で,説明可能性の最適割当を生成する。これらの学習された割り当ては、非常に高い説明可能性レベルでアンサンブルのパフォーマンスを一貫して維持することが判明し(平均で74\%の観察値を示す)、説明可能性を改善しながら、コンポーネント説明可能モデルとブラックボックスモデルの両方を上回ることさえある。

In this paper we propose a method for the optimal allocation of observations between an intrinsically explainable glass box model and a black box model. An optimal allocation being defined as one which, for any given explainability level (i.e. the proportion of observations for which the explainable model is the prediction function), maximizes the performance of the ensemble on the underlying task, and maximizes performance of the explainable model on the observations allocated to it, subject to the maximal ensemble performance condition. The proposed method is shown to produce such explainability optimal allocations on a benchmark suite of tabular datasets across a variety of explainable and black box model types. These learned allocations are found to consistently maintain ensemble performance at very high explainability levels (explaining $74\%$ of observations on average), and in some cases even outperforming both the component explainable and black box models while improving explainability.

翻訳日:2023-12-27 20:34:29 公開日:2023-12-22

# 完全および部分入力依存対称性の自己監視検出

Self-Supervised Detection of Perfect and Partial Input-Dependent Symmetries ( http://arxiv.org/abs/2312.12223v2 )

ライセンス: Link先を確認

Alonso Urbano, David W. Romero

(参考訳) 群同分散は入力の群変換に対する一貫した応答を保証し、より堅牢なモデルと拡張された一般化能力をもたらす。しかし、この性質は、群で見なされる対称性がデータで観察されたものと異なる場合、過度に制約されたモデルをもたらす可能性がある。一般的な手法では、データセットレベルで適切な対称性のレベルを決定することでこの問題に対処するが、同じデータセットに複数の対称性が共存するシナリオは、教師付き設定と無視に限られる。例えば、車と飛行機の写真は異なるレベルの回転を示すが、どちらもCIFAR-10データセットに含まれている。本稿では,ラベルを使わずに各入力の対称性のレベルを検出する手法を提案する。この目的のために、データ内の対称性の分布を学ぶのに十分かつ必要な条件を導出する。学習した分布を用いて擬似ラベルを生成し,各入力の対称性のレベルを自己教師ありで学習する。本研究では, クラスごとに異なる対称性を持つ合成データセット, 例えば mnistmultiple に対して, 数値がクラスに依存して一様回転する手法の有効性を検証する。本手法は,対称性が存在しない標準データセットの生成や,推論中の分布外対称性の検出など,実用的な用途に応用できることを実証する。これにより、非同変モデルの一般化と堅牢性の両方を改善することができる。私たちのコードはhttps://github.com/aurban0/ssl-symで公開されています。

Group equivariance ensures consistent responses to group transformations of the input, leading to more robust models and enhanced generalization capabilities. However, this property can lead to overly constrained models if the symmetries considered in the group differ from those observed in data. While common methods address this by determining the appropriate level of symmetry at the dataset level, they are limited to supervised settings and ignore scenarios in which multiple levels of symmetry co-exist in the same dataset. For instance, pictures of cars and planes exhibit different levels of rotation, yet both are included in the CIFAR-10 dataset. In this paper, we propose a method able to detect the level of symmetry of each input without the need for labels. To this end, we derive a sufficient and necessary condition to learn the distribution of symmetries in the data. Using the learned distribution, we generate pseudo-labels that allow us to learn the levels of symmetry of each input in a self-supervised manner. We validate the effectiveness of our approach on synthetic datasets with different per-class levels of symmetries e.g. MNISTMultiple, in which digits are uniformly rotated within a class-dependent interval. We demonstrate that our method can be used for practical applications such as the generation of standardized datasets in which the symmetries are not present, as well as the detection of out-of-distribution symmetries during inference. By doing so, both the generalization and robustness of non-equivariant models can be improved. Our code is publicly available at https://github.com/aurban0/ssl-sym.

翻訳日:2023-12-27 20:33:54 公開日:2023-12-22

# C2FAR: 高精度確率予測のための粗大な自己回帰ネットワーク

C2FAR: Coarse-to-Fine Autoregressive Networks for Precise Probabilistic Forecasting ( http://arxiv.org/abs/2312.15002v1 )

ライセンス: Link先を確認

Shane Bergsma, Timothy Zeyl, Javad Rahimipour Anaraki, Lei Guo

(参考訳) 本稿では,不定値の数値確率変数の確率分布をモデル化する手法であるc2farを提案する。 c2farは、各分布が予め生成された粗い間隔で条件づけされた複数のバイナリ分布から、段階的により細かい支持間隔を生成する。以前の(平坦な)双対分布とは異なり、C2FARは複雑性の線形増加のため、指数的に高い精度で値を表現することができる。我々はC2FARを用いて、繰り返しニューラルネットワークによる確率予測を行い、空間と時間の両方で時系列を自動回帰的にモデル化する。 C2FARは任意のスケールと分布形状の離散連続列を同時に扱う最初の方法である。この柔軟性は、異常検出、補間、圧縮など、さまざまな時系列ユースケースを可能にする。 C2FARは、いくつかのベンチマーク予測データセットの最先端よりも改善されている。

We present coarse-to-fine autoregressive networks (C2FAR), a method for modeling the probability distribution of univariate, numeric random variables. C2FAR generates a hierarchical, coarse-to-fine discretization of a variable autoregressively; progressively finer intervals of support are generated from a sequence of binned distributions, where each distribution is conditioned on previously-generated coarser intervals. Unlike prior (flat) binned distributions, C2FAR can represent values with exponentially higher precision, for only a linear increase in complexity. We use C2FAR for probabilistic forecasting via a recurrent neural network, thus modeling time series autoregressively in both space and time. C2FAR is the first method to simultaneously handle discrete and continuous series of arbitrary scale and distribution shape. This flexibility enables a variety of time series use cases, including anomaly detection, interpolation, and compression. C2FAR achieves improvements over the state-of-the-art on several benchmark forecasting datasets.

翻訳日:2023-12-27 20:26:21 公開日:2023-12-22

# 構成を一般化するモジュラー解の発見

Discovering modular solutions that generalize compositionally ( http://arxiv.org/abs/2312.15001v1 )

ライセンス: Link先を確認

Simon Schug, Seijin Kobayashi, Yassir Akram, Maciej Wo{\l}czyk, Alexandra Proca, Johannes von Oswald, Razvan Pascanu, Jo\~ao Sacramento, Angelika Steger

(参考訳) 多くの複雑なタスクや環境は、単純で独立した部分に分解できる。このような構成構造の発見は、適応を迅速化し、構成の一般化を可能にする可能性を秘めている。進歩にもかかわらず、我々の最も強力なシステムは柔軟に組み立てるのに苦労している。これらのシステムのほとんどはモノリシックだが、モジュール性によって多くのタスクの構成的性質をキャプチャできる。しかし、モジュラーシステムがこの隠れた構成構造を発見する状況は不明である。そこで,本研究では,地中真理モジュールの構成を完全に制御できるモジュール型教師を用いた教師学生設定について検討する。これにより、構成的一般化の問題と基盤となるモジュールの識別の問題とを関連付けることができる。実演から純粋に線形変換への同定は,指数関数的な加群の組み合わせを学習することなく,ハイパーネットで可能であることを示す。我々の理論は無限のデータ限界を前提としているが、有限データからのメタラーニングが、構成をモジュラーに一般化するがモノリシックなアーキテクチャではないモジュラーソリューションをいかに発見できるかを実証する。さらに,我々の洞察が教師の学習環境の外側に翻訳され,構成的選好と構成的目標を持つタスクにおいて,ハイパーネットワークが構成的に一般化するモジュラーポリシーを発見できることを実証する。

Many complex tasks and environments can be decomposed into simpler, independent parts. Discovering such underlying compositional structure has the potential to expedite adaptation and enable compositional generalization. Despite progress, our most powerful systems struggle to compose flexibly. While most of these systems are monolithic, modularity promises to allow capturing the compositional nature of many tasks. However, it is unclear under which circumstances modular systems discover this hidden compositional structure. To shed light on this question, we study a teacher-student setting with a modular teacher where we have full control over the composition of ground truth modules. This allows us to relate the problem of compositional generalization to that of identification of the underlying modules. We show theoretically that identification up to linear transformation purely from demonstrations is possible in hypernetworks without having to learn an exponential number of module combinations. While our theory assumes the infinite data limit, in an extensive empirical study we demonstrate how meta-learning from finite data can discover modular solutions that generalize compositionally in modular but not monolithic architectures. We further show that our insights translate outside the teacher-student setting and demonstrate that in tasks with compositional preferences and tasks with compositional goals hypernetworks can discover modular policies that compositionally generalize.

翻訳日:2023-12-27 20:26:03 公開日:2023-12-22

# デジタルフットプリントのクローズがユーザのプライバシーとパーソナライゼーションに及ぼす影響

The Impact of Cloaking Digital Footprints on User Privacy and Personalization ( http://arxiv.org/abs/2312.15000v1 )

ライセンス: Link先を確認

Sofie Goethals, Sandra Matz, Foster Provost, Yanou Ramon, David Martens

(参考訳) 私たちのオンライン生活は、技術プラットフォームによって蓄積され活用される、豊富な行動記録('デジタルフットプリント')を生み出します。このデータは、サービスをパーソナライズすることで、ユーザにとっての価値を生み出すために使用できる。しかし同時に、個人の特性(例えば、彼らの個性、政治的イデオロギー、性的指向)に非常に親密な窓を提供することで、人々のプライバシーを脅かす。以前の研究は、ユーザのフットプリントのクローキングという潜在的な修正を提案している。つまり、ユーザーは予測アルゴリズムからデジタルフットプリントの一部を隠して、望ましくない推論を避けることができる。このようなアプローチは、現時点ではプライバシー保護を提供することが示されているが、2つのオープンな疑問がある。第一に、クローキングが時間とともにどれだけうまく機能するかは不明だ。人々が常に新しいデジタルフットプリントを離れるにつれて、アルゴリズムは以前のクロークされた特性を予測する能力を取り戻すかもしれない。第二に、望ましくない推論を避けるためにデジタルフットプリントをクローズすることは、他の望ましい推論(例えば、望ましいパーソナライズされたコンテンツを駆動しているもの)に対するモデルの性能を低下させる可能性がある。これらの研究ギャップに照らして、私たちの貢献は2つあります。 1)メタフィーチャー(自動生成高レベルカテゴリ)を隠蔽する新しいクローキング戦略を提案し,その効果を既存のクローキングアプローチと比較する。 2) 一つの形質が他の形質に対する推論の正確性に及ぼす影響を検証した。重要な発見は、クローキングの有効性は時間とともに低下するが、その低下率は、個々のフットプリントよりもメタフィーチャーをクロークする場合にかなり小さいことである。さらに、われわれの発見はプライバシーとパーソナライゼーションのトレードオフが期待されていることを明らかにしている: 望ましくない特徴を隠すことも、他の望ましい特徴を部分的に隠している。

Our online lives generate a wealth of behavioral records -'digital footprints'- which are stored and leveraged by technology platforms. This data can be used to create value for users by personalizing services. At the same time, however, it also poses a threat to people's privacy by offering a highly intimate window into their private traits (e.g., their personality, political ideology, sexual orientation). Prior work has proposed a potential remedy: The cloaking of users' footprints. That is, platforms could allow users to hide portions of their digital footprints from predictive algorithms to avoid undesired inferences. While such an approach has been shown to offer privacy protection in the moment, there are two open questions. First, it remains unclear how well cloaking performs over time. As people constantly leave new digital footprints, the algorithm might regain the ability to predict previously cloaked traits. Second, cloaking digital footprints to avoid one undesirable inference may degrade the performance of models for other, desirable inferences (e.g., those driving desired personalized content). In the light of these research gaps, our contributions are twofold: 1) We propose a novel cloaking strategy that conceals 'metafeatures' (automatically generated higher-level categories) and compares its effectiveness against existing cloaking approaches, and 2) we test the spill-over effects of cloaking one trait on the accuracy of inferences on other traits. A key finding is that the effectiveness of cloaking degrades over times, but the rate at which it degrades is significantly smaller when cloaking metafeatures rather than individual footprints. In addition, our findings reveal the expected trade-off between privacy and personalization: Cloaking an undesired trait also partially conceals other desirable traits.

翻訳日:2023-12-27 20:25:40 公開日:2023-12-22

# きめ細かい鳥の識別のためのハビタット情報の活用

Leveraging Habitat Information for Fine-grained Bird Identification ( http://arxiv.org/abs/2312.14999v1 )

ライセンス: Link先を確認

Tin Nguyen, Anh Nguyen

(参考訳) 従来の鳥分類器は、主に鳥の視覚特性に依存している。以前の作品の中には、背景に不変な分類器を訓練し、鳥類の生活環境を完全に破棄するものもある。その代わり、私たちは鳥類学者によって鳥類を識別する4つの主要な方法の1つである生息地情報を現代の鳥類分類器に統合する研究を初めて行った。 1)下流の鳥のデータセットに基づいて訓練されたCNNとViT,(2)オリジナルでマルチモーダルなCLIPである。 CNNとViTを生息地データでトレーニングすると、NABirdsとCUB-200で最大0.83点、+0.23点が改善される。同様に、CLIPのプロンプトに生息地記述子を追加すると、NABirdsとCUB-200で最大0.99と+1.1ポイントの精度が向上する。画像拡張プロセスと視覚言語CLIP分類器のテキスト記述子に環境特徴を統合することにより,一貫した精度の向上が得られた。コードは、https://anonymous.4open.science/r/reasoning-8B7E/で入手できる。

Traditional bird classifiers mostly rely on the visual characteristics of birds. Some prior works even train classifiers to be invariant to the background, completely discarding the living environment of birds. Instead, we are the first to explore integrating habitat information, one of the four major cues for identifying birds by ornithologists, into modern bird classifiers. We focus on two leading model types: (1) CNNs and ViTs trained on the downstream bird datasets; and (2) original, multi-modal CLIP. Training CNNs and ViTs with habitat-augmented data results in an improvement of up to +0.83 and +0.23 points on NABirds and CUB-200, respectively. Similarly, adding habitat descriptors to the prompts for CLIP yields a substantial accuracy boost of up to +0.99 and +1.1 points on NABirds and CUB-200, respectively. We find consistent accuracy improvement after integrating habitat features into the image augmentation process and into the textual descriptors of vision-language CLIP classifiers. Code is available at: https://anonymous.4open.science/r/reasoning-8B7E/.

翻訳日:2023-12-27 20:25:10 公開日:2023-12-22

# 合成画像は人造アート偽造者の認識を助ける

Synthetic images aid the recognition of human-made art forgeries ( http://arxiv.org/abs/2312.14998v1 )

ライセンス: Link先を確認

Johann Ostmeyer, Ludovica Schaerf, Pavel Buividovich, Tessa Charles, Eric Postma, Carina Popovici

(参考訳) これまでの研究によると、人工知能は特定のアーティストによる本物の絵画と、驚くほどの精度で人造の偽造品を区別できるという。しかし, 既知偽造の数が限られているため, 偽造検出のための増補法が望まれる。本研究では, 合成アートワークをトレーニングデータセットに組み込むことにより, 偽造検出性能を向上させる可能性を検討する。我々はVincent van Gogh氏による絵画に焦点を当て、偽造検出に特化した最初のデータセットをリリースしました。結果を強化するため、Amedeo Modigliani と Raphael で同様の分析を行った。原画と偽物とを区別するために分類器を訓練する。このために、有名なアーティストのスタイルで人造の偽造品や模倣品を使用し、Stable DiffusionとStyleGANが生成した同様のスタイルのイメージでトレーニングセットを拡張する。追加の合成偽造物は、一貫して人造偽造物の検出を改善している。さらに, 従来の研究と並行して, トレーニングに合成偽造物を含めることで, 特に類似の発電機を用いて生成したAI生成偽造物の検出が可能となった。

Previous research has shown that Artificial Intelligence is capable of distinguishing between authentic paintings by a given artist and human-made forgeries with remarkable accuracy, provided sufficient training. However, with the limited amount of existing known forgeries, augmentation methods for forgery detection are highly desirable. In this work, we examine the potential of incorporating synthetic artworks into training datasets to enhance the performance of forgery detection. Our investigation focuses on paintings by Vincent van Gogh, for which we release the first dataset specialized for forgery detection. To reinforce our results, we conduct the same analyses on the artists Amedeo Modigliani and Raphael. We train a classifier to distinguish original artworks from forgeries. For this, we use human-made forgeries and imitations in the style of well-known artists and augment our training sets with images in a similar style generated by Stable Diffusion and StyleGAN. We find that the additional synthetic forgeries consistently improve the detection of human-made forgeries. In addition, we find that, in line with previous research, the inclusion of synthetic forgeries in the training also enables the detection of AI-generated forgeries, especially if created using a similar generator.

翻訳日:2023-12-27 20:24:49 公開日:2023-12-22

# ブリッジングAIと臨床実践: 自動睡眠スコアアルゴリズムと不確かさガイドの医師レビューの統合

Bridging AI and Clinical Practice: Integrating Automated Sleep Scoring Algorithm with Uncertainty-Guided Physician Review ( http://arxiv.org/abs/2312.14996v1 )

ライセンス: Link先を確認

Michal Bechny (1 and 2), Giuliana Monachino (1 and 2), Luigi Fiorillo (2), Julia van der Meer (3), Markus H. Schmidt (3 and 4), Claudio L. A. Bassetti (3), Athina Tzovara (1 and 5), Francesca D. Faraci (2) ((1) Institute of Computer Science, University of Bern, Bern, Switzerland (2) Institute of Digital Technologies for Personalized Healthcare (MeDiTech), University of Applied Sciences and Arts of Southern Switzerland, Lugano, Switzerland (3) Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland (4) Ohio Sleep Medicine Institute, Dublin, United States (5) Center for Experimental Neurology, Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland)

(参考訳) 目的: 本研究の目的は, 予測催眠図のマニュアルレビューにおいて, 臨床医を効率的に支援するための不確実性推定手法を組み込むことにより, 自動睡眠コーリングアルゴリズムの臨床的利用を促進することである。本研究は,事前定義された合意レベルを達成するために必要なレビュー範囲を目標とし,ドメイン内データとドメイン外データの両方を調べ,対象者の診断を検討する。患者と方法:13のオープンアクセスデータベースから合計19578のPSGを使用して、最先端の睡眠スコアアルゴリズムであるU-Sleepをトレーニングした。我々は、年齢と睡眠障害の全スペクトルをカバーする8832psgの総合的な臨床データベースを利用して、u-sleepを洗練し、新しい信頼ネットワークを含む異なる不確実性定量化アプローチを評価する。 idデータは50名以上の医師が獲得したpsgからなり、2つのoodセットはそれぞれユニークな上級医師が記録した。結果: U-Sleepは堅牢な性能を示し、CohenのKappaはIDが76.2%、OODデータが73.8-78.8%だった。信頼ネットワークは不確実な予測の特定に優れており、AUROCはIDが85.7%、OODデータが82.5-85.6%だった。睡眠障害状態とは関係なく, 統計的評価では, 整合性と不協和性予測の信頼スコアの有意差がみられた。医師の介入で少なくとも90%のKを達成するためには、不確実なエポックの29.0%未満を検査し、医師の負担を大幅に減らし、ほぼ完全な合意を容易にした。

Purpose: This study aims to enhance the clinical use of automated sleep-scoring algorithms by incorporating an uncertainty estimation approach to efficiently assist clinicians in the manual review of predicted hypnograms, a necessity due to the notable inter-scorer variability inherent in polysomnography (PSG) databases. Our efforts target the extent of review required to achieve predefined agreement levels, examining both in-domain and out-of-domain data, and considering subjects diagnoses. Patients and methods: Total of 19578 PSGs from 13 open-access databases were used to train U-Sleep, a state-of-the-art sleep-scoring algorithm. We leveraged a comprehensive clinical database of additional 8832 PSGs, covering a full spectrum of ages and sleep-disorders, to refine the U-Sleep, and to evaluate different uncertainty-quantification approaches, including our novel confidence network. The ID data consisted of PSGs scored by over 50 physicians, and the two OOD sets comprised recordings each scored by a unique senior physician. Results: U-Sleep demonstrated robust performance, with Cohen's kappa (K) at 76.2% on ID and 73.8-78.8% on OOD data. The confidence network excelled at identifying uncertain predictions, achieving AUROC scores of 85.7% on ID and 82.5-85.6% on OOD data. Independently of sleep-disorder status, statistical evaluations revealed significant differences in confidence scores between aligning vs discording predictions, and significant correlations of confidence scores with classification performance metrics. To achieve K of at least 90% with physician intervention, examining less than 29.0% of uncertain epochs was required, substantially reducing physicians workload, and facilitating near-perfect agreement.

翻訳日:2023-12-27 20:24:29 公開日:2023-12-22

# 大規模マルチモーダルモデルを用いた多機能食品アシスタントFoodLMM

FoodLMM: A Versatile Food Assistant using Large Multi-modal Model ( http://arxiv.org/abs/2312.14991v1 )

ライセンス: Link先を確認

Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, Chong-Wah Ngo

(参考訳) 大規模マルチモーダルモデル(LMM)は多くの視覚言語タスクにおいて顕著な進歩を遂げている。しかし、特定の領域における一般LMMの性能は、まだ十分ではない。本稿では,食品認識,食材認識,レシピ生成,栄養推定,食品セグメンテーション,多ラウンド会話など,多機能なLMMに基づく多目的食品アシスタントであるFoodLMMを提案する。純粋なテキスト出力以外のタスクの処理を容易にするために,一連のタスク固有のトークンとヘッドを導入し,食品栄養値と複数のセグメンテーションマスクの予測を可能にした。 2段階のトレーニング戦略を採用しています。第1段階では,インストラクション・フォロー・パラダイムを活用し,マルチタスク学習に複数の公開食品ベンチマークを利用する。第2段階では,マルチラウンド会話と推論セグメンテーションデータセットを構築し,モデルを微調整し,食事領域における複雑な推論に基づく専門的な対話やセグメンテーションマスクの生成を可能にする。微調整したFoodLMMは、いくつかの食品ベンチマークで最先端の結果が得られる。コード、モデル、データセットを一般公開します。

Large Multi-modal Models (LMMs) have made impressive progress in many vision-language tasks. Nevertheless, the performance of general LMMs in specific domains is still far from satisfactory. This paper proposes FoodLMM, a versatile food assistant based on LMMs with various capabilities, including food recognition, ingredient recognition, recipe generation, nutrition estimation, food segmentation and multi-round conversation. To facilitate FoodLMM to deal with tasks beyond pure text output, we introduce a series of novel task-specific tokens and heads, enabling the model to predict food nutritional values and multiple segmentation masks. We adopt a two-stage training strategy. In the first stage, we utilize multiple public food benchmarks for multi-task learning by leveraging instruct-following paradigm. In the second stage, we construct a multi-round conversation and a reasoning segmentation datasets to fine-tune the model, enabling it to conduct professional dialogues and generate segmentation masks based on complex reasoning in food domain. Our fine-tuned FoodLMM achieves state-of-the-art results across several food benchmarks. We will make our code, models and datasets publicly available.

翻訳日:2023-12-27 20:23:52 公開日:2023-12-22

# オープンワールド連続学習のための知識伝達促進のための学習

Learning to Prompt Knowledge Transfer for Open-World Continual Learning ( http://arxiv.org/abs/2312.14990v1 )

ライセンス: Link先を確認

Yujie Li, Xin Yang, Hao Wang, Xiangkun Wang and Tianrui Li

(参考訳) 本稿では,open-world continual learning (owcl) と呼ばれるオープンワールドシナリオにおける連続学習の問題について述べる。 OwCLは増加傾向にあり、2倍に非常に挑戦的です。一過去の知識を忘れることなく、一連のタスクを学習すること。二将来の未知物(未知物又はクラス)を識別すること。既存のowclメソッドは、既知のものと未知の間のタスク認識境界の適応性に苦しみ、知識伝達のメカニズムを考慮しない。本稿では,OwCLの知識伝達モデルであるPro-KTを提案する。 Pro-KTは、(1)タスクジェネリックな知識とタスク固有の知識の両方をエンコードし、転送するプロンプトバンク、(2)タスクアウェアなオープンセット境界により、新しいタスクの未知を識別する。 2つの実世界のデータセットを用いた実験の結果、提案したPro-KTは未知の発見と既知の分類の両方において最先端のデータセットよりも優れていた。

This paper studies the problem of continual learning in an open-world scenario, referred to as Open-world Continual Learning (OwCL). OwCL is increasingly rising while it is highly challenging in two-fold: i) learning a sequence of tasks without forgetting knowns in the past, and ii) identifying unknowns (novel objects/classes) in the future. Existing OwCL methods suffer from the adaptability of task-aware boundaries between knowns and unknowns, and do not consider the mechanism of knowledge transfer. In this work, we propose Pro-KT, a novel prompt-enhanced knowledge transfer model for OwCL. Pro-KT includes two key components: (1) a prompt bank to encode and transfer both task-generic and task-specific knowledge, and (2) a task-aware open-set boundary to identify unknowns in the new tasks. Experimental results using two real-world datasets demonstrate that the proposed Pro-KT outperforms the state-of-the-art counterparts in both the detection of unknowns and the classification of knowns markedly.

翻訳日:2023-12-27 20:23:35 公開日:2023-12-22

# Emage:非自己回帰型テキスト画像生成

Emage: Non-Autoregressive Text-to-Image Generation ( http://arxiv.org/abs/2312.14988v1 )

ライセンス: Link先を確認

Zhangyin Feng, Runyi Hu, Liangxin Liu, Fan Zhang, Duyu Tang, Yong Dai, Xiaocheng Feng, Jiwei Li, Bing Qin, Shuming Shi

(参考訳) 自己回帰モデルと拡散モデルは、テキストから画像への生成における最近のブレークスルーを駆動する。自動回帰モデルは画像トークンを生成するために数千回以上連続して実行され、拡散モデルはガウスノイズを数百のデノゲーションステップでイメージに変換する。本研究では,何百もの画像トークンを並列に効率的に生成する非自己回帰的テキスト・画像モデルについて検討する。学習戦略や推論戦略,初期化テキストエンコーダなど,さまざまなモデルバリエーションを開発しています。 1000回実行する必要がある自己回帰ベースラインと比較すると、私たちのモデルは16回しか動作せず、非常に低い推論レイテンシで競合品質のイメージを生成します。 346Mパラメータを持つ我々の非自己回帰モデルは、256$\times$256の画像を1つのV100 GPU上で約1秒生成する。

Autoregressive and diffusion models drive the recent breakthroughs on text-to-image generation. Despite their huge success of generating high-realistic images, a common shortcoming of these models is their high inference latency - autoregressive models run more than a thousand times successively to produce image tokens and diffusion models convert Gaussian noise into images with many hundreds of denoising steps. In this work, we explore non-autoregressive text-to-image models that efficiently generate hundreds of image tokens in parallel. We develop many model variations with different learning and inference strategies, initialized text encoders, etc. Compared with autoregressive baselines that needs to run one thousand times, our model only runs 16 times to generate images of competitive quality with an order of magnitude lower inference latency. Our non-autoregressive model with 346M parameters generates an image of 256$\times$256 with about one second on one V100 GPU.

翻訳日:2023-12-27 20:23:17 公開日:2023-12-22

# 立体規則化生体力学平衡による変形性画像登録

Deformable Image Registration with Stochastically Regularized Biomechanical Equilibrium ( http://arxiv.org/abs/2312.14987v1 )

ライセンス: Link先を確認

Pablo Alvarez (MIMESIS), St\'ephane Cotin (MIMESIS)

(参考訳) 変形可能な画像登録のための多数の正規化手法は、スムーズな変換を強制することを目的としているが、事前調整が困難であり、明確な物理的基盤が欠如している。物理的にインスピレーションを受けた戦略が出現し、健全な理論的基礎を提供するが、それでも複雑な離散化と解決のスキームを必要とする。本研究は, 医用画像登録の物理的動機付けによる正規化のメリットを維持しつつ, 離散化を必要としない正規化戦略を導入し, 現行の登録フレームワークと互換性を持たせた。提案手法は合成データと実データの両方において好適に動作し,現在の最先端手法に匹敵する精度を示す。

Numerous regularization methods for deformable image registration aim at enforcing smooth transformations, but are difficult to tune-in a priori and lack a clear physical basis. Physically inspired strategies have emerged, offering a sound theoretical basis, but still necessitating complex discretization and resolution schemes. This study introduces a regularization strategy that does not require discretization, making it compatible with current registration frameworks, while retaining the benefits of physically motivated regularization for medical image registration. The proposed method performs favorably in both synthetic and real datasets, exhibiting an accuracy comparable to current state-of-the-art methods.

翻訳日:2023-12-27 20:23:06 公開日:2023-12-22

# unihuman:野生で人間の画像を編集するための統一モデル

UniHuman: A Unified Model for Editing Human Images in the Wild ( http://arxiv.org/abs/2312.14985v1 )

ライセンス: Link先を確認

Nannan Li, Qing Liu, Krishna Kumar Singh, Yilin Wang, Jianming Zhang, Bryan A. Plummer, Zhe Lin

(参考訳) 人間の画像編集には、人のポーズや服装を変えたり、テキストのプロンプトに従って画像を編集したりするタスクが含まれる。しかし、先行研究はしばしばこれらの課題に別々に取り組み、共同学習による相互強化の利益を見落としている。本論文では,実際の環境下での人間の画像編集の複数の側面を扱う統一モデルUniHumanを提案する。モデルの生成品質と一般化能力を高めるために、人間の視覚エンコーダからのガイダンスを活用して、異なるポーズ表現を活用できる軽量なポーズウォーピングモジュールを導入し、目に見えないテクスチャやパターンに適応する。さらに,既存の人体編集ベンチマークと実世界のデータとの格差を埋めるために,400Kの高品質な人体画像テキストペアをトレーニングし,ドメイン外テストのために2Kの人体画像を収集した。ドメイン内テストセットとドメイン外テストセットの両方の実験では、UniHumanがタスク固有のモデルよりも大きなマージンで優れていることが示されている。ユーザスタディでは、UniHumanは平均して77%のケースでユーザに好まれる。

Human image editing includes tasks like changing a person's pose, their clothing, or editing the image according to a text prompt. However, prior work often tackles these tasks separately, overlooking the benefit of mutual reinforcement from learning them jointly. In this paper, we propose UniHuman, a unified model that addresses multiple facets of human image editing in real-world settings. To enhance the model's generation quality and generalization capacity, we leverage guidance from human visual encoders and introduce a lightweight pose-warping module that can exploit different pose representations, accommodating unseen textures and patterns. Furthermore, to bridge the disparity between existing human editing benchmarks with real-world data, we curated 400K high-quality human image-text pairs for training and collected 2K human images for out-of-domain testing, both encompassing diverse clothing styles, backgrounds, and age groups. Experiments on both in-domain and out-of-domain test sets demonstrate that UniHuman outperforms task-specific models by a significant margin. In user studies, UniHuman is preferred by the users in an average of 77% of cases.

翻訳日:2023-12-27 20:22:52 公開日:2023-12-22

# TPTNet:乱流電位温度に基づくデータ駆動温度予測モデル

TPTNet: A Data-Driven Temperature Prediction Model Based on Turbulent Potential Temperature ( http://arxiv.org/abs/2312.14980v1 )

ライセンス: Link先を確認

Jun Park and Changhoon Lee

(参考訳) 数値気象予測(NWP)の計算負担を軽減するため,ニューラルネットワークを用いた表面温度予測のためのデータ駆動モデルを提案した。 TPTNetと命名された我々のモデルは, 気象観測所で観測された2mの温度のみを用いて, 限られた予報時間における局部温度の予測を行う。年間および毎日の変動を考慮した気候成分を分離し, 観測値から温度の乱流変動成分を抽出した。ステーション高度の影響は、潜在的な温度を導入することで補償された。その結果得られた不規則分布局の乱流電位温度データは、畳み込みニューラルネットワーク(cnn)、スウィントランス、グラフィックニューラルネットワーク(gnn)に基づいて、3つの訓練されたネットワークを通して予測時間における乱流電位温度を予測する入力として用いられた。ネットワークの予測性能はpersistenceとnwpと比較され、モデルが最大12時間nwpを上回ったことを確認した。

A data-driven model for predicting the surface temperature using neural networks was proposed to alleviate the computational burden of numerical weather prediction (NWP). Our model, named TPTNet uses only 2m temperature measured at the weather stations of the South Korean Peninsula as input to predict the local temperature at finite forecast hours. The turbulent fluctuation component of the temperature was extracted from the station measurements by separating the climatology component accounting for the yearly and daily variations. The effect of station altitude was then compensated by introducing a potential temperature. The resulting turbulent potential temperature data at irregularly distributed stations were used as input for predicting the turbulent potential temperature at forecast hours through three trained networks based on convolutional neural network (CNN), Swin Transformer, and a graphic neural network (GNN). The prediction performance of our network was compared with that of persistence and NWP, confirming that our model outperformed NWP for up to 12 forecast hours.

翻訳日:2023-12-27 20:22:32 公開日:2023-12-22

# 予測自由エネルギー最小化による情報探索多項式narxモデル予測制御

Information-seeking polynomial NARX model-predictive control through expected free energy minimization ( http://arxiv.org/abs/2312.15046v1 )

ライセンス: Link先を確認

Wouter M. Kouw

(参考訳) 本稿では,システムの目標状態への運転と,非線形自己回帰的外因性モデルのパラメータに関する情報的システム観測を求める適応型モデル予測制御器を提案する。コントローラの目的関数は期待される自由エネルギー関数から派生し、モデルパラメータや出力予測に対する不確実性を表す情報理論用語を含む。パラメータの不確かさが制御対象にどのように影響するかを実験で示し、振り子スイングアップタスクのための提案したコントローラを評価する。

We propose an adaptive model-predictive controller that balances driving the system to a goal state and seeking system observations that are informative with respect to the parameters of a nonlinear autoregressive exogenous model. The controller's objective function is derived from an expected free energy functional and contains information-theoretic terms expressing uncertainty over model parameters and output predictions. Experiments illustrate how parameter uncertainty affects the control objective and evaluate the proposed controller for a pendulum swing-up task.

翻訳日:2023-12-27 20:15:53 公開日:2023-12-22

# 連続時間における集合列の確率的モデリング

Probabilistic Modeling for Sequences of Sets in Continuous-Time ( http://arxiv.org/abs/2312.15045v1 )

ライセンス: Link先を確認

Yuxin Chang, Alex Boyd, Padhraic Smyth

(参考訳) ニューラルマーク付き時間的ポイントプロセスは、連続時間イベントデータのための統計パラメトリックモデルの既存のツールボックスに価値ある追加である。これらのモデルは、各イベントが1つのアイテム(単一のイベントタイプまたは"マーク")に関連付けられるシーケンスに役立ちますが、これらのモデルは、各イベントが一連のアイテムに関連付けられる実用的な状況には適していません。本研究では,インテンシティに基づくリカレントニューラルポイントプロセスモデルと互換性のある,連続時間にセット値データをモデリングするための汎用フレームワークを開発した。さらに,このようなモデルを用いて,シーケンス履歴を条件とした「アイテム $b$ 前に観測されるアイテム $a$ の確率」のような確率的クエリに答える推論手法を開発した。このようなクエリの正確な答えの計算は、問題設定の連続時間の性質と、各イベントの潜在的な結果の組合せ的に大きな空間の両方によって、神経モデルでは一般的には役に立たない。そこで,本研究では,実世界の4つのデータセットを用いた体系的な実験を通して,直接サンプリングよりも桁違いに効率が向上することを示す。また、このフレームワークを用いて1段階の予測を伴わない確率を用いてモデル選択を行う方法について説明する。

Neural marked temporal point processes have been a valuable addition to the existing toolbox of statistical parametric models for continuous-time event data. These models are useful for sequences where each event is associated with a single item (a single type of event or a "mark") -- but such models are not suited for the practical situation where each event is associated with a set of items. In this work, we develop a general framework for modeling set-valued data in continuous-time, compatible with any intensity-based recurrent neural point process model. In addition, we develop inference methods that can use such models to answer probabilistic queries such as "the probability of item $A$ being observed before item $B$," conditioned on sequence history. Computing exact answers for such queries is generally intractable for neural models due to both the continuous-time nature of the problem setting and the combinatorially-large space of potential outcomes for each event. To address this, we develop a class of importance sampling methods for querying with set-based sequences and demonstrate orders-of-magnitude improvements in efficiency over direct sampling via systematic experiments with four real-world datasets. We also illustrate how to use this framework to perform model selection using likelihoods that do not involve one-step-ahead prediction.

翻訳日:2023-12-27 20:15:45 公開日:2023-12-22

# GroundVLP:視覚言語事前学習とオープン語彙オブジェクト検出によるゼロショット視覚グラウンドのハーネス化

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection ( http://arxiv.org/abs/2312.15043v1 )

ライセンス: Link先を確認

Haozhan Shen, Tiancheng Zhao, Mingwei Zhu, Jianwei Yin

(参考訳) ビジュアルグラウンド(Visual Grounding)は、クエリ表現に基づく視覚的コンテキストの理解を含む重要な視覚言語タスクであり、オブジェクト間の相互作用をキャプチャするモデルと、様々な空間的および属性情報を必要とする。しかし、視覚的接地作業のアノテーションデータは、その時間と労働集約的なアノテーションプロセスによって制限され、訓練されたモデルは、その能力をより広い領域に一般化することから制約される。この課題に対処するために,画像テキストペアと純粋なオブジェクト検出データから学習した既存のモデルから視覚的接地能力を活用する,シンプルで効果的なゼロショット手法であるGroundVLPを提案する。 GroundVLPはGradCAMのヒートマップとオープン語彙検出器のオブジェクト提案を組み合わせた融合機構を提案する。提案手法は,RefCOCOとRefCOCO+のテスト分割において,従来のゼロショット・オブ・ザ・アートを約28倍上回り,RefCOCO//gデータセット上の他のゼロショット・メソッドを著しく上回ることを示す。さらに、GroundVLPはFlickr30kエンティティデータセット上のいくつかの非VLPベースの教師付きモデルと互換性があるか、それ以上に機能する。私たちのコードはhttps://github.com/om-ai-lab/GroundVLPで利用可能です。

Visual grounding, a crucial vision-language task involving the understanding of the visual context based on the query expression, necessitates the model to capture the interactions between objects, as well as various spatial and attribute information. However, the annotation data of visual grounding task is limited due to its time-consuming and labor-intensive annotation process, resulting in the trained models being constrained from generalizing its capability to a broader domain. To address this challenge, we propose GroundVLP, a simple yet effective zero-shot method that harnesses visual grounding ability from the existing models trained from image-text pairs and pure object detection data, both of which are more conveniently obtainable and offer a broader domain compared to visual grounding annotation data. GroundVLP proposes a fusion mechanism that combines the heatmap from GradCAM and the object proposals of open-vocabulary detectors. We demonstrate that the proposed method significantly outperforms other zero-shot methods on RefCOCO/+/g datasets, surpassing prior zero-shot state-of-the-art by approximately 28\% on the test split of RefCOCO and RefCOCO+. Furthermore, GroundVLP performs comparably to or even better than some non-VLP-based supervised models on the Flickr30k entities dataset. Our code is available at https://github.com/om-ai-lab/GroundVLP.

翻訳日:2023-12-27 20:15:05 公開日:2023-12-22

# 学習分析ダッシュボードはハイプに耐えただろうか? 学生の達成、モチベーション、参加、態度への影響に関する体系的考察

Have Learning Analytics Dashboards Lived Up to the Hype? A Systematic Review of Impact on Students' Achievement, Motivation, Participation and Attitude ( http://arxiv.org/abs/2312.15042v1 )

ライセンス: Link先を確認

Rogers Kaliisa, Kamila Misiejuk, Sonsoles L\'opez-Pernas, Mohammad Khalil, Mohammed Saqr

(参考訳) 学習分析ダッシュボード(LAD)はLA介入の最も一般的な形態であるが、学生の学習結果への影響については限定的な証拠がある。本研究は,学生の学習成果,達成,参加,モチベーション,態度にLADが与える影響を総合的に調査するために,38件の研究成果を総合するものである。私たちが現在立っているように、LADが学術的業績を改善するという約束を果たすまで生きてきたという結論を支持する証拠はない。ほとんどの研究は無視または小さな効果を報告し、十分に制御された実験の限られた証拠を報告した。多くの研究は、ladのユーザと非ユーザを比較し、ダッシュボード効果を学生のエンゲージメントレベルと組み合わせている。同様に、LADがモチベーションや態度に与える影響は、わずかに例外的に顕著な効果を示した。これらの研究の小さなサンプルサイズは、これらの発見を検証するための大規模な調査の必要性を強調している。特に、LADは学生参加に比較的大きな影響を及ぼした。いくつかの研究は中～大きな効果の大きさを報告し、LADがオンライン学習環境におけるエンゲージメントと相互作用を促進することを示唆している。しかし, 従来の評価手法への依存, 自己選択バイアス, 利用に等しいという仮定, 標準化された評価ツールの欠如など, 方法論上の欠点が繰り返し発生する。 ladの研究ラインを前進させるために、研究者は厳密な評価手法を使い、学習構成を評価するための明確な基準を確立する必要がある。このような取り組みは、LADの可能性の理解を深め、学習成果を高め、教育者や研究者にも貴重な洞察を提供する。

While learning analytics dashboards (LADs) are the most common form of LA intervention, there is limited evidence regarding their impact on students learning outcomes. This systematic review synthesizes the findings of 38 research studies to investigate the impact of LADs on students' learning outcomes, encompassing achievement, participation, motivation, and attitudes. As we currently stand, there is no evidence to support the conclusion that LADs have lived up to the promise of improving academic achievement. Most studies reported negligible or small effects, with limited evidence from well-powered controlled experiments. Many studies merely compared users and non-users of LADs, confounding the dashboard effect with student engagement levels. Similarly, the impact of LADs on motivation and attitudes appeared modest, with only a few exceptions demonstrating significant effects. Small sample sizes in these studies highlight the need for larger-scale investigations to validate these findings. Notably, LADs showed a relatively substantial impact on student participation. Several studies reported medium to large effect sizes, suggesting that LADs can promote engagement and interaction in online learning environments. However, methodological shortcomings, such as reliance on traditional evaluation methods, self-selection bias, the assumption that access equates to usage, and a lack of standardized assessment tools, emerged as recurring issues. To advance the research line for LADs, researchers should use rigorous assessment methods and establish clear standards for evaluating learning constructs. Such efforts will advance our understanding of the potential of LADs to enhance learning outcomes and provide valuable insights for educators and researchers alike.

翻訳日:2023-12-27 20:14:24 公開日:2023-12-22

# twitter上のバイアスド・メディカル・クレームのカスケード検出に向けて

Towards Detecting Cascades of Biased Medical Claims on Twitter ( http://arxiv.org/abs/2312.15040v1 )

ライセンス: Link先を確認

Libby Tiderman, Juan Sanchez Mercedes, Fiona Romanoschi, Fabricio Murai

(参考訳) ソーシャルメディアは、社会的識別子と病気の間の誤解を招く相関関係を強調する医療的主張を広める可能性がある。われわれの研究は、Twitter上の偏りのある医療クレームを特定し、その拡散を測定することを目的としている。本稿では,医学的クレームを検出するRoBERTaとバイアスを分類するDistilBERTという2つのモデルを用いた機械学習フレームワークを提案する。偏りのある医療クレームを特定した後、リツイートカスケード分析を行い、個々のリーチと拡散率を計算した。偏りのあるクレームを含むツイートは、偏りのないクレームよりも速く、さらに拡散することが判明した。

Social media may disseminate medical claims that highlight misleading correlations between social identifiers and diseases due to not accounting for structural determinants of health. Our research aims to identify biased medical claims on Twitter and measure their spread. We propose a machine learning framework that uses two models in tandem: RoBERTa to detect medical claims and DistilBERT to classify bias. After identifying original biased medical claims, we conducted a retweet cascade analysis, computing their individual reach and rate of spread. Tweets containing biased claims were found to circulate faster and further than unbiased claims.

翻訳日:2023-12-27 20:13:37 公開日:2023-12-22

# latents2semantics: 顔画像の局所的なスタイル操作に生成モデルの潜在空間を利用する

Latents2Semantics: Leveraging the Latent Space of Generative Models for Localized Style Manipulation of Face Images ( http://arxiv.org/abs/2312.15037v1 )

ライセンス: Link先を確認

Snehal Singh Tomar, A.N. Rajagopalan

(参考訳) メタバースが徐々に現実のものとなり、デジタル人間の創造に向けた急速な発展のペースを考えると、人間の顔のための原理化されたスタイルの編集パイプラインの必要性は多様体を増加させることに縛られる。顔画像中の複数の領域(ROI)のスタイル属性の高度に局所化された編集を容易にする生成オートエンコーダモデルであるLatents2Semantics Autoencoder (L2SAE)を導入することで、このニーズに応える。 L2SAEは、符号化された画像の構造とスタイル情報に対する別個の潜在表現を学習する。これにより、選択したroisの構造保存スタイル編集が可能になる。符号化された構造表現は空間次元を小さくしたマルチチャネル2次元テンソルであり、局所構造特性と大域構造特性の両方をキャプチャする。スタイル表現はグローバルなスタイル属性をキャプチャする1Dテンソルである。フレームワークでは、構造表現をスライスして、異なるROIの強い不整合対応を構築する。選択されたROIのスタイル編集は、単純な組み合わせに相当します。 (a)スライスされた構造表現から生じるROIマスク及び (b)グローバルスタイル(ガウスノイズを使用)と不変構造テンソルから生成されたグローバルスタイル変更によるデコード画像。スタイル編集は、スタイル編集に意味的意味をもたらすために、既存の作品の多くは追加の人的努力(スーパービジョン)を必要とするため、SOTAスタイルの編集パイプラインよりも人的監督が優れている。また、反復最適化に基づく反転や、計算コストのかかる演算を必要とする訓練後の潜在方向の制御を廃止する。複数のデータセットからサンプリングされたテスト画像を用いて、選択的なスタイル編集やスワップなど、複数のアプリケーションに対して、定性的かつ定量的な結果を提供する。

With the metaverse slowly becoming a reality and given the rapid pace of developments toward the creation of digital humans, the need for a principled style editing pipeline for human faces is bound to increase manifold. We cater to this need by introducing the Latents2Semantics Autoencoder (L2SAE), a Generative Autoencoder model that facilitates highly localized editing of style attributes of several Regions of Interest (ROIs) in face images. The L2SAE learns separate latent representations for encoded images' structure and style information. Thus, allowing for structure-preserving style editing of the chosen ROIs. The encoded structure representation is a multichannel 2D tensor with reduced spatial dimensions, which captures both local and global structure properties. The style representation is a 1D tensor that captures global style attributes. In our framework, we slice the structure representation to build strong and disentangled correspondences with different ROIs. Consequentially, style editing of the chosen ROIs amounts to a simple combination of (a) the ROI-mask generated from the sliced structure representation and (b) the decoded image with global style changes, generated from the manipulated (using Gaussian noise) global style and unchanged structure tensor. Style editing sans additional human supervision is a significant win over SOTA style editing pipelines because most existing works require additional human effort (supervision) post-training for attributing semantic meaning to style edits. We also do away with iterative-optimization-based inversion or determining controllable latent directions post-training, which requires additional computationally expensive operations. We provide qualitative and quantitative results for the same over multiple applications, such as selective style editing and swapping using test images sampled from several datasets.

翻訳日:2023-12-27 20:13:00 公開日:2023-12-22

# SODA:オンデバイス機械学習モデルにおけるプライオリティ情報保護

SODA: Protecting Proprietary Information in On-Device Machine Learning Models ( http://arxiv.org/abs/2312.15036v1 )

ライセンス: Link先を確認

Akanksha Atrey, Ritwik Sinha, Saayan Mitra, Prashant Shenoy

(参考訳) ローエンドハードウェアの成長は、エッジアプリケーションにおける機械学習ベースのサービスの増加につながった。これらのアプリケーションはユーザに関するコンテキスト情報を収集し、マシンラーニング(ML)モデルを通じてパーソナライズされたオファーなどのサービスを提供する。このようなMLモデルをユーザのデバイスにデプロイすることで、レイテンシの低減、ユーザのプライバシの維持、集中的なソースへの継続的依存の最小化を実現している。しかし、ユーザのエッジデバイスにMLモデルをデプロイすると、サービスプロバイダに関するプロプライエタリな情報が漏洩する可能性がある。本研究では,モバイルサービス提供に使用されるオンデバイスMLモデルについて検討し,簡単な攻撃がサービスプロバイダのプロプライエタリな情報を漏洩させる可能性を実証する。異なる敵が容易にこのようなモデルを利用して利益を最大化し、コンテンツ盗難を達成できることを示す。このような攻撃を阻止する必要性に感銘を受け、敵の攻撃を防ぎながらエッジデバイス上でのデプロイとサービスを行うためのエンドツーエンドフレームワークであるSODAを提示する。以上の結果から,サービス性能,レイテンシ,ストレージへの影響を最小限に抑えつつ,50クエリ未満で89%の精度で敵使用を検出できることが示唆された。

The growth of low-end hardware has led to a proliferation of machine learning-based services in edge applications. These applications gather contextual information about users and provide some services, such as personalized offers, through a machine learning (ML) model. A growing practice has been to deploy such ML models on the user's device to reduce latency, maintain user privacy, and minimize continuous reliance on a centralized source. However, deploying ML models on the user's edge device can leak proprietary information about the service provider. In this work, we investigate on-device ML models that are used to provide mobile services and demonstrate how simple attacks can leak proprietary information of the service provider. We show that different adversaries can easily exploit such models to maximize their profit and accomplish content theft. Motivated by the need to thwart such attacks, we present an end-to-end framework, SODA, for deploying and serving on edge devices while defending against adversarial usage. Our results demonstrate that SODA can detect adversarial usage with 89% accuracy in less than 50 queries with minimal impact on service performance, latency, and storage.

翻訳日:2023-12-27 20:12:05 公開日:2023-12-22

# 2時間量子ゆらぎのアプローチとBethe-Salpeter方程式との関係

Two-Time Quantum Fluctuations Approach and its Relation to the Bethe--Salpeter Equation ( http://arxiv.org/abs/2312.15034v1 )

ライセンス: Link先を確認

Erik Schroedter and Michael Bonitz

(参考訳) 平衡状態の関連量子多粒子系は、相関固体、超低温原子、高密度プラズマを含む多くの分野で高い関心を持つ。これらのシステムの正確な理論記述は、概念的にも計算資源に関しても困難である。我々は最近、非平衡 $gw$ 近似(英語版)(nonequilibrium $gw$ approximation)と同値な量子揺らぎのアプローチを提示した。 Schroedter \textit{et al。と、Cond。マット Phys 23401 (2022)] 計算コストが低い場合に高い精度を保証します。第二の出版物で. Schroedter \textit{et al。とPhys。 B \textbf{108}, 205109 (2023)] では、このアプローチは2時間交換相関関数と密度応答特性にまで拡張された。ここでは、このアプローチの特性をより詳細に分析する。一般化されたkadanoff-baym ansatz と hartree-fock propagator を適用した場合、この手法は2回交換相関関数の bethe-salpeter 方程式と等価であることを示す。

Correlated quantum many-particle systems out of equilibrium are of high interest in many fields, including correlated solids, ultracold atoms or dense plasmas. Accurate theoretical description of these systems is challenging both, conceptionally and with respect to computational resources. We have recently presented a quantum fluctuations approach which is equivalent to the nonequilibrium $GW$ approximation [E. Schroedter \textit{et al.}, Cond. Matt. Phys. \textbf{25}, 23401 (2022)] that promises high accuracy at low computational cost. In a second publication [E. Schroedter \textit{et al.}, Phys. Rev. B \textbf{108}, 205109 (2023)], this approach was extended to the two-time exchange-correlation functions and the density response properties. Here, we analyze the properties of this approach in more detail. We demonstrate that the method is equivalent to the Bethe--Salpeter equation for the two-time exchange-correlation function when the generalized Kadanoff-Baym ansatz with Hartree-Fock propagators is applied.

翻訳日:2023-12-27 20:11:46 公開日:2023-12-22

# 解釈可能な推論時間干渉によるLLMの空間誘導ホロスティック説明法

Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention ( http://arxiv.org/abs/2312.15033v1 )

ライセンス: Link先を確認

Zhen Tan, Tianlong Chen, Zhenyu Zhang, Huan Liu

(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理領域において前例のないブレークスルーを達成した。しかし、llmsの謎めいた「ブラックボックス」の性質は、透過的かつ説明可能な応用を妨げる、解釈可能性にとって重要な課題である。注目の可視化、重要なサブネットワーク抽出、概念に基づく分析といった過去のアプローチは、いくつかの洞察を与えるが、彼らはしばしば1次元内の局所的またはグローバルな説明に焦点を合わせ、時には包括的明確性の提供に不足する。そこで本研究では,LLMの全体的解釈を目的とし,空間性誘導技術に係わる新たな方法論を提案する。我々のフレームワークは、SparseCBMと呼ばれ、空間性を革新的に統合し、インプット、サブネットワーク、コンセプトレベルという3つの相互解釈層を解明する。さらに、新たに導入された解釈可能な推論時間介入の次元は、展開中のモデルに対する動的調整を容易にする。実世界のデータセットに対する厳密な経験的評価を通じて、SparseCBMはLLMの振る舞いを深く理解し、モデルの不正確な解釈と改善の両面で分離することを実証した。コードはサプリメントで提供される。

Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains. However, the enigmatic ``black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications. While past approaches, such as attention visualization, pivotal subnetwork extraction, and concept-based analyses, offer some insight, they often focus on either local or global explanations within a single dimension, occasionally falling short in providing comprehensive clarity. In response, we propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs. Our framework, termed SparseCBM, innovatively integrates sparsity to elucidate three intertwined layers of interpretation: input, subnetwork, and concept levels. In addition, the newly introduced dimension of interpretable inference-time intervention facilitates dynamic adjustments to the model during deployment. Through rigorous empirical evaluations on real-world datasets, we demonstrate that SparseCBM delivers a profound understanding of LLM behaviors, setting it apart in both interpreting and ameliorating model inaccuracies. Codes are provided in supplements.

翻訳日:2023-12-27 20:11:22 公開日:2023-12-22

# Federated Q-Learning: 通信コストの低い線形レグレット高速化

Federated Q-Learning: Linear Regret Speedup with Low Communication Cost ( http://arxiv.org/abs/2312.15023v1 )

ライセンス: Link先を確認

Zhong Zheng, Fengyu Gao, Lingzhou Xue, Jing Yang

(参考訳) 本稿では,中央サーバの協調の下で複数のエージェントが協調して環境を探索し,それらの生データを共有することなく最適な方針を学習する,表状エピソディックマルコフ決定プロセス(mdp)のためのフェデレート強化学習について検討する。収束率やサンプルの複雑さなどの指標では,エージェント数の線形スピードアップが達成されているが,通信コストの低い線形後悔スピードアップを実現するために,モデルフリーなアルゴリズムを設計できるかどうかは不明である。本稿では,FedQ-Hoeffding とFedQ-Bernstein という2つの連立Q-Learningアルゴリズムを提案し,時間的地平線が十分に大きい場合と比較して,対応する全後悔が線形なスピードアップを達成することを示し,通信コストは時間的ステップの総数$T$で対数的にスケールすることを示した。これらの結果は、エージェントとサーバ間のイベントトリガー同期機構、サーバがステートアクション値の局所的な見積を集約してグローバルな見積を形成する場合の新たなステップサイズ選択、および非マーチンゲール差の和を束縛する新しい濃度不等式に頼っている。これは、連帯強化学習におけるモデルフリーアルゴリズムによって線形後悔のスピードアップと対数コミュニケーションコストが達成できることを示す最初の研究である。

In this paper, we consider federated reinforcement learning for tabular episodic Markov Decision Processes (MDP) where, under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. While linear speedup in the number of agents has been achieved for some metrics, such as convergence rate and sample complexity, in similar settings, it is unclear whether it is possible to design a model-free algorithm to achieve linear regret speedup with low communication cost. We propose two federated Q-Learning algorithms termed as FedQ-Hoeffding and FedQ-Bernstein, respectively, and show that the corresponding total regrets achieve a linear speedup compared with their single-agent counterparts when the time horizon is sufficiently large, while the communication cost scales logarithmically in the total number of time steps $T$. Those results rely on an event-triggered synchronization mechanism between the agents and the server, a novel step size selection when the server aggregates the local estimates of the state-action values to form the global estimates, and a set of new concentration inequalities to bound the sum of non-martingale differences. This is the first work showing that linear regret speedup and logarithmic communication cost can be achieved by model-free algorithms in federated reinforcement learning.

翻訳日:2023-12-27 20:10:59 公開日:2023-12-22

# 統一マルチモーダル推論フレームワークに向けて

Towards a Unified Multimodal Reasoning Framework ( http://arxiv.org/abs/2312.15021v1 )

ライセンス: Link先を確認

Abhinav Arun and Dipendra Singh Mal and Mehul Soni and Tomohiro Sawada

(参考訳) 近年のディープラーニングの進歩は、様々なタスクに優れた強力な言語モデル(LM)の開発につながっている。これらの成果にもかかわらず、特に推論能力の向上とマルチモーダルデータの導入には改善の余地がある。本報告は,複数質問の解答におけるLMの精度を向上させるために,CoT推論とVQA技術を組み合わせることによる潜在的影響について検討する。テキストVQAとScienceQAを用いて、3つのテキスト埋め込み手法と3つの視覚埋め込み手法の有効性を評価した。本実験は,CoTとVQAの複合的影響を調査することによって,現在の研究のギャップを埋めることを目的としており,これらの技術がGPT-4のような最先端モデルの推論能力をいかに改善できるかの理解に寄与している。実験の結果は、LMの推論能力と質問応答能力の向上、この分野におけるさらなる研究と開発のための洞察の提供、および複数のモードにわたる複雑な推論タスクを処理可能なより正確で信頼性の高いAIシステムの実現における、これらのアプローチの可能性を実証した。

Recent advancements in deep learning have led to the development of powerful language models (LMs) that excel in various tasks. Despite these achievements, there is still room for improvement, particularly in enhancing reasoning abilities and incorporating multimodal data. This report investigates the potential impact of combining Chain-of-Thought (CoT) reasoning and Visual Question Answering (VQA) techniques to improve LM's accuracy in solving multiple-choice questions. By employing TextVQA and ScienceQA datasets, we assessed the effectiveness of three text embedding methods and three visual embedding approaches. Our experiments aimed to fill the gap in current research by investigating the combined impact of CoT and VQA, contributing to the understanding of how these techniques can improve the reasoning capabilities of state-of-the-art models like GPT-4. Results from our experiments demonstrated the potential of these approaches in enhancing LM's reasoning and question-answering capabilities, providing insights for further research and development in the field, and paving the way for more accurate and reliable AI systems that can handle complex reasoning tasks across multiple modalities.

翻訳日:2023-12-27 20:10:31 公開日:2023-12-22

# Gemini vs GPT-4V : 定性ケースによる視覚言語モデルの予備比較と組み合わせ

Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases ( http://arxiv.org/abs/2312.15011v1 )

ライセンス: Link先を確認

Zhangyang Qi, Ye Fang, Mengchen Zhang, Zeyi Sun, Tong Wu, Ziwei Liu, Dahua Lin, Jiaqi Wang, Hengshuang Zhao

(参考訳) MLLM(Multi-modal Large Language Models)の急速に発展する分野は、人工知能における言語処理と視覚処理の統合の最前線にある。本稿では,GoogleのGeminiとOpenAIのGPT-4V(ision)の2つのパイオニアモデルについて,詳細な比較研究を行った。本研究は,視覚言語能力,人間とのインタラクション,時間的理解,知性と感情的商の両方における評価など,両モデルの多面的評価を含む。分析の核心は、それぞれのモデルの視覚的理解能力に分解されます。各種産業応用シナリオにおける性能評価のための構造化実験を行い,実用性に関する総合的な考察を行った。直接的なパフォーマンス比較だけでなく、均衡と公正な分析を保証するためのプロンプトやシナリオの調整も含んでいます。我々の発見は、両方のモデルのユニークな強みとニッチを照らしている。 GPT-4Vは応答の正確さと簡潔さで自分自身を区別し、ジェミニは関連する画像とリンクを伴って詳細で拡張的な回答を提供する。これらの理解は、geminiとgpt-4vの比較的な利点に光を当てただけでなく、マルチモーダル基礎モデルの進化の風景を強調し、この分野における将来の進歩への道を開いた。比較後, 2つのモデルを組み合わせることにより, より良い結果を得ることができた。最後に、GPT-4VとGeminiの開発チームに、この分野への先駆的な貢献を感謝します。当社の認定は、Yang et al の 'Dawn' で示された包括的質的分析にまで拡張されている。本研究は, 画像サンプル, プロンプト, GPT-4V関連結果の広範な収集とともに, 解析の基礎となった。

The rapidly evolving sector of Multi-modal Large Language Models (MLLMs) is at the forefront of integrating linguistic and visual processing in artificial intelligence. This paper presents an in-depth comparative study of two pioneering models: Google's Gemini and OpenAI's GPT-4V(ision). Our study involves a multi-faceted evaluation of both models across key dimensions such as Vision-Language Capability, Interaction with Humans, Temporal Understanding, and assessments in both Intelligence and Emotional Quotients. The core of our analysis delves into the distinct visual comprehension abilities of each model. We conducted a series of structured experiments to evaluate their performance in various industrial application scenarios, offering a comprehensive perspective on their practical utility. We not only involve direct performance comparisons but also include adjustments in prompts and scenarios to ensure a balanced and fair analysis. Our findings illuminate the unique strengths and niches of both models. GPT-4V distinguishes itself with its precision and succinctness in responses, while Gemini excels in providing detailed, expansive answers accompanied by relevant imagery and links. These understandings not only shed light on the comparative merits of Gemini and GPT-4V but also underscore the evolving landscape of multimodal foundation models, paving the way for future advancements in this area. After the comparison, we attempted to achieve better results by combining the two models. Finally, We would like to express our profound gratitude to the teams behind GPT-4V and Gemini for their pioneering contributions to the field. Our acknowledgments are also extended to the comprehensive qualitative analysis presented in 'Dawn' by Yang et al. This work, with its extensive collection of image samples, prompts, and GPT-4V-related results, provided a foundational basis for our analysis.

翻訳日:2023-12-27 20:10:12 公開日:2023-12-22

# SI-MIL:ギガピクセル病理における自己解釈性のための深部MILのモデリング

SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology ( http://arxiv.org/abs/2312.15010v1 )

ライセンス: Link先を確認

Saarthak Kapse, Pushpak Pati, Srijan Das, Jingwei Zhang, Chao Chen, Maria Vakalopoulou, Joel Saltz, Dimitris Samaras, Rajarsi R. Gupta, Prateek Prasanna

(参考訳) ギガピクセルスライドの複雑さを考えると、全スライド画像(WSI)解析のための解釈可能性と推論をMIL(Multiple Instance Learning)手法に導入することは困難である。伝統的に、ミル解釈性は下流タスクに適していると考えられる突出した領域を特定することに限定されており、これらの選択の背景にある根拠についてエンドユーザー(病理学者)にほとんど洞察を与えていない。そこで本研究では,自己解釈型MIL(Self-Interpretable MIL, SI-MIL)を提案する。 SI-MILは、手作りの病理的特徴に基づく解釈可能な分岐をガイドし、線形予測を容易にする。 SI-MILは、正常な領域を識別する以外に、WSIの病理学的洞察に根ざした特徴レベルの解釈を提供する。特に、SI-MILは線形予測制約を伴い、モデル解釈可能性と性能の間の必然的なトレードオフの神話に挑戦し、3種類の癌に対してWSIレベルの予測タスクに関する最先端の手法と比較して、競争の結果を示す。さらに,si-milの局所的およびグローバル的解釈可能性について,統計的分析,ドメインエキスパート研究,解釈可能性のデシデラタ,すなわちユーザフレンドリーさと忠実性の観点から徹底的に評価した。

Introducing interpretability and reasoning into Multiple Instance Learning (MIL) methods for Whole Slide Image (WSI) analysis is challenging, given the complexity of gigapixel slides. Traditionally, MIL interpretability is limited to identifying salient regions deemed pertinent for downstream tasks, offering little insight to the end-user (pathologist) regarding the rationale behind these selections. To address this, we propose Self-Interpretable MIL (SI-MIL), a method intrinsically designed for interpretability from the very outset. SI-MIL employs a deep MIL framework to guide an interpretable branch grounded on handcrafted pathological features, facilitating linear predictions. Beyond identifying salient regions, SI-MIL uniquely provides feature-level interpretations rooted in pathological insights for WSIs. Notably, SI-MIL, with its linear prediction constraints, challenges the prevalent myth of an inevitable trade-off between model interpretability and performance, demonstrating competitive results compared to state-of-the-art methods on WSI-level prediction tasks across three cancer types. In addition, we thoroughly benchmark the local- and global-interpretability of SI-MIL in terms of statistical analysis, a domain expert study, and desiderata of interpretability, namely, user-friendliness and faithfulness.

翻訳日:2023-12-27 20:09:44 公開日:2023-12-22

# ChatGPTの算数能力に及ぼすプロンプト, ペルソナ, および思考方法の連鎖の影響の評価

Assessing the Impact of Prompting, Persona, and Chain of Thought Methods on ChatGPT's Arithmetic Capabilities ( http://arxiv.org/abs/2312.15006v1 )

ライセンス: Link先を確認

Yuhao Chen, Chloe Wong, Hanwen Yang, Juan Aguenza, Sai Bhujangari, Benthan Vu, Xun Lei, Amisha Prasad, Manny Fluss, Eric Phuong, Minghao Liu, James Davis

(参考訳) 本研究は,OpenAIの言語モデルChatGPTの数学的習熟度を,戦略的プロンプト,ペルソナ実装,思考の連鎖といった3つの規範的手法の効率に対して,デフォルトの計算能力を近似することで評価する。この評価は、数学の広い範囲と複雑さのレベルを包含する、数学、gsm8k、mmluデータセットの多様で広範な問題集合を活用した。モデルの数学的精度を高めるためにこれらの介入の有効性を判断するために洗練されたグレーディングスクリプトが設計された。期待に反して,実験手法ではchatgptのベースライン性能が大幅に向上することはなかった。いくつかのケースでは、これらの介入は不注意にモデルの応答生成を妨害した。この調査は、言語モデルの性能向上のための革新的な戦略の追求は依然として重要であるが、本研究では、ChatGPTの計算能力に大きな改善をもたらすことはなかった。これらの知見は、様々な領域にまたがるモデルの精度と信頼性を高めるために、より包括的な研究と新しい技術の探索の重要性を浮き彫りにしている。

This study critically evaluates the mathematical proficiency of OpenAI's language model, ChatGPT, by juxtaposing its default computational capabilities against the efficiency of three prescriptive methods: strategic prompting, persona implementation, and the Chain of Thought approach. The evaluation harnessed the diverse and extensive problem sets from the MATH, GSM8K, and MMLU data-sets, which encompassing a broad spectrum of mathematical conundrums and levels of complexity. A sophisticated grading script was designed to determine the efficacy of these interventions in enhancing the model's mathematical precision. Contrary to expectations, our empirical analysis revealed that none of the trialed methods substantially improved ChatGPT's baseline performance. In some cases, these interventions inadvertently disrupted the model's response generation. This investigation concluded that while the pursuit of innovative strategies for augmenting language model performance remains crucial, the specific methods examined within this study did not induce significant improvements in ChatGPT's computational aptitude. These findings underscore the importance of further comprehensive research and exploration of novel techniques to enhance the precision and dependability of such models across diverse domains.

翻訳日:2023-12-27 20:09:18 公開日:2023-12-22

# FineMoGen: 微粒な時空間運動生成と編集

FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing ( http://arxiv.org/abs/2312.15004v1 )

ライセンス: Link先を確認

Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei Liu

(参考訳) テキスト駆動モーション生成は拡散モデルの出現によって大きく進歩した。しかし、既存の手法では、細かな記述に対応する複雑な動き列を生成するのに苦労しており、詳細かつ正確な時空間的動作を描写している。この制御性の欠如は、モーション生成の使用をより多くのオーディエンスに制限する。このような課題に対処するために,ユーザの指示に空間的時間的組成を組み込んだ微細な動きを合成できる拡散型モーション生成・編集フレームワークであるFineMoGenを提案する。具体的には、FineMoGenはSAMI(Spatio-Temporal Mixture Attention)と呼ばれる新しいトランスフォーマーアーキテクチャで拡散モデルを構築している。 SAMIは2つの視点からグローバルアテンションテンプレートの生成を最適化する。 1)時空間構成の制約を明示的にモデル化し, 2) 微粒化を適応的に抽出するために, スパース活性混合物を利用する。本研究は,2,968本の動画と102,336本の微細な時空間記述からなるHumman-MoGenデータセットを寄贈する。大規模な実験により、FineMoGenは最先端の手法よりも優れたモーション生成品質を示すことが示された。特に、FinMoGenは、最新の大言語モデル(LLM)の助けを借りて、よりきめ細かな命令で動きシーケンスを忠実に操作することで、ゼロショットモーション編集を可能にする。プロジェクトページ: https://mingyuan-zhang.github.io/projects/finemogen.html

Text-driven motion generation has achieved substantial progress with the emergence of diffusion models. However, existing methods still struggle to generate complex motion sequences that correspond to fine-grained descriptions, depicting detailed and accurate spatio-temporal actions. This lack of fine controllability limits the usage of motion generation to a larger audience. To tackle these challenges, we present FineMoGen, a diffusion-based motion generation and editing framework that can synthesize fine-grained motions, with spatial-temporal composition to the user instructions. Specifically, FineMoGen builds upon diffusion model with a novel transformer architecture dubbed Spatio-Temporal Mixture Attention (SAMI). SAMI optimizes the generation of the global attention template from two perspectives: 1) explicitly modeling the constraints of spatio-temporal composition; and 2) utilizing sparsely-activated mixture-of-experts to adaptively extract fine-grained features. To facilitate a large-scale study on this new fine-grained motion generation task, we contribute the HuMMan-MoGen dataset, which consists of 2,968 videos and 102,336 fine-grained spatio-temporal descriptions. Extensive experiments validate that FineMoGen exhibits superior motion generation quality over state-of-the-art methods. Notably, FineMoGen further enables zero-shot motion editing capabilities with the aid of modern large language models (LLM), which faithfully manipulates motion sequences with fine-grained instructions. Project Page: https://mingyuan-zhang.github.io/projects/FineMoGen.html

翻訳日:2023-12-27 20:08:54 公開日:2023-12-22

# 適応型ドメイン推論攻撃

Adaptive Domain Inference Attack ( http://arxiv.org/abs/2312.15088v1 )

ライセンス: Link先を確認

Yuechun Gu, Keke Chen

(参考訳) ディープニューラルネットワークは、医療やセキュリティといったセンシティブなアプリケーションドメインにますますデプロイされているため、これらのモデルからどのようなセンシティブな情報を推測できるかを理解する必要がある。既存のモデルターゲティング攻撃はすべて、攻撃者がアプリケーションドメインやトレーニングデータ分散を知っていると仮定する。これらの攻撃からモデルを保護するモデルAPIからドメイン情報を削除できるだろうか? 本稿では,この問題について考察する。残念なことに、最小限の知識、すなわち入力と出力の意味を漏らさずにモデルにアクセスしても、提案された適応ドメイン推論攻撃(ADI)はトレーニングデータの関連するサブセットをうまく推定することができる。抽出された関連データは,例えばモデル・インバージョン攻撃の性能が著しく向上することを示す。具体的には、利用可能な公開データセットとプライベートデータセットの集合の上に構築された概念階層と、未知のトレーニングデータに現れる葉の概念の可能性を適応的に調整する新しいアルゴリズムを利用する。 ADI攻撃は概念レベルで部分的なトレーニングデータを抽出するだけでなく、高速に収束し、他のドメイン推論攻撃であるGDIよりもはるかに少ないターゲットモデルアクセスを必要とする。

As deep neural networks are increasingly deployed in sensitive application domains, such as healthcare and security, it's necessary to understand what kind of sensitive information can be inferred from these models. Existing model-targeted attacks all assume the attacker has known the application domain or training data distribution, which plays an essential role in successful attacks. Can removing the domain information from model APIs protect models from these attacks? This paper studies this critical problem. Unfortunately, even with minimal knowledge, i.e., accessing the model as an unnamed function without leaking the meaning of input and output, the proposed adaptive domain inference attack (ADI) can still successfully estimate relevant subsets of training data. We show that the extracted relevant data can significantly improve, for instance, the performance of model-inversion attacks. Specifically, the ADI method utilizes a concept hierarchy built on top of a large collection of available public and private datasets and a novel algorithm to adaptively tune the likelihood of leaf concepts showing up in the unseen training data. The ADI attack not only extracts partial training data at the concept level, but also converges fast and requires much fewer target-model accesses than another domain inference attack, GDI.

翻訳日:2023-12-27 20:02:27 公開日:2023-12-22

# hypermix: アウトオブディストリビューションの検出と分類

HyperMix: Out-of-Distribution Detection and Classification in Few-Shot Settings ( http://arxiv.org/abs/2312.15086v1 )

ライセンス: Link先を確認

Nikhil Mehta, Kevin J Liang, Jing Huang, Fu-Jen Chu, Li Yin, Tal Hassner

(参考訳) アウト・オブ・ディストリビューション(OOD)検出は、現実世界の機械学習システムにとって重要なトピックであるが、限定的な分散サンプルによる設定は過小評価されている。モデルがOODサンプルを識別する前にデータ配布を学習する機会が少ないため、このような数ショットのOOD設定は難しい。実際、最近の最先端OOD法は、数ショット設定で単純なベースラインを上回りません。そこで我々はHyperMixと呼ばれるハイパーネットワークフレームワークを提案し、生成した分類器パラメータのMixupと、追加のoutlierデータセットを必要としない自然なout-of-episodeoutlierエクスポージャー手法を提案する。我々はCIFAR-FSとMiniImageNetで実験を行い、数ショットで他のOOD法よりも優れています。

Out-of-distribution (OOD) detection is an important topic for real-world machine learning systems, but settings with limited in-distribution samples have been underexplored. Such few-shot OOD settings are challenging, as models have scarce opportunities to learn the data distribution before being tasked with identifying OOD samples. Indeed, we demonstrate that recent state-of-the-art OOD methods fail to outperform simple baselines in the few-shot setting. We thus propose a hypernetwork framework called HyperMix, using Mixup on the generated classifier parameters, as well as a natural out-of-episode outlier exposure technique that does not require an additional outlier dataset. We conduct experiments on CIFAR-FS and MiniImageNet, significantly outperforming other OOD methods in the few-shot regime.

翻訳日:2023-12-27 20:02:06 公開日:2023-12-22

# 森林自動在庫:3次元深層学習による高密度空中LiDAR点雲の解析

Automated forest inventory: analysis of high-density airborne LiDAR point clouds with 3D deep learning ( http://arxiv.org/abs/2312.15084v1 )

ライセンス: Link先を確認

Binbin Xiang and Maciej Wielgosz and Theodora Kontogianni and Torben Peters and Stefano Puliti and Rasmus Astrup and Konrad Schindler

(参考訳) 詳細な森林在庫は、森林資源の持続的かつ柔軟な管理、様々な生態系の維持に不可欠である。現代の空中レーザースキャナーは、高密度の点雲を微細な森林の在庫と分析に大いに活用するが、点雲を個々の木や木の構成要素のような有意義な実体に自動的に分割することは課題である。本研究は,このギャップを埋めることを目的として,多様な森林タイプや地理的領域にまたがるセグメンテーションが可能なディープラーニングフレームワークを導入する。区分けされたデータから、個々の木の生物物理学的パラメータとスタンドを導出する。このシステムは、調査ドローンを使って5つの国で買収されたポイントクラウドのデータセットであるfor-instanceでテストされている。セグメンテーションのバックエンドは、各木の85%以上のFスコアを達成しており、それぞれ73%以上は、地上、低植生、茎、生きた枝、枯れた枝の5つの意味カテゴリーでIoUの平均値である。セグメンテーションの結果に基づいて、パイプラインは個々の木の生物物理特性(直径、クラウン径、クラウン体積、dbh、位置)とスタンドごとの特性(デジタル地形モデルとスタンド密度)を密に計算します。特にクラウン関連の特徴は,ほとんどの場合高い精度で回収されるが,DBHと位置推定の信頼性は低い。

Detailed forest inventories are critical for sustainable and flexible management of forest resources, to conserve various ecosystem services. Modern airborne laser scanners deliver high-density point clouds with great potential for fine-scale forest inventory and analysis, but automatically partitioning those point clouds into meaningful entities like individual trees or tree components remains a challenge. The present study aims to fill this gap and introduces a deep learning framework that is able to perform such a segmentation across diverse forest types and geographic regions. From the segmented data, we then derive relevant biophysical parameters of individual trees as well as stands. The system has been tested on FOR-Instance, a dataset of point clouds that have been acquired in five different countries using surveying drones. The segmentation back-end achieves over 85% F-score for individual trees, respectively over 73% mean IoU across five semantic categories: ground, low vegetation, stems, live branches and dead branches. Building on the segmentation results our pipeline then densely calculates biophysical features of each individual tree (height, crown diameter, crown volume, DBH, and location) and properties per stand (digital terrain model and stand density). Especially crown-related features are in most cases retrieved with high accuracy, whereas the estimates for DBH and location are less reliable, due to the airborne scanning setup.

翻訳日:2023-12-27 20:01:50 公開日:2023-12-22

# リッチランキングの学習

Learning Rich Rankings ( http://arxiv.org/abs/2312.15081v1 )

ライセンス: Link先を確認

Arjun Seshadri, Stephen Ragain, Johan Ugander

(参考訳) ランク付けの基礎はよく確立されているが、ランキング文学は主に単純なユニモーダルモデル(例えば、マロとプラケット=ルースモデル)に焦点を当てており、1つの順序付けを中心に分布を定義する。明示的な混合モデルはマルチモーダルランキングデータをモデル化するためのツールを提供しているが、そのようなモデルをデータから学習することは難しいことが多い。本研究では,最近の選択モデリングの進歩を活かし,階層空間に自然な多様性と豊かさをもたらす,文脈的反復選択(crs)モデルを提案する。構造依存型テールリスクと予測リスクバウンダリによるモデルの下での最大推定の厳密な理論的保証を提供する。副産物として,多項ロジット(mnl)選択モデルとプラケットルース(pl)ランキングモデル,およびplランキングモデルに紐づけられた第1のテールリスクについて,最大確率推定値の予測リスクに関する最初の厳密な境界を設ける。 crsモデルは、レースからランク選択投票まで、さまざまな設定で現実世界のランキングデータをモデル化する既存の方法を大幅に上回っている。

Although the foundations of ranking are well established, the ranking literature has primarily been focused on simple, unimodal models, e.g. the Mallows and Plackett-Luce models, that define distributions centered around a single total ordering. Explicit mixture models have provided some tools for modelling multimodal ranking data, though learning such models from data is often difficult. In this work, we contribute a contextual repeated selection (CRS) model that leverages recent advances in choice modeling to bring a natural multimodality and richness to the rankings space. We provide rigorous theoretical guarantees for maximum likelihood estimation under the model through structure-dependent tail risk and expected risk bounds. As a by-product, we also furnish the first tight bounds on the expected risk of maximum likelihood estimators for the multinomial logit (MNL) choice model and the Plackett-Luce (PL) ranking model, as well as the first tail risk bound on the PL ranking model. The CRS model significantly outperforms existing methods for modeling real world ranking data in a variety of settings, from racing to rank choice voting.

翻訳日:2023-12-27 20:01:21 公開日:2023-12-22

# クーパー対対光による高次光子過程のスペクトルシグネチャ

Spectral signature of high-order photon processes mediated by Cooper-pair pairing ( http://arxiv.org/abs/2312.15075v1 )

ライセンス: Link先を確認

W. C. Smith, A. Borgognoni, M. Villiers, E. Roverc'h, J. Palomo, M. R. Delbecq, T. Kontos, P. Campagne-Ibarcq, B. Dou\c{c}ot, Z. Leghtas

(参考訳) 個々の光子間の相互作用を誘導することは、フォトニック量子情報処理や多体光子状態に関する基礎研究に必須である。強い相互作用と低損失を組み合わせるのに適した分野は、マイクロ波量子光学と超伝導回路である。光子は典型的には$LC$の回路に格納され、ジョセフソントンネル接合によって回路が絞られると相互作用が現れる。重要な点は、接合部を横切る超伝導相の零点揺らぎが誘導相互作用の強さと秩序を制御することである。超伝導回路は、位相ゆらぎが単体よりも小さく、カー効果として知られる2光子相互作用が支配的な状態において、ほぼ独占的に動作している。この実験では、2対のクーパーペアのみをトンネルに通すダイポールで、高インピーダンスの$LC$発振器をシャットダウンした。このペアリングによって効果的に2倍になる位相変動は3.4に達する。この極端なゆらぎの状況では、無調和なはしごを登るとき、非単調に変化する遷移周波数を観測する。この測定結果から, 2-, 3-, 4-光子相互作用エネルギーを等価振幅で抽出し, すべて光子損失率を上回った。この研究は、多光子量子論理から高相関のマイクロ波放射の研究まで、マイクロ波量子光学における高次光子相互作用の新しい状態を探究する。

Inducing interactions between individual photons is essential for applications in photonic quantum information processing and fundamental research on many-body photon states. A field that is well suited to combine strong interactions and low losses is microwave quantum optics with superconducting circuits. Photons are typically stored in an $LC$ circuit, and interactions appear when the circuit is shunted by a Josephson tunnel junction. Importantly, the zero-point fluctuations of the superconducting phase across the junction control the strength and order of the induced interactions. Superconducting circuits have almost exclusively operated in the regime where phase fluctuations are smaller than unity, and two-photon interactions, known as the Kerr effect, dominate. In this experiment, we shunt a high-impedance $LC$ oscillator by a dipole that only allows pairs of Cooper pairs to tunnel. Phase fluctuations, which are effectively doubled by this pairing, reach the value of 3.4. In this regime of extreme fluctuations, we observe transition frequencies that shift non-monotonically as we climb the anharmonic ladder. From this spectroscopic measurement, we extract two-, three- and four-photon interaction energies of comparable amplitude, and all exceeding the photon loss rate. This work explores a new regime of high-order photon interactions in microwave quantum optics, with applications ranging from multi-photon quantum logic to the study of highly correlated microwave radiation.

翻訳日:2023-12-27 20:01:01 公開日:2023-12-22

# 技術的重複検出のためのシームス構造を有するGPT-3インベディングの精製

Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection ( http://arxiv.org/abs/2312.15068v1 )

ライセンス: Link先を確認

Xingfang Wu, Heng Li, Nobukazu Yoshioka, Hironori Washizaki, Foutse Khomh

(参考訳) 技術的オンラインコミュニティの1つのゴールは、開発者が一箇所で正しい答えを見つけるのを助けることである。一つの質問は異なる言葉で異なる方法で問うことができ、技術的フォーラムに重複するポストが存在する。重複投稿の発見とリンクに関する問題は、開発者コミュニティと研究者の両方の注目を集めている。例えばstack overflowでは,重複記事のマークとクローズに投票ベースのメカニズムを採用している。しかし、これら繰り返し発生する重複投稿にタイムリーに対処することは、課題を生じ続けている。そのため,技術フォーラム投稿の重複投稿を自動的に検出する手法が提案されている。既存のメソッドは、投稿の意味を十分に把握できない手作りの類似度メトリクスに依存するか、パフォーマンスを改善するための監督の欠如によって、制限に苦しめられている。さらに、これらの手法の効率は、大量のデータに対して実用的でないペアワイズ特徴生成への依存によって妨げられる。本研究では,重複検出タスクのためのgpt-3組込みを採用し,改良する。 GPT-3埋め込みはポストのセマンティクスを正確に表現できると仮定する。さらに,gpt-3組込みに基づくシャム語ベースのネットワークを訓練することにより,技術フォーラム投稿における重複関係を正確に捉えた潜在埋め込みを実現する。ベンチマークデータセットを用いた実験により,提案手法の有効性を確認し,ベースライン法と比較して優れた性能を示す。最近のStack Overflowダンプで構築したデータセットに適用すると、Top-1、Top-5、Top-30の精度はそれぞれ23.1%、43.9%、68.9%に達します。マニュアル研究により,技術フォーラムでラベルなしの複製を発見できる可能性を確認した。

One goal of technical online communities is to help developers find the right answer in one place. A single question can be asked in different ways with different wordings, leading to the existence of duplicate posts on technical forums. The question of how to discover and link duplicate posts has garnered the attention of both developer communities and researchers. For example, Stack Overflow adopts a voting-based mechanism to mark and close duplicate posts. However, addressing these constantly emerging duplicate posts in a timely manner continues to pose challenges. Therefore, various approaches have been proposed to detect duplicate posts on technical forum posts automatically. The existing methods suffer from limitations either due to their reliance on handcrafted similarity metrics which can not sufficiently capture the semantics of posts, or their lack of supervision to improve the performance. Additionally, the efficiency of these methods is hindered by their dependence on pair-wise feature generation, which can be impractical for large amount of data. In this work, we attempt to employ and refine the GPT-3 embeddings for the duplicate detection task. We assume that the GPT-3 embeddings can accurately represent the semantics of the posts. In addition, by training a Siamese-based network based on the GPT-3 embeddings, we obtain a latent embedding that accurately captures the duplicate relation in technical forum posts. Our experiment on a benchmark dataset confirms the effectiveness of our approach and demonstrates superior performance compared to baseline methods. When applied to the dataset we constructed with a recent Stack Overflow dump, our approach attains a Top-1, Top-5, and Top-30 accuracy of 23.1%, 43.9%, and 68.9%, respectively. With a manual study, we confirm our approach's potential of finding unlabelled duplicates on technical forums.

翻訳日:2023-12-27 20:00:38 公開日:2023-12-22

# 時間局所非Lindbladマスター方程式の最適形式

Optimal form of time-local non-Lindblad master equations ( http://arxiv.org/abs/2312.15066v1 )

ライセンス: Link先を確認

Tobias Becker and Andr\'e Eckardt

(参考訳) 超弱系-バス結合の極限を超えた開量子系を記述する時間局所量子マスター方程式は、しばしばゴリーニ=コサコフスキー=スダルシャン=リンドブラッド形式(GKSL)ではない。代表的な例として、一般の開量子系を近似するレッドフィールド方程式や、減衰調和振動子を正確に記述したhu-paz-zhang方程式がある。ここでは、項のいくつかが負の重みを持つという事実を除いて、前者だけでなく後者もGKSL方程式に類似した散逸子で擬似Lndblad形式にすることができることを示す。さらに,擬似Lindblad方程式の散逸を変化させる変換について,正項と負項の相対重みを変化させながら体系的に検討した。これらは、最近開発された擬Lindblad方程式の量子軌道展開の収束と、GKSL方程式を得るために負項の切り離しの両方に最適である負項の重みを最小化するために使用できる。

Time-local quantum master equations that describe open quantum systems beyond the limit of ultraweak system-bath coupling are often not of Gorini-Kossakowski-Sudarshan-Lindblad (GKSL) form. Prominent examples are the Redfield equation approximating general open quantum systems and the Hu-Paz-Zhang equation exactly describing a damped harmonic oscillator. Here, we show that not only the former, but also the latter can be brought to pseudo-Lindblad form, with a dissipator that resembles that of a GKSL equation, except for the fact that some of the terms have negative weights. Moreover, we systematically investigate transformations that leave the dissipator of pseudo-Lindblad equations unchanged, while changing the relative weight between its positive and negative terms. These can be used to minimize the weights of the negative terms, which is optimal both for the convergence of a recently developed quantum-trajectory unraveling of pseudo-Lindblad equations as well as for the truncation of the negative terms to obtain a GKSL equation.

翻訳日:2023-12-27 20:00:09 公開日:2023-12-22

# 多項集合に対する排他的有限時間相関関数:量子輸送と熱力学の理論的枠組みの連結

Exact finite-time correlation functions for multi-terminal setups: Connecting theoretical frameworks for quantum transport and thermodynamics ( http://arxiv.org/abs/2312.15065v1 )

ライセンス: Link先を確認

Gianmichele Blasi, Shishir Khandelwal, and G\'eraldine Haack

(参考訳) 開量子系における輸送は、量子マスター方程式、散乱行列、ハイゼンベルク運動方程式など、様々な理論的な枠組みを通して研究することができる。フレームワークの選択は、インタラクションの存在、システムと環境の結合力、定常的あるいは一時的なレジームに焦点を当てているかどうかといった要因に依存する。既存の文献はこれらの枠組みを独立して扱い、統一的な視点を欠いている。本研究は,電圧および温度バイアス下での2段階設定において,最小レベルの量子ドットモデルを用いて,これらのアプローチの役割と現状を明らかにすることで,このギャップに対処する。粒子およびエネルギー電流の解析式と定常状態と過渡状態の両方における変動を導出する。ハイゼンベルク方程式の正確な結果は、それぞれの有効範囲内で散乱行列とマスター方程式のアプローチと一致することが示されている。まず,弱結合限界のプロトコルを確立し,ハイゼンベルクとの弱結合におけるマスター方程式の適用可能性や任意の結合強度での散乱行列アプローチを橋渡しする。

Transport in open quantum systems can be explored through various theoretical frameworks, including the quantum master equation, scattering matrix, and Heisenberg equation of motion. The choice of framework depends on factors such as the presence of interactions, the coupling strength between the system and environment, and whether the focus is on steady-state or transient regimes. Existing literature treats these frameworks independently, lacking a unified perspective. Our work addresses this gap by clarifying the role and status of these approaches using a minimal single-level quantum dot model in a two-terminal setup under voltage and temperature biases. We derive analytical expressions for particle and energy currents and their fluctuations in both steady-state and transient regimes. Exact results from the Heisenberg equation are shown to align with scattering matrix and master equation approaches within their respective validity regimes. Crucially, we establish a protocol for the weak-coupling limit, bridging the applicability of master equations at weak-coupling with Heisenberg or scattering matrix approaches at arbitrary coupling strength.

翻訳日:2023-12-27 19:59:46 公開日:2023-12-22

# マルチモーダルMRIデータを用いた自己監督型コントラスト学習 : 異常神経発達予測に向けて

Joint Self-Supervised and Supervised Contrastive Learning for Multimodal MRI Data: Towards Predicting Abnormal Neurodevelopment ( http://arxiv.org/abs/2312.15064v1 )

ライセンス: Link先を確認

Zhiyuan Li, Hailong Li, Anca L. Ralescu, Jonathan R. Dillman, Mekibib Altaye, Kim M. Cecil, Nehal A. Parikh, Lili He

(参考訳) 構造,拡散テンソル,機能的磁気共鳴画像などの異なる画像モダリティの深層学習モデルとの融合により,表現特性の識別や疾患診断の強化が期待できる結果となった。このような手法の開発は、当初は異なる表現空間内に存在する異種多様特徴の効率的な融合にかかっている。マルチモーダルな特徴をネゴライズすることは相補的な情報を適切に捉えず、冗長性さえも生み出す。本研究では,マルチモーダルMRIデータから頑健な潜在特徴表現を学習し,異種特徴の共通空間への投射を可能にし,相補的情報と類似的情報の両方を様々なモダリティと類似した主題に集約する,新しい共同教師付きコントラスト学習法を提案する。提案手法と代替的な深層マルチモーダル学習手法の比較分析を行った。 2つの独立したデータセットに対する広範な実験により,本手法は異常な神経発達を予測するための他の深層マルチモーダル学習法よりも優れていることが示された。本手法は,マルチモーダルデータのパワーを活用し,臨床におけるコンピュータ支援診断を容易にする能力を有する。

The integration of different imaging modalities, such as structural, diffusion tensor, and functional magnetic resonance imaging, with deep learning models has yielded promising outcomes in discerning phenotypic characteristics and enhancing disease diagnosis. The development of such a technique hinges on the efficient fusion of heterogeneous multimodal features, which initially reside within distinct representation spaces. Naively fusing the multimodal features does not adequately capture the complementary information and could even produce redundancy. In this work, we present a novel joint self-supervised and supervised contrastive learning method to learn the robust latent feature representation from multimodal MRI data, allowing the projection of heterogeneous features into a shared common space, and thereby amalgamating both complementary and analogous information across various modalities and among similar subjects. We performed a comparative analysis between our proposed method and alternative deep multimodal learning approaches. Through extensive experiments on two independent datasets, the results demonstrated that our method is significantly superior to several other deep multimodal learning methods in predicting abnormal neurodevelopment. Our method has the capability to facilitate computer-aided diagnosis within clinical practice, harnessing the power of multimodal data.

翻訳日:2023-12-27 19:59:29 公開日:2023-12-22

# 非線形抵抗ネットワークに対する普遍近似定理

A universal approximation theorem for nonlinear resistive networks ( http://arxiv.org/abs/2312.15063v1 )

ライセンス: Link先を確認

Benjamin Scellier, Siddhartha Mishra

(参考訳) レジストレータネットワークは近年、エネルギー効率のよい自己学習マシンの基盤として注目されている。この研究は、これらの抵抗ネットワークの計算能力を研究する。電圧源,リニア抵抗器,ダイオード,電圧制御電圧源(vcvs)からなる電気ネットワークは,任意の連続機能を実現することができることを示す。これを証明するために、回路要素は理想的であり、可変抵抗器のコンダクタンスとvcvsの増幅係数は任意に小さく、あるいは任意に大きい値を取ることができると仮定する。また,本稿では,このような自己学習型電気ネットワークの設計について述べる。

Resistor networks have recently had a surge of interest as substrates for energy-efficient self-learning machines. This work studies the computational capabilities of these resistor networks. We show that electrical networks composed of voltage sources, linear resistors, diodes and voltage-controlled voltage sources (VCVS) can implement any continuous functions. To prove it, we assume that the circuit elements are ideal and that the conductances of variable resistors and the amplification factors of the VCVS's can take arbitrary values -- arbitrarily small or arbitrarily large. The constructive nature of our proof could also inform the design of such self-learning electrical networks.

翻訳日:2023-12-27 19:59:05 公開日:2023-12-22

# アニマタブルヒトアバターのための変形可能な3次元ガウススプラッティング

Deformable 3D Gaussian Splatting for Animatable Human Avatars ( http://arxiv.org/abs/2312.15059v1 )

ライセンス: Link先を確認

HyunJun Jung, Nikolas Brasch, Jifei Song, Eduardo Perez-Pellitero, Yiren Zhou, Zhihao Li, Nassir Navab, Benjamin Busam

(参考訳) 近年のニューラルラディアンス分野の進歩は、人間のアニメーションのシナリオに適用可能な、動的設定におけるフォトリアリスティック画像の新しいビュー合成を可能にする。しかし、正確なモデルを確立するために暗黙のバックボーンは、多くの入力ビューと人間のマスク、uvマップ、深度マップなどの追加アノテーションを必要とする。本研究では,1つの単細胞配列からデジタルアバターを構築するための完全明示的なアプローチであるpardy-human (parameterized dynamic human avatar)を提案する。 pardy-human は 3d gaussian splatting にパラメータ駆動ダイナミクスを導入し、3d gaussian は人間のポーズモデルによって変形してアバターをアニメーション化する。本手法は, 正準3次元ガウス多様体をsmpl頂点に従って変形する第1モジュールと, 設計したジョイント符号化を更に取り入れてガウス変形ごとに予測し, smpl頂点変形を超えるダイナミクスを扱う連続モジュールの2つの部分からなる。画像はラスタライザーによって合成される。 pardy-humanは、リアルな動的人間のアバターのための明示的なモデルを構成する。当社のアバター学習にはマスクなどの追加アノテーションが不要で,ユーザのハードウェア上でも,フル解像度の画像の推測を効率的に行うことができる。本稿では,ZJU-MoCap と THUman4.0 データセットにおいて,ParDy-Human が最先端の手法よりも定量的かつ視覚的に優れていることを示す実験的証拠を提供する。

Recent advances in neural radiance fields enable novel view synthesis of photo-realistic images in dynamic settings, which can be applied to scenarios with human animation. Commonly used implicit backbones to establish accurate models, however, require many input views and additional annotations such as human masks, UV maps and depth maps. In this work, we propose ParDy-Human (Parameterized Dynamic Human Avatar), a fully explicit approach to construct a digital avatar from as little as a single monocular sequence. ParDy-Human introduces parameter-driven dynamics into 3D Gaussian Splatting where 3D Gaussians are deformed by a human pose model to animate the avatar. Our method is composed of two parts: A first module that deforms canonical 3D Gaussians according to SMPL vertices and a consecutive module that further takes their designed joint encodings and predicts per Gaussian deformations to deal with dynamics beyond SMPL vertex deformations. Images are then synthesized by a rasterizer. ParDy-Human constitutes an explicit model for realistic dynamic human avatars which requires significantly fewer training views and images. Our avatars learning is free of additional annotations such as masks and can be trained with variable backgrounds while inferring full-resolution images efficiently even on consumer hardware. We provide experimental evidence to show that ParDy-Human outperforms state-of-the-art methods on ZJU-MoCap and THUman4.0 datasets both quantitatively and visually.

翻訳日:2023-12-27 19:58:53 公開日:2023-12-22

# サードパーティの機械学習モデルとデータセットのドキュメンテーションプラクティスの現状

The State of Documentation Practices of Third-party Machine Learning Models and Datasets ( http://arxiv.org/abs/2312.15058v1 )

ライセンス: Link先を確認

Ernesto Lang Oreamuno, Rohan Faiyaz Khan, Abdul Ali Bangash, Catherine Stinson, Bram Adams

(参考訳) モデルストアは、プロジェクト統合が容易なサードパーティのmlモデルとデータセットを提供し、コーディング作業を最小化する。モデルやデータセットカードなどのドキュメント標準を活用して、これらのモデルとデータセットの詳細な仕様をドキュメントに見つけたいと思っています。本研究では,現在使用されている最大のモデルストアであるHugging Face (HF)において,モデルカードとデータセットカードの文書化の実践状況を評価するために,統計解析とハイブリッドカードソートを用いる。その結果,21,902モデル (39.62\%) と1,925データセット (28.48\%) のみがドキュメントを持っていることがわかった。さらに,mlモデルやデータセットに対する倫理や透明性に関する文書の一貫性の欠如を観察する。

Model stores offer third-party ML models and datasets for easy project integration, minimizing coding efforts. One might hope to find detailed specifications of these models and datasets in the documentation, leveraging documentation standards such as model and dataset cards. In this study, we use statistical analysis and hybrid card sorting to assess the state of the practice of documenting model cards and dataset cards in one of the largest model stores in use today--Hugging Face (HF). Our findings show that only 21,902 models (39.62\%) and 1,925 datasets (28.48\%) have documentation. Furthermore, we observe inconsistency in ethics and transparency-related documentation for ML models and datasets.

翻訳日:2023-12-27 19:58:25 公開日:2023-12-22

# 効率的なGWAS特徴選択のための深層学習

Deep Learning for Efficient GWAS Feature Selection ( http://arxiv.org/abs/2312.15055v1 )

ライセンス: Link先を確認

Kexuan Li

(参考訳) ゲノムワイド・アソシエーション研究(gwas)は、大きなゲノムデータの時代において、特に遺伝学的特徴の数が利用可能なサンプルを大幅に超える超高次元データセットを扱う際に、ユニークな課題に直面している。本稿では,超高次元gwasデータに関連する複雑な問題に対処するために,mirzaeiら(2020)によって提案された特徴選択手法の拡張を提案する。拡張アプローチは,学生ネットワークにフロベニウス規範のペナルティを導入し,多数の特徴と限られたサンプルで特徴付けられるシナリオに適応する能力を高めることで,元の手法を強化する。教師なし設定と教師なし設定の両方でシームレスに動作し、2つの重要なニューラルネットワークを用いる。 1つ目は、次元減少のためにオートエンコーダまたは教師付きオートエンコーダを利用し、超高次元ゲノムデータから顕著な特徴を抽出する。第2のネットワークは、単一の隠蔽層を持つ正規化フィードフォワードモデルであり、正確な特徴選択のために設計されている。学生ネットワークにおけるフロベニウスのノルムペナルティの導入は、超高次元GWASデータセットがもたらす課題に対する方法のレジリエンスを著しく向上させる。 GWASデータの特徴選択における提案手法の有効性を実験的に検証した。この手法は超高次元設定の複雑さを扱うだけでなく、ゲノムデータに存在するニュアンス構造に優れた適応性を示す。提案手法の柔軟性と汎用性は,提案手法が様々な実験で成功していることに起因している。

Genome-Wide Association Studies (GWAS) face unique challenges in the era of big genomics data, particularly when dealing with ultra-high-dimensional datasets where the number of genetic features significantly exceeds the available samples. This paper introduces an extension to the feature selection methodology proposed by Mirzaei et al. (2020), specifically tailored to tackle the intricacies associated with ultra-high-dimensional GWAS data. Our extended approach enhances the original method by introducing a Frobenius norm penalty into the student network, augmenting its capacity to adapt to scenarios characterized by a multitude of features and limited samples. Operating seamlessly in both supervised and unsupervised settings, our method employs two key neural networks. The first leverages an autoencoder or supervised autoencoder for dimension reduction, extracting salient features from the ultra-high-dimensional genomic data. The second network, a regularized feed-forward model with a single hidden layer, is designed for precise feature selection. The introduction of the Frobenius norm penalty in the student network significantly boosts the method's resilience to the challenges posed by ultra-high-dimensional GWAS datasets. Experimental results showcase the efficacy of our approach in feature selection for GWAS data. The method not only handles the inherent complexities of ultra-high-dimensional settings but also demonstrates superior adaptability to the nuanced structures present in genomics data. The flexibility and versatility of our proposed methodology are underscored by its successful performance across a spectrum of experiments.

翻訳日:2023-12-27 19:58:15 公開日:2023-12-22

# 変分量子アルゴリズムのための階層型マルチグリッドアンサッツ

Hierarchical Multigrid Ansatz for Variational Quantum Algorithms ( http://arxiv.org/abs/2312.15048v1 )

ライセンス: Link先を確認

Christo Meriwether Keller, Stephan Eidenbenz, Andreas B\"artschi, Daniel O'Malley, John Golden, Satyajayant Misra

(参考訳) 量子コンピューティングは、基礎物理学を用いてスーパーコンピューティングを強化することを約束する工学の新しいトピックである。短期的には、この利点を達成する最良の候補アルゴリズムは変分量子アルゴリズム(VQA)である。本稿では,変分量子固有解法(VQE)を中心に,新しいVQAアンサッツの設計と数値評価を行う。私たちの ansatz は、古典的なマルチグリッド階層メソッドにインスパイアされているので、これを "multigrid'' ansatz" と呼んでいます。マルチグリッドアンサッツは、より小さなキュービット数に対する回路を連続的に構築し最適化することにより、$n$ qubits上の量子問題に対するパラメータ化量子回路を生成し、$j+1$の次の階層に対する初期解として最適化されたパラメータ値を再利用する。数値シミュレーションにより,Laplacian 固有解器の解法品質やMaxCut と Maximum $k$-Satisfiability の具体例による組合せ最適化問題において,マルチグリッドアンサッツは標準的なハードウェア効率のアンサッツよりも優れていることを示す。本稿では,多くのVQAの候補としてマルチグリッドアンサッツが確立され,特に組合せ最適化問題に対するQAOAアプローチの代替として有望であることを示す。

Quantum computing is an emerging topic in engineering that promises to enhance supercomputing using fundamental physics. In the near term, the best candidate algorithms for achieving this advantage are variational quantum algorithms (VQAs). We design and numerically evaluate a novel ansatz for VQAs, focusing in particular on the variational quantum eigensolver (VQE). As our ansatz is inspired by classical multigrid hierarchy methods, we call it "multigrid'' ansatz. The multigrid ansatz creates a parameterized quantum circuit for a quantum problem on $n$ qubits by successively building and optimizing circuits for smaller qubit counts $j < n$, reusing optimized parameter values as initial solutions to next level hierarchy at $j+1$. We show through numerical simulation that the multigrid ansatz outperforms the standard hardware-efficient ansatz in terms of solution quality for the Laplacian eigensolver as well as for a large class of combinatorial optimization problems with specific examples for MaxCut and Maximum $k$-Satisfiability. Our studies establish the multi-grid ansatz as a viable candidate for many VQAs and in particular present a promising alternative to the QAOA approach for combinatorial optimization problems.

翻訳日:2023-12-27 19:57:49 公開日:2023-12-22

# 測位および通信のための最適ノイズ絡み合い試験

Optimal noisy entanglement testing for ranging and communication ( http://arxiv.org/abs/2312.15047v1 )

ライセンス: Link先を確認

Pengcheng Liao and Quntao Zhuang

(参考訳) 量子システム$S$が他のシステム$I$と絡み合うと、絡み合いテストの問題は発生し、$m \ge 2$の同一システム内のシステム$S$が識別される。このシナリオは、量子レンジングおよび絡み合い支援通信(Phys. Rev. Lett. 126, 240501, (2021)]で発生する測定タスクのモデルとして機能する。この文脈では、最適測定アプローチは典型的にはすべての$m+1$システムの共同測定を伴う。しかし、システム$s$を含むサブシステムが絡み合うノイズにさらされている場合、これはそうではないことを実証する。提案手法は,最近開発された相関-変位変換の計測手法を利用する。我々は,m+1$システム上での局所的操作と古典的通信(locc)で実装可能な,絡み合いテスト計測のための構造化設計を提案する。さらに, この測定手法は, 雑音条件下で漸近的に誤差確率の観点から最適性が得られることを示す。量子照明に適用すると, 信号の輝度が低く, ノイズのレベルが高いシナリオにおいて, 最適範囲の計測が可能となる。同様に、エンタングルメント支援の古典的通信に適用すると、測定設計は通信速度、特に信号の輝度が低いシナリオにおいて、相対的に有利となる。

Given a quantum system $S$ entangled with another system $I$, the entanglement testing problem arises, prompting the identification of the system $S$ within a set of $m \ge 2$ identical systems. This scenario serves as a model for the measurement task encountered in quantum ranging and entanglement-assisted communication [Phys. Rev. Lett. 126, 240501, (2021)]. In this context, the optimal measurement approach typically involves joint measurements on all $m+1$ systems. However, we demonstrate that this is not the case when the subsystems containing system $S$ are subjected to entanglement-breaking noise. Our approach utilizes the recently developed measurement technique of correlation-to-displacement conversion. We present a structured design for the entanglement testing measurement, implementable with local operations and classical communications (LOCC) on the $m+1$ systems. Furthermore, we prove that this measurement approach achieves optimality in terms of error probability asymptotically under noisy conditions. When applied to quantum illumination, our measurement design enables optimal ranging in scenarios with low signal brightness and high levels of noise. Similarly, when applied to entanglement-assisted classical communication, the measurement design leads to a significant relative advantage in communication rates, particularly in scenarios with low signal brightness.

翻訳日:2023-12-27 19:57:27 公開日:2023-12-22

# EGAIN: 拡張GANインバージョン

EGAIN: Extended GAn INversion ( http://arxiv.org/abs/2312.15116v1 )

ライセンス: Link先を確認

Wassim Kabbani, Marcel Grimmer, Christoph Busch

(参考訳) GAN(Generative Adversarial Networks)は近年顕著な進歩を目の当たりにしており、より高品質な画像を生成している。近年のGANは、アンタングル空間における特徴を符号化し、ポーズ、照明、性別などの生成された顔画像の様々な意味的属性を正確に制御できることが証明されている。 GANの潜在空間に画像を投影するGANインバージョンは、実際の顔画像の顔意味論を操作するための扉を開く。これは顔認識システムの性能評価など、多くのアプリケーションで有用である。本稿では,GAN逆変換モデルを構築するためのアーキテクチャであるEGAINについて述べる。このアーキテクチャは、以前のganインバージョンモデルの欠点のいくつかを明示的に取り扱う。このアーキテクチャをベースとした同名の固有モデルも提案され、最先端モデルよりも優れた再構築品質を示し、EGAINアーキテクチャの有効性を示す。

Generative Adversarial Networks (GANs) have witnessed significant advances in recent years, generating increasingly higher quality images, which are non-distinguishable from real ones. Recent GANs have proven to encode features in a disentangled latent space, enabling precise control over various semantic attributes of the generated facial images such as pose, illumination, or gender. GAN inversion, which is projecting images into the latent space of a GAN, opens the door for the manipulation of facial semantics of real face images. This is useful for numerous applications such as evaluating the performance of face recognition systems. In this work, EGAIN, an architecture for constructing GAN inversion models, is presented. This architecture explicitly addresses some of the shortcomings in previous GAN inversion models. A specific model with the same name, egain, based on this architecture is also proposed, demonstrating superior reconstruction quality over state-of-the-art models, and illustrating the validity of the EGAIN architecture.

翻訳日:2023-12-27 19:51:58 公開日:2023-12-22

# 非退化パラメトリック増幅器のベリー位相とマンデルパラメータ

Berry phase and the Mandel parameter of the non-degenerate parametric amplifier ( http://arxiv.org/abs/2312.15114v1 )

ライセンス: Link先を確認

J. C. Vega, E. Chore\~no, D. Ojeda-Guill\'en and R. D. Mota

(参考訳) 我々は、$SU(1,1)$群の代数的アプローチから非退化パラメトリック増幅問題を研究する。我々は、この問題のハミルトニアンを$SU(1,1)$群のボソン生成子と差分作用素の項で記述する。我々は、このハミルトニアンを正確に解くために傾き変換を適用し、そのエネルギースペクトルと固有関数を得る。そして、ハミルトニアンが時間の明示的な関数であると仮定することで、ベリー位相を計算する。最後に、光子数 $n_a$ と $n_b$ の Mandel $Q-$parameter を得る。

We study the non-degenerate parametric amplifier problem from an algebraic approach of the $SU(1,1)$ group. We write the Hamiltonian of this problem in terms of the boson generators of the $SU(1,1)$ group and the difference operator. We apply the tilting transformation to our results to exactly solve this Hamiltonian and obtain its energy spectrum and eigenfunctions. Then, by assuming that our Hamiltonian is an explicit function of time we calculate its Berry phase. Finally we obtain the Mandel $Q-$parameter of the photon numbers $n_a$ and $n_b$.

翻訳日:2023-12-27 19:51:44 公開日:2023-12-22

# ドライバーと歩行者の相互作用を理解してドライバーの利得を予測する:ミネソタ州で収集された自然主義的オープンソースデータセット

Understanding driver-pedestrian interactions to predict driver yielding: naturalistic open-source dataset collected in Minnesota ( http://arxiv.org/abs/2312.15113v1 )

ライセンス: Link先を確認

Tianyi Li, Joshua Klavins, Te Xu, Niaz Mahmud Zafri, Raphael Stern

(参考訳) 交通量、車両速度、道路特性など、ドライバーとペデストリアンの相互作用の成果に影響を与える多くの要因がある。これらの相互作用の個々の側面は研究されているが、特に建設環境がドライバの利得行動に与える影響を考えると、包括的で自然主義的な研究は欠落している。このギャップに対処するために、ミネソタ州横断の18の未指定交差点でビデオデータから収集された広範なオープンソースデータセットを紹介した。このデータセットは3000以上のインタラクションを文書化し、ドライバとペデストリアンのインタラクションと50以上の異なるコンテキスト変数の詳細なビューを提供する。個々のドライバーと歩行者のインタラクションとコンテキスト要因をカバーするデータは、https://github.com/tianyi17/pedestrian_yielding_data_MNで公開されている。ロジスティック回帰法を用いて,特定変数に基づいてドライバの利得を予測する分類モデルを開発した。分析の結果,自動車の速度,駐車場の存在,公園や学校に近い距離,道路横断道路の幅は,未指定交差点での運転者収量に大きな影響を及ぼすことがわかった。この研究は、米国で最も包括的なドライバー-ペデストリアンデータセットの1つに寄与し、交通安全改善のための貴重な洞察を提供する。この情報を利用できるようにすることで、ミネソタ州と米国中のコミュニティが歩行者の道路安全を改善する努力を続けていることを支援します。

Many factors influence the yielding result of a driver-pedestrian interaction, including traffic volume, vehicle speed, roadway characteristics, etc. While individual aspects of these interactions have been explored, comprehensive, naturalistic studies, particularly those considering the built environment's influence on driver-yielding behavior, are lacking. To address this gap, our study introduces an extensive open-source dataset, compiled from video data at 18 unsignalized intersections across Minnesota. Documenting more than 3000 interactions, this dataset provides a detailed view of driver-pedestrian interactions and over 50 distinct contextual variables. The data, which covers individual driver-pedestrian interactions and contextual factors, is made publicly available at https://github.com/tianyi17/pedestrian_yielding_data_MN. Using logistic regression, we developed a classification model that predicts driver yielding based on the identified variables. Our analysis indicates that vehicle speed, the presence of parking lots, proximity to parks or schools, and the width of major road crossings significantly influence driver yielding at unsignalized intersections. This study contributes to one of the most comprehensive driver-pedestrian datasets in the US, offering valuable insights for traffic safety improvements. By making this information available, our study will support communities across Minnesota and the United States in their ongoing efforts to improve road safety for pedestrians.

翻訳日:2023-12-27 19:51:35 公開日:2023-12-22

# 教師の多かれ少なかれ--知識蒸留における三方幾何学の活用

Less or More From Teacher: Exploiting Trilateral Geometry For Knowledge Distillation ( http://arxiv.org/abs/2312.15112v1 )

ライセンス: Link先を確認

Chengming Hu, Haolun Wu, Xuan Li, Chen Ma, Xi Chen, Jun Yan, Boyu Wang, Xue Liu

(参考訳) 知識蒸留は、より大きな教師ネットワークからのソフトな監督と地上の真実からのハードな監督を用いて、コンパクトな学生ネットワークを訓練することを目的としている。しかし、これらの監視信号のバランスをとる最適な知識融合比を決定することは依然として困難である。従来の方法では、通常、一定のあるいはヒューリスティックな融合比を頼りにしており、しばしば適切なバランスに欠ける。本研究では,教師と生徒の正当性を生かし,各生徒が各サンプルに対していかにその教師を模倣しているかを生かし,サンプルの知識融合比を学習するための適応的手法を提案する。本手法は,学生の予測値(S$),教師の予測値(T$),基礎的真理値(G$)の3値内幾何学的関係を自然に導く。外れ値の影響を均衡させるため、教師のグローバル平均予測$\bar{t}$を同じクラス内のサンプルに組み込むことで、サンプル間関係をさらに拡張する。単純なニューラルネットワークは、サンプル内およびサンプル間関係から、適応的でサンプル単位の知識融合比への暗黙のマッピングをバイレベル最適化方式で学習する。我々のアプローチは、様々なアーキテクチャやモデルサイズにまたがって適用可能な、シンプルで実用的で適応可能な知識蒸留ソリューションを提供する。広範な実験により、画像分類、攻撃検出、クリックスルー率予測において、他の損失再重み付け方法よりも一貫した改善が示されている。

Knowledge distillation aims to train a compact student network using soft supervision from a larger teacher network and hard supervision from ground truths. However, determining an optimal knowledge fusion ratio that balances these supervisory signals remains challenging. Prior methods generally resort to a constant or heuristic-based fusion ratio, which often falls short of a proper balance. In this study, we introduce a novel adaptive method for learning a sample-wise knowledge fusion ratio, exploiting both the correctness of teacher and student, as well as how well the student mimics the teacher on each sample. Our method naturally leads to the intra-sample trilateral geometric relations among the student prediction ($S$), teacher prediction ($T$), and ground truth ($G$). To counterbalance the impact of outliers, we further extend to the inter-sample relations, incorporating the teacher's global average prediction $\bar{T}$ for samples within the same class. A simple neural network then learns the implicit mapping from the intra- and inter-sample relations to an adaptive, sample-wise knowledge fusion ratio in a bilevel-optimization manner. Our approach provides a simple, practical, and adaptable solution for knowledge distillation that can be employed across various architectures and model sizes. Extensive experiments demonstrate consistent improvements over other loss re-weighting methods on image classification, attack detection, and click-through rate prediction.

翻訳日:2023-12-27 19:51:07 公開日:2023-12-22

# 視覚データ分析と最適化によるUASによる自動構造検査経路計画

UAS-based Automated Structural Inspection Path Planning via Visual Data Analytics and Optimization ( http://arxiv.org/abs/2312.15109v1 )

ライセンス: Link先を確認

Yuxiang Zhao, Benhao Lu, Mohamad Alipour

(参考訳) Unmanned Aerial Systems (UAS) はインフラ検査の分野で大きな注目を集めている。しかし、インフラの大規模かつ複雑な性質を考えると、自動化は検査作業の効率化と品質向上に不可欠である。この点において大きな問題の1つは、飛行時間を最小化しながらミッション目標を達成できる最適な自動飛行経路を選択することである。本稿では,構造検査の文脈における経路計画問題の効果的な定式化について述べる。カバレッジは、損傷検出性とパス長を目標として最小化するための制約として保証され、検査品質を確保しながら効率を最大化する。次に、視点の位置を決定する遺伝的アルゴリズムと、ポーズを計算する欲求アルゴリズムからなる経路計画問題を解くために、2段階のアルゴリズムを考案する。提案アルゴリズムの有効性と適用範囲を示すため,包括的感度解析を行った。また,実世界の構造検査要件を満たすため,提案手法の柔軟性を実証する手法として,飛行禁止ゾーンを用いた部分空間検査や集中検査などの応用例も提示した。結論として,本研究は,提案手法の実現可能性を強調し,uasに基づく構造検査ミッション計画に自動化を組み込むための基礎作業を確立する。

Unmanned Aerial Systems (UAS) have gained significant traction for their application in infrastructure inspections. However, considering the enormous scale and complex nature of infrastructure, automation is essential for improving the efficiency and quality of inspection operations. One of the core problems in this regard is electing an optimal automated flight path that can achieve the mission objectives while minimizing flight time. This paper presents an effective formulation for the path planning problem in the context of structural inspections. Coverage is guaranteed as a constraint to ensure damage detectability and path length is minimized as an objective, thus maximizing efficiency while ensuring inspection quality. A two-stage algorithm is then devised to solve the path planning problem, composed of a genetic algorithm for determining the positions of viewpoints and a greedy algorithm for calculating the poses. A comprehensive sensitivity analysis is conducted to demonstrate the proposed algorithm's effectiveness and range of applicability. Applied examples of the algorithm, including partial space inspection with no-fly zones and focused inspection, are also presented, demonstrating the flexibility of the proposed method to meet real-world structural inspection requirements. In conclusion, the results of this study highlight the feasibility of the proposed approach and establish the groundwork for incorporating automation into UAS-based structural inspection mission planning.

翻訳日:2023-12-27 19:50:42 公開日:2023-12-22

# 生成AIと建築史

Generative AI and the History of Architecture ( http://arxiv.org/abs/2312.15106v1 )

ライセンス: Link先を確認

Joern Ploennigs and Markus Berger

(参考訳) 最近の生成aiプラットフォームは、単純なテキストプロンプトからテキストや印象的なイメージを作成できる。これにより、アーキテクチャ履歴に関する知識を要約したり、アイデア、スケッチ、モデリングといった初期のデザインタスクで新しい創造的な仕事を引き出す強力なツールになります。しかし、建築史における生成的AIモデルの理解は、どの程度優れているのか? スタイルを適切に区別することを学んだか、あるいは情報を幻覚させるか? 本章では,これらのツールの知識の能力と境界を理解するために,異なるアーキテクチャスタイルのテキストと画像生成のための生成AIプラットフォームに対するこの問題について検討する。また、1億100万のMidjourneyクエリのデータセットを分析して、実践者がすでに特定のアーキテクチャ概念を問合っているかどうか、どのように分析しています。

Recent generative AI platforms are able to create texts or impressive images from simple text prompts. This makes them powerful tools for summarizing knowledge about architectural history or deriving new creative work in early design tasks like ideation, sketching and modelling. But, how good is the understanding of the generative AI models of the history of architecture? Has it learned to properly distinguish styles, or is it hallucinating information? In this chapter, we investigate this question for generative AI platforms for text and image generation for different architectural styles, to understand the capabilities and boundaries of knowledge of those tools. We also analyze how they are already being used by analyzing a data set of 101 million Midjourney queries to see if and how practitioners are already querying for specific architectural concepts.

翻訳日:2023-12-27 19:50:22 公開日:2023-12-22

# アナログコンピューティングのためのエネルギーベース学習アルゴリズムの比較研究

Energy-based learning algorithms for analog computing: a comparative study ( http://arxiv.org/abs/2312.15103v1 )

ライセンス: Link先を確認

Benjamin Scellier, Maxence Ernoult, Jack Kendall, Suhas Kumar

(参考訳) エネルギーベースの学習アルゴリズムは最近、アナログ(ポストデジタル)ハードウェアとの互換性から、注目を集めている。既存のアルゴリズムには、コントラスト学習(cl)、平衡伝播(ep)、結合学習(cpl)があり、いずれも2つの状態と対照的に構成され、第1の状態から第2状態を得るのに使用される摂動の種類が異なる。しかし、これらのアルゴリズムは、同じモデルやデータセットと等価な基盤で明示的に比較されることはないため、スケーラビリティを評価し、実際にどれを選ぶかを決めるのが困難である。本研究では, 摂動の兆候に応じて, 7つの学習アルゴリズム,すなわちCLとEPとCpLの異なる変種を比較した。具体的には、これらの学習アルゴリズムを用いて、5つの視覚タスク(MNIST、F-MNIST、SVHN、CIFAR-10、CIFAR-100)で深層畳み込みホップフィールドネットワーク(DCHN)を訓練する。全てのアルゴリズムがMNISTに匹敵する性能をもたらすが、タスクの難しさが増すにつれて、性能上の重要な違いが生じる。私たちの重要な発見は、負の摂動は正の摂動よりも良いことを示し、ep(反対符号の2つの摂動を用いる)の中心的変種を最も優れたアルゴリズムとして強調する。また、これらの発見を理論的議論で裏付ける。さらに、DCHNを5つのデータセットすべてに対して、性能と速度の両方で新しいSOTA結果を確立する。特に,我々のDCHNシミュレーションは,非同期更新に基づく新しいエネルギー最小化アルゴリズムと,精度の低下(16ビット)を併用して実現したLabieux et al.(2021)の13.5倍高速である。

Energy-based learning algorithms have recently gained a surge of interest due to their compatibility with analog (post-digital) hardware. Existing algorithms include contrastive learning (CL), equilibrium propagation (EP) and coupled learning (CpL), all consisting in contrasting two states, and differing in the type of perturbation used to obtain the second state from the first one. However, these algorithms have never been explicitly compared on equal footing with same models and datasets, making it difficult to assess their scalability and decide which one to select in practice. In this work, we carry out a comparison of seven learning algorithms, namely CL and different variants of EP and CpL depending on the signs of the perturbations. Specifically, using these learning algorithms, we train deep convolutional Hopfield networks (DCHNs) on five vision tasks (MNIST, F-MNIST, SVHN, CIFAR-10 and CIFAR-100). We find that, while all algorithms yield comparable performance on MNIST, important differences in performance arise as the difficulty of the task increases. Our key findings reveal that negative perturbations are better than positive ones, and highlight the centered variant of EP (which uses two perturbations of opposite sign) as the best-performing algorithm. We also endorse these findings with theoretical arguments. Additionally, we establish new SOTA results with DCHNs on all five datasets, both in performance and speed. In particular, our DCHN simulations are 13.5 times faster with respect to Laborieux et al. (2021), which we achieve thanks to the use of a novel energy minimisation algorithm based on asynchronous updates, combined with reduced precision (16 bits).

翻訳日:2023-12-27 19:50:09 公開日:2023-12-22

# 肌色を伴わない顔画像品質評価のためのロバスト・スクレラ・セグメンテーション

Robust Sclera Segmentation for Skin-tone Agnostic Face Image Quality Assessment ( http://arxiv.org/abs/2312.15102v1 )

ライセンス: Link先を確認

Wassim Kabbani, Christoph Busch, Kiran Raja

(参考訳) 顔画像品質評価(FIQA)は、良好な顔認識性能を得るために重要である。 FIQAアルゴリズムは、人口統計要因に敏感で堅牢であるべきである。眼強膜は、年齢、民族、肌の色に関わらず、すべてのヒトにおいて一貫した白みがかった色をしている。本研究は,囲い込みにおける顔画像と境界制御顔認識シナリオに適した頑健な強膜分節法を提案する。このことは、スクレラピクセルの統計分析が、スキントーン、年齢、民族性に不変な特徴をいかに生み出すかを示し、したがって、人口統計学的要因に依存しないようにFIQAアルゴリズムに組み込むことができることを示している。

Face image quality assessment (FIQA) is crucial for obtaining good face recognition performance. FIQA algorithms should be robust and insensitive to demographic factors. The eye sclera has a consistent whitish color in all humans regardless of their age, ethnicity and skin-tone. This work proposes a robust sclera segmentation method that is suitable for face images in the enrolment and the border control face recognition scenarios. It shows how the statistical analysis of the sclera pixels produces features that are invariant to skin-tone, age and ethnicity and thus can be incorporated into FIQA algorithms to make them agnostic to demographic factors.

翻訳日:2023-12-27 19:49:35 公開日:2023-12-22

# Fix-Con: 自動フォールトローカライゼーションとディープラーニングモデル変換の修復

Fix-Con: Automatic Fault Localization and Repair of Deep Learning Model Conversions ( http://arxiv.org/abs/2312.15101v1 )

ライセンス: Link先を確認

Nikolaos Louloudakis, Perry Gibson, Jos\'e Cano, and Ajitha Rajan

(参考訳) ディープラーニングモデルをフレームワーク間で変換することは、デバイス間のモデル互換性を最大化し、ひとつのディープラーニングフレームワークでのみ提供される最適化機能を活用するための一般的なステップである。しかし、この変換プロセスはバグによって取り除かれ、変換されたモデルはデプロイ不能または問題なく、予測の正確性を著しく低下させる。本稿では,ディープラーニングフレームワーク間のモデル変換におけるフォールトローカライズと修復のための自動アプローチであるfix-conを提案する。 Fix-Conは、変換中にモデル入力、パラメータ、ハイパーパラメータ、モデルグラフに導入された障害を検出し、修正することができる。 Fix-Conでは、変換問題から抽出した一連のフォールトタイプを使用して、変換対象モデルの潜在的な変換障害をローカライズし、例えばターゲットモデルのパラメータをソースモデルに置き換えるなど、適切な修正を行う。これは、すべての差が解決されるまで、ソースモデルと変換対象モデルの間に出力ラベルの差があるデータセットのすべての画像に対して反復的に行われる。 4つの異なるディープラーニングフレームワークで変換された3つの画像認識モデルのモデル変換バグの修正におけるfix-conの有効性を評価した。全体として、Fix-Conは完全に修復できたか、15の誤変換ケースのうち14が大幅に改善された。

Converting deep learning models between frameworks is a common step to maximize model compatibility across devices and leverage optimization features that may be exclusively provided in one deep learning framework. However, this conversion process may be riddled with bugs, making the converted models either undeployable or problematic, considerably degrading their prediction correctness. We propose an automated approach for fault localization and repair, Fix-Con, during model conversion between deep learning frameworks. Fix-Con is capable of detecting and fixing faults introduced in model input, parameters, hyperparameters, and the model graph during conversion. Fix-Con uses a set of fault types mined from surveying conversion issues raised to localize potential conversion faults in the converted target model, and then repairs them appropriately, e.g. replacing the parameters of the target model with those from the source model. This is done iteratively for every image in the dataset with output label differences between the source model and the converted target model until all differences are resolved. We evaluate the effectiveness of Fix-Con in fixing model conversion bugs of three widely used image recognition models converted across four different deep learning frameworks. Overall, Fix-Con was able to either completely repair, or significantly improve the performance of 14 out of the 15 erroneous conversion cases.

翻訳日:2023-12-27 19:49:23 公開日:2023-12-22

# 大規模言語モデルにおける連鎖推論によるオンラインヘイトの変化

Moderating New Waves of Online Hate with Chain-of-Thought Reasoning in Large Language Models ( http://arxiv.org/abs/2312.15099v1 )

ライセンス: Link先を確認

Nishant Vishwamitra, Keyan Guo, Farhan Tajwar Romit, Isabelle Ondracek, Long Cheng, Ziming Zhao, Hongxin Hu

(参考訳) オンライン憎悪はインターネットユーザーの生活に悪影響を及ぼすエスカレートする問題であり、進化する出来事によって急激な変化を招き、新たなオンライン憎悪の波が重大な脅威をもたらす。これらの新たな波の検出と緩和は、ヘイトフルコンテンツの存在を判断するために推論に基づく複雑な意思決定を要求することと、トレーニングサンプルの可用性の制限によって検出モデルの更新が妨げられる、という2つの大きな課題をもたらす。この重要な問題に対処するために、オンライン憎悪の新しい波を効果的に緩和するHATEGUARDという新しいフレームワークを提案する。 HATEGUARDは、最近導入されたチェーン・オブ・ソート(CoT)プロンプト技術を利用して、大規模言語モデル(LLM)の機能を活用する推論ベースのアプローチを採用している。 hateguardはさらに、オンラインヘイトの新しい波に効果的に対応するために、新しいデロギ的用語とターゲットによる検出プロンプトを自動生成および更新することで、プロンプトベースのゼロショット検出を実現する。このアプローチの有効性を示すために、我々は、最近目撃された3つの新しい波、2022年のロシアによるウクライナ侵攻、2021年の米国議会議事堂の暴動、COVID-19パンデミックに関するツイートからなる新しいデータセットをコンパイルした。本研究は,イベントの進化と,それに対応するための既存のモデレーションツールを迅速に更新する技術の必要性について,これらの新しい波における重要な縦断パターンを明らかにした。最先端ツールに対する比較評価は、我々のフレームワークの優位性を示し、オンライン嫌悪の3つの新しい波の検出において、22.22%から83.33%の大幅な改善を示しました。我々の研究は、オンラインヘイトの新しい波の出現によって引き起こされる深刻な脅威を強調し、この脅威に現実的に対処するパラダイムシフトを表している。

Online hate is an escalating problem that negatively impacts the lives of Internet users, and is also subject to rapid changes due to evolving events, resulting in new waves of online hate that pose a critical threat. Detecting and mitigating these new waves present two key challenges: it demands reasoning-based complex decision-making to determine the presence of hateful content, and the limited availability of training samples hinders updating the detection model. To address this critical issue, we present a novel framework called HATEGUARD for effectively moderating new waves of online hate. HATEGUARD employs a reasoning-based approach that leverages the recently introduced chain-of-thought (CoT) prompting technique, harnessing the capabilities of large language models (LLMs). HATEGUARD further achieves prompt-based zero-shot detection by automatically generating and updating detection prompts with new derogatory terms and targets in new wave samples to effectively address new waves of online hate. To demonstrate the effectiveness of our approach, we compile a new dataset consisting of tweets related to three recently witnessed new waves: the 2022 Russian invasion of Ukraine, the 2021 insurrection of the US Capitol, and the COVID-19 pandemic. Our studies reveal crucial longitudinal patterns in these new waves concerning the evolution of events and the pressing need for techniques to rapidly update existing moderation tools to counteract them. Comparative evaluations against state-of-the-art tools illustrate the superiority of our framework, showcasing a substantial 22.22% to 83.33% improvement in detecting the three new waves of online hate. Our work highlights the severe threat posed by the emergence of new waves of online hate and represents a paradigm shift in addressing this threat practically.

翻訳日:2023-12-27 19:49:01 公開日:2023-12-22

# ディープニューラルネットワークを用いた教師なし聴覚・意味学習モデル

Unsupervised Auditory and Semantic Entrainment Models with Deep Neural Networks ( http://arxiv.org/abs/2312.15098v1 )

ライセンス: Link先を確認

Jay Kejriwal, Stefan Benus, Lina M. Rojas-Barahona

(参考訳) 話者は、会話のさまざまな側面において対話者と類似するようになると、エントレーメントとして知られる適応行動に関与する傾向がある。本稿では,テキストの特徴から意味のある表現を導き出す教師なしのディープラーニングフレームワークを提案する。本研究では,BERT モデル (DistilBERT と XLM-RoBERTa) と Google の普遍文エンコーダ (USE) を2つの人間 (HH) コーパス (The Fisher Corpus English Part 1, Columbia Games corpus) と1つの人間 (HM) コーパス (Voice Assistant Conversation Corpus (VACC)) に埋め込んだ特徴を抽出し,その性能について検討する。セマンティック機能に加えて、2つの聴覚埋め込み(TRILL)ベクトル、低レベル記述子(LLD)特徴)と2つの分析単位(Inter pausal unit and Turn)を用いてDNNベースのモデルを訓練した。その結果,本モデルでは,HHとHMの相互作用を区別し,音響特性を抽出する2つの分析単位が同等な結果をもたらすことが示唆された。

Speakers tend to engage in adaptive behavior, known as entrainment, when they become similar to their interlocutor in various aspects of speaking. We present an unsupervised deep learning framework that derives meaningful representation from textual features for developing semantic entrainment. We investigate the model's performance by extracting features using different variations of the BERT model (DistilBERT and XLM-RoBERTa) and Google's universal sentence encoder (USE) embeddings on two human-human (HH) corpora (The Fisher Corpus English Part 1, Columbia games corpus) and one human-machine (HM) corpus (Voice Assistant Conversation Corpus (VACC)). In addition to semantic features we also trained DNN-based models utilizing two auditory embeddings (TRIpLet Loss network (TRILL) vectors, Low-level descriptors (LLD) features) and two units of analysis (Inter pausal unit and Turn). The results show that semantic entrainment can be assessed with our model, that models can distinguish between HH and HM interactions and that the two units of analysis for extracting acoustic features provide comparable findings.

翻訳日:2023-12-27 19:48:11 公開日:2023-12-22

# 代弁的組立によるモデル多重度下での講義

Recourse under Model Multiplicity via Argumentative Ensembling ( http://arxiv.org/abs/2312.15097v1 )

ライセンス: Link先を確認

Junqi Jiang, Antonio Rago, Francesco Leofante, Francesca Toni

(参考訳) モデル重複度(model multiplicity, mm)は、同じ予測タスクを解決するために、複数の均等な機械学習モデルをトレーニングできる場合に発生する。近年の研究では、MMで得られたモデルが同一入力に対して一貫性のない予測を生成する可能性が示されている。これが起こると、モデル予測によって負の影響を受ける個人にリコメンデーションレコメンデーションを提供する一般的な手段である、反実的説明(CE)の提供が困難になる。本稿では,recourse-aware ensemblingと名づけたこの問題を定式化し,その解決法が満たすべきいくつかの望ましい性質を明らかにする。既存のセンシングメソッドは、cesのさまざまな方法で自然に拡張されているが、これらの特性を満たさないことを示している。次に,ces から mm へのロバスト性を保証するために計算的議論を展開し,カスタマイズ可能なユーザ嗜好を満たした議論的センスリングを導入する。理論的および実験的に、議論的アンサンブルは既存の手法に欠けている性質を満足し、トレードオフは最小のWrt精度であることを示す。

Model Multiplicity (MM) arises when multiple, equally performing machine learning models can be trained to solve the same prediction task. Recent studies show that models obtained under MM may produce inconsistent predictions for the same input. When this occurs, it becomes challenging to provide counterfactual explanations (CEs), a common means for offering recourse recommendations to individuals negatively affected by models' predictions. In this paper, we formalise this problem, which we name recourse-aware ensembling, and identify several desirable properties which methods for solving it should satisfy. We show that existing ensembling methods, naturally extended in different ways to provide CEs, fail to satisfy these properties. We then introduce argumentative ensembling, deploying computational argumentation to guarantee robustness of CEs to MM, while also accommodating customisable user preferences. We show theoretically and experimentally that argumentative ensembling satisfies properties which the existing methods lack, and that the trade-offs are minimal wrt accuracy.

翻訳日:2023-12-27 19:47:27 公開日:2023-12-22

# $\mathbb{Z}_3$対称性で保護される二次元トポロジカルパラマグネット:境界ハミルトニアンの性質

Two-dimensional topological paramagnets protected by $\mathbb{Z}_3$ symmetry: Properties of the boundary Hamiltonian ( http://arxiv.org/abs/2312.15095v1 )

ライセンス: Link先を確認

Hrant Topchyan, Vasilii Iugov, Mkhitar Mirumyan, Tigran S. Hakobyan, Tigran A. Sedrakyan, Ara G. Sedrakyan

(参考訳) 三角格子上に隙間のないエッジモードを持つ2次元$\mathbb{Z}_3$対称性保護トポロジー(SPT)3状態ポッツパラマグネットを体系的に構築する。まず, ギャップレスエッジの微視的格子モデルについて検討し, 密度行列再正規化群(dmrg)法を用いて, 低次励起スペクトルとエンタングルメントエントロピーの有限サイズスケーリングについて検討した。得られた結果に基づき、臨界エッジの普遍性クラス、すなわち対応する共形場理論と中心電荷を同定する。最後に、エッジモデルの固有対称性と2つのspt相を区別する創発的巻線対称性について考察する。その結果、二つの位相的に非自明な位相と自明な位相は、三重性をサポートする一般の1次元鎖を定義する。

We systematically construct two-dimensional $\mathbb{Z}_3$ symmetry-protected topological (SPT) three-state Potts paramagnets with gapless edge modes on a triangular lattice. First, we study microscopic lattice models for the gapless edge and, using the density-matrix renormalization group (DMRG) approach, investigate the finite size scaling of the low-lying excitation spectrum and the entanglement entropy. Based on the obtained results, we identify the universality class of the critical edge, namely the corresponding conformal field theory and the central charge. Finally, we discuss the inherent symmetries of the edge models and the emergent winding symmetry distinguishing between two SPT phases. As a result, the two topologically nontrivial and the trivial phases define a general one-dimensional chain supporting a tricriticality, which we argue supports a gapless SPT order in one dimension.

翻訳日:2023-12-27 19:46:56 公開日:2023-12-22

# 2つのステップと1つのステップバック:CPRAの下で販売をオプトアウトする権利

Two Steps Forward and One Step Back: The Right to Opt-out of Sale under CPRA ( http://arxiv.org/abs/2312.15094v1 )

ライセンス: Link先を確認

Jan Charatan and Eleanor Birrell

(参考訳) カリフォルニア州プライバシ・ライツ法(California Privacy Rights Act、CPRA)は、カリフォルニア州消費者プライバシ法(CCPA)を改正した法案である。プライバシの権利の拡大と強化をめざすことが多いが、以前の法律の変更とCPRAガイドラインの以前の草案の変更の両方で、テキストによる改訂の綿密な分析は、現実がより微妙なものになる可能性を示唆している。本研究では,cpraにおける販売オプトアウトの権利に悪影響を及ぼす可能性がある3つのテキストリビジョンを特定し,これらのリビジョンの効果を,(1)12ヶ月にわたる25,000サイトを対象とした大規模縦断調査,(2)多作で募集された775人の実験ユーザ調査を用いて評価した。すべてのリビジョンは、販売をオプトアウトする権利のユーザビリティ、スコープ、可視性に悪影響を及ぼすことが分かりました。その結果,インターネットのプライバシーに対するCPRAの影響を総合的に評価した。彼らはまた、法律が施行された後にガイドラインと事例法が進化するにつれて、法的要件の継続的な評価の重要性を強調している。

The California Privacy Rights Act (CPRA) was a ballot initiative that revised the California Consumer Privacy Act (CCPA). Although often framed as expanding and enhancing privacy rights, a close analysis of textual revisions -- both changes from the earlier law and changes from earlier drafts of the CPRA guidelines -- suggest that the reality might be more nuanced. In this work, we identify three textual revisions that have potential to negatively impact the right to opt-out of sale under CPRA and evaluate the effect of these textual revisions using (1) a large-scale longitudinal measurement study of 25,000 websites over twelve months and (2) an experimental user study with 775 participants recruited through Prolific. We find that all revisions negatively impacted the usability, scope, and visibility of the right to opt-out of sale. Our results provide the first comprehensive evaluation of the impact of CPRA on Internet privacy. They also emphasize the importance of continued evaluation of legal requirements as guidelines and case law evolve after a law goes into effect.

翻訳日:2023-12-27 19:46:15 公開日:2023-12-22

# 通信遅延のない非同期確率近似の安定性に関する一考察

A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays ( http://arxiv.org/abs/2312.15091v1 )

ライセンス: Link先を確認

Huizhen Yu, Yi Wan, Richard S. Sutton

(参考訳) 本稿では,通信遅延のない非同期確率近似アルゴリズムについて検討する。我々の主な貢献は、より一般的な雑音条件を調節することによってボルカーとメインの手法を拡張するこれらのアルゴリズムの安定性証明である。また, この安定性から収束結果を導出し, 重要な平均回帰強化学習問題への応用について考察した。

In this paper, we study asynchronous stochastic approximation algorithms without communication delays. Our main contribution is a stability proof for these algorithms that extends a method of Borkar and Meyn by accommodating more general noise conditions. We also derive convergence results from this stability result and discuss their application in important average-reward reinforcement learning problems.

翻訳日:2023-12-27 19:45:51 公開日:2023-12-22

# 敵対的模倣学習の自動エンコーディング

Auto-Encoding Adversarial Imitation Learning ( http://arxiv.org/abs/2206.11004v4 )

ライセンス: Link先を確認

Kaifeng Zhang, Rui Zhao, Ziming Zhang, Yang Gao

(参考訳) 強化学習(rl)は意思決定のための強力なフレームワークを提供するが、実際には注意深く設計された報酬機能を必要とすることが多い。 AIL(Adversarial Imitation Learning)は、環境からの報酬信号にアクセスせずに自動ポリシー取得に光を当てる。本稿では,堅牢でスケーラブルな AIL フレームワークである Auto-Encoding Adversarial Imitation Learning (AEAIL) を提案する。 AEAILは、実証から専門家ポリシーを誘導するため、オートエンコーダの再構成エラーを報奨信号として利用し、従来の差別者ベースのものよりも、ポリシーを最適化するための情報を提供する。その後、導出した目的関数を用いてオートエンコーダとエージェントポリシーを訓練する。実験の結果,AEAILは現状および画像ベース環境において,最先端の手法よりも優れていることがわかった。さらに重要なのは、AEAILは、専門家によるデモが騒々しいときに、はるかに優れた堅牢性を示します。

Reinforcement learning (RL) provides a powerful framework for decision-making, but its application in practice often requires a carefully designed reward function. Adversarial Imitation Learning (AIL) sheds light on automatic policy acquisition without access to the reward signal from the environment. In this work, we propose Auto-Encoding Adversarial Imitation Learning (AEAIL), a robust and scalable AIL framework. To induce expert policies from demonstrations, AEAIL utilizes the reconstruction error of an auto-encoder as a reward signal, which provides more information for optimizing policies than the prior discriminator-based ones. Subsequently, we use the derived objective functions to train the auto-encoder and the agent policy. Experiments show that our AEAIL performs superior compared to state-of-the-art methods on both state and image based environments. More importantly, AEAIL shows much better robustness when the expert demonstrations are noisy.

翻訳日:2023-12-25 19:13:39 公開日:2023-12-22

# 低リソース言語に対するテキスト正規化--Ligurianの場合

Text normalization for low-resource languages: the case of Ligurian ( http://arxiv.org/abs/2206.07861v2 )

ライセンス: Link先を確認

Stefano Lusito and Edoardo Ferrante and Jean Maillard

(参考訳) テキストの正規化は、厳格な綴り規則を欠いた低リソース言語や、複数の綴り改革を行った言語にとって重要な技術である。これまでのところ、低リソースのテキスト正規化は手作りのルールに依存しており、これはニューラルネットワークよりもデータ効率が高いと考えられている。本稿では,絶滅危惧言語であるリグリア語のテキスト正規化事例について検討する。正規化バージョンと組み合わせた4,394のLigurian文と、Ligurian用の最初のオープンソースモノリンガルコーパスを収集する。少ないデータ量にもかかわらず、バックトランスや適切なトークン化を用いることで、コンパクトなトランスフォーマーベースのモデルを非常に低いエラー率を達成するように訓練できることを実証する。

Text normalization is a crucial technology for low-resource languages which lack rigid spelling conventions or that have undergone multiple spelling reforms. Low-resource text normalization has so far relied upon hand-crafted rules, which are perceived to be more data efficient than neural methods. In this paper we examine the case of text normalization for Ligurian, an endangered Romance language. We collect 4,394 Ligurian sentences paired with their normalized versions, as well as the first open source monolingual corpus for Ligurian. We show that, in spite of the small amounts of data available, a compact transformer-based model can be trained to achieve very low error rates by the use of backtranslation and appropriate tokenization.

翻訳日:2023-12-25 19:13:23 公開日:2023-12-22

# 新型コロナウイルス:パンデミックにおける病原体関連データ共有の連続的障害の探索

COVID-19: An exploration of consecutive systemic barriers to pathogen-related data sharing during a pandemic ( http://arxiv.org/abs/2205.12098v3 )

ライセンス: Link先を確認

Yo Yehudi, Lukas Hughes-Noehrer, Carole Goble and Caroline Jay

(参考訳) 2020年、新型コロナウイルスのパンデミックは世界中の政府や研究者から急速に反応した。 2023年後半には、新型コロナウイルス(COVID-19)の影響で数百万人以上が死亡し、多くの生存者が数週間、数ヶ月、数年の長期的影響を経験している。パンデミックに関連するデータを扱う人々は、このデータにアクセス、共有、再利用するための重要なシステム的障壁に直面していることが多い。本稿では、ソーシャルメディア、移動性、ウイルスゲノム、検査、感染、入院、死亡など、新型コロナウイルス関連のデータ型を扱うデータ専門家にインタビューを行った結果について報告する。これらのデータタイプは、パンデミックのスプレッド・モデリング、医療システムのストレス・アウェアネス、およびcovid-19治療の考案のために様々な用途に使用される。 Barriers to data access, sharing and re-use include the cost of access to data (primarily certain healthcare sources and mobility data from mobile phone carriers), human throughput bottlenecks, unclear pathways to request access to data, unnecessarily strict access controls and data re-use policies, unclear data provenance, inability to link separate data sources that could collectively create a more complete picture, poor adherence to metadata standards, and a lack of computer-suitable data formats.

In 2020, the COVID-19 pandemic resulted in a rapid response from governments and researchers worldwide. As of late 2023, over millions have died as a result of COVID-19, with many COVID-19 survivors going on to experience long-term effects weeks, months, or years after their illness. Despite this staggering toll, those who work with pandemic-relevant data often face significant systemic barriers to accessing, sharing or re-using this data. In this paper we report results of a study, where we interviewed data professionals working with COVID-19-relevant data types including social media, mobility, viral genome, testing, infection, hospital admission, and deaths. These data types are variously used for pandemic spread modelling, healthcare system strain awareness, and devising therapeutic treatments for COVID-19. Barriers to data access, sharing and re-use include the cost of access to data (primarily certain healthcare sources and mobility data from mobile phone carriers), human throughput bottlenecks, unclear pathways to request access to data, unnecessarily strict access controls and data re-use policies, unclear data provenance, inability to link separate data sources that could collectively create a more complete picture, poor adherence to metadata standards, and a lack of computer-suitable data formats.

翻訳日:2023-12-25 19:13:12 公開日:2023-12-22

# 密度行列を用いた量子密度推定:量子異常検出への応用

Quantum density estimation with density matrices: Application to quantum anomaly detection ( http://arxiv.org/abs/2201.10006v4 )

ライセンス: Link先を確認

Diego H. Useche, Oscar A. Bustos-Brinez, Joseph A. Gallego, Fabio A. Gonz\'alez

(参考訳) 密度推定は統計学と機械学習の中心的なタスクである。この問題は、観測されたデータセットに最もよく適合する基礎となる確率密度関数を決定することを目的としている。応用例としては、統計的推論、教師なし学習、異常検出などがある。その関連性にもかかわらず、量子コンピューティングの密度推定への応用を探求した研究は少ない。本稿では,密度行列の期待値と量子フーリエ特徴と呼ばれる新しい量子埋め込みに基づく,量子古典的密度行列密度推定モデルq-demdeを提案する。量子ハードウェアを用いて、混合量子状態によるトレーニングデータの確率分布を構築する。コアサブルーチンとして,量子コンピュータ上でのスペクトル分解から混合密度行列の期待値を推定する新しいアルゴリズムを提案する。さらに,本手法の量子古典的異常検出への応用について述べる。量子シミュレータと実量子コンピュータの異なるデータセット上の量子ランダムおよび量子適応フーリエ特徴を用いた密度推定モデルの評価を行った。この研究の重要な結果は、現在の量子コンピュータで高い性能で密度推定と異常検出を行うことができることを示すことである。

Density estimation is a central task in statistics and machine learning. This problem aims to determine the underlying probability density function that best aligns with an observed data set. Some of its applications include statistical inference, unsupervised learning, and anomaly detection. Despite its relevance, few works have explored the application of quantum computing to density estimation. In this article, we present a novel quantum-classical density matrix density estimation model, called Q-DEMDE, based on the expected values of density matrices and a novel quantum embedding called quantum Fourier features. The method uses quantum hardware to build probability distributions of training data via mixed quantum states. As a core subroutine, we propose a new algorithm to estimate the expected value of a mixed density matrix from its spectral decomposition on a quantum computer. In addition, we present an application of the method for quantum-classical anomaly detection. We evaluated the density estimation model with quantum random and quantum adaptive Fourier features on different data sets on a quantum simulator and a real quantum computer. An important result of this work is to show that it is possible to perform density estimation and anomaly detection with high performance on present-day quantum computers.

翻訳日:2023-12-25 19:12:50 公開日:2023-12-22

# ランダムデータに欠落したモデルベースクラスタリング

Model-based Clustering with Missing Not At Random Data ( http://arxiv.org/abs/2112.10425v4 )

ライセンス: Link先を確認

Aude Sportisse (UCA, MAASAI), Matthieu Marbac (UR, ENSAI, CNRS, CREST), Fabien Laporte (Nantes Univ, CNRS, ITX-lab), Gilles Celeux (CELESTE), Claire Boyer (SU, LPSM (UMR\_8001), MOKAPLAN), Julie Josse (IDESP, PREMEDICAL), Christophe Biernacki (CNRS, MODAL)

(参考訳) モデルベースの教師なし学習は、学習タスクとして、データが失われるとすぐに停止します。これは、欠落したデータが情報化されている場合や、不明なデータがランダムではない場合(MNAR)にさらに真実である。本稿では、mnarデータを含む非常に一般的なデータ型を扱うように設計されたモデルベースクラスタリングアルゴリズムを提案する。そこで本研究では,データ分布とMNAR機構を協調的にモデル化するために,データの種類(連続的,数的,分類的,混合的)の混合モデルを導入する。いくつかのmnarモデルについて議論され、欠落の原因は欠落した変数自体の値とクラスメンバシップの両方に依存する。しかし、MNARzと呼ばれる特定のMNARモデルに焦点をあて、欠落はクラスメンバーシップにのみ依存する。まず, 標準mar機構を考慮し, 紛失マスクと連結したデータ行列上で統計的推論を行うことにより, 推定の容易さを強調する。そこで我々は,この単純化された再解釈のために開発された期待最大化アルゴリズムを用いてクラスタリングを行う。最後に,提案手法の合成データおよび実際の医療用レジストリであるTraumaBase上での数値的性能を評価した。

Model-based unsupervised learning, as any learning task, stalls as soon as missing data occurs. This is even more true when the missing data are informative, or said missing not at random (MNAR). In this paper, we propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data. To do so, we introduce a mixture model for different types of data (continuous, count, categorical and mixed) to jointly model the data distribution and the MNAR mechanism, remaining vigilant to the relative degrees of freedom of each. Several MNAR models are discussed, for which the cause of the missingness can depend on both the values of the missing variable themselves and on the class membership. However, we focus on a specific MNAR model, called MNARz, for which the missingness only depends on the class membership. We first underline its ease of estimation, by showing that the statistical inference can be carried out on the data matrix concatenated with the missing mask considering finally a standard MAR mechanism. Consequently, we propose to perform clustering using the Expectation Maximization algorithm, specially developed for this simplified reinterpretation. Finally, we assess the numerical performances of the proposed methods on synthetic data and on the real medical registry TraumaBase as well.

翻訳日:2023-12-25 19:12:35 公開日:2023-12-22

# 説明可能な深層学習による壁面乱流の重要領域の同定

Identifying regions of importance in wall-bounded turbulence through explainable deep learning ( http://arxiv.org/abs/2302.01250v3 )

ライセンス: Link先を確認

Andres Cremades, Sergio Hoyas, Rahul Deshpande, Pedro Quintero, Martin Lellep, Will Junghoon Lee, Jason Monty, Nicholas Hutchins, Moritz Linkmann, Ivan Marusic, Ricardo Vinuesa

(参考訳) その科学的、技術的重要性にもかかわらず、壁境界乱流は古典物理学において未解決の問題であり、新しい視点に取り組む必要がある。重要な戦略の1つは、流れ中のエネルギーを含むコヒーレント構造間の相互作用を研究することである。このような相互作用を,説明可能な深層学習法を用いて初めて検討した。乱流流シミュレーションから得られた瞬時速度場を用いて,U-netアーキテクチャを用いて時間内速度場を予測する。予測フローに基づいて,SHAP(SHapley Additive exPlanations)のゲーム理論アルゴリズムを用いて,この予測における各構造の重要性を評価する。この研究は、文献における以前の観測結果と一致し、フローにおける最も重要な構造が必ずしもレイノルズせん断応力に最も寄与した構造であるとは限らないことを明らかにすることでそれらを拡張した。また,本手法を実験データベースに適用し,その重要度に基づいて全く新しい構造を同定する。この枠組みは、流れ制御の新しい戦略を含む多数の壁境界乱流の基本的な現象に光を当てる可能性がある。

Despite its great scientific and technological importance, wall-bounded turbulence is an unresolved problem in classical physics that requires new perspectives to be tackled. One of the key strategies has been to study interactions among the energy-containing coherent structures in the flow. Such interactions are explored in this study for the first time using an explainable deep-learning method. The instantaneous velocity field obtained from a turbulent channel flow simulation is used to predict the velocity field in time through a U-net architecture. Based on the predicted flow, we assess the importance of each structure for this prediction using the game-theoretic algorithm of SHapley Additive exPlanations (SHAP). This work provides results in agreement with previous observations in the literature and extends them by revealing that the most important structures in the flow are not necessarily the ones with the highest contribution to the Reynolds shear stress. We also apply the method to an experimental database, where we can identify completely new structures based on their importance score. This framework has the potential to shed light on numerous fundamental phenomena of wall-bounded turbulence, including novel strategies for flow control.

翻訳日:2023-12-25 19:08:37 公開日:2023-12-22

# テキストスタイル転送のためのプロンプトベース編集

Prompt-Based Editing for Text Style Transfer ( http://arxiv.org/abs/2301.11997v2 )

ライセンス: Link先を確認

Guoqing Luo, Yu Tong Han, Lili Mou, Mauajama Firdaus

(参考訳) テキストプロンプト(textual prompt)は、事前学習された言語モデルにクエリし、スタイル変換されたテキストを単語毎に自己回帰的に生成するために使用される。しかし、このような生成プロセスは制御しにくく、早期予測エラーは将来の単語予測に影響を及ぼす可能性がある。本稿では,テキストスタイル転送のためのプロンプトベースの編集手法を提案する。具体的には,事前学習した言語モデルを用いてスタイル分類を行い,分類確率を用いてスタイルスコアを計算する。次に,単語レベルの編集による離散探索を行い,スタイル変換タスクの総合的スコアリング関数を最大化する。このように、プロンプトに基づく生成問題を、学習フリーなプロセスであり、文の自己回帰生成よりも制御しやすい分類問題に変換する。私たちの実験では、3つのスタイル転送ベンチマークデータセットで自動評価とヒューマン評価の両方を行い、このアプローチが20倍のパラメータを持つ最先端システムを大きく上回っていることを示した。さらなる実証分析は、我々のアプローチの有効性をさらに示します。

Prompting approaches have been recently explored in text style transfer, where a textual prompt is used to query a pretrained language model to generate style-transferred texts word by word in an autoregressive manner. However, such a generation process is less controllable and early prediction errors may affect future word predictions. In this paper, we present a prompt-based editing approach for text style transfer. Specifically, we prompt a pretrained language model for style classification and use the classification probability to compute a style score. Then, we perform discrete search with word-level editing to maximize a comprehensive scoring function for the style-transfer task. In this way, we transform a prompt-based generation problem into a classification one, which is a training-free process and more controllable than the autoregressive generation of sentences. In our experiments, we performed both automatic and human evaluation on three style-transfer benchmark datasets, and show that our approach largely outperforms the state-of-the-art systems that have 20 times more parameters. Additional empirical analyses further demonstrate the effectiveness of our approach.

翻訳日:2023-12-25 19:07:59 公開日:2023-12-22

# 相空間と水素原子における一般化力学理論

Generalized dynamical theories in phase space and the hydrogen atom ( http://arxiv.org/abs/2212.12267v2 )

ライセンス: Link先を確認

Martin Pl\'avala and Matthias Kleinmann

(参考訳) 一般確率論の位相空間定式化は一般化された時間発展を含むように拡張でき、安定で離散エネルギー準位を持ちゼーマン効果を含む非量子水素系を記述することができる。これにより、共鳴レーザーとラザフォード散乱による水素様系の励起などの動的効果を研究することができる。我々の構成は、古典理論と量子論は位相空間における一般確率論の特定の選択と見なすことができ、他の確率論も測定可能な予測をもたらすことを示した。

We show that the phase-space formulation of general probabilistic theories can be extended to include a generalized time-evolution and that it can describe a nonquantum hydrogen-like system which is stable, has discrete energy levels, and includes the Zeeman effect. This allows us to study dynamical effects such as excitations of the hydrogen-like system by a resonant laser and Rutherford scattering. Our construction demonstrates that classical theory and quantum theory can be seen as specific choices of general probabilistic theory in phase space and that other probabilistic theories also lead to measurable predictions.

翻訳日:2023-12-25 19:07:43 公開日:2023-12-22

# Reduce&chop: より深い問題のための浅回路

Reduce&chop: Shallow circuits for deeper problems ( http://arxiv.org/abs/2212.11862v3 )

ライセンス: Link先を確認

Adri\'an P\'erez-Salinas, Radoica Dra\v{s}ki\'c, Jordi Tura, Vedran Dunjko

(参考訳) 最先端の量子コンピュータは、量子ビット数と計算深度に制限のある回路しか確実に実行できない。これにより、実行可能なアルゴリズムの範囲が大幅に削減される。数量子ビットデバイスを利用するために多くの技術が発明されているが、深さ制限計算の対応するスキームは研究されていない。本研究は、より浅いデバイスを繰り返し使用することにより、より深い量子計算の性能をどの程度模倣できるかを考察する。この目的のために、与えられた回路を2つに切断するFeynmanシミュレーションにインスパイアされた手法を提案する。第1片は早期に実行され測定され、第2片は前の結果に基づいて実行される。この方法は、可能な結果の数が多いため、直接的に適用した場合は非効率である。この問題を軽減するために,既定義の許容限界内における手法の複雑さの維持を目的とした浅変分回路を提案し,そのような回路を見つけるための新しい最適化手法を提案する。これらの手法の成分の合成は reduce\&chop と呼ばれる。私たちが議論するとおり、このアプローチは特定のケースで有効です。この研究は、浅い量子コンピュータの可能性を活用するための新しい研究を刺激する可能性がある。

State-of-the-art quantum computers can only reliably execute circuits with limited qubit numbers and computational depth. This severely reduces the scope of algorithms that can be run. While numerous techniques have been invented to exploit few-qubit devices, corresponding schemes for depth-limited computations are less explored. This work investigates to what extent we can mimic the performance of a deeper quantum computation by repeatedly using a shallower device. We propose a method for this purpose, inspired by Feynman simulation, where a given circuit is chopped in two pieces. The first piece is executed and measured early on, and the second piece is run based on the previous outcome. This method is inefficient if applied in a straightforward manner due to the high number of possible outcomes. To mitigate this issue, we propose a shallow variational circuit, whose purpose is to maintain the complexity of the method within pre-defined tolerable limits, and provide a novel optimisation method to find such circuit. The composition of these components of the methods is called reduce\&chop. As we discuss, this approach works for certain cases of interest. We believe this work may stimulate new research towards exploiting the potential of shallow quantum computers.

翻訳日:2023-12-25 19:07:34 公開日:2023-12-22

# 統計的推論としての説明可能性

Explainability as statistical inference ( http://arxiv.org/abs/2212.03131v2 )

ライセンス: Link先を確認

Hugo Henri Joseph Senetaire, Damien Garreau, Jes Frellsen, Pierre-Alexandre Mattei

(参考訳) 近年、様々なモデル説明アプローチが提案されており、いずれも非常に異なる理論とヒューリスティックによって導かれている。本稿では,統計的推論問題として新しい経路と解釈可能性を提案する。本稿では,解釈可能な予測を生成するために設計された一般の深部確率モデルを提案する。モデルパラメータは最大確率で学習でき、この方法は任意の予測器ネットワークアーキテクチャと任意の種類の予測問題に適用することができる。本手法は,ニューラルネットワークをセレクタとして使用し,推論時の解釈を高速に行う無形解釈モデルの一例である。いくつかの一般的な解釈可能性法は、一般モデルに対する正規化極大確率の特別な場合であることが示されている。そこで本稿では,特徴重要度マップの評価を可能にする,真理選択に基づく新しいデータセットを提案する。これらのデータセットを用いて、複数の命令を用いることでより合理的な解釈が得られることを示す。

A wide variety of model explanation approaches have been proposed in recent years, all guided by very different rationales and heuristics. In this paper, we take a new route and cast interpretability as a statistical inference problem. We propose a general deep probabilistic model designed to produce interpretable predictions. The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture and any type of prediction problem. Our method is a case of amortized interpretability models, where a neural network is used as a selector to allow for fast interpretation at inference time. Several popular interpretability methods are shown to be particular cases of regularised maximum likelihood for our general model. We propose new datasets with ground truth selection which allow for the evaluation of the features importance map. Using these datasets, we show experimentally that using multiple imputation provides more reasonable interpretations.

翻訳日:2023-12-25 19:07:16 公開日:2023-12-22

# 開量子多体系におけるデコヒーレンス過程の準粒子:インコヒーレントン

Quasiparticles of Decoherence Processes in Open Quantum Many-Body Systems: Incoherentons ( http://arxiv.org/abs/2211.14991v2 )

ライセンス: Link先を確認

Taiki Haga, Masaya Nakagawa, Ryusuke Hamazaki, Masahito Ueda

(参考訳) 開量子系の緩和ダイナミクスは、系のコヒーレントハミルトン力学と環境との相互作用による散逸力学との競合によって決定される。したがって、コヒーレント体制から非コヒーレント体制への移行を理解することは基本的な関心事である。ヒッヘルト非認識準粒子(インコヒーレントン)は、開量子多体系の力学を支配するリウヴィリア超作用素の固有モデムにおけるコヒーレント-非コヒーレント遷移を記述する。ここで、インコヒーレントンは、系の密度行列を表す補助ラダー系において、鎖間結合状態として定義される。リウヴィリアン固有モードは、関連するインコヒーレントンの数を反映する異なる減衰率を持つ群に分類される。また、固有モードの異なるグループを分離するスペクトルギャップ(量子コヒーレンスギャップ)も導入します。我々は, 劣化を受ける格子ボソンモデルにおけるインコヒーレントンの存在を実証し, インコヒーレントンが分解されると量子コヒーレンスギャップが閉じることを示し, 指数的崩壊による非コヒーレント緩和からコヒーレント振動緩和への動的遷移を示す。さらに, 量子多体系のデコヒーレンスダイナミクスが, インコヒーレントンの生成, 局在, 拡散の観点でどのように理解できるかを考察する。

The relaxation dynamics of an open quantum system is determined by the competition between the coherent Hamiltonian dynamics of a system and the dissipative dynamics due to interactions with environments. It is therefore of fundamental interest to understand the transition from the coherent to incoherent regimes. We find that hitherto unrecognized quasiparticles -- incoherentons -- describe this coherent-to-incoherent transition in eigenmodes of a Liouvillian superoperator that governs the dynamics of an open quantum many-body system. Here, an incoherenton is defined as an interchain bound state in an auxiliary ladder system that represents the density matrix of a system. The Liouvillian eigenmodes are classified into groups with different decay rates that reflect the number of incoherentons involved therein. We also introduce a spectral gap -- quantum coherence gap -- that separates the different groups of eigenmodes. We demonstrate the existence of incoherentons in a lattice boson model subject to dephasing, and show that the quantum coherence gap closes when incoherentons are deconfined, which signals a dynamical transition from incoherent relaxation with exponential decay to coherent oscillatory relaxation. Furthermore, we discuss how the decoherence dynamics of quantum many-body systems can be understood in terms of the generation, localization, and diffusion of incoherentons.

翻訳日:2023-12-25 19:06:55 公開日:2023-12-22

# FI-ODE:ニューラル・オードにおけるロバストな前方不変性

FI-ODE: Certifiably Robust Forward Invariance in Neural ODEs ( http://arxiv.org/abs/2210.16940v4 )

ライセンス: Link先を確認

Yujia Huang, Ivan Dario Jimenez Rodriguez, Huan Zhang, Yuanyuan Shi, Yisong Yue

(参考訳) フォワード不変性(フォワード不変性、Forward invariance)とは、制御理論において、力学系が常に指定された状態の集合内に留まり、堅牢性を保証する(例えば、証明書は摂動の下で保持される)ことを証明するために用いられる長期研究された性質である。本稿では,ニューラルネットワークにおけるフォワード不変性の証明とトレーニングのための一般的なフレームワークを提案する。このフレームワークは、堅牢な継続的制御において認証された安全性を提供する。私たちの知る限りでは、このような保証のない保証でNeural ODEポリシーをトレーニングする最初の例です。さらに,画像分類の可逆的ロバスト性を証明するために,このフレームワークの汎用性について検討する。

Forward invariance is a long-studied property in control theory that is used to certify that a dynamical system stays within some pre-specified set of states for all time, and also admits robustness guarantees (e.g., the certificate holds under perturbations). We propose a general framework for training and provably certifying robust forward invariance in Neural ODEs. We apply this framework to provide certified safety in robust continuous control. To our knowledge, this is the first instance of training Neural ODE policies with such non-vacuous certified guarantees. In addition, we explore the generality of our framework by using it to certify adversarial robustness for image classification.

翻訳日:2023-12-25 19:06:28 公開日:2023-12-22

# 量子ソボレフ不等式について

On Quantum Sobolev Inequalities ( http://arxiv.org/abs/2210.03013v3 )

ライセンス: Link先を確認

Laurent Lafleche

(参考訳) 位相空間における古典ソボレフ不等式(英語版)の量子アナログを、可換子のシャッテンノルムによって定義される量子ソボレフノルムを用いて検討する。これらの不等式はウィグナー・ヤネーゼのスキュー情報に対する不確実性原理を提供し、またその記号の観点からワイル量子化のシャッテンノルムに新しい境界をもたらす。中間ツールとして、畳み込みの半古典的なアナログに対するハーディ・リトルウッド・ソボレフの不等式の類似を取得し、量子ベソフ空間を導入する。明示的な推定は最適定数で得られる。

We investigate the quantum analogue of the classical Sobolev inequalities in the phase space, with the quantum Sobolev norms defined in terms of Schatten norms of commutators. These inequalities provide an uncertainty principle for the Wigner-Yanase skew information, and also lead to new bounds on the Schatten norms of the Weyl quantization in terms of its symbol. As an intermediate tool, we obtain the analogue of Hardy-Littlewood-Sobolev's inequalities for a semiclassical analogue of the convolution, and introduce quantum Besov spaces. Explicit estimates are obtained on the optimal constants.

翻訳日:2023-12-25 19:05:40 公開日:2023-12-22

# 2つの両複素および1つの多重複素最小平均平方アルゴリズム

Two Bicomplex and One Multicomplex Least Mean Square algorithms ( http://arxiv.org/abs/2209.11899v2 )

ライセンス: Link先を確認

Daniel Alpay, Kamal Diki, Mihaela Vajiac

(参考訳) 我々は1960年にWidrow and Hoff for Adaptive Linear Neuron (ADALINE)によって発明されたLMSアルゴリズムから着想を得た、複素および複複素条件における新しい勾配作用素を研究、導入した。これらの勾配演算子は、両複素最小平均平方(BLMS)アルゴリズムの新しい学習規則を定式化するために使用され、また、多複素LMSアルゴリズム(MLMS)の場合、これらの学習規則を定式化する。このアプローチは古典的実数と複素LMSアルゴリズムの両方を拡張する。

We study and introduce new gradient operators in the complex and bicomplex settings, inspired from the well-known Least Mean Square (LMS) algorithm invented in 1960 by Widrow and Hoff for Adaptive Linear Neuron (ADALINE). These gradient operators will be used to formulate new learning rules for the Bicomplex Least Mean Square (BLMS) algorithms and we will also formulate these learning rules will for the case of multicomplex LMS algorithms (MLMS). This approach extends both the classical real and complex LMS algorithms.

翻訳日:2023-12-25 19:05:27 公開日:2023-12-22

# NELLIE: グラウンドド、コンポジション、説明可能な推論のためのニューロシンボリック推論エンジン

NELLIE: A Neuro-Symbolic Inference Engine for Grounded, Compositional, and Explainable Reasoning ( http://arxiv.org/abs/2209.07662v4 )

ライセンス: Link先を確認

Nathaniel Weir, Peter Clark, and Benjamin Van Durme

(参考訳) 我々のゴールは,nlコーパスに根拠のある人間の解釈可能な証明木によって回答が支持される体系的推論を通じて,疑問に答える現代的なアプローチである。このようなシステムは、現代のlmsによる解釈可能性と幻覚の課題の緩和と、現在の説明方法(例えば、連鎖的思考)の根拠の欠如に役立つ。本稿では,手作りのルールを,ニューラルネットワークのモデリング,誘導生成,半パラメトリックな高密度検索の組み合わせに置き換える,prologに基づく推論エンジンの新たなアプローチを提案する。我々の実装であるNELLIEは、テキストから既知の事実を解説する以前の研究を超えて、包括木証明探索として完全に解釈可能でエンドツーエンドの接地されたQAを示す最初のシステムである。実験では、NELLIEは知識に基づく説明をしながら、同様の大きさの最先端の推論器(Tafjord et al., 2022)より優れています。また、NELLIEは半構造化テキストコーパスとNLテキストコーパスの両方を利用して推論を導くことができる。これらを組み合わせることで、現代のニューラルメソッドと伝統的なシンボリック推論の両方の利点を共同で享受する新しい方法が示唆される。

Our goal is a modern approach to answering questions via systematic reasoning where answers are supported by human interpretable proof trees grounded in an NL corpus of authoritative facts. Such a system would help alleviate the challenges of interpretability and hallucination with modern LMs, and the lack of grounding of current explanation methods (e.g., Chain-of-Thought). This paper proposes a new take on Prolog-based inference engines, where we replace handcrafted rules with a combination of neural language modeling, guided generation, and semiparametric dense retrieval. Our implementation, NELLIE, is the first system to demonstrate fully interpretable, end-to-end grounded QA as entailment tree proof search, going beyond earlier work explaining known-to-be-true facts from text. In experiments, NELLIE outperforms a similar-sized state-of-the-art reasoner [Tafjord et al., 2022] while producing knowledge-grounded explanations. We also find NELLIE can exploit both semi-structured and NL text corpora to guide reasoning. Together these suggest a new way to jointly reap the benefits of both modern neural methods and traditional symbolic reasoning.

翻訳日:2023-12-25 19:05:16 公開日:2023-12-22

# 部分的ラベル学習のためのメタ客観指導型曖昧さ解消

Meta Objective Guided Disambiguation for Partial Label Learning ( http://arxiv.org/abs/2208.12459v2 )

ライセンス: Link先を確認

Bo-Shi Zou, Ming-Kun Xie, Sheng-Jun Huang

(参考訳) 部分ラベル学習(pll)は典型的な弱い教師付き学習フレームワークであり、各トレーニングインスタンスは候補ラベルセットに関連付けられ、1つのラベルのみが有効である。 PLL問題を解決するには、訓練データの構造情報や自己学習方式でモデル出力を精査するといった事前知識を用いて、候補集合の曖昧さを解こうとする手法が一般的である。残念なことに、これらの手法は、モデルトレーニングの初期段階において、事前情報や信頼できない予測が欠如しているため、望ましい性能を得ることができないことが多い。本稿では,小さな検証セット上でのメタ目的を解いて,候補ラベルから基底ラベルを回収することを目的とした,メタ目的導出不曖昧化(mogd)を用いた部分ラベル学習のための新しい枠組みを提案する。具体的には、偽陽性ラベルの悪影響を軽減するため、バリデーションセットのメタ損失に基づいて各候補ラベルを再強調する。そして、重み付きクロスエントロピー損失を最小化して分類器を訓練する。提案手法は,通常のsgdオプティマイザを用いた各種深層ネットワークを用いて容易に実装できる。理論的には,メタ目的の収束特性を証明し,提案手法の推定誤差境界を導出する。様々なベンチマークデータセットと実世界のPLLデータセットに対する大規模な実験により、提案手法は最先端の手法と比較して有能な性能が得られることを示した。

Partial label learning (PLL) is a typical weakly supervised learning framework, where each training instance is associated with a candidate label set, among which only one label is valid. To solve PLL problems, typically methods try to perform disambiguation for candidate sets by either using prior knowledge, such as structure information of training data, or refining model outputs in a self-training manner. Unfortunately, these methods often fail to obtain a favorable performance due to the lack of prior information or unreliable predictions in the early stage of model training. In this paper, we propose a novel framework for partial label learning with meta objective guided disambiguation (MoGD), which aims to recover the ground-truth label from candidate labels set by solving a meta objective on a small validation set. Specifically, to alleviate the negative impact of false positive labels, we re-weight each candidate label based on the meta loss on the validation set. Then, the classifier is trained by minimizing the weighted cross entropy loss. The proposed method can be easily implemented by using various deep networks with the ordinary SGD optimizer. Theoretically, we prove the convergence property of meta objective and derive the estimation error bounds of the proposed method. Extensive experiments on various benchmark datasets and real-world PLL datasets demonstrate that the proposed method can achieve competent performance when compared with the state-of-the-art methods.

翻訳日:2023-12-25 19:04:53 公開日:2023-12-22

# 複雑相互作用下での拡散に基づくマルチヒューマンモーション生成

InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions ( http://arxiv.org/abs/2304.05684v2 )

ライセンス: Link先を確認

Han Liang, Wenqian Zhang, Wenxuan Li, Jingyi Yu, Lan Xu

(参考訳) 最近、現実的な人間の動きを生み出すための拡散の進歩が著しく進んでいる。しかし、それらは多人の相互作用をほとんど無視している。本稿では,人間同士のインタラクションを移動拡散プロセスに組み込んだ効果的な拡散に基づくアプローチであるInterGenを提案する。まず、InterHumanというマルチモーダルデータセットをコントリビュートする。様々な2人インタラクションのための約107万フレームで構成され、正確な骨格運動と23,337の自然言語記述がある。アルゴリズム側では、動き拡散モデルを2人のインタラクション設定に注意深く調整します。相互作用中の人間のアイデンティティの対称性を扱うために,重みを明示的に共有する2つの共用変圧器と,これら2つの重み付けプロセスをさらに接続するための相互注意機構を提案する。次に,世界フレームにおける2人の演奏者間の世界関係を明示的に定式化したインタラクション拡散モデルにおいて,新たな動き入力表現を提案する。さらに, 相互作用拡散モデルの学習中に対応する減衰スキームを備える空間関係を符号化する2つの新しい正規化項を導入する。大規模な実験はInterGenの有効性と一般化性を検証する。特に、従来の方法よりも多様で説得力のある2人の動作を生成し、人間のインタラクションに様々な下流の応用を可能にする。

We have recently seen tremendous progress in diffusion advances for generating realistic human motions. Yet, they largely disregard the multi-human interactions. In this paper, we present InterGen, an effective diffusion-based approach that incorporates human-to-human interactions into the motion diffusion process, which enables layman users to customize high-quality two-person interaction motions, with only text guidance. We first contribute a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 23,337 natural language descriptions. For the algorithm side, we carefully tailor the motion diffusion model to our two-person interaction setting. To handle the symmetry of human identities during interactions, we propose two cooperative transformer-based denoisers that explicitly share weights, with a mutual attention mechanism to further connect the two denoising processes. Then, we propose a novel representation for motion input in our interaction diffusion model, which explicitly formulates the global relations between the two performers in the world frame. We further introduce two novel regularization terms to encode spatial relations, equipped with a corresponding damping scheme during the training of our interaction diffusion model. Extensive experiments validate the effectiveness and generalizability of InterGen. Notably, it can generate more diverse and compelling two-person motions than previous methods and enables various downstream applications for human interactions.

翻訳日:2023-12-25 18:57:40 公開日:2023-12-22

# 拡散橋の混合輸送, schr\"odinger bridge問題と生成モデル

Diffusion Bridge Mixture Transports, Schr\"odinger Bridge Problems and Generative Modeling ( http://arxiv.org/abs/2304.00917v2 )

ライセンス: Link先を確認

Stefano Peluchetti

(参考訳) 動的schr\"odinger bridge問題(英語版)は、2つの目標確率測度間の移動を定義する確率過程を求め、クルバック・リーバーの発散の観点から最接近の基準を最適に満たしている。本稿では,動的schr\"odinger bridge問題を解くために,新しいサンプリングベース反復アルゴリズムである反復拡散橋混合法(idbm)を提案する。 IDBM手順は、各イテレーションにおける目標確率測度間の有効な輸送を実現するという魅力的な性質を示す。我々はIDBM手順に関する最初の理論的研究を行い、その収束特性を確立した。理論的結果は,IDBM法の競争性能を示す数値実験によって補完される。生成モデリングの最近の進歩は、拡散過程の時間反転を用いて、単純な分布をデータ分布に大まかに輸送する生成過程を定義する。代替案として, idbm 手続きの最初のイテレーションを, このトランスポートを実現する近似フリー手法として利用することを提案する。このアプローチは、生成過程のダイナミクスを選択する際の柔軟性を向上し、より大きな離散化間隔よりも加速されたトレーニングと優れたサンプル品質を示す。実装面では、必要な修正は最小限の侵入的であり、トレーニング損失の定義に限定される。

The dynamic Schr\"odinger bridge problem seeks a stochastic process that defines a transport between two target probability measures, while optimally satisfying the criteria of being closest, in terms of Kullback-Leibler divergence, to a reference process. We propose a novel sampling-based iterative algorithm, the iterated diffusion bridge mixture (IDBM) procedure, aimed at solving the dynamic Schr\"odinger bridge problem. The IDBM procedure exhibits the attractive property of realizing a valid transport between the target probability measures at each iteration. We perform an initial theoretical investigation of the IDBM procedure, establishing its convergence properties. The theoretical findings are complemented by numerical experiments illustrating the competitive performance of the IDBM procedure. Recent advancements in generative modeling employ the time-reversal of a diffusion process to define a generative process that approximately transports a simple distribution to the data distribution. As an alternative, we propose utilizing the first iteration of the IDBM procedure as an approximation-free method for realizing this transport. This approach offers greater flexibility in selecting the generative process dynamics and exhibits accelerated training and superior sample quality over larger discretization intervals. In terms of implementation, the necessary modifications are minimally intrusive, being limited to the training loss definition.

翻訳日:2023-12-25 18:56:36 公開日:2023-12-22

# ChatGPTは良いキーワード生成器か? 予備的研究

Is ChatGPT A Good Keyphrase Generator? A Preliminary Study ( http://arxiv.org/abs/2303.13001v3 )

ライセンス: Link先を確認

Mingyang Song, Haiyun Jiang, Shuming Shi, Songfang Yao, Shilong Lu, Yi Feng, Huafeng Liu, Liping Jing

(参考訳) ChatGPTの出現は、最近、計算言語学コミュニティから大きな注目を集めている。キーフレーズ生成器としての機能を実証するために,キーフレーズ生成タスクにおけるchatgptの予備評価を行う。我々は,キーフレーズ生成プロンプト,キーフレーズ生成多様性,長い文書理解など,様々な面でその性能を評価する。評価は6つのベンチマークデータセットに基づいており、OpenAIが提案するプロンプトを6つの候補プロンプトに拡張しながら採用しています。 chatgptは6つの候補プロンプトすべてにおいて非常によく機能しており、データセット全体では小さなパフォーマンスの違いが観察されている。以上の結果から,chatgptはキーフレーズ生成に大きな可能性があると結論づけた。さらに,チャットgptではキーフレーズの欠落が問題となっていることも判明した。一方,最終節では,本報告の限界と今後の拡張についても紹介する。

The emergence of ChatGPT has recently garnered significant attention from the computational linguistics community. To demonstrate its capabilities as a keyphrase generator, we conduct a preliminary evaluation of ChatGPT for the keyphrase generation task. We evaluate its performance in various aspects, including keyphrase generation prompts, keyphrase generation diversity, and long document understanding. Our evaluation is based on six benchmark datasets, and we adopt the prompt suggested by OpenAI while extending it to six candidate prompts. We find that ChatGPT performs exceptionally well on all six candidate prompts, with minor performance differences observed across the datasets. Based on our findings, we conclude that ChatGPT has great potential for keyphrase generation. Moreover, we discover that ChatGPT still faces challenges when it comes to generating absent keyphrases. Meanwhile, in the final section, we also present some limitations and future expansions of this report.

翻訳日:2023-12-25 18:56:14 公開日:2023-12-22

# 量子鍵分布系保護のための光パワーリミッタのセキュリティ境界

Security boundaries of an optical power limiter for protecting quantum key distribution systems ( http://arxiv.org/abs/2303.12355v3 )

ライセンス: Link先を確認

Qingquan Peng, Binwu Gao, Konstantin Zaitsev, Dongyang Wang, Jiangfang Ding, Yingwen Liu, Qin Liao, Ying Guo, Anqi Huang and Junjie Wu

(参考訳) 無認可光注入は、量子鍵分布(QKD)システムの実用的セキュリティにとって、常に重要な脅威である。熱・光デフォーカス効果に基づく光パワーリミッタ (opl) を提案し, 注入されたハッキング光を制限した。ハードウェア対策として、様々な光注入攻撃によるOPLの性能試験を行い、広く展開される前にセキュリティ境界を明らかにする。量子暗号におけるOPLのセキュリティ境界を調べるために、連続波(例えば)光注入攻撃の下でのOPLの挙動を総合的に検証し分析し、パルスの繰り返し率を0.5$-$\hertz$,$40$-$\mega\hertz$,$1$-$\giga\hertz$とするパルス照明攻撃を行う。テスト結果は、OPLのセキュリティ境界を照らし、ユースケースでOPLを適切に利用することを可能にする。ここで提案する試験と解析の方法論は,QKDシステムにおける他のパワーリミテーションコンポーネントに適用可能である。

Unauthorized light injection has always been a vital threat to the practical security of a quantum key distribution (QKD) system. An optical power limiter (OPL) based on the thermo-optical defocusing effect has been proposed and implemented, limiting the injected hacking light. As a hardware countermeasure, the performance of the OPL under various light-injection attacks shall be tested to clarify the security boundary before being widely deployed. To investigate the OPL's security boundary in quantum cryptography, we comprehensively test and analyse the behavior of OPL under continuous-wave (c.w.) light-injection attacks and pulse illumination attacks with pulses' repetition rate at $0.5$-$\hertz$, $40$-$\mega\hertz$, and $1$-$\giga\hertz$. The testing results illuminate the security boundary of the OPL, which allows one to properly employ the OPL in the use cases. The methodology of testing and analysis proposed here is applicable to other power-limitation components in a QKD system.

翻訳日:2023-12-25 18:55:33 公開日:2023-12-22

# マルチエージェント強化学習による量的市場における取引戦略の最適化

Optimizing Trading Strategies in Quantitative Markets using Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2303.11959v2 )

ライセンス: Link先を確認

Hengxi Zhang, Zhendong Shi, Yuanquan Hu, Wenbo Ding, Ercan E. Kuruoglu, Xiao-Ping Zhang

(参考訳) 量的市場は、迅速なダイナミクスと豊富な不確実性によって特徴づけられ、利益主導の株式取引行動の追求は本質的に困難である。この文脈の中では、最適制御のための報酬中心のメカニズムで機能する強化学習(RL)が、提示される複雑な金融意思決定の難問に対する潜在的に効果的な解決策として浮上している。本論文は、固定比率ポートフォリオ保険(CPPI)と時間不変ポートフォリオ保護(TIPP)の2つの確立された金融トレーディング戦略と、マルチエージェントディープ決定主義政策勾配(MADDPG)フレームワークの融合について述べる。その結果、量的市場における戦略的取引の探索に適した2つの新しいマルチエージェントRL(MARL)手法、CPPI-MADDPGとTIPP-MADDPGを導入した。これらのイノベーションを検証するため、我々は100のリアルマーケット株を多種多様に選別して実装した。実証実験の結果,CPPI-MADDPGとTIPP-MADDPGの戦略は従来よりも一貫して優れており,定量取引の分野での有効性が確認された。

Quantitative markets are characterized by swift dynamics and abundant uncertainties, making the pursuit of profit-driven stock trading actions inherently challenging. Within this context, reinforcement learning (RL), which operates on a reward-centric mechanism for optimal control, has surfaced as a potentially effective solution to the intricate financial decision-making conundrums presented. This paper delves into the fusion of two established financial trading strategies, namely the constant proportion portfolio insurance (CPPI) and the time-invariant portfolio protection (TIPP), with the multi-agent deep deterministic policy gradient (MADDPG) framework. As a result, we introduce two novel multi-agent RL (MARL) methods, CPPI-MADDPG and TIPP-MADDPG, tailored for probing strategic trading within quantitative markets. To validate these innovations, we implemented them on a diverse selection of 100 real-market shares. Our empirical findings reveal that the CPPI-MADDPG and TIPP-MADDPG strategies consistently outpace their traditional counterparts, affirming their efficacy in the realm of quantitative trading.

翻訳日:2023-12-25 18:55:01 公開日:2023-12-22

# SPSysML:シミュレーション物理システムの定量的評価のためのメタモデル

SPSysML: A meta-model for quantitative evaluation of Simulation-Physical Systems ( http://arxiv.org/abs/2303.09565v3 )

ライセンス: Link先を確認

Wojciech Dudek, Narcis Miguel, Tomasz Winiarski

(参考訳) ロボットシステムは、複数のセンサーとエフェクターを備えた複雑なサイバー物理システム(CPS)である。最近のシミュレーション手法は、Digital Twin(DT)の概念の実現を可能にする。しかし、ロボットシステム開発におけるDTの雇用、例えば開発内テストは不明確である。システム開発の間、その部品は模擬モックアップから実際のハードウェアにデプロイされたソフトウェアを実行する物理部品へと進化する。したがって、シミュレーション部品と物理部品の整合性を確保するための設計ツールとフレキシブルな開発手順が必要である。我々は,CPSのシミュレーションと物理部品の統合を,様々な設定で最大化することを目的としている。統合性の向上、物理部分(ハードウェアとソフトウェア)のシミュレーションベースのテストカバレッジの向上。本稿では、SPSysML(Simulation-Physical System Modeling Language)と呼ばれるシステムモデリング言語(SysML)に基づくドメイン仕様言語(DSL)を提案する。 SPSysMLは、シミュレーション・物理システム(SPSys)の分類を定義し、少なくとも物理的またはシミュレートされた部分からなるCPSである。特に、シミュレーションされたものはDTである。本稿では,SPSys のシミュレーション・物理的整合性を最大化できる SPSys 開発手法を提案する。 SPSysDPはINCAREプロジェクトのための複雑なロボットシステムの開発に使用されている。その後のSPSysDPでは、システムのシミュレーションと物理の整合性が最大化される。結果として、システムモデルは少ないコンポーネントで構成され、システムコンポーネントの大部分は、さまざまなシステムセットアップ間で共有される。本稿では,ロボットオペレーティング・システム(ROS)とガゼボシミュレータを用いて,システムの実装とテストを行う。 SPSysDPを使用したSPSysMLは、SPSys(DTとCPSを含む)の設計を可能にし、シミュレーションと物理部品間の最大整合性を特徴とするマルチセットアップシステムの開発を可能にする。

Robotic systems are complex cyber-physical systems (CPS) commonly equipped with multiple sensors and effectors. Recent simulation methods enable the Digital Twin (DT) concept realisation. However, DT employment in robotic system development, e.g. in-development testing, is unclear. During the system development, its parts evolve from simulated mockups to physical parts which run software deployed on the actual hardware. Therefore, a design tool and a flexible development procedure ensuring the integrity of the simulated and physical parts are required. We aim to maximise the integration between a CPS's simulated and physical parts in various setups. The better integration, the better simulation-based testing coverage of the physical part (hardware and software). We propose a Domain Specification Language (DSL) based on Systems Modeling Language (SysML) that we refer to as SPSysML (Simulation-Physical System Modeling Language). SPSysML defines the taxonomy of a Simulation-Physical System (SPSys), being a CPS consisting of at least a physical or simulated part. In particular, the simulated ones can be DTs. We propose a SPSys Development Procedure (SPSysDP) that enables the maximisation of the simulation-physical integrity of SPSys by evaluating the proposed factors. SPSysDP is used to develop a complex robotic system for the INCARE project. In subsequent iterations of SPSysDP, the simulation-physical integrity of the system is maximised. As a result, the system model consists of fewer components, and a greater fraction of the system components are shared between various system setups. We implement and test the system with popular frameworks, Robot Operating System (ROS) and Gazebo simulator. SPSysML with SPSysDP enables the design of SPSys (including DT and CPS), multi-setup system development featuring maximised integrity between simulation and physical parts in its setups.

翻訳日:2023-12-25 18:54:31 公開日:2023-12-22

# 畳み込み型クロスビューポーズ推定

Convolutional Cross-View Pose Estimation ( http://arxiv.org/abs/2303.05915v3 )

ライセンス: Link先を確認

Zimin Xia, Olaf Booij, and Julian F. P. Kooij

(参考訳) 本稿では,新しい視点間ポーズ推定手法を提案する。クェリのローカルエリアをカバーする地上レベルのクェリ画像と空中画像が与えられた場合、クェリの3デグリー・オブ・フリーダムカメラのポーズは、その画像ディスクリプタと、その空中画像内のローカル領域のディスクリプタとのマッチングにより推定される。方向認識ディスクリプタは、変換同値な畳み込み畳み込み基底画像エンコーダとコントラスト学習とを用いて得られる。ローカライズデコーダは、新しいローカライズマッチングアップサンプリングモジュールと共に、粗〜微妙な方法で高密度確率分布を生成する。より小さなオリエンテーションデコーダは、ローカライゼーションに向き推定を条件付けるベクトル場を生成する。提案手法は,VIGORとKITTIのデータセットで検証され,最先端のベースラインを72%,中央値のローカライゼーション誤差が36%の精度で上回っている。予測確率分布は局所的曖昧性を表すことができ、誤った予測を拒否することができる。再トレーニングを行わなければ、異なる視野を持つ地上画像を推論し、利用可能であればオリエンテーション優先を利用することができる。オックスフォード・ロボットカーデータセットでは,1m以下で中央位置推定誤差を,14fpsで1度前後で中央方向誤差を算出し,経時的に ego-vehicle の姿勢を確実に推定する。

We propose a novel end-to-end method for cross-view pose estimation. Given a ground-level query image and an aerial image that covers the query's local neighborhood, the 3 Degrees-of-Freedom camera pose of the query is estimated by matching its image descriptor to descriptors of local regions within the aerial image. The orientation-aware descriptors are obtained by using a translationally equivariant convolutional ground image encoder and contrastive learning. The Localization Decoder produces a dense probability distribution in a coarse-to-fine manner with a novel Localization Matching Upsampling module. A smaller Orientation Decoder produces a vector field to condition the orientation estimate on the localization. Our method is validated on the VIGOR and KITTI datasets, where it surpasses the state-of-the-art baseline by 72% and 36% in median localization error for comparable orientation estimation accuracy. The predicted probability distribution can represent localization ambiguity, and enables rejecting possible erroneous predictions. Without re-training, the model can infer on ground images with different field of views and utilize orientation priors if available. On the Oxford RobotCar dataset, our method can reliably estimate the ego-vehicle's pose over time, achieving a median localization error under 1 meter and a median orientation error of around 1 degree at 14 FPS.

翻訳日:2023-12-25 18:54:03 公開日:2023-12-22

# 任意形状の位相物体の光学パラメータ推定精度の量子限界

Quantum limits for the precision of optical parameter estimation of arbitrarily shaped phase objects ( http://arxiv.org/abs/2302.14504v2 )

ライセンス: Link先を確認

Arturo Villegas, Marcello H. M. Passos, Silvania F. Pereira, Juan P. Torres

(参考訳) 位相対象を特徴付けるパラメータの集合である光・物質相互作用過程によって決定される最善の精度を最適な精度で推定する方法を提案する。この方法はpezzeらによって提唱された[phys. rev. lett. 119, 130504 (2017)]アイデアに由来する。我々のゴールは、この方法の主な特徴と物理学コミュニティへの応用を照らすことであり、量子推定理論に関する研究で通常使われる形式的な量子言語には馴染みがないだろう。まず、位相オブジェクトを特徴付けるパラメータの集合を推定するための精度境界を導出する。我々は、平均光子数 N の多重モードコヒーレント状態と、多重モード単一光子量子状態の N コピーの2つの実験的な種類の照明に対して、Cr\`amer-Rao の下界を計算する。この2つのモデルがどのような条件で等価かを示す。第2に, 物体から反射・透過された光を, 空間形状を工夫したモード群に投影することにより, 最適精度が得られることを示す。これらのモードの構築方法を説明し、これらの測定値を用いた推定精度が最適であることを示す。例えば, ナノファブリケーション技術の評価のために, 半導体産業に関連する物体である崖状ナノ構造の高さと側壁角度の推定にこれらの結果を適用する。

We show a general method to estimate with optimum precision, i.e., the best precision determined by the light-matter interaction process, a set of parameters that characterize a phase object. The method derives from ideas presented by Pezze et al., [Phys. Rev. Lett. 119, 130504 (2017)]. Our goal is to illuminate the main characteristics of this method as well as its applications to the physics community, probably not familiar with the formal quantum language usually employed in works related to quantum estimation theory. First, we derive precision bounds for the estimation of the set of parameters characterizing the phase object. We compute the Cr\`amer-Rao lower bound for two experimentally relevant types of illumination: a multimode coherent state with mean photon number N, and N copies of a multimode single-photon quantum state. We show under which conditions these two models are equivalent. Second, we show that the optimum precision can be achieved by projecting the light reflected/transmitted from the object onto a set of modes with engineered spatial shape. We describe how to construct these modes, and demonstrate explicitly that the precision of the estimation using these measurements is optimum. As example, we apply these results to the estimation of the height and sidewall angle of a cliff-like nanostructure, an object relevant in semiconductor industry for the evaluation of nanofabrication techniques.

翻訳日:2023-12-25 18:53:17 公開日:2023-12-22

# 位置依存有効質量を持つ半圧高調波振動子モデルのウィグナー関数

The Wigner function of a semiconfined harmonic oscillator model with a position-dependent effective mass ( http://arxiv.org/abs/2302.12673v5 )

ライセンス: Link先を確認

S.M. Nagiyev, A.M. Jafarova and E.I. Jafarov

(参考訳) 量子調和振動子モデルにおけるウィグナー関数の観点から位相空間表現の概念を提案する。新しい手法は、そのような半収束量子系に対して正確にウィグナー分布関数を計算するために用いられる。この方法は、量子分布関数の定義における積分の発散を抑制し、半圧振動子モデルの定常状態に対する解析式を計算させる。この量子系では、適用された外部同族体の存在と不在の両方が研究されている。得られたウィグナー分布関数の正確な表現は、第一種およびラゲール多項式のベッセル関数を介して表現される。さらに、特殊ケースや制限についても詳細に論じている。

We propose a phase-space representation concept in terms of the Wigner function for a quantum harmonic oscillator model that exhibits the semiconfinement effect through its mass varying with the position. The new method is used to compute the Wigner distribution function exactly for such a semiconfinement quantum system. This method suppresses the divergence of the integrand in the definition of the quantum distribution function and leads to the computation of its analytical expressions for the stationary states of the semiconfined oscillator model. For this quantum system, both the presence and absence of the applied external homogenous field are studied. Obtained exact expressions of the Wigner distribution function are expressed through the Bessel function of the first kind and Laguerre polynomials. Furthermore, some of the special cases and limits are discussed in detail.

翻訳日:2023-12-25 18:52:53 公開日:2023-12-22

# 持続可能なオンデマンドライドプールの価格設定とマッチング

Future Aware Pricing and Matching for Sustainable On-demand Ride Pooling ( http://arxiv.org/abs/2302.10510v3 )

ライセンス: Link先を確認

Xianjie Zhang and Pradeep Varakantham and Hao Jiang

(参考訳) オンデマンドのライドプーリングの人気は、顧客(低価格)、タクシードライバー(高い収入)、環境(少ない車両によるカーボンフットプリント)、そしてuberのような集約企業(高い収入)に提供される利点がある。これらの利点を達成するには、2つの重要な相互リンク課題を効果的に解決する必要がある。 (a)価格 --タクシーの顧客要求に価格を設定すること (b)マッチング -- タクシー・車への顧客(価格を受け入れた)の割り当て。伝統的に、これら2つの課題は、将来の要求に対する現在のマッチングの影響を考慮せずに、個別に研究され、(現在の要求のみを考慮して)妙明なアプローチを用いている。本稿では,価格とマッチングの問題を取り扱うとともに,価格とマッチング決定の今後の影響も考慮しながら,新たな枠組みを提案する。実世界のタクシーデータセットにおける実験結果では、固定収入の取得に必要な車両数(最大14%、平均10.6%)と、車両の走行距離(最大11.1%、平均3.7%)を削減し、持続的に収益(平均17%、平均6.4%)を大幅に改善できることを実証した。つまり、顧客、ドライバー、アグリゲータ(ライドプール会社)に対して高い収益を得ると同時に、環境(道路上の車両の数が少なく、燃料消費も少ないため)に適している、すべての利害関係者(顧客、ドライバー、アグリゲータ、環境)に理想的なウィンウィンシナリオを提供することができるのです。

The popularity of on-demand ride pooling is owing to the benefits offered to customers (lower prices), taxi drivers (higher revenue), environment (lower carbon footprint due to fewer vehicles) and aggregation companies like Uber (higher revenue). To achieve these benefits, two key interlinked challenges have to be solved effectively: (a) pricing -- setting prices to customer requests for taxis; and (b) matching -- assignment of customers (that accepted the prices) to taxis/cars. Traditionally, both these challenges have been studied individually and using myopic approaches (considering only current requests), without considering the impact of current matching on addressing future requests. In this paper, we develop a novel framework that handles the pricing and matching problems together, while also considering the future impact of the pricing and matching decisions. In our experimental results on a real-world taxi dataset, we demonstrate that our framework can significantly improve revenue (up to 17% and on average 6.4%) in a sustainable manner by reducing the number of vehicles (up to 14% and on average 10.6%) required to obtain a given fixed revenue and the overall distance travelled by vehicles (up to 11.1% and on average 3.7%). That is to say, we are able to provide an ideal win-win scenario for all stakeholders (customers, drivers, aggregator, environment) involved by obtaining higher revenue for customers, drivers, aggregator (ride pooling company) while being good for the environment (due to fewer number of vehicles on the road and lesser fuel consumed).

翻訳日:2023-12-25 18:52:42 公開日:2023-12-22

# フレームワーク税:NLP研究と展開における推論効率の相違

The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment ( http://arxiv.org/abs/2302.06117v2 )

ライセンス: Link先を確認

Jared Fernandez, Jacob Kahn, Clara Na, Yonatan Bisk, Emma Strubell

(参考訳) NLPシステムの計算効率の向上は、効率的なモデルアーキテクチャの設計と基盤となるハードウェアアクセラレータの改善を動機付けている。しかし、計算スループットの向上と浮動小数点演算の削減は、直接ウォールクロックの推論遅延の改善に寄与していない。これらの差異は、ディープラーニングフレームワークがもたらしたボトルネックが大きな原因であることを実証する。我々は、この現象を \textit{framework tax} と表現し、ハードウェアの速度が時間とともに増加するにつれて差が大きくなることを観察する。本稿では,モデル設計決定,フレームワークパラダイム,ハードウェアプラットフォームが全体のモデル遅延に与える影響を分析する一連のケーススタディを通して,この現象を考察する。コードはhttps://github.com/JaredFern/Framework-Tax.comで入手できる。

Increased focus on the computational efficiency of NLP systems has motivated the design of efficient model architectures and improvements to underlying hardware accelerators. However, the resulting increases in computational throughput and reductions in floating point operations have not directly translated to improvements in wall-clock inference latency. We demonstrate that these discrepancies can be largely attributed to bottlenecks introduced by deep learning frameworks. We denote this phenomenon as the \textit{framework tax}, and observe that the disparity is growing as hardware speed increases over time. In this work, we examine this phenomenon through a series of case studies analyzing the effects of model design decisions, framework paradigms, and hardware platforms on total model latency. Code is available at https://github.com/JaredFern/Framework-Tax.

翻訳日:2023-12-25 18:52:13 公開日:2023-12-22

# 量子格子モデルにおけるニューラルネットワークによる手話規則学習の原理

Principle of learning sign rules by neural networks in qubit lattice models ( http://arxiv.org/abs/2302.02523v3 )

ライセンス: Link先を確認

Jin Cao, Shijie Hu, Zhiping Yin, and Ke Xia

(参考訳) ニューラルネットワークは、人間の直感を超えた隠された法則を発見できる強力なツールだ。しかし、複雑な非線形構造のため、しばしばブラックボックスとして現れる。 gutzwiller平均場理論を参考にすることで、キュービット格子モデルにおける順序状態の符号規則の原理を示すことができる。これらの符号規則を示すために、単一の隠れニューロンを持つ浅いフィードフォワードニューラルネットワークを導入する。一般化Ising, spin-1/2$XY, (フラストレーション)Heisenberg環, トーラス上の三角形XY反強磁性体, 任意の充填でFermi-Hubbard環など,様々なモデルで系統的なベンチマークを行う。これらのベンチマークは、すべての先行符号規則特性がピッチ角などの古典的な形式で可視化可能であることを示している。さらに、量子揺らぎは不完全な精度を定量的に得ることができる。

A neural network is a powerful tool that can uncover hidden laws beyond human intuition. However, it often appears as a black box due to its complicated nonlinear structures. By drawing upon the Gutzwiller mean-field theory, we can showcase a principle of sign rules for ordered states in qubit lattice models. We introduce a shallow feed-forward neural network with a single hidden neuron to present these sign rules. We conduct systematical benchmarks in various models, including the generalized Ising, spin-$1/2$ XY, (frustrated) Heisenberg rings, triangular XY antiferromagnet on a torus, and the Fermi-Hubbard ring at an arbitrary filling. These benchmarks show that all the leading-order sign rule characteristics can be visualized in classical forms, such as pitch angles. Besides, quantum fluctuations can result in an imperfect accuracy rate quantitatively.

翻訳日:2023-12-25 18:52:02 公開日:2023-12-22

# 設計によるIT/OT統合

IT/OT Integration by Design ( http://arxiv.org/abs/2305.19735v2 )

ライセンス: Link先を確認

Georg Sch\"afer, Hannes Waclawek, Sarah Riedmann, Christoph Binder, Christian Neureiter and Stefan Huber

(参考訳) 情報透明性、技術援助、相互接続、分散化決定の4つの設計原則は、産業システムに情報技術(IT)と運用技術(OT)を統合する際の課題を提起している。これらの異なるソリューションには矛盾する要件があり、システムと組織の両方でインターフェースが問題になる。 ITとOTの領域の仲介役として機能するIBPT(Industrial Business Process Twin)エンティティは、この状況を克服するために必要なIT/OTインターフェースの量を効果的に削減するために、以前の研究で提案されている。本研究では,設計段階におけるこのアプローチの効果について検討する。システム設計における IT と OT コンポーネント間のインターフェースを排除することによって,組織内の通信チャネルの競合を排除している,と我々は主張する。議論を検証するため、産業4.0の4つの重要な産業4.0設計原則に対処する産業4.0シナリオを用いて、参照アーキテクチャモデルインダストリー4.0(RAMI4.0)に従ってIBPT概念のモデルを開発する。結果は、IBPTアプローチがシステム設計フェーズにおいて潜在的に競合するIT/OTインターフェースを排除していることを示している。

The four Industry 4.0 design principles information transparency, technical assistance, interconnection, and decentralized decisions pose challenges in integrating information technology (IT) and operational technology (OT) solutions in industrial systems. These different solutions have conflicting requirements, making interfaces between them problematic for both systems and organizations. An Industrial Business Process Twin (IBPT) entity, acting as an intermediary between the realms of IT and OT, has been proposed in a previous work, to effectively reduce the amount of required IT/OT interfaces in an attempt of overcoming this situation. In this work, we investigate the effects of this approach during the design phase. We argue that, by eliminating interfaces between IT and OT components in the system design, this approach is therefore eliminating conflicting communication channels within the organization's communication structure. In order to verify our argument, we develop a model of our IBPT concept according to the Reference Architecture Model Industrie 4.0 (RAMI4.0) using an Industry 4.0 scenario addressing the four essential Industry 4.0 design principles. Results show that the IBPT approach indeed eliminates potentially conflicting IT/OT interfaces during the system design phase.

翻訳日:2023-12-25 18:46:23 公開日:2023-12-22

# 教師なしメロディ-歌詞生成

Unsupervised Melody-to-Lyric Generation ( http://arxiv.org/abs/2305.19228v2 )

ライセンス: Link先を確認

Yufei Tian, Anjali Narayan-Chen, Shereen Oraby, Alessandra Cervone, Gunnar Sigurdsson, Chenyang Tao, Wenbo Zhao, Yiwen Chen, Tagyoung Chung, Jing Huang, Nanyun Peng

(参考訳) メロディと歌詞の自動生成は、与えられたメロディと共に歌詞を生成するタスクである。音楽が歌詞に追加の制約を課すため、これは、制約のない歌詞生成よりも重要な実践的関心と挑戦である。ほとんどの楽曲は著作権を侵害されるため、トレーニングデータは制限され、メロディと歌詞の複雑な相互モーダル関係に不適合なモデルとなる。本研究では,任意のメロディ・歌詞データを訓練することなく高品質な歌詞を生成する手法を提案する。具体的には、まず歌の輪郭を生成し、次に完全な歌詞を生成する階層的歌詞生成フレームワークを設計する。このフレームワークは、(純粋にテキストに基づく)トレーニングを推論(メロディ誘導テキスト生成)から切り離すことで、並列データの不足を回避する。我々はメロディと歌詞のセグメンテーションとリズムアライメントを活用し、そのメロディを推論中の指示としてデコード制約にコンパイルする。 2段階の階層デザインは、共同曲作成を民主化するための非常に望ましい機能である、歌詞概要によるコンテンツ制御を可能にする。実験結果から,本モデルは,例えば,並列データセットを用いたSOTAモデルであるSongMASSや,人間の評価に基づく全体的な品質改善率の24%といった,強靭なベースラインよりもオントピー的,歌声的,知的な,一貫性のある高品質な歌詞を生成することができることがわかった。

Automatic melody-to-lyric generation is a task in which song lyrics are generated to go with a given melody. It is of significant practical interest and more challenging than unconstrained lyric generation as the music imposes additional constraints onto the lyrics. The training data is limited as most songs are copyrighted, resulting in models that underfit the complicated cross-modal relationship between melody and lyrics. In this work, we propose a method for generating high-quality lyrics without training on any aligned melody-lyric data. Specifically, we design a hierarchical lyric generation framework that first generates a song outline and second the complete lyrics. The framework enables disentanglement of training (based purely on text) from inference (melody-guided text generation) to circumvent the shortage of parallel data. We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints as guidance during inference. The two-step hierarchical design also enables content control via the lyric outline, a much-desired feature for democratizing collaborative song creation. Experimental results show that our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines, for example SongMASS, a SOTA model trained on a parallel dataset, with a 24% relative overall quality improvement based on human ratings.

翻訳日:2023-12-25 18:46:02 公開日:2023-12-22

# 条件不変意味セグメンテーション

Condition-Invariant Semantic Segmentation ( http://arxiv.org/abs/2305.17349v2 )

ライセンス: Link先を確認

Christos Sakaridis, David Bruggemann, Fisher Yu, Luc Van Gool

(参考訳) セマンティクスセグメンテーションネットワークの異なる視覚条件への適応は、自律走行車やロボットのロバストな知覚に不可欠である。しかし、従来の研究は、ほとんどの特徴レベル適応法は、敵対的トレーニングを採用し、合成から現実への適応で検証されているが、条件レベル適応では限界ゲインを与え、スタイリゼーションによる単純なピクセルレベル適応により性能が向上することを示した。これらの結果から,ネットワークのエンコーダが抽出した内部ネットワーク特徴と,各入力画像のスタイリングビューとを新たな特徴分散損失に整合させることにより,特徴レベルの適応を行う上でのスタイル化を活用することを提案する。このようにして、エンコーダは入力のスタイルに不変な特徴を抽出することを奨励し、デコーダはこれらの特徴を解析することに集中でき、入力の特定のスタイルからさらに抽象化することができない。本研究では,現状のドメイン適応アーキテクチャに基づいて条件不変セマンティックセマンティックセマンティックセマンティックシグメンテーション (CISS) という手法を実装し,条件レベル適応の優れた結果を得る。特に、CISSは、人気の高い昼から夜までのCityscapes$\to$Dark Zurichベンチマークで、アートの新たな状態を設定している。さらに,本手法は,通常の都市景観$\to$ACDCベンチマークにおける2番目に高い性能を実現する。 CISSはBDD100K-nightのようなトレーニング中に見つからない領域によく一般化している。コードはhttps://github.com/SysCV/CISSで公開されている。

Adaptation of semantic segmentation networks to different visual conditions is vital for robust perception in autonomous cars and robots. However, previous work has shown that most feature-level adaptation methods, which employ adversarial training and are validated on synthetic-to-real adaptation, provide marginal gains in condition-level adaptation, being outperformed by simple pixel-level adaptation via stylization. Motivated by these findings, we propose to leverage stylization in performing feature-level adaptation by aligning the internal network features extracted by the encoder of the network from the original and the stylized view of each input image with a novel feature invariance loss. In this way, we encourage the encoder to extract features that are already invariant to the style of the input, allowing the decoder to focus on parsing these features and not on further abstracting from the specific style of the input. We implement our method, named Condition-Invariant Semantic Segmentation (CISS), on the current state-of-the-art domain adaptation architecture and achieve outstanding results on condition-level adaptation. In particular, CISS sets the new state of the art in the popular daytime-to-nighttime Cityscapes$\to$Dark Zurich benchmark. Furthermore, our method achieves the second-best performance on the normal-to-adverse Cityscapes$\to$ACDC benchmark. CISS is shown to generalize well to domains unseen during training, such as BDD100K-night. Code is publicly available at https://github.com/SysCV/CISS .

翻訳日:2023-12-25 18:45:36 公開日:2023-12-22

# 変圧器ニューラルプロセスを用いたエンドツーエンドメタベイズ最適化

End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes ( http://arxiv.org/abs/2305.15930v4 )

ライセンス: Link先を確認

Alexandre Maraval, Matthieu Zimmer, Antoine Grosnit, Haitham Bou Ammar

(参考訳) Meta-Bayesian optimization (Meta-BO)は、関連するタスクからのデータを活用することで、ベイズ最適化のサンプル効率を改善することを目的としている。従来の手法はサロゲートモデルまたは獲得関数を独立にメタ学習することに成功したが、両コンポーネントの共同トレーニングは依然としてオープンな課題である。本稿では、トランスフォーマーアーキテクチャを介して獲得関数を学ぶために、神経過程を一般化する最初のエンドツーエンドの微分可能メタボフレームワークを提案する。強化学習(rl)を用いたこのエンドツーエンドフレームワークにより,ラベル付き取得データの欠如に対処できる。初期の段階では、特に報酬が不足している場合、RLでスクラッチからトランスフォーマーベースのニューラルプロセスのトレーニングが困難であることに気付きました。この主張を,報奨信号として広く用いられている後悔の概念が,軌道長の対数間隔パターンを示すことを示す組合せ解析で定式化した。この問題に対処するため,アーキテクチャの一部を指導し,帰納的バイアスとして有効な確率モデルを学習する補助的なタスクでRLの目的を増強する。提案手法は, 標準的なハイパーパラメータ最適化タスクの実験において, 様々なベースラインに対して, 最先端の後悔結果を達成するとともに, 混合整数プログラミングチューニング, 抗体設計, 電子設計自動化のための論理合成の現実的問題において, 他よりも優れていることを示す。

Meta-Bayesian optimisation (meta-BO) aims to improve the sample efficiency of Bayesian optimisation by leveraging data from related tasks. While previous methods successfully meta-learn either a surrogate model or an acquisition function independently, joint training of both components remains an open challenge. This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data. Early on, we notice that training transformer-based neural processes from scratch with RL is challenging due to insufficient supervision, especially when rewards are sparse. We formalise this claim with a combinatorial analysis showing that the widely used notion of regret as a reward signal exhibits a logarithmic sparsity pattern in trajectory lengths. To tackle this problem, we augment the RL objective with an auxiliary task that guides part of the architecture to learn a valid probabilistic model as an inductive bias. We demonstrate that our method achieves state-of-the-art regret results against various baselines in experiments on standard hyperparameter optimisation tasks and also outperforms others in the real-world problems of mixed-integer programming tuning, antibody design, and logic synthesis for electronic design automation.

翻訳日:2023-12-25 18:45:12 公開日:2023-12-22

# ウエハスケールMgB2超電導デバイス

Wafer-Scale MgB2 Superconducting Devices ( http://arxiv.org/abs/2305.15190v2 )

ライセンス: Link先を確認

Changsub Kim, Christina Bell, Jake Evans, Jonathan Greenfield, Emma Batson, Karl Berggren, Nathan Lewis, Daniel Cunnane

(参考訳) 過去10年間の超伝導デバイスと検出器技術の進歩は、量子コンピュータ、遠赤外線望遠鏡用検出器、光通信における実用的な応用を実現している。しかし、超伝導薄膜材料は依然としてほとんど変化がなく、アルミニウムは超伝導量子ビットの材料であり、ニオブ化合物は高周波・高速度インダクタンスデバイスである。ジホリドマグネシウム (\mathrm{mgb}_2$) は、金属超伝導体の中で最も高い遷移温度 (\mathrm{t}_c$ = 39 k) で知られており、高温で高周波数の超伝導デバイスがthz周波数に向かって移動するための有効な材料である。しかし、ウェハスケール薄膜の合成の難しさは超伝導エレクトロニクスの応用基盤への$\mathrm{MgB}_2$デバイスの導入を妨げている。本稿では,直径100mm以上の超スムース(<0.5 nm)と均一な$\mathrm{mgb}_2$薄膜(<100 nm)を初めて報告し,これらフィルムを用いて作製した試作装置において,4.5 kで$\mathrm{10}^4$,40 nmフィルムで10sqのph/sqの順に高波長の動力学的インダクタンスを有する内部超伝導特性を示す。この画期的な進歩は、高温、高周波超伝導量子回路およびデバイスの開発を可能にする。

Progress in superconducting device and detector technologies over the past decade have realized practical applications in quantum computers, detectors for far-infrared telescopes, and optical communications. Superconducting thin film materials, however, have remained largely unchanged, with aluminum still being the material of choice for superconducting qubits, and niobium compounds for high frequency/high kinetic inductance devices. Magnesium diboride ($\mathrm{MgB}_2$), known for its highest transition temperature ($\mathrm{T}_c$ = 39 K) among metallic superconductors, is a viable material for elevated temperature and higher frequency superconducting devices moving towards THz frequencies. However, difficulty in synthesizing wafer-scale thin films have prevented implementation of $\mathrm{MgB}_2$ devices into the application base of superconducting electronics. Here, we report ultra-smooth (< 0.5 nm root-mean-square roughness) and uniform $\mathrm{MgB}_2$ thin (< 100 nm) films over 100 mm in diameter for the first time and present prototype devices fabricated with these films demonstrating key superconducting properties including internal quality factor over $\mathrm{10}^4$ at 4.5 K and high tunable kinetic inductance in the order of tens of pH/sq in a 40 nm film. This groundbreaking advancement will enable development of elevated temperature, high frequency superconducting quantum circuits and devices.

翻訳日:2023-12-25 18:44:44 公開日:2023-12-22

# In-Context Probing:大規模言語モデルによるロバスト分類器の構築に向けて

In-Context Probing: Toward Building Robust Classifiers via Probing Large Language Models ( http://arxiv.org/abs/2305.14171v3 )

ライセンス: Link先を確認

Afra Amini and Massimiliano Ciaramita

(参考訳) 大きな言語モデルは、新しいタスクをコンテキストで学習することができ、命令といくつかの注釈付きの例が提供されている。しかし、文脈内学習の有効性は提供されたコンテキストに依存しており、下流タスクのパフォーマンスは命令によって大きく異なる可能性がある。重要なのは、このようなコンテキストへの依存が予測不能な方法で現れる可能性があることだ。本稿では, In-Context Probing (ICP) という代替手法を提案する。文脈内学習と同様に、入力の表現を命令でコンテキスト化するが、出力予測をデコードする代わりに、文脈化された表現を探索してラベルを予測する。多様な分類タスクの一連の実験を通して、文脈内探索は命令の変化に対してはるかに堅牢であることを示す。さらに、ICPは微調整よりも優れた性能を示し、より小さなモデルの上に分類器を構築するのに特に役立ち、訓練例は100に満たない。

Large language models are able to learn new tasks in context, where they are provided with instructions and a few annotated examples. However, the effectiveness of in-context learning is dependent on the provided context, and the performance on a downstream task can vary considerably, depending on the instruction. Importantly, such dependency on the context can surface in unpredictable ways, e.g., a seemingly more informative instruction might lead to a worse performance. In this paper, we propose an alternative approach, which we term In-Context Probing (ICP). Similar to in-context learning, we contextualize the representation of the input with an instruction, but instead of decoding the output prediction, we probe the contextualized representation to predict the label. Through a series of experiments on a diverse set of classification tasks, we show that in-context probing is significantly more robust to changes in instructions. We further show that ICP performs competitive or superior to finetuning and can be particularly helpful to build classifiers on top of smaller models, with less than a hundred training examples.

翻訳日:2023-12-25 18:44:04 公開日:2023-12-22

# 異なるランダム性をもつスパースランダム行列とガウスアンサンブル

Sparse random matrices and Gaussian ensembles with varying randomness ( http://arxiv.org/abs/2305.07505v2 )

ライセンス: Link先を確認

Takanori Anegawa, Norihiro Iizuka, Arkaprava Mukherjee, Sunil Kumar Sake, Sandip P. Trivedi

(参考訳) ガウス分布からのカップリング定数を様々な方法で描画して得られるランダムハミルトニアンと n$ qubits の系について検討した。この結果、GUEと固定$q$SYK理論を含む豊富な系のクラスが得られる。私たちのモチベーションは、システムを大体$N$で理解することにあります。実際、我々の計算のほとんどは、正確な対角化技術を用いて行われる(最大$N=24$)。 gue から始めて,ランダム性が低下するにつれて生じる行動について検討する。一般に、ランダム性が低下するにつれて、システムはカオスからより順序づけられるようになるが、状態の密度、スペクトル形状係数、レベル統計、時間外相関器などの様々な特性の変化は興味深いパターンを明らかにする。主に数値的な解析の限界について、ハミルトニアンにおける非ゼロ独立項の数が指数関数的に$N$であるときに、その振る舞いが突然に変化するという証拠がいくつか見つかる。また,sykモデルの局所バージョンでは,結合数をn$で線形にスケールし,その挙動を特徴付けるような非ランダム性の逆極限についても検討した。我々の調査は、このタイプのシステムのより完全な理論解析は、かなり価値があることを示唆している。

We study a system of $N$ qubits with a random Hamiltonian obtained by drawing coupling constants from Gaussian distributions in various ways. This results in a rich class of systems which include the GUE and the fixed $q$ SYK theories. Our motivation is to understand the system at large $N$. In practice most of our calculations are carried out using exact diagonalisation techniques (up to $N=24$). Starting with the GUE, we study the resulting behaviour as the randomness is decreased. While in general the system goes from being chaotic to being more ordered as the randomness is decreased, the changes in various properties, including the density of states, the spectral form factor, the level statistics and out-of-time-ordered correlators, reveal interesting patterns. Subject to the limitations of our analysis which is mainly numerical, we find some evidence that the behaviour changes in an abrupt manner when the number of non-zero independent terms in the Hamiltonian is exponentially large in $N$. We also study the opposite limit of much reduced randomness obtained in a local version of the SYK model where the number of couplings scales linearly in $N$, and characterise its behaviour. Our investigation suggests that a more complete theoretical analysis of this class of systems will prove quite worthwhile.

翻訳日:2023-12-25 18:43:46 公開日:2023-12-22

# 大規模言語モデルによるスピアフィッシング

Spear Phishing With Large Language Models ( http://arxiv.org/abs/2305.06972v3 )

ライセンス: Link先を確認

Julian Hazell

(参考訳) 人工知能(AI)の最近の進歩、特に大規模言語モデル(LLM)の領域は、強力で汎用的なデュアルユースシステムを生み出している。この知能は、様々な有益なタスクに向けられるが、害を引き起こすためにも使用できる。本研究は,標的を操り,機密情報を漏洩させるサイバー犯罪の一種であるスピアフィッシングに対して,llmがいかに利用できるかを調べることで,そのような害を探求する。まず,LLMが槍フィッシング攻撃の偵察およびメッセージ生成を補助する能力について検討し,その上で,槍フィッシング攻撃の電子メール生成フェーズを支援できることを見出した。次に、OpenAIのGPT-3.5およびGPT-4モデルを使用して、600人以上の英国議会議員に対して、LLMのスピアフィッシングキャンペーンの規模を拡大する可能性を探るため、ユニークなスピアフィッシングメッセージを作成しました。私の調査結果は、これらのメッセージが現実的なだけでなく、コスト効率も高く、それぞれのメールが生成するのにわずか1セントしかかからないことを示しています。次に、基本的なプロンプトエンジニアリングがllmsにインストールされたセーフガードを回避し、モデルの誤用を防ぐロバストな介入に関するさらなる研究の必要性を強調する。これらの進化するリスクにさらに対処するために、アプリケーションプログラミングインタフェースのような構造化アクセススキームとLLMベースの防御システムという2つの潜在的なソリューションを検討します。

Recent progress in artificial intelligence (AI), particularly in the domain of large language models (LLMs), has resulted in powerful and versatile dual-use systems. This intelligence can be put towards a wide variety of beneficial tasks, yet it can also be used to cause harm. This study explores one such harm by examining how LLMs can be used for spear phishing, a form of cybercrime that involves manipulating targets into divulging sensitive information. I first explore LLMs' ability to assist with the reconnaissance and message generation stages of a spear phishing attack, where I find that LLMs are capable of assisting with the email generation phase of a spear phishing attack. To explore how LLMs could potentially be harnessed to scale spear phishing campaigns, I then create unique spear phishing messages for over 600 British Members of Parliament using OpenAI's GPT-3.5 and GPT-4 models. My findings provide some evidence that these messages are not only realistic but also cost-effective, with each email costing only a fraction of a cent to generate. Next, I demonstrate how basic prompt engineering can circumvent safeguards installed in LLMs, highlighting the need for further research into robust interventions that can help prevent models from being misused. To further address these evolving risks, I explore two potential solutions: structured access schemes, such as application programming interfaces, and LLM-based defensive systems.

翻訳日:2023-12-25 18:43:24 公開日:2023-12-22

# ランダムlpノルム劣化を伴う画像分類器の破壊ロバスト性の検討

Investigating the Corruption Robustness of Image Classifiers with Random Lp-norm Corruptions ( http://arxiv.org/abs/2305.05400v3 )

ライセンス: Link先を確認

Georg Siedel, Weijia Shao, Silvia Vock, Andrey Morozov

(参考訳) 堅牢性は、安全性と信頼性を達成するために必要な機械学習分類器の基本特性である。画像分類器の対向ロバストネスの分野では、ロバストネスはp-ノルム距離内の全ての入力変化に対するモデルの安定性として定義される。しかしながら、ランダムな腐敗の堅牢性の分野では、現実世界で観測される変動が使われ、p-ノルムの腐敗はめったに考慮されない。本研究では,画像分類器のトレーニングとテストデータを強化するために,ランダムなpノルム腐敗の利用を検討する。既視的ランダムpノルム破壊に対するモデルロバスト性を評価し,新しいロバストネス指標を提案する。 p-ノルム間のロバスト性伝達とモデルがp-ノルム崩壊を訓練し評価すべき結論を導出するかどうかを実証的に検討する。 p-ノルムの汚職の組み合わせによるトレーニングデータの増大は、最先端のデータ増補スキームにおいても、汚職の堅牢性を大幅に向上させる。

Robustness is a fundamental property of machine learning classifiers required to achieve safety and reliability. In the field of adversarial robustness of image classifiers, robustness is commonly defined as the stability of a model to all input changes within a p-norm distance. However, in the field of random corruption robustness, variations observed in the real world are used, while p-norm corruptions are rarely considered. This study investigates the use of random p-norm corruptions to augment the training and test data of image classifiers. We evaluate the model robustness against imperceptible random p-norm corruptions and propose a novel robustness metric. We empirically investigate whether robustness transfers across different p-norms and derive conclusions on which p-norm corruptions a model should be trained and evaluated. We find that training data augmentation with a combination of p-norm corruptions significantly improves corruption robustness, even on top of state-of-the-art data augmentation schemes.

翻訳日:2023-12-25 18:42:59 公開日:2023-12-22

# プロトタイプベース多段階学習による半教師付きドメイン適応

Semi-supervised Domain Adaptation via Prototype-based Multi-level Learning ( http://arxiv.org/abs/2305.02693v3 )

ライセンス: Link先を確認

Xinyang Huang, Chuang Zhu and Wenkai Chen

(参考訳) 半教師付きドメイン適応(ssda)では、各クラスのラベル付きターゲットサンプルが、モデルが完全なラベル付きソースドメインからターゲットドメインへの知識表現の転送を支援する。既存の多くのメソッドは、ラベル付きターゲットサンプルをマルチレベルから完全に利用する利点を無視している。この追加データをよりよく活用するために,ラベル付き対象サンプルの可能性をうまく活用するためのプロトタイプベース多段階学習(ProML)フレームワークを提案する。ドメイン内適応を実現するために,まず,ドメイン内最適移動に基づく擬似ラベルアグリゲーションを導入し,ラベルなしのターゲットサンプルとプロトタイプの特徴分布をモデル化する。ドメイン間レベルでは、モデルがドメイン間知識転送のターゲットプロトタイプを使用するのを助けるために、クロスドメインアライメントロスを提案する。さらに,プロトタイプ類似性と線形分類器に基づく2重一貫性を提案し,バッチレベルでのコンパクトな特徴表現の識別学習を促進する。 DomainNet, VisDA2017, Office-Homeの3つのデータセットに対する大規模な実験により,提案手法がSSDAの最先端性能を実現することを示す。

In semi-supervised domain adaptation (SSDA), a few labeled target samples of each class help the model to transfer knowledge representation from the fully labeled source domain to the target domain. Many existing methods ignore the benefits of making full use of the labeled target samples from multi-level. To make better use of this additional data, we propose a novel Prototype-based Multi-level Learning (ProML) framework to better tap the potential of labeled target samples. To achieve intra-domain adaptation, we first introduce a pseudo-label aggregation based on the intra-domain optimal transport to help the model align the feature distribution of unlabeled target samples and the prototype. At the inter-domain level, we propose a cross-domain alignment loss to help the model use the target prototype for cross-domain knowledge transfer. We further propose a dual consistency based on prototype similarity and linear classifier to promote discriminative learning of compact target feature representation at the batch level. Extensive experiments on three datasets, including DomainNet, VisDA2017, and Office-Home demonstrate that our proposed method achieves state-of-the-art performance in SSDA.

翻訳日:2023-12-25 18:42:40 公開日:2023-12-22

# FlightBERT++: 自動回帰型マルチ水平飛行軌道予測フレームワーク

FlightBERT++: A Non-autoregressive Multi-Horizon Flight Trajectory Prediction Framework ( http://arxiv.org/abs/2305.01658v2 )

ライセンス: Link先を確認

Dongyue Guo, Zheng Zhang, Zhen Yan, Jianwei Zhang, and Yi Lin

(参考訳) フライト軌道予測(ftp)は、航空管制における重要なタスクであり、航空管制官がより安全かつ効率的に航空空間を管理するのを助ける。既存のアプローチは、通常、自動回帰的にマルチ水平FTPタスクを実行するため、エラーの蓄積や低効率の問題に悩まされる。本稿では,FlightBERT++と呼ばれる新しいフレームワークを提案する。一自己回帰的でない方法で直接マルチホライゾン飛行軌道を予測すること。 ii) FlightBERTにおけるバイナリエンコーディング(BE)表現の制限を改善すること。特に、FlightBERT++は、エンコーダ-デコーダアーキテクチャによって実装され、エンコーダは歴史的観測から時間空間パターンを学習し、デコーダは将来の地平線の飛行状態を予測する。従来のアーキテクチャと比較して,事前の地平線情報を考慮するために,革新的な地平線認識コンテキスト生成器が設計されている。さらに、差分列の定常性を利用して、差分予測の能力を高めるために、差分誘導復号器を提案する。実世界のデータセット実験の結果、FlightBERT++はFTP性能と計算効率の両面で競合するベースラインを上回った。

Flight Trajectory Prediction (FTP) is an essential task in Air Traffic Control (ATC), which can assist air traffic controllers in managing airspace more safely and efficiently. Existing approaches generally perform multi-horizon FTP tasks in an autoregressive manner, thereby suffering from error accumulation and low-efficiency problems. In this paper, a novel framework, called FlightBERT++, is proposed to i) forecast multi-horizon flight trajectories directly in a non-autoregressive way, and ii) improve the limitation of the binary encoding (BE) representation in the FlightBERT. Specifically, the FlightBERT++ is implemented by a generalized encoder-decoder architecture, in which the encoder learns the temporal-spatial patterns from historical observations and the decoder predicts the flight status for the future horizons. Compared with conventional architecture, an innovative horizon-aware contexts generator is dedicatedly designed to consider the prior horizon information, which further enables non-autoregressive multi-horizon prediction. Moreover, a differential prompted decoder is proposed to enhance the capability of the differential predictions by leveraging the stationarity of the differential sequence. The experimental results on a real-world dataset demonstrated that the FlightBERT++ outperformed the competitive baselines in both FTP performance and computational efficiency.

翻訳日:2023-12-25 18:42:19 公開日:2023-12-22

# 屈曲軟導波路の束縛状態

Bound States in Bent Soft Waveguides ( http://arxiv.org/abs/2304.14776v2 )

ライセンス: Link先を確認

Pavel Exner and Semjon Vugalter

(参考訳) 本論文の目的は,固定プロファイルの'ditch'形式のポテンシャルを持つ2次元schr\"odinger演算子が幾何学的に誘起される離散スペクトルを持つことを示すことである。さらに、強い幾何学的制約の下では、この主張はチャネルの「バンク」の1つに潜在的なバイアスが存在する場合にも真である。

The aim of this paper is to show that a two-dimensional Schr\"odinger operator with the potential in the form of a `ditch' of a fixed profile can have a geometrically induced discrete spectrum; this happens if such a potential channel has a single or multiple bends being straight outside a compact. Moreover, under stronger geometric restrictions the claim remains true in the presence of a potential bias at one of the channel `banks'.

翻訳日:2023-12-25 18:41:54 公開日:2023-12-22

# AutoNeRF: 自律エージェントによる暗黙のシーン表現のトレーニング

AutoNeRF: Training Implicit Scene Representations with Autonomous Agents ( http://arxiv.org/abs/2304.11241v2 )

ライセンス: Link先を確認

Pierre Marza, Laetitia Matignon, Olivier Simonin, Dhruv Batra, Christian Wolf, Devendra Singh Chaplot

(参考訳) ニューラルレージアンス場(NeRF)のような入射表現は、新規なビュー合成に非常に有効であることが示されている。しかし、これらのモデルは通常、トレーニングのために手動で注意深い人的データ収集を必要とする。本稿では,自律型エンボディエージェントを用いたNeRF訓練に必要なデータ収集手法であるAutoNeRFを提案する。本手法では,エージェントが未知の環境を効率的に探索し,その経験を用いて暗黙の地図表現を自律的に構築できる。我々は,手作りのフロンティア探索や,訓練された高レベルプランナーと古典的な低レベルパスフォロワーからなるエンドツーエンドおよびモジュラーアプローチなど,さまざまな探索戦略の影響を比較した。我々は,この問題に適応した異なる報酬関数を持つこれらのモデルを訓練し,古典的視点レンダリング,地図再構成,計画,ポーズリファインメントという4つの下流タスクにおける学習表現の品質を評価する。実験結果から,nerfsは未発見の環境において1回の体験のみを使用して,アクティブに収集されたデータに対してトレーニングすることが可能であり,いくつかの下流ロボットタスクに使用できること,モジュール型学習された探索モデルは,他の古典的およびエンドツーエンドのベースラインよりも優れることが示された。最後に,AutoNeRFは大規模シーンの再構成が可能であり,生成した3D環境モデルをシミュレータにロードし,興味のあるポリシーを微調整できるため,シーン固有の適応を行う上で有用なツールであることを示す。

Implicit representations such as Neural Radiance Fields (NeRF) have been shown to be very effective at novel view synthesis. However, these models typically require manual and careful human data collection for training. In this paper, we present AutoNeRF, a method to collect data required to train NeRFs using autonomous embodied agents. Our method allows an agent to explore an unseen environment efficiently and use the experience to build an implicit map representation autonomously. We compare the impact of different exploration strategies including handcrafted frontier-based exploration, end-to-end and modular approaches composed of trained high-level planners and classical low-level path followers. We train these models with different reward functions tailored to this problem and evaluate the quality of the learned representations on four different downstream tasks: classical viewpoint rendering, map reconstruction, planning, and pose refinement. Empirical results show that NeRFs can be trained on actively collected data using just a single episode of experience in an unseen environment, and can be used for several downstream robotic tasks, and that modular trained exploration models outperform other classical and end-to-end baselines. Finally, we show that AutoNeRF can reconstruct large-scale scenes, and is thus a useful tool to perform scene-specific adaptation as the produced 3D environment models can be loaded into a simulator to fine-tune a policy of interest.

翻訳日:2023-12-25 18:41:24 公開日:2023-12-22

# ランダム回路サンプリングにおける位相遷移

Phase transition in Random Circuit Sampling ( http://arxiv.org/abs/2304.11119v2 )

ライセンス: Link先を確認

A. Morvan, B. Villalonga, X. Mi, S. Mandr\`a, A. Bengtsson, P. V. Klimov, Z. Chen, S. Hong, C. Erickson, I. K. Drozdov, J. Chau, G. Laun, R. Movassagh, A. Asfaw, L. T.A.N. Brand\~ao, R. Peralta, D. Abanin, R. Acharya, R. Allen, T. I. Andersen, K. Anderson, M. Ansmann, F. Arute, K. Arya, J. Atalaya, J. C. Bardin, A. Bilmes, G. Bortoli, A. Bourassa, J. Bovaird, L. Brill, M. Broughton, B. B. Buckley, D. A. Buell, T. Burger, B. Burkett, N. Bushnell, J. Campero, H. S. Chang, B. Chiaro, D. Chik, C. Chou, J. Cogan, R. Collins, P. Conner, W. Courtney, A. L. Crook, B. Curtin, D. M. Debroy, A. Del Toro Barba, S. Demura, A. Di Paolo, A. Dunsworth, L. Faoro, E. Farhi, R. Fatemi, V. S. Ferreira, L. Flores Burgos, E. Forati, A. G. Fowler, B. Foxen, G. Garcia, E. Genois, W. Giang, C. Gidney, D. Gilboa, M. Giustina, R. Gosula, A. Grajales Dau, J. A. Gross, S. Habegger, M. C. Hamilton, M. Hansen, M. P. Harrigan, S. D. Harrington, P. Heu, M. R. Hoffmann, T. Huang, A. Huff, W. J. Huggins, L. B. Ioffe, S. V. Isakov, J. Iveland, E. Jeffrey, Z. Jiang, C. Jones, P. Juhas, D. Kafri, T. Khattar, M. Khezri, M. Kieferov\'a, S. Kim, A. Kitaev, A. R. Klots, A. N. Korotkov, F. Kostritsa, J. M. Kreikebaum, D. Landhuis, P. Laptev, K.-M. Lau, L. Laws, J. Lee, K. W. Lee, Y. D. Lensky, B. J. Lester, A. T. Lill, W. Liu, W. P. Livingston, A. Locharla, F. D. Malone, O. Martin, S. Martin, J. R. McClean, M. McEwen, K. C. Miao, A. Mieszala, S. Montazeri, W. Mruczkiewicz, O. Naaman, M. Neeley, C. Neill, A. Nersisyan, M. Newman, J. H. Ng, A. Nguyen, M. Nguyen, M. Yuezhen Niu, T. E. O'Brien, S. Omonije, A. Opremcak, A. Petukhov, R. Potter, L. P. Pryadko, C. Quintana, D. M. Rhodes, E. Rosenberg, C. Rocque, P. Roushan, N. C. Rubin, N. Saei, D. Sank, K. Sankaragomathi, K. J. Satzinger, H. F. Schurkus, C. Schuster, M. J. Shearn, A. Shorter, N. Shutty, V. Shvarts, V. Sivak, J. Skruzny, W. C. Smith, R. D. Somma, G. Sterling, D. Strain, M. Szalay, D. Thor, A. Torres, G. Vidal, C. Vollgraff Heidweiller, T. White, B. W. K. Woo, C. Xing, Z. J. Yao, P. Yeh, J. Yoo, G. Young, A. Zalcman, Y. Zhang, N. Zhu, N. Zobrist, E. G. Rieffel, R. Biswas, R. Babbush, D. Bacon, J. Hilton, E. Lucero, H. Neven, A. Megrant, J. Kelly, I. Aleiner, V. Smelyanskiy, K. Kechedzhi, Y. Chen, S. Boixo

(参考訳) 周囲環境への望ましくない結合は、量子プロセッサ上の長距離相関を破壊し、名目上利用可能な計算空間におけるコヒーレント進化を妨げる。この非コヒーレントノイズは、短期量子プロセッサの計算能力を完全に活用する際、顕著な課題である。ランダム回路サンプリング (RCS) とクロスエントロピーベンチマーク (XEB) のベンチマークにより、ヒルベルト空間の有効サイズを確実に推定できることが示されている。雑音の存在が与えられた量子アルゴリズムの出力を自明にできる程度、すなわち古典的計算によってスポアブル化できる程度は、解き放たれた問題である。ここでは、RCSアルゴリズムの実装により、XEBで観測可能な2つの相転移が存在することを実験的に実証し、統計的モデルを用いて理論的に説明する。 1つ目はサイクルの数の関数としての動的遷移であり、無騒音の場合の反集中点の継続である。 2つ目は1サイクルあたりの誤差によって制御される量子相転移であり、解析的および実験的に識別するために、ノイズの強さとコヒーレントな進化を両立させる弱いリンクモデルを作成する。さらに, 67キュービットのRCS実験を32サイクルで行うことにより, 従来のスーパーコンピュータの計算コストが, ノイズの存在を考慮に入れた場合でも, 従来のスーパーコンピュータの能力を超えることを示した。我々の実験的および理論的研究は、現在の量子プロセッサで到達可能な安定な計算複雑相への遷移の存在を確立する。

Undesired coupling to the surrounding environment destroys long-range correlations on quantum processors and hinders the coherent evolution in the nominally available computational space. This incoherent noise is an outstanding challenge to fully leverage the computation power of near-term quantum processors. It has been shown that benchmarking Random Circuit Sampling (RCS) with Cross-Entropy Benchmarking (XEB) can provide a reliable estimate of the effective size of the Hilbert space coherently available. The extent to which the presence of noise can trivialize the outputs of a given quantum algorithm, i.e. making it spoofable by a classical computation, is an unanswered question. Here, by implementing an RCS algorithm we demonstrate experimentally that there are two phase transitions observable with XEB, which we explain theoretically with a statistical model. The first is a dynamical transition as a function of the number of cycles and is the continuation of the anti-concentration point in the noiseless case. The second is a quantum phase transition controlled by the error per cycle; to identify it analytically and experimentally, we create a weak link model which allows varying the strength of noise versus coherent evolution. Furthermore, by presenting an RCS experiment with 67 qubits at 32 cycles, we demonstrate that the computational cost of our experiment is beyond the capabilities of existing classical supercomputers, even when accounting for the inevitable presence of noise. Our experimental and theoretical work establishes the existence of transitions to a stable computationally complex phase that is reachable with current quantum processors.

翻訳日:2023-12-25 18:41:00 公開日:2023-12-22

# 開放シュウィンガー模型のリウビリアンダイナミクス:熱媒質における弦破断と運動散逸

Liouvillian Dynamics of the Open Schwinger Model: String Breaking and Kinetic Dissipation in a Thermal Medium ( http://arxiv.org/abs/2308.03878v3 )

ライセンス: Link先を確認

Kyle Lee, James Mulligan, Felix Ringer and Xiaojun Yao

(参考訳) 境界状態形成のダイナミクスを理解することは、量子色力学(qcd)のような量子場理論を閉じ込める基本的な問題の1つである。最初にフェルミオンと反フェルミオンをつなぐ弦の破断が大きな注目を集めたハドロン化機構の1つである。シュウィンガーモデルのようなより単純で低次元のモデルでリアルタイムの弦破れ力学の理解を深めることにより、凝縮物質や統計システムで見られるQCDやその他の凝縮系におけるハドロン化過程の理解を深めることができる。本稿では,シュウィンガーモデルにおける弦破壊のダイナミクスを考察し,熱媒質中での修正を考察し,シュウィンガーモデルを熱環境に結合した開量子系として扱う。システムと環境の間の弱い結合の仕組みの中で、システムのリアルタイムな進化はリンドブラッド進化方程式によって説明できる。このリンドブラッド方程式のリウヴィリアンギャップとシステムのフォン・ノイマンエントロピーの時間依存性を解析した。環境相関時間の増加に伴い, 後期緩和速度は低下する。さらに、環境相関長が無限であるとき、系は2つの定常状態を示し、各々のチャージ共役パリティ(cp)量子数を持つセクタに1つずつを示す。初期弦が真空で壊れるパラメータ状態に対しては, 運動的消散効果により, 媒体内の弦破壊の遅れが観察される。逆に、真空時間進化において初期弦がそのまま残る状態においては、熱媒体内の弦の破れ(融解)が観察される。さらに,オープンシュウィンガーモデルのリウビリアンダイナミクスを量子コンピュータ上でシミュレートし,関連するトロッター誤差を推定する方法についても検討した。

Understanding the dynamics of bound state formation is one of the fundamental questions in confining quantum field theories such as Quantum Chromodynamics (QCD). One hadronization mechanism that has garnered significant attention is the breaking of a string initially connecting a fermion and an anti-fermion. Deepening our understanding of real-time string-breaking dynamics with simpler, lower dimensional models like the Schwinger model can improve our understanding of the hadronization process in QCD and other confining systems found in condensed matter and statistical systems. In this paper, we consider the string-breaking dynamics within the Schwinger model and investigate its modification inside a thermal medium, treating the Schwinger model as an open quantum system coupled to a thermal environment. Within the regime of weak coupling between the system and environment, the real-time evolution of the system can be described by a Lindblad evolution equation. We analyze the Liouvillian gaps of this Lindblad equation and the time dependence of the system's von Neumann entropy. We observe that the late-time relaxation rate decreases as the environment correlation length increases. Moreover, when the environment correlation length is infinite, the system exhibits two steady states, one in each of the sectors with definite charge-conjugation-parity (CP) quantum numbers. For parameter regimes where an initial string breaks in vacuum, we observe a delay of the string breaking in the medium, due to kinetic dissipation effects. Conversely, in regimes where an initial string remains intact in vacuum time evolution, we observe string breaking (melting) in the thermal medium. We further discuss how the Liouvillian dynamics of the open Schwinger model can be simulated on quantum computers and provide an estimate of the associated Trotter errors.

翻訳日:2023-12-25 18:35:22 公開日:2023-12-22

# unival: 画像、ビデオ、オーディオ、言語タスクのための統一モデル

UnIVAL: Unified Model for Image, Video, Audio and Language Tasks ( http://arxiv.org/abs/2307.16184v2 )

ライセンス: Link先を確認

Mustafa Shukor, Corentin Dancette, Alexandre Rame, Matthieu Cord

(参考訳) 大規模言語モデル(LLM)は、汎用エージェントの野心的な探求を幻想からかなり遠ざかっている。このような一般的なモデルを構築する上で重要なハードルは、タスクとモダリティの多様性と多様性である。有望な解決策は統一であり、一つの統一フレームワーク内で多数のタスクとモダリティをサポートすることができる。大規模なデータセットで訓練されたFlamingo (Alayrac et al., 2022)のような大規模なモデルはほとんど2つのモダリティをサポートできないが、現在の小型モデルと中規模モデルはまだ2つのモダリティに制限されている。すべてのモダリティを効率的にサポートする統一モデルを構築することは可能ですか? そこで我々は,この野心的な目標に向けての一歩として,UnIVALを提案する。データセットのサイズや数十億のパラメータを持つモデルに頼ることなく、0.55bのパラメータユニバルモデルは2つのモダリティを超えて、テキスト、イメージ、ビデオ、オーディオを1つのモデルに統合します。我々のモデルはタスクバランスとマルチモーダルカリキュラム学習に基づいて,多くのタスクで効率的に事前学習される。 UnIVALは、画像およびビデオテキストタスク間で、既存の最先端アプローチと競合するパフォーマンスを示す。画像とビデオテキストのモダリティから学んだ特徴表現は、オーディオに事前学習されていないにもかかわらず、オーディオテキストタスクで微調整された場合、モデルが競合性能を達成することができる。統一モデルにより,異なるマルチモーダルタスクで訓練されたモデルの重み補間によるマルチモーダルモデルマージに関する新しい研究を提案し,その効果を分散一般化に示している。最後に,タスク間の相乗効果を示すことによって,統合の動機付けを行う。モデルウェイトとコードは以下にリリースされている。

Large Language Models (LLMs) have made the ambitious quest for generalist agents significantly far from being a fantasy. A key hurdle for building such general models is the diversity and heterogeneity of tasks and modalities. A promising solution is unification, allowing the support of a myriad of tasks and modalities within one unified framework. While few large models (e.g., Flamingo (Alayrac et al., 2022), trained on massive datasets, can support more than two modalities, current small to mid-scale unified models are still limited to 2 modalities, usually image-text or video-text. The question that we ask is: is it possible to build efficiently a unified model that can support all modalities? To answer this, we propose UnIVAL, a step further towards this ambitious goal. Without relying on fancy datasets sizes or models with billions of parameters, the ~ 0.25B parameter UnIVAL model goes beyond two modalities and unifies text, images, video, and audio into a single model. Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning. UnIVAL shows competitive performance to existing state-of-the-art approaches, across image and video-text tasks. The feature representations learned from image and video-text modalities, allows the model to achieve competitive performance when finetuned on audio-text tasks, despite not being pretrained on audio. Thanks to the unified model, we propose a novel study on multimodal model merging via weight interpolation of models trained on different multimodal tasks, showing their benefits in particular for out-of-distribution generalization. Finally, we motivate unification by showing the synergy between tasks. The model weights and code are released here: https://github.com/mshukor/UnIVAL.

翻訳日:2023-12-25 18:34:49 公開日:2023-12-22

# 時間相関ノイズを有する量子デバイスの圧縮ゲート特性評価

Compressed gate characterization for quantum devices with time-correlated noise ( http://arxiv.org/abs/2307.14432v2 )

ライセンス: Link先を確認

M. J. Gullans, M. Caranti, A. R. Mills, and J. R. Petta

(参考訳) 量子デバイスは、中間スケールとフォールトトレラントな量子コンピューティングに向けて着実に進歩するので、既知のノイズ源を説明する厳密で効率的な測定プロトコルを開発することが不可欠である。ゲートセットトモグラフィやランダム化ベンチマークのような既存の量子特徴づけプロトコルの多くは、量子ビットに作用するノイズがマルコビアンであると仮定する。しかし、1/fの電荷ノイズや超微細核スピンノイズの場合のように、この仮定はしばしば有効ではない。本稿では,時間関連ノイズの存在下での量子プロセストモグラフィ(QPT)の一般的な枠組みについて述べる。さらに,マルコフ音源と非マルコフノイズの相対強度を定量化する忠実度ベンチマークも導入する。本手法の適用例として,シリコンスピン量子ビットの比較理論的および実験的解析を行った。まず, 支配的雑音源を考慮した詳細なノイズモデルを開発し, 実験データに対する評価を行った。時間関連QPTの枠組みを適用すると、完全汎用の場合と比較して、1と2のキュービットゲートを特徴付けるのに必要な独立パラメータの数を10倍、100倍圧縮できることがわかった。これらの圧縮は実験に必要なトモグラフィ測定量を減少させると同時に、時間依存のハミルトニアンシミュレーションと比較してノイズ量子回路ダイナミクスの数値シミュレーションを著しく高速化する。この圧縮雑音モデルを用いて, シリコンスピン量子ビットに関する最近の実験において, 理論的に予測されたプロセスフィデリティと2つの量子ビット間ランダム化ベンチマークフィデリティの99.8%との一致が確認された。より広範に、我々のフォーマリズムは直接拡張することができ、非マルコフノイズを持つ大規模量子デバイスの高忠実性制御のための効率的でスケーラブルなチューニングプロトコルを開発することができる。

As quantum devices make steady progress towards intermediate scale and fault-tolerant quantum computing, it is essential to develop rigorous and efficient measurement protocols that account for known sources of noise. Most existing quantum characterization protocols such as gate set tomography and randomized benchmarking assume the noise acting on the qubits is Markovian. However, this assumption is often not valid, as for the case of 1/f charge noise or hyperfine nuclear spin noise. Here, we present a general framework for quantum process tomography (QPT) in the presence of time-correlated noise. We further introduce fidelity benchmarks that quantify the relative strength of different sources of Markovian and non-Markovian noise. As an application of our method, we perform a comparative theoretical and experimental analysis of silicon spin qubits. We first develop a detailed noise model that accounts for the dominant sources of noise and validate the model against experimental data. Applying our framework for time-correlated QPT, we find that the number of independent parameters needed to characterize one and two-qubit gates can be compressed by 10x and 100x, respectively, when compared to the fully generic case. These compressions reduce the amount of tomographic measurements needed in experiment, while also significantly speeding up numerical simulations of noisy quantum circuit dynamics compared to time-dependent Hamiltonian simulation. Using this compressed noise model, we find good agreement between our theoretically predicted process fidelities and two qubit interleaved randomized benchmarking fidelities of 99.8% measured in recent experiments on silicon spin qubits. More broadly, our formalism can be directly extended to develop efficient and scalable tuning protocols for high-fidelity control of large-arrays of quantum devices with non-Markovian noise.

翻訳日:2023-12-25 18:34:18 公開日:2023-12-22

# 単一qudit符号化によるフォールトトレラント計算

Fault-Tolerant Computing with Single Qudit Encoding ( http://arxiv.org/abs/2307.10761v3 )

ライセンス: Link先を確認

Matteo Mezzadri, Alessandro Chiesa, Luca Lepori and Stefano Carretta

(参考訳) 我々は、複数の量子ビット符号の典型的なリソースエスカレーションを回避するため、単一のマルチレベルキューディットに実装された安定化器量子エラー補正符号について議論する。これらのコードはquditの特定の物理的エラーに合わせてカスタマイズすることができ、効果的に抑制することができる。分子スピンquditsに対するフォールトトレラントな実装を実証し,線形quditサイズ成長のみを用いてほぼ指数関数的誤差抑制を示す。特にこれは、数千単位のqubitコードよりも優れている。また,これら組込みコードをフォールトトレラントに実装するための汎用物理システムに必要な特性についても概説する。

We discuss stabilizer quantum-error correction codes implemented in a single multi-level qudit to avoid resource escalation typical of multi-qubit codes. These codes can be customized to the specific physical errors on the qudit, effectively suppressing them. We demonstrate a Fault-Tolerant implementation on molecular spin qudits, showcasing nearly exponential error suppression with only linear qudit size growth. Notably, this outperforms qubit codes using thousands of units. We also outline the required properties for a generic physical system to Fault-Tolerantly implement these embedded codes.

翻訳日:2023-12-25 18:33:52 公開日:2023-12-22

# 連合基盤モデルに向けて: グループ構造学習のためのスケーラブルなデータセットパイプライン

Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning ( http://arxiv.org/abs/2307.09619v2 )

ライセンス: Link先を確認

Zachary Charles, Nicole Mitchell, Krishna Pillutla, Michael Reneer, Zachary Garrett

(参考訳) 我々は,大規模なグループ構造化(フェデレート)データセットを作成するためのライブラリであるDataset Grouperを導入し,基礎モデルのスケールでのフェデレーション学習シミュレーションを可能にする。このライブラリは、ユーザ指定のパーティションに基づいて、既存のデータセットのグループ構造バージョンの作成を容易にするとともに、既存のソフトウェアフレームワークにプラグイン可能な、さまざまな有用な異種データセットに直接つながる。 Dataset Grouperには3つの利点がある。まず、単一のグループのデータセットでさえメモリに収まるには大きすぎる設定にスケールします。第2に、基本(非分割)データセットの選択とパーティション定義の両方において、柔軟性を提供します。最後に、フレームワークに依存しない。我々は、Dataset Grouperが、以前の作業よりも桁違いに大きいデータセット上で、大規模なフェデレートされた言語モデリングシミュレーションを可能にし、数十億のパラメータを持つ言語モデルのフェデレーショントレーニングを可能にすることを実証的に実証した。実験の結果,FedAvgのようなアルゴリズムは,この規模の経験的リスク最小化手法よりもメタラーニング手法として機能し,下流のパーソナライズやタスク固有の適応に有用であることが示唆された。 dataset grouperはhttps://github.com/google-research/dataset_grouperで入手できる。

We introduce Dataset Grouper, a library to create large-scale group-structured (e.g., federated) datasets, enabling federated learning simulation at the scale of foundation models. This library facilitates the creation of group-structured versions of existing datasets based on user-specified partitions and directly leads to a variety of useful heterogeneous datasets that can be plugged into existing software frameworks. Dataset Grouper offers three key advantages. First, it scales to settings where even a single group's dataset is too large to fit in memory. Second, it provides flexibility, both in choosing the base (non-partitioned) dataset and in defining partitions. Finally, it is framework-agnostic. We empirically demonstrate that Dataset Grouper enables large-scale federated language modeling simulations on datasets that are orders of magnitude larger than in previous work, allowing for federated training of language models with hundreds of millions, and even billions, of parameters. Our experimental results show that algorithms like FedAvg operate more as meta-learning methods than as empirical risk minimization methods at this scale, suggesting their utility in downstream personalization and task-specific adaptation. Dataset Grouper is available at https://github.com/google-research/dataset_grouper.

翻訳日:2023-12-25 18:33:42 公開日:2023-12-22

# S.T.A.R.トラック:適応時空間表現を用いたエンドツーエンド3次元物体追跡のための潜在運動モデル

S.T.A.R.-Track: Latent Motion Models for End-to-End 3D Object Tracking with Adaptive Spatio-Temporal Appearance Representations ( http://arxiv.org/abs/2306.17602v2 )

ライセンス: Link先を確認

Simon Doll, Niklas Hanselmann, Lukas Schneider, Richard Schulz, Markus Enzweiler, Hendrik P.A. Lensch

(参考訳) 本稿では,トラッキング・バイ・アテンションのパラダイムに従って,オブジェクト中心のトランスフォーマーベースの3d追跡フレームワークを提案する。従来のモデルに基づく追跡手法は、幾何運動モデルを用いたフレーム間のオブジェクトとエゴの動きの幾何学的効果を取り入れている。そこで,我々はs.t.a.r.-trackを提案する。s.t.a.r.-trackは,新しい潜在運動モデル (lmm) を用いて,潜在空間における視方向や照明条件の変化を考慮したオブジェクトクエリの調整を行う。トラックの存在確率をモデル化する新しい学習可能なトラック埋め込みと組み合わせることで、任意のクエリベースの検出器と統合可能な汎用的なトラッキングフレームワークが実現される。 nuScenes ベンチマークによる大規模な実験により,DETR3D ベースのトラッカーの \ac{sota} 性能を示すとともに,トラックの同一性スイッチ数を劇的に削減した。

Following the tracking-by-attention paradigm, this paper introduces an object-centric, transformer-based framework for tracking in 3D. Traditional model-based tracking approaches incorporate the geometric effect of object- and ego motion between frames with a geometric motion model. Inspired by this, we propose S.T.A.R.-Track, which uses a novel latent motion model (LMM) to additionally adjust object queries to account for changes in viewing direction and lighting conditions directly in the latent space, while still modeling the geometric motion explicitly. Combined with a novel learnable track embedding that aids in modeling the existence probability of tracks, this results in a generic tracking framework that can be integrated with any query-based detector. Extensive experiments on the nuScenes benchmark demonstrate the benefits of our approach, showing \ac{sota} performance for DETR3D-based trackers while drastically reducing the number of identity switches of tracks at the same time.

翻訳日:2023-12-25 18:33:20 公開日:2023-12-22

# 人間中心の生成AIの次のステップ:技術的視点

Next Steps for Human-Centered Generative AI: A Technical Perspective ( http://arxiv.org/abs/2306.15774v2 )

ライセンス: Link先を確認

Xiang 'Anthony' Chen, Jeff Burke, Ruofei Du, Matthew K. Hong, Jennifer Jacobs, Philippe Laban, Dingzeyu Li, Nanyun Peng, Karl D. D. Willis, Chien-Sheng Wu, Bolei Zhou

(参考訳) 繰り返し、学際的な議論を通じて、我々はHuman-centered Generative AI(HGAI)の次のステップを定義し、提案する。我々は、人的価値の整合性、人間の意図の同化、人間の能力の増強という3つのレベルにまたがるジェネレーティブAIの今後の方向性を示す包括的な研究課題に貢献する。これらの次のステップを特定することで、学際的な研究チームがHGAIにおける一貫したアイデアの集合を追求し、その関心事に焦点を合わせながら、将来的な作業環境の全体像を維持していくことを目指しています。

Through iterative, cross-disciplinary discussions, we define and propose next-steps for Human-centered Generative AI (HGAI). We contribute a comprehensive research agenda that lays out future directions of Generative AI spanning three levels: aligning with human values; assimilating human intents; and augmenting human abilities. By identifying these next-steps, we intend to draw interdisciplinary research teams to pursue a coherent set of emergent ideas in HGAI, focusing on their interested topics while maintaining a coherent big picture of the future work landscape.

翻訳日:2023-12-25 18:33:00 公開日:2023-12-22

# 量子最適輸送と弱位相

Quantum Optimal Transport and Weak Topologies ( http://arxiv.org/abs/2306.12944v3 )

ライセンス: Link先を確認

Laurent Lafleche

(参考訳) 古典的最適輸送距離の量子設定へのいくつかの拡張が提案されている。本稿では、golse, mouhot, paul [commun math phys 343:165-205, 2016] と golse, paul [arch ration mech anal 223:57-94, 2017] によって導入された擬メトリックスについて検討する。これらの擬メトリックは、位相空間上のモンゲ-カントロヴィチ-ワッサーシュタイン距離の量子アナログとして機能する。これらは、半古典近似における正の「自己距離」のため、小さな項まで負のソボレフノルムに匹敵することを証明する。これにより、初期データに対する正規性が少なくなり、平均場と半古典的限界の文脈で既知の結果を改善することができる。

Several extensions of the classical optimal transport distances to the quantum setting have been proposed. In this paper, we investigate the pseudometrics introduced by Golse, Mouhot and Paul in [Commun Math Phys 343:165-205, 2016] and by Golse and Paul in [Arch Ration Mech Anal 223:57-94, 2017]. These pseudometrics serve as a quantum analogue of the Monge-Kantorovich-Wasserstein distances of order $2$ on the phase space. We prove that they are comparable to negative Sobolev norms up to a small term due to a positive "self-distance" in the semiclassical approximation, which can be bounded above using the Wigner-Yanase skew information. This enables us to improve the known results in the context of the mean-field and semiclassical limits by requiring less regularity on the initial data.

翻訳日:2023-12-25 18:32:49 公開日:2023-12-22

# RoboCat:ロボットマニピュレーションのための自己改善型ジェネリストエージェント

RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation ( http://arxiv.org/abs/2306.11706v2 )

ライセンス: Link先を確認

Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju, Antoine Laurens, Claudio Fantacci, Valentin Dalibard, Martina Zambelli, Murilo Martins, Rugile Pevceviciute, Michiel Blokzijl, Misha Denil, Nathan Batchelor, Thomas Lampe, Emilio Parisotto, Konrad \.Zo{\l}na, Scott Reed, Sergio G\'omez Colmenarejo, Jon Scholz, Abbas Abdolmaleki, Oliver Groth, Jean-Baptiste Regli, Oleg Sushkov, Tom Roth\"orl, Jos\'e Enrique Chen, Yusuf Aytar, Dave Barker, Joy Ortiz, Martin Riedmiller, Jost Tobias Springenberg, Raia Hadsell, Francesco Nori, Nicolas Heess

(参考訳) 異なるロボットやタスクから異種ロボット体験を活用し、新しいスキルや体格を素早く習得できる能力は、ロボット学習を変革する可能性がある。視覚と言語の基礎モデルの最近の進歩に触発されて,ロボット操作のためのマルチアンボディメントマルチタスク汎用エージェントを提案する。このエージェントはrobocatと呼ばれ、アクションラベルの視覚体験を消費できる視覚目標条件決定トランスフォーマーである。このデータは、シミュレートされた本物のロボットアームから、さまざまな観察とアクションのセットでモーターコントロールスキルの大規模なレパートリーにまたがる。 RoboCatでは、ゼロショットだけでなく、ターゲットタスクの100-1000例のみを使用して適応することで、新しいタスクやロボットに一般化する能力を示す。また、トレーニングされたモデル自体が、その後のトレーニングイテレーションでデータを生成するためにどのように使われるかを示し、自律的な改善ループのための基本的な構築ブロックを提供する。本研究は,シミュレーションと3種類の実ロボットを用いた大規模評価を行い,エージェントの能力について検討する。トレーニングデータの拡大と多様化が進むにつれ、robocatはクロスタスク転送の兆候を示すだけでなく、新しいタスクへの適応もより効率的になります。

The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a multi-embodiment, multi-task generalist agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100-1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.

翻訳日:2023-12-25 18:32:13 公開日:2023-12-22

# Sparse and Invisible Trigger によるバックドアアタック

Backdoor Attack with Sparse and Invisible Trigger ( http://arxiv.org/abs/2306.06209v2 )

ライセンス: Link先を確認

Yinghua Gao, Yiming Li, Xueluan Gong, Zhifeng Li, Shu-Tao Xia, Qian Wang

(参考訳) ディープニューラルネットワーク(DNN)は、バックドア攻撃に対して脆弱であり、敵は、被害者モデルが通常、良性サンプルで予測するが、トリガーされたサンプルをターゲットクラスに分類するように、少数のトレーニングデータを操作する。バックドア攻撃は、トレーニングフェーズの脅威として浮上しているが、DNNベースのアプリケーションに深刻なリスクをもたらす。本稿では,既存のバックドア攻撃のトリガパターンを再検討する。私たちは、それらが見えているか、スパースでないかを明らかにします。さらに重要なのは、既存の手法を組み合わせて効果的なスパースで見えないバックドア攻撃を設計することは不可能である。この問題に対処するために、疎度と可視性制約を伴う二段階最適化問題としてトリガ生成を定式化し、それを解決する効果的な方法を提案する。提案手法はsparse and visible backdoor attack (SIBA)と呼ばれる。異なる設定下でベンチマークデータセットを広範囲に実験し、攻撃の有効性と既存のバックドア防御に対する耐性を検証する。主な実験を再現するためのコードは \url{https://github.com/yinghuagao/siba} で入手できる。

Deep neural networks (DNNs) are vulnerable to backdoor attacks, where the adversary manipulates a small portion of training data such that the victim model predicts normally on the benign samples but classifies the triggered samples as the target class. The backdoor attack is an emerging yet threatening training-phase threat, leading to serious risks in DNN-based applications. In this paper, we revisit the trigger patterns of existing backdoor attacks. We reveal that they are either visible or not sparse and therefore are not stealthy enough. More importantly, it is not feasible to simply combine existing methods to design an effective sparse and invisible backdoor attack. To address this problem, we formulate the trigger generation as a bi-level optimization problem with sparsity and invisibility constraints and propose an effective method to solve it. The proposed method is dubbed sparse and invisible backdoor attack (SIBA). We conduct extensive experiments on benchmark datasets under different settings, which verify the effectiveness of our attack and its resistance to existing backdoor defenses. The codes for reproducing main experiments are available at \url{https://github.com/YinghuaGao/SIBA}.

翻訳日:2023-12-25 18:31:53 公開日:2023-12-22

# スケッチ美化:学習部人工物体のスケッチの美化と構造洗練

Sketch Beautification: Learning Part Beautification and Structure Refinement for Sketches of Man-made Objects ( http://arxiv.org/abs/2306.05832v2 )

ライセンス: Link先を確認

Deng Yu, Manfred Lau, Lin Gao, Hongbo Fu

(参考訳) 本稿では,人工物体の自由なスケッチを入力し,幾何学的にも構造的にも自動的に美化する,新しいフリーハンドスケッチ美化手法を提案する。スケッチの美化は、非常に抽象的で多彩な描画方法のため、難しい。既存の手法は通常、限られた訓練サンプルの分布に制限されるため、豊かなバリエーションで自由に描かれたスケッチを美化することはできない。この課題に対処するために、分割・組み合わせ戦略を採用します。具体的には、まず、入力スケッチを意味成分にパースし、部分レベルの暗黙多様体に基づく学習部美化モジュールにより個々のコンポーネントを美化し、次に構造美化モジュールを介して美化コンポーネントを再評価する。この戦略により,本手法はトレーニングサンプルを超えて,新しいフリーハンドスケッチを処理できる。本システムの有効性を広範な実験と知覚的研究で実証する。

We present a novel freehand sketch beautification method, which takes as input a freely drawn sketch of a man-made object and automatically beautifies it both geometrically and structurally. Beautifying a sketch is challenging because of its highly abstract and heavily diverse drawing manner. Existing methods are usually confined to the distribution of their limited training samples and thus cannot beautify freely drawn sketches with rich variations. To address this challenge, we adopt a divide-and-combine strategy. Specifically, we first parse an input sketch into semantic components, beautify individual components by a learned part beautification module based on part-level implicit manifolds, and then reassemble the beautified components through a structure beautification module. With this strategy, our method can go beyond the training samples and handle novel freehand sketches. We demonstrate the effectiveness of our system with extensive experiments and a perceptive study.

翻訳日:2023-12-25 18:31:33 公開日:2023-12-22

# 予測と統計のパリティの調和:因果的アプローチ

Reconciling Predictive and Statistical Parity: A Causal Approach ( http://arxiv.org/abs/2306.05059v2 )

ライセンス: Link先を確認

Drago Plecko, Elias Bareinboim

(参考訳) 公正な機械学習が調査の重要分野として台頭して以来、差別の定量化と測定方法に関する多くの異なる概念が文献で提案されている。しかし、これらの概念のいくつかは互いに相容れないことが示されている。このような結果から,多種多様な公平性が存在することが明らかとなり,公平性に関する適切な尺度についてのコンセンサスが困難となり,実用上のツールの適用が妨げられた。本稿では,統計的および予測的パリティの概念を関連づけた,これらの重要な不可能な結果の1つについて検討する。具体的には,予測パリティに関連する公平度尺度の新たな因果分解式を導出し,この基準が,異質な待遇,異質な影響,ビジネスの必要性という法的ドクトリンを通じて,統計的パリティとどのように関連しているか,新たな知見を得る。以上の結果から, 統計的・予測パリティの概念は, より慎重な因果分析を通じて, 相互排他的ではなく, ビジネスニーズという概念を通じて, 公正な概念のスペクトルを補完し, 分散していることが明らかとなった。最後に,実例における発見の重要性を実証する。

Since the rise of fair machine learning as a critical field of inquiry, many different notions on how to quantify and measure discrimination have been proposed in the literature. Some of these notions, however, were shown to be mutually incompatible. Such findings make it appear that numerous different kinds of fairness exist, thereby making a consensus on the appropriate measure of fairness harder to reach, hindering the applications of these tools in practice. In this paper, we investigate one of these key impossibility results that relates the notions of statistical and predictive parity. Specifically, we derive a new causal decomposition formula for the fairness measures associated with predictive parity, and obtain a novel insight into how this criterion is related to statistical parity through the legal doctrines of disparate treatment, disparate impact, and the notion of business necessity. Our results show that through a more careful causal analysis, the notions of statistical and predictive parity are not really mutually exclusive, but complementary and spanning a spectrum of fairness notions through the concept of business necessity. Finally, we demonstrate the importance of our findings on a real-world example.

翻訳日:2023-12-25 18:30:10 公開日:2023-12-22

# クロスモーダル検索のためのプロトタイプベースアレエータ不確かさ定量化

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval ( http://arxiv.org/abs/2309.17093v2 )

ライセンス: Link先を確認

Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, Heng Tao Shen

(参考訳) クロスモーダル検索手法は、共通表現空間を共同学習することにより、視覚と言語モダリティの類似性関係を構築する。しかし、この予測は、腐敗した画像、速いペースの動画、未詳のテキストなど、低品質のデータによって引き起こされるアリータティックな不確実性によって、しばしば信頼性が低下する。本稿では,不確実性から生じる不確かさを定量化することにより,信頼性の高い予測を実現するための新しいプロトタイプベースアレエータ型不確実性定量化(pau)フレームワークを提案する。具体的には、セマンティクス部分空間全体を表現するために、まず様々な学習可能なプロトタイプを各モダリティ向けに構築する。次に、デンプスター・シェーファー理論と主観論理理論を用いて、証拠とディリクレ分布パラメータを関連付けた実証的理論的枠組みを構築する。 PAUモデルは、クロスモーダル検索のための正確な不確実性と信頼性のある予測を誘導する。 MSR-VTT, MSVD, DiDeMo, MS-COCOの4つの主要なベンチマークデータセットを用いて実験を行い, 本手法の有効性を実証した。コードはhttps://github.com/leolee99/PAUでアクセスできる。

Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity. Concretely, we first construct a set of various learnable prototypes for each modality to represent the entire semantics subspace. Then Dempster-Shafer Theory and Subjective Logic Theory are utilized to build an evidential theoretical framework by associating evidence with Dirichlet Distribution parameters. The PAU model induces accurate uncertainty and reliable predictions for cross-modal retrieval. Extensive experiments are performed on four major benchmark datasets of MSR-VTT, MSVD, DiDeMo, and MS-COCO, demonstrating the effectiveness of our method. The code is accessible at https://github.com/leolee99/PAU.

翻訳日:2023-12-25 18:22:45 公開日:2023-12-22

# 可変抑制によるシャープネス認識最適化の強化

Enhancing Sharpness-Aware Optimization Through Variance Suppression ( http://arxiv.org/abs/2309.15639v3 )

ライセンス: Link先を確認

Bingcong Li, Georgios B. Giannakis

(参考訳) シャープネスを意識した最小化(SAM)は、大きなデータ拡張がなくても、ディープニューラルネットワークの一般化を向上する上でのメリットを十分に文書化している。一般化能力を高める「平坦なミニマ」近傍の損失関数の幾何学を取り入れたSAMは、近隣の摂動パラメータによる最大損失を最小化して「平坦な谷」を求める。損失関数の鋭さを考慮に入れることは重要であるが、このような「過密な敵」は一般化の最も外側のレベルを縮めることができる。この貢献の新しいアプローチは、そのような親和性を避けるために分散抑制(vasso)を通じて敵の安定化を促進する。 VaSSOの証明可能な安定性は、画像分類や機械翻訳を含むモデルに依存しないタスクにおいてSAMよりも数値的に改善されている。さらに、実験により、VaSSOはSAMを高レベルのラベルノイズに対して堅牢性で支持することを確認した。

Sharpness-aware minimization (SAM) has well documented merits in enhancing generalization of deep neural networks, even without sizable data augmentation. Embracing the geometry of the loss function, where neighborhoods of 'flat minima' heighten generalization ability, SAM seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood. Although critical to account for sharpness of the loss function, such an 'over-friendly adversary' can curtail the outmost level of generalization. The novel approach of this contribution fosters stabilization of adversaries through variance suppression (VaSSO) to avoid such friendliness. VaSSO's provable stability safeguards its numerical improvement over SAM in model-agnostic tasks, including image classification and machine translation. In addition, experiments confirm that VaSSO endows SAM with robustness against high levels of label noise.

翻訳日:2023-12-25 18:22:14 公開日:2023-12-22

# PrNet:Android Raw GNSS測定による位置決めを改善するために擬似空間を補正するニューラルネットワーク

PrNet: A Neural Network for Correcting Pseudoranges to Improve Positioning with Android Raw GNSS Measurements ( http://arxiv.org/abs/2309.12204v2 )

ライセンス: Link先を確認

Xu Weng, Keck Voon Ling, Haochen Liu

(参考訳) 本稿では,携帯端末から収集したデータによる局所化性能を向上させるために,疑似配列のバイアス誤差を軽減するニューラルネットワークを提案する。衛星方向多層パーセプトロン (mlp) は, 6つの衛星, 受信機, android raw global navigation satellite system (gnss) 測定から得られた文脈関連特徴から疑似範囲バイアス補正を緩和するように設計されている。 mlpを訓練するために,位置情報と平滑化手法を用いて疑似バイアスの目標値を慎重に算出し,スマートフォンのクロックバイアスの推定残差を考慮した損失関数を最適化する。修正された擬似範囲は、モデルベースのローカライズエンジンによって位置を計算するために使用される。 Google Phone Decimeter Challenge (GSDC)データセットは、農村部と都市部から収集されたAndroidスマートフォンデータを含んでいる。フィンガープリントとクロストレースの双方のローカライゼーションの結果から,提案手法はモデルベースおよび最先端のデータ駆動手法より優れていることが示された。

We present a neural network for mitigating biased errors in pseudoranges to improve localization performance with data collected from mobile phones. A satellite-wise Multilayer Perceptron (MLP) is designed to regress the pseudorange bias correction from six satellite, receiver, context-related features derived from Android raw Global Navigation Satellite System (GNSS) measurements. To train the MLP, we carefully calculate the target values of pseudorange bias using location ground truth and smoothing techniques and optimize a loss function involving the estimation residuals of smartphone clock bias. The corrected pseudoranges are then used by a model-based localization engine to compute locations. The Google Smartphone Decimeter Challenge (GSDC) dataset, which contains Android smartphone data collected from both rural and urban areas, is utilized for evaluation. Both fingerprinting and cross-trace localization results demonstrate that our proposed method outperforms model-based and state-of-the-art data-driven approaches.

翻訳日:2023-12-25 18:21:55 公開日:2023-12-22

# ホログラフィーの限界と量子情報プロトコルの補正

Holographic Limitations and Corrections to Quantum Information Protocols ( http://arxiv.org/abs/2309.09939v3 )

ライセンス: Link先を確認

Stefano Pirandola

(参考訳) 我々は、ベッケンシュタイン境界やススキンド球面エントロピー境界のようなホログラフィック境界による絡み合い分布、量子テレポーテーション、および量子通信に課される制限について論じる。連続可変(CV)量子情報に対して、ホログラフィック補正の単純適用が確立された結果を妨げていることを示す。これらの補正は完全cvテレポーテーションを不可能にし、損失のある量子チャネルのテレポーテーションシミュレーションにおける一様収束を妨げ、量子通信に修正されたplobバウンドを課す。これらの数学的補正は、実用的量子技術に直ちには影響しないが、量子情報理論のより深い理論的理解には重要である。

We discuss the limitations imposed on entanglement distribution, quantum teleportation, and quantum communication by holographic bounds, such as the Bekenstein bound and Susskind's spherical entropy bound. For continuous-variable (CV) quantum information, we show how the naive application of holographic corrections disrupts well-established results. These corrections render perfect CV teleportation impossible, preclude uniform convergence in the teleportation simulation of lossy quantum channels, and impose a revised PLOB bound for quantum communication. While these mathematical corrections do not immediately impact practical quantum technologies, they are critical for a deeper theoretical understanding of quantum information theory.

翻訳日:2023-12-25 18:21:35 公開日:2023-12-22

# 拡張コンパスモデルにおけるサブシステム対称性、臨界ボース表面および非移動励起

Subsystem symmetries, critical Bose surface and immobile excitations in an extended compass model ( http://arxiv.org/abs/2309.08300v2 )

ライセンス: Link先を確認

Zhidan Li, Chun-Jiong Huang, Changle Liu and Hai-Zhou Lu

(参考訳) サブシステム対称性をホストする拡張コンパスモデルを提案し、3d遷移金属化合物との潜在的な実験的関連性を示す。サブシステム対称性はスピン励起の移動性を強く制限し、重大な結果をもたらす。量子臨界点では、$k_x$ と $k_y$ の軸全体に「臨界ボース曲面」が存在することが分かる。その向こう側には、低温でネマティック不安定になる陽極線スピン液体があります。フェロ四極子相では、1つの励起は「フラクトン」と個別に類似している。

We propose an extended compass model that hosts subsystem symmetries and has potential experimental relevance with 3d transition metal compounds. The subsystem symmetries strongly constrain the mobility of spin excitations and lead to profound consequences. At the quantum critical point we find the presence of "critical Bose surface" along the entire $k_x$ and $k_y$ axis. Across which we find a nodal-line spin liquid that undergoes nematic instability at low temperatures. In the ferro-quadrupole phase, we find that one excitation is immobile individually analogous to "fractons".

翻訳日:2023-12-25 18:21:22 公開日:2023-12-22

# JPEGの差別化:悪魔は細部にある

Differentiable JPEG: The Devil is in the Details ( http://arxiv.org/abs/2309.06978v4 )

ライセンス: Link先を確認

Christoph Reich, Biplob Debnath, Deep Patel, Srimat Chakradhar

(参考訳) jpegは最も広く普及している画像符号化方法の1つである。しかしながら、jpegの非微分性は、ディープラーニングパイプラインのアプリケーションを制限する。 JPEGのいくつかの異なる近似がこの問題に対処するために最近提案されている。本稿では既存の差分を包括的に検討する。 JPEGは従来の方法で見逃された重要な詳細にアプローチし、識別する。この目的のために、我々は新しい差分を提案する。 JPEGアプローチは、以前の制限を克服する。我々のアプローチは、入力画像、jpeg品質、量子化テーブル、色変換パラメータを微分可能なw.r.tである。我々は差分の前方および後方のパフォーマンスを評価する。既存のメソッドに対するJPEGアプローチ。さらに、重要な設計選択を評価するために広範なアブレーションが行われる。我々の提案した差分。 JPEGは(非差分)参照実装に最も似ており、近年の差分をはるかに上回っている。平均$3.47$dB (PSNR) のアプローチ。強い圧縮率では、PSNRも9.51ドルdB改善できる。強い敵攻撃の結果は差分によって得られる。 JPEGは、効果的な勾配近似を示す。私たちのコードはhttps://github.com/necla-ml/Diff-JPEGで公開されています。

JPEG remains one of the most widespread lossy image coding methods. However, the non-differentiable nature of JPEG restricts the application in deep learning pipelines. Several differentiable approximations of JPEG have recently been proposed to address this issue. This paper conducts a comprehensive review of existing diff. JPEG approaches and identifies critical details that have been missed by previous methods. To this end, we propose a novel diff. JPEG approach, overcoming previous limitations. Our approach is differentiable w.r.t. the input image, the JPEG quality, the quantization tables, and the color conversion parameters. We evaluate the forward and backward performance of our diff. JPEG approach against existing methods. Additionally, extensive ablations are performed to evaluate crucial design choices. Our proposed diff. JPEG resembles the (non-diff.) reference implementation best, significantly surpassing the recent-best diff. approach by $3.47$dB (PSNR) on average. For strong compression rates, we can even improve PSNR by $9.51$dB. Strong adversarial attack results are yielded by our diff. JPEG, demonstrating the effective gradient approximation. Our code is available at https://github.com/necla-ml/Diff-JPEG.

翻訳日:2023-12-25 18:21:13 公開日:2023-12-22

# セルフ・スーパービジョンによるLiDARデータのセマンティックシーンセグメンテーション

Self-Supervised Pre-Training Boosts Semantic Scene Segmentation on LiDAR Data ( http://arxiv.org/abs/2309.02139v2 )

ライセンス: Link先を確認

Mariona Car\'os, Ariadna Just, Santi Segu\'i, Jordi Vitri\`a

(参考訳) 空中ライダーシステムは、主に3d座標で定義された点からなる広範囲な点雲データを生成することで、地球表面を捉えることができる。しかし、教師付き学習タスクにそのようなポイントをラベル付けするのは時間を要する。その結果,無ラベルデータから学習し,注釈付きサンプルの数を著しく削減できる技術を検討する必要がある。本研究では,Barlow Twins を用いた自己教師型エンコーダを訓練し,セマンティックシーンセグメンテーションのタスクにおいて,事前学習ネットワークとして使用することを提案する。実験の結果,教師なしの事前学習により,教師なしのタスク,特に未提示のカテゴリでパフォーマンスが向上することが示された。

Airborne LiDAR systems have the capability to capture the Earth's surface by generating extensive point cloud data comprised of points mainly defined by 3D coordinates. However, labeling such points for supervised learning tasks is time-consuming. As a result, there is a need to investigate techniques that can learn from unlabeled data to significantly reduce the number of annotated samples. In this work, we propose to train a self-supervised encoder with Barlow Twins and use it as a pre-trained network in the task of semantic scene segmentation. The experimental results demonstrate that our unsupervised pre-training boosts performance once fine-tuned on the supervised task, especially for under-represented categories.

翻訳日:2023-12-25 18:21:01 公開日:2023-12-22

# 変形性関節症に対する音響-調音インバージョン:事前訓練による自己指導的表現は好ましいか?

Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable? ( http://arxiv.org/abs/2309.01108v3 )

ライセンス: Link先を確認

Sarthak Kumar Maharana, Krishna Kamal Adidam, Shoumik Nandi, Ajitesh Srivastava

(参考訳) aai (a acoustic-to-articulatory inversion) は、音響から調音空間へのマッピングである。 MFCCのような信号処理機能は、AAIタスクに広く使われている。変形性発声者にとって、AAIは不正確で不明瞭な発音のため困難である。本研究では,事前学習型自己教師付き学習(ssl)モデルを用いて,構音障害児のaaiを行う。我々は、この挑戦的なAAIタスクに対する様々な事前訓練された機能の影響を、低リソース環境で実証する。さらに、抽出したSSL機能にxベクトルを条件として、BLSTMネットワークをトレーニングする。実例では、3つのAIトレーニングスキーム(オブジェクト固有、プール化、微調整)を実験する。トレーニングスキーム間で一貫した結果、DeCoARは、微調整スキームにおいて、健康管理と患者のそれぞれに対して、パーソン相関係数(Pearson correlation Coefficient, CC)を ~1.81% と ~4.56% で相対的に改善することを明らかにする。見えないケースでは、SSLのさまざまな機能について、同様の平均トレンドを観察します。全体として、機能再構築や将来のタイムステップ予測タスクでトレーニングされたwav2vec、APC、DeCoARといったSSLネットワークは、変形性関節軌跡の予測にうまく機能する。

Acoustic-to-articulatory inversion (AAI) involves mapping from the acoustic to the articulatory space. Signal-processing features like the MFCCs, have been widely used for the AAI task. For subjects with dysarthric speech, AAI is challenging because of an imprecise and indistinct pronunciation. In this work, we perform AAI for dysarthric speech using representations from pre-trained self-supervised learning (SSL) models. We demonstrate the impact of different pre-trained features on this challenging AAI task, at low-resource conditions. In addition, we also condition x-vectors to the extracted SSL features to train a BLSTM network. In the seen case, we experiment with three AAI training schemes (subject-specific, pooled, and fine-tuned). The results, consistent across training schemes, reveal that DeCoAR, in the fine-tuned scheme, achieves a relative improvement of the Pearson Correlation Coefficient (CC) by ~1.81% and ~4.56% for healthy controls and patients, respectively, over MFCCs. We observe similar average trends for different SSL features in the unseen case. Overall, SSL networks like wav2vec, APC, and DeCoAR, trained with feature reconstruction or future timestep prediction tasks, perform well in predicting dysarthric articulatory trajectories.

翻訳日:2023-12-25 18:20:48 公開日:2023-12-22

# 医用画像登録のためのオンザフライ指導

On-the-Fly Guidance Training for Medical Image Registration ( http://arxiv.org/abs/2308.15216v4 )

ライセンス: Link先を確認

Yicheng Chen, Shengxiang Ji, Yuelin Xin, Kun Han, Xiaohui Xie

(参考訳) 本研究は,学習に基づく画像登録の分野において,弱い教師付きおよび教師なしの方法に固有の制限に対処した新しいアプローチを探求する。弱教師付き手法は少ないラベル付きデータに大きく依存するが、教師なし戦略は画像類似性による間接的精度測定に依存する。特に、従来の教師付き学習は、医療画像の正確な変形の欠如のために使われない。本研究は,既存のモデルを強化するために,OFG(On-the-Fly Guidance)を用いたユニークなトレーニングフレームワークを提案する。このフレームワークは、トレーニング中に、我々のカスタムオプティマイザで現在の変形予測を精査することで、数ステップ前に擬似地下真実を生成する。この疑似基底真理は、教師付き学習コンテキストでモデルを直接監督するのに役立ちます。このプロセスでは、予測変形を限られたステップで最適化し、トレーニング効率を確保し、各トレーニングフェーズの達成可能な目標を設定する。 OFGは、学習ベースの手法の速度を維持しながら、既存の画像登録技術の精度を著しく向上させる。提案手法は,既定登録モデルからの予測や最適化アウトプットを含む様々な疑似根拠真理生成戦略を用いて評価した。実験は3つのベンチマークデータセットと3つの最先端モデルにまたがって行われた。 OFGは、学習に基づく画像登録モデルのトレーニング効率を高めるために、容易に統合可能なプラグアンドプレイソリューションを提供する。コード: https://github.com/miraclefactory/on-the-fly-guidance.com

This research explores a novel approach in the realm of learning-based image registration, addressing the limitations inherent in weakly-supervised and unsupervised methods. Weakly-supervised techniques depend heavily on scarce labeled data, while unsupervised strategies rely on indirect measures of accuracy through image similarity. Notably, traditional supervised learning is not utilized due to the lack of precise deformation ground-truth in medical imaging. Our study introduces a unique training framework with On-the-Fly Guidance (OFG) to enhance existing models. This framework, during training, generates pseudo-ground truth a few steps ahead by refining the current deformation prediction with our custom optimizer. This pseudo-ground truth then serves to directly supervise the model in a supervised learning context. The process involves optimizing the predicted deformation with a limited number of steps, ensuring training efficiency and setting achievable goals for each training phase. OFG notably boosts the precision of existing image registration techniques while maintaining the speed of learning-based methods. We assessed our approach using various pseudo-ground truth generation strategies, including predictions and optimized outputs from established registration models. Our experiments spanned three benchmark datasets and three cutting-edge models, with OFG demonstrating significant and consistent enhancements, surpassing previous state-of-the-arts in the field. OFG offers an easily integrable plug-and-play solution to enhance the training effectiveness of learning-based image registration models. Code at https://github.com/miraclefactory/on-the-fly-guidance.

翻訳日:2023-12-25 18:20:23 公開日:2023-12-22

# 音声・言語・聴覚科学における一般化可能な機械学習モデルに向けて : サンプルサイズの推定とオーバーフィッティングの低減

Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting ( http://arxiv.org/abs/2308.11197v3 )

ライセンス: Link先を確認

Hamzeh Ghasemzadeh, Robert E. Hillman, Daryush D. Mehta

(参考訳) この研究の第一の目的は、研究者がより堅牢なネストクロスバリデーション法を使う動機となる定量的証拠を提供することである。第2の目的は,MLに基づく解析のための電力分析を行うための方法とMATLABコードを提供することである。モンテカルロシミュレーションは、使用済みのクロスバリデーション法、特徴の判別力、特徴空間の次元、モデルの次元の間の相互作用を定量化するために用いられた。 MLモデルの統計力と統計的信頼度に基づいて,4種類のクロスバリデーション(シングルホールトアウト,10倍,列車バリデーションテスト,ネスト10倍)を比較した。統計学的に有意な結果を得るために最小のサンプルサイズを決定するためにヌル仮説と代替仮説の分布を用いた({\alpha}=0.05, 1-\b{eta}=0.8)。モデルの統計的信頼度は、正しい特徴が選択され、最終モデルに含まれる確率として定義された。分析の結果,単一ホールドアウト法に基づくモデルは非常に低い統計的パワーと統計的信頼性を示し,精度を著しく過大評価した。逆に、ネストした10倍のクロスバリデーションは、最も高い統計信頼と最も高い統計力をもたらし、その正確さの偏りのない推定を提供した。単一のホールドアウトで必要なサンプルサイズは、ネストされたクロスバリデーションを使用する場合に必要なものよりも50%高い。ネストされたクロスバリデーションに基づくモデルの信頼度は、単一のホールドアウトベースのモデルの信頼度より4倍も高かった。計算モデル、MATLAB符号およびルックアップテーブルは、将来の研究の設計において、サンプルサイズを推定する研究者を支援するために提供される。

This study's first purpose is to provide quantitative evidence that would incentivize researchers to instead use the more robust method of nested cross-validation. The second purpose is to present methods and MATLAB codes for doing power analysis for ML-based analysis during the design of a study. Monte Carlo simulations were used to quantify the interactions between the employed cross-validation method, the discriminative power of features, the dimensionality of the feature space, and the dimensionality of the model. Four different cross-validations (single holdout, 10-fold, train-validation-test, and nested 10-fold) were compared based on the statistical power and statistical confidence of the ML models. Distributions of the null and alternative hypotheses were used to determine the minimum required sample size for obtaining a statistically significant outcome ({\alpha}=0.05, 1-\b{eta}=0.8). Statistical confidence of the model was defined as the probability of correct features being selected and hence being included in the final model. Our analysis showed that the model generated based on the single holdout method had very low statistical power and statistical confidence and that it significantly overestimated the accuracy. Conversely, the nested 10-fold cross-validation resulted in the highest statistical confidence and the highest statistical power, while providing an unbiased estimate of the accuracy. The required sample size with a single holdout could be 50% higher than what would be needed if nested cross-validation were used. Confidence in the model based on nested cross-validation was as much as four times higher than the confidence in the single holdout-based model. A computational model, MATLAB codes, and lookup tables are provided to assist researchers with estimating the sample size during the design of their future studies.

翻訳日:2023-12-25 18:19:56 公開日:2023-12-22

# 機械学習のためのトレーニングデータの分布特性検証

Attesting Distributional Properties of Training Data for Machine Learning ( http://arxiv.org/abs/2308.09552v2 )

ライセンス: Link先を確認

Vasisht Duddu, Anudeep Das, Nora Khayata, Hossein Yalame, Thomas Schneider, N. Asokan

(参考訳) 機械学習(ML)の成功は、その信頼性に対する懸念が高まっている。いくつかの管轄区域がML規制の枠組みを準備している。そのような懸念の1つは、モデルトレーニングデータが特定の機密属性に対して望ましい分布特性を持つことである。例えば、ドラフト規則は、トレーニングデータセットが人口の多様性を反映するなど、特定の分布特性を持つことを示すためにモデルトレーナーが必要であることを示している。本研究では,証明者(例えばモデルトレーナー)が,学習データの適切な分布特性を検証者(例えば,顧客)に公開することなく示すことができる特性証明の概念を提案する。本稿では,プロパティ推論と暗号機構を組み合わせた効果的なハイブリッド特性証明を提案する。

The success of machine learning (ML) has been accompanied by increased concerns about its trustworthiness. Several jurisdictions are preparing ML regulatory frameworks. One such concern is ensuring that model training data has desirable distributional properties for certain sensitive attributes. For example, draft regulations indicate that model trainers are required to show that training datasets have specific distributional properties, such as reflecting diversity of the population. We propose the notion of property attestation allowing a prover (e.g., model trainer) to demonstrate relevant distributional properties of training data to a verifier (e.g., a customer) without revealing the data. We present an effective hybrid property attestation combining property inference with cryptographic mechanisms.

翻訳日:2023-12-25 18:19:25 公開日:2023-12-22

# テキスト認識のための自己蒸留正規化コネクショニスト時間的分類損失:単純かつ効果的なアプローチ

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach ( http://arxiv.org/abs/2308.08806v3 )

ライセンス: Link先を確認

Ziyin Zhang, Ning Lu, Minghui Liao, Yongshuai Huang, Cheng Li, Min Wang and Wei Peng

(参考訳) テキスト認識手法は急速に発展しつつある。強力なモジュール、言語モデル、un-および半教師なしの学習スキームなど、いくつかの高度なテクニックは、公開ベンチマークのパフォーマンスを継続的に押し上げる。しかし、損失関数の観点から、テキスト認識モデルをいかに最適化するかという問題は概ね見過ごされている。 CTCに基づく手法は、性能と推論速度のバランスが良く、精度の低下に苦慮しているため、実際に広く用いられている。 CTC損失は、個々の文字を学習することを無視しながら、シーケンスターゲット全体の最適化を強調するためである。本稿では,CTCモデルを用いた自己蒸留方式を提案する。フレームワイズ正規化項をctc損失に取り入れ、個々の監督を強調し、潜在アライメントの最大化後アライメントを活用し、ctcベースのモデル間の蒸留で生じる不整合問題を解決する。正規化ctc損失を蒸留接続主義時間的分類 (dctc) 損失と呼ぶ。 DCTCの損失はモジュールフリーで、余分なパラメータや推論遅延、追加のトレーニングデータやフェーズを必要としない。公開ベンチマークの大規模な実験は、DCTCがこれらの欠点を全くなく、テキスト認識モデルの精度を最大2.6%向上させることができることを示した。

Text recognition methods are gaining rapid development. Some advanced techniques, e.g., powerful modules, language models, and un- and semi-supervised learning schemes, consecutively push the performance on public benchmarks forward. However, the problem of how to better optimize a text recognition model from the perspective of loss functions is largely overlooked. CTC-based methods, widely used in practice due to their good balance between performance and inference speed, still grapple with accuracy degradation. This is because CTC loss emphasizes the optimization of the entire sequence target while neglecting to learn individual characters. We propose a self-distillation scheme for CTC-based model to address this issue. It incorporates a framewise regularization term in CTC loss to emphasize individual supervision, and leverages the maximizing-a-posteriori of latent alignment to solve the inconsistency problem that arises in distillation between CTC-based models. We refer to the regularized CTC loss as Distillation Connectionist Temporal Classification (DCTC) loss. DCTC loss is module-free, requiring no extra parameters, longer inference lag, or additional training data or phases. Extensive experiments on public benchmarks demonstrate that DCTC can boost text recognition model accuracy by up to 2.6%, without any of these drawbacks.

翻訳日:2023-12-25 18:19:14 公開日:2023-12-22

# Rydberg量子アニール上の局所光シフト符号化による最適化問題の解法

Solving optimization problems with local light shift encoding on Rydberg quantum annealers ( http://arxiv.org/abs/2308.07798v2 )

ライセンス: Link先を確認

Kapil Goswami, Rick Mukherjee, Herwig Ott, Peter Schmelcher

(参考訳) 最大カット(max-cut)や最大独立集合(mis)といった組合せ最適化問題をrydberg量子アニーラー上で解くための非単位ディスクフレームワークを提供する。我々の構成は、グラフ問題をイジングスピンモデルにマッピングするために、局所制御可能な光シフトを個々のキュービットに適用する多体相互作用Rydbergシステムからなる。光トワイザーが空間配置で提供する柔軟性を生かした数値シミュレーションでは、rydberg annealerを所望の多体基底状態へとグローバルに駆動しながら局所調整プロトコルを実装し、最適化問題への解決策でもある。最適制御法を用いて, システムの寿命内, 近似比が1に近い時間スケールのプロトタイプグラフに対して, これらの解を求める。非ブロッケードアプローチは、2次元のRydberg構成で実現でき、非重み付きグラフと重み付きグラフの両方に適用できる特定のトポロジーによるグラフ問題の符号化を容易にする。システムサイズ, グラフの硬度, 解に収束するのに要するイテレーション数の観点から, 提案手法の利点を浮き彫りにした, 高速な模擬焼鈍による比較解析が提供される。

We provide a non-unit disk framework to solve combinatorial optimization problems such as Maximum Cut (Max-Cut) and Maximum Independent Set (MIS) on a Rydberg quantum annealer. Our setup consists of a many-body interacting Rydberg system where locally controllable light shifts are applied to individual qubits in order to map the graph problem onto the Ising spin model. Exploiting the flexibility that optical tweezers offer in terms of spatial arrangement, our numerical simulations implement the local-detuning protocol while globally driving the Rydberg annealer to the desired many-body ground state, which is also the solution to the optimization problem. Using optimal control methods, these solutions are obtained for prototype graphs with varying sizes at time scales well within the system lifetime and with approximation ratios close to one. The non-blockade approach facilitates the encoding of graph problems with specific topologies that can be realized in two-dimensional Rydberg configurations and is applicable to both unweighted as well as weighted graphs. A comparative analysis with fast simulated annealing is provided which highlights the advantages of our scheme in terms of system size, hardness of the graph, and the number of iterations required to converge to the solution.

翻訳日:2023-12-25 18:18:51 公開日:2023-12-22

# ディープラーニングを用いたカスタム熱力学の構築

Constructing Custom Thermodynamics Using Deep Learning ( http://arxiv.org/abs/2308.04119v3 )

ライセンス: Link先を確認

Xiaoli Chen, Beatrice W. Soh, Zi-En Ooi, Eleonore Vissol-Gaudin, Haijun Yu, Kostya S. Novoselov, Kedar Hippalgaonkar, Qianxiao Li

(参考訳) ai(artificial intelligence)の最もエキサイティングな応用の1つは、以前に蓄積されたデータに基づく自動科学的発見であり、対称性や保存則など、既知の物理原理による制限と組み合わせられている。このような自動仮説作成と検証は、従来の物理的直観が失敗する複雑な現象の研究を支援する。本稿では,任意の確率的散逸系の巨視的力学記述を,その微視的軌跡の観察から直接学習するための一般化オンザガー原理に基づくプラットフォームを開発する。本手法は, 還元された熱力学的座標を同時に構築し, それらの座標のダイナミクスを解釈する。提案手法の有効性を理論的に検証し, 外部応用分野における長鎖の伸長を実験的に検証した。具体的には、3つの解釈可能な熱力学座標を学習し、安定状態と遷移状態の同定と伸縮速度の制御を含む、ポリマー伸長の動的景観を構築する。我々の一般的な方法論は、幅広い科学的・技術的応用に利用できる。

One of the most exciting applications of artificial intelligence (AI) is automated scientific discovery based on previously amassed data, coupled with restrictions provided by known physical principles, including symmetries and conservation laws. Such automated hypothesis creation and verification can assist scientists in studying complex phenomena, where traditional physical intuition may fail. Here we develop a platform based on a generalized Onsager principle to learn macroscopic dynamical descriptions of arbitrary stochastic dissipative systems directly from observations of their microscopic trajectories. Our method simultaneously constructs reduced thermodynamic coordinates and interprets the dynamics on these coordinates. We demonstrate its effectiveness by studying theoretically and validating experimentally the stretching of long polymer chains in an externally applied field. Specifically, we learn three interpretable thermodynamic coordinates and build a dynamical landscape of polymer stretching, including the identification of stable and transition states and the control of the stretching rate. Our general methodology can be used to address a wide range of scientific and technological applications.

翻訳日:2023-12-25 18:18:26 公開日:2023-12-22

# テキスト条件拡散モデルに基づくシーンテキスト画像の超解像

Scene Text Image Super-resolution based on Text-conditional Diffusion Models ( http://arxiv.org/abs/2311.09759v2 )

ライセンス: Link先を確認

Chihiro Noguchi, Shun Fukuda, Masao Yamanaka

(参考訳) シーンテキスト画像超解像(STISR)は,シーンテキスト認識のための前処理手法として最近大きな成功を収めている。 STISRは、現実世界の設定でぼやけた低解像度(LR)テキストイメージを、シーンテキスト認識に適した鮮明な高解像度(HR)テキストイメージに変換することを目的としている。本研究では,テキストから画像への印象的な合成能力で知られるdms(text-conditional diffusion model)をstisrタスクに活用する。実験の結果,テキスト条件DMは既存のSTISR法をはるかに上回ることがわかった。特にLRテキスト画像からのテキストが入力として与えられると、テキスト条件DMは高品質な高解像度テキスト画像を生成することができる。この機能を利用して、LR-HRペアテキスト画像データセットを合成する新しいフレームワークを提案する。このフレームワークは3つの特殊なテキスト条件DMで構成され、それぞれがテキスト画像合成、超解像、画像劣化に特化している。これらの3つのモジュールは、STISR法の訓練に適している異なるLRとHRのペア画像の合成に不可欠である。実験により,これらの合成画像対はテキストZoom評価におけるSTISR法の性能を大幅に向上させることを確認した。

Scene Text Image Super-resolution (STISR) has recently achieved great success as a preprocessing method for scene text recognition. STISR aims to transform blurred and noisy low-resolution (LR) text images in real-world settings into clear high-resolution (HR) text images suitable for scene text recognition. In this study, we leverage text-conditional diffusion models (DMs), known for their impressive text-to-image synthesis capabilities, for STISR tasks. Our experimental results revealed that text-conditional DMs notably surpass existing STISR methods. Especially when texts from LR text images are given as input, the text-conditional DMs are able to produce superior quality super-resolution text images. Utilizing this capability, we propose a novel framework for synthesizing LR-HR paired text image datasets. This framework consists of three specialized text-conditional DMs, each dedicated to text image synthesis, super-resolution, and image degradation. These three modules are vital for synthesizing distinct LR and HR paired images, which are more suitable for training STISR methods. Our experiments confirmed that these synthesized image pairs significantly enhance the performance of STISR methods in the TextZoom evaluation.

翻訳日:2023-12-25 18:12:41 公開日:2023-12-22

# 医用画像分類のためのAlexNetのレビュー

Review of AlexNet for Medical Image Classification ( http://arxiv.org/abs/2311.08655v2 )

ライセンス: Link先を確認

Wenhao Tang, Junding Sun, Shuihua Wang, Yudong Zhang

(参考訳) 近年, 深層学習の急速な発展が, 医用画像の分類分野に幅広い応用をもたらしている。オーバーフィッティングの緩和、一般化の改善、勾配の消失と爆発の回避など、常にパフォーマンスが向上しているニューラルネットワークモデルの変種には、いくつかの共通点がある。 AlexNetは最初にドロップアウト技術を使ってオーバーフィッティングを緩和し、ReLUアクティベーション機能を使って勾配の消滅を回避する。そこで我々は2012年のcnn開発に大きく貢献したalexnetに関する議論に焦点を当てた。ジャーナル論文やカンファレンス論文を含む40以上の論文をレビューした後、AlexNetの技術的な詳細、利点、応用分野について解説する。

In recent years, the rapid development of deep learning has led to a wide range of applications in the field of medical image classification. The variants of neural network models with ever-increasing performance share some commonalities: to try to mitigate overfitting, improve generalization, avoid gradient vanishing and exploding, etc. AlexNet first utilizes the dropout technique to mitigate overfitting and the ReLU activation function to avoid gradient vanishing. Therefore, we focus our discussion on AlexNet, which has contributed greatly to the development of CNNs in 2012. After reviewing over 40 papers, including journal papers and conference papers, we give a narrative on the technical details, advantages, and application areas of AlexNet.

翻訳日:2023-12-25 18:12:20 公開日:2023-12-22

# キーストローク検証チャレンジ(KVC: Biometric and Fairness Benchmark Evaluation)

Keystroke Verification Challenge (KVC): Biometric and Fairness Benchmark Evaluation ( http://arxiv.org/abs/2311.06000v3 )

ライセンス: Link先を確認

Giuseppe Stragapede, Ruben Vera-Rodriguez, Ruben Tolosana, Aythami Morales, Naser Damer, Julian Fierrez, Javier Ortega-Garcia

(参考訳) 生体認証のためのキーストロークダイナミクス(KD)の分析にはいくつかの利点がある:最も差別的な行動特性の一つであり、キーボードはユーザーがテキストデータを入力するための主要な手段であり、その獲得には追加のハードウェアが必要であり、その処理は比較的軽量であり、透過的に被験者を認識することができる。しかし、実験プロトコルとメトリクスの不均一性と、文献で採用されているデータベースのサイズが限られているため、異なるシステム間の直接比較が妨げられ、キーストロークバイオメトリックスの進歩の障害となっている。そこで本稿では,Aalto Keystroke Databases から抽出したデスクトップおよびモバイルキーボードを用いて取得した185,000件以上の可変転写テキストのツイート長シーケンスに基づいて,KD に基づく生体認証性能と公平性をベンチマークする実験フレームワークを提案する。このフレームワークは、Keystroke Verification Challenge (KVC)という形でCodaLab上で動作する。さらに,新しい公平度指標であるsweted impostor ratio (sir) を導入し,検証スコアにおけるデム間およびデム内群バイアスパターンを捉えた。提案手法は,2つの最先端キーストローク検証システム「typenet」と「typeformer」を用いて異なる入力特徴の比較を行い,時間領域に拡張された特徴を優先してテキスト内容(押したキーのascii符号)の分析を破棄することで,プライバシーを侵害しないシステムを実現する。我々の実験は、このアプローチが満足なパフォーマンスを維持することができることを示している。

Analyzing keystroke dynamics (KD) for biometric verification has several advantages: it is among the most discriminative behavioral traits; keyboards are among the most common human-computer interfaces, being the primary means for users to enter textual data; its acquisition does not require additional hardware, and its processing is relatively lightweight; and it allows for transparently recognizing subjects. However, the heterogeneity of experimental protocols and metrics, and the limited size of the databases adopted in the literature impede direct comparisons between different systems, thus representing an obstacle in the advancement of keystroke biometrics. To alleviate this aspect, we present a new experimental framework to benchmark KD-based biometric verification performance and fairness based on tweet-long sequences of variable transcript text from over 185,000 subjects, acquired through desktop and mobile keyboards, extracted from the Aalto Keystroke Databases. The framework runs on CodaLab in the form of the Keystroke Verification Challenge (KVC). Moreover, we also introduce a novel fairness metric, the Skewed Impostor Ratio (SIR), to capture inter- and intra-demographic group bias patterns in the verification scores. We demonstrate the usefulness of the proposed framework by employing two state-of-the-art keystroke verification systems, TypeNet and TypeFormer, to compare different sets of input features, achieving a less privacy-invasive system, by discarding the analysis of text content (ASCII codes of the keys pressed) in favor of extended features in the time domain. Our experiments show that this approach allows to maintain satisfactory performance.

翻訳日:2023-12-25 18:12:08 公開日:2023-12-22

# 静的リーク検出のためのLLMに基づくリソース指向意図推論

LLM-based Resource-Oriented Intention Inference for Static Resource Leak Detection ( http://arxiv.org/abs/2311.04448v2 )

ライセンス: Link先を確認

Chong Wang, Jianan Liu, Xin Peng, Yang Liu, Yiling Lou

(参考訳) リソースリークは、買収後にリリースされないリソースによって引き起こされ、しばしばパフォーマンス上の問題やシステムクラッシュにつながる。既存の静的検出技術は、事前定義されたリソース獲得/リリースapiの機械的マッチング、事前定義されたapiの完全性、到達可能性の検証の特定、分析の複雑さなど、その有効性への挑戦に依存する。これらの課題を克服するために,我々は,機械的なapiマッチングではなく,リソース管理知識とコードコンテキスト理解に基づいて,コード内のリソース指向の意図(獲得,リリース,到達可能性検証)を直接推論するために,大規模言語モデル(llm)を活用する新しいアプローチであるinferroiを提案する。 InferROI は LLM に与えられたコードスニペットから関連する意図を推論するように指示するプロンプトを使用し、それを形式表現に変換する。これらの推論された意図を集約することにより、InferROIは軽量な静的解析に基づくアルゴリズムを使用して、コードから抽出された制御-フローパスを分析し、リソースリークを検出する。 InferROIをJavaプログラム上で評価し、リソース指向の意図推論とリソースリーク検出の両面での有効性を検討する。実験の結果、InferROIは74.6%の精度で、DroidLeaksデータセットから172のコードスニペットを意図的に推論して81.8%のリコールを達成した。さらに、InferROIは、データセットにリストされているAndroidリソースのかなりの部分をカバーしている。 DroidLeaksデータセットの86のバグに適用すると、InferROIは8つのベースライン検出器と比較して高いバグ検出率(53.5%)と低い偽アラーム率(8.1%)を示す。さらに,実世界のオープンソースプロジェクトからの100メソッドのリソースリーク検出にinferroiを適用し,未知の12のリソースリークバグを特定し,そのうち7つを開発者が確認した。

Resource leaks, caused by resources not being released after acquisition, often lead to performance issues and system crashes. Existing static detection techniques rely on mechanical matching of predefined resource acquisition/release APIs, posing challenges to their effectiveness, including completeness of predefined APIs, identification of reachability validation, and analysis complexity. To overcome these challenges, we propose InferROI, a novel approach that leverages large language models (LLMs) to directly infer resource-oriented intentions (acquisition, release, and reachability validation) in code, based on resource management knowledge and code context understanding, rather than mechanical API matching. InferROI uses a prompt to instruct the LLM in inferring involved intentions from a given code snippet, which are then translated into formal expressions. By aggregating these inferred intentions, InferROI utilizes a lightweight static-analysis based algorithm to analyze control-flow paths extracted from the code, thereby detecting resource leaks. We evaluate InferROI on Java program and investigate its effectiveness in both resource-oriented intention inference and resource leak detection. Experimental results demonstrate that InferROI achieves a precision of 74.6% and a recall of 81.8% in intention inference on 172 code snippets from the DroidLeaks dataset. Additionally, InferROI covers a significant portion of concerned Android resources listed in the dataset. When applied to 86 bugs from the DroidLeaks dataset, InferROI exhibits a high bug detection rate (53.5%) and a low false alarm rate (8.1%) compared to eight baseline detectors. Moreover, we apply InferROI to resource leak detection in 100 methods from real-world open-source projects, where it identifies 12 unknown resource leak bugs, with 7 of them being confirmed by developers.

翻訳日:2023-12-25 18:11:39 公開日:2023-12-22

# PriPrune: Pruned Federated Learningにおけるプライバシの定量化と保存

PriPrune: Quantifying and Preserving Privacy in Pruned Federated Learning ( http://arxiv.org/abs/2310.19958v2 )

ライセンス: Link先を確認

Tianyue Chu, Mengwei Yang, Nikolaos Laoutaris, Athina Markopoulou

(参考訳) Federated Learning(FL)は、複数のクライアントデバイスとサーバが、ローカルなトレーニングデータを共有することなく、モデル更新のみを交換することで、グローバルモデルを協調的にトレーニングできるパラダイムである。これらのデバイスは通信や計算リソースの面で制約されることが多く、モデルプルーニング(モデルのサイズと複雑さを減らすために広く使用されるパラダイム)の恩恵を受けることができる。直観的には、ローカルモデルをより粗いものにすることで、pruningはflのコンテキストにおけるプライバシ攻撃に対する保護を提供するものと期待される。しかし、この保護は以前にも正式にも実験的にも特徴づけられておらず、最先端の攻撃に対して十分なものかどうかは不明である。本稿では,flにおけるモデルプルーニングのプライバシ保証に関する最初の調査を行う。我々は,pruned flモデルによって漏洩した情報量に関する情報理論上の上限を導出する。我々はこれらの理論的な知見を補完し、ベンチマークデータセットを用いて、最先端のプライバシー攻撃を含む包括的な実験により検証する。この評価は、プルーニングによって提供されるプライバシー保護に影響を与える可能性のある選択とパラメータに関する貴重な洞察を提供する。このアルゴリズムでは、パーソナライズされたクライアント毎の防御マスクを使用し、防御プルーニング率を適用して、プライバシとモデルパフォーマンスを共同で最適化する。 PriPruneは、クライアント上でプラインドされたFLスキームを変更せずに適用し、サーバによる逆攻撃から保護する、普遍的な方法である。私たちの経験的評価は、プライバシを考慮しない最先端のpruned flスキームと比較して、pripruneがプライバシ-精度のトレードオフを大幅に改善していることを示しています。

Federated learning (FL) is a paradigm that allows several client devices and a server to collaboratively train a global model, by exchanging only model updates, without the devices sharing their local training data. These devices are often constrained in terms of communication and computation resources, and can further benefit from model pruning -- a paradigm that is widely used to reduce the size and complexity of models. Intuitively, by making local models coarser, pruning is expected to also provide some protection against privacy attacks in the context of FL. However this protection has not been previously characterized, formally or experimentally, and it is unclear if it is sufficient against state-of-the-art attacks. In this paper, we perform the first investigation of privacy guarantees for model pruning in FL. We derive information-theoretic upper bounds on the amount of information leaked by pruned FL models. We complement and validate these theoretical findings, with comprehensive experiments that involve state-of-the-art privacy attacks, on several state-of-the-art FL pruning schemes, using benchmark datasets. This evaluation provides valuable insights into the choices and parameters that can affect the privacy protection provided by pruning. Based on these insights, we introduce PriPrune -- a privacy-aware algorithm for local model pruning, which uses a personalized per-client defense mask and adapts the defense pruning rate so as to jointly optimize privacy and model performance. PriPrune is universal in that can be applied after any pruned FL scheme on the client, without modification, and protects against any inversion attack by the server. Our empirical evaluation demonstrates that PriPrune significantly improves the privacy-accuracy tradeoff compared to state-of-the-art pruned FL schemes that do not take privacy into account.

翻訳日:2023-12-25 18:11:04 公開日:2023-12-22

# 全光相関ノイズチャネルとその量子コヒーレンス回復への応用

All-optical correlated noisy channel and its application in recovering quantum coherence ( http://arxiv.org/abs/2310.16342v2 )

ライセンス: Link先を確認

Dan Lei, Disheng Guo, Jun Xin, and Xiao-Ming Lu

(参考訳) 減衰と増幅は光通信の最も一般的なプロセスである。増幅は、光学場の複素振幅の減衰を補償するために用いられるが、減衰チャネルと増幅チャネルが独立であることから、失われたコヒーレンスを回復することができない。そこで本研究では,減衰チャネルと増幅チャネルが相関したノイズを発生させると,印加した光の量子コヒーレンスを回復できることを示す。本研究では, 4波混合過程に基づく全光相関雑音チャネルを提案し, 連続変数系における量子コヒーレンス回復の可能性を示す。我々はコヒーレント状態と2モード圧縮状態のコヒーレンス回復現象を定量的に検討した。さらに,回復チャネルに依存しない他の光子損失が回復コヒーレンス性能に及ぼす影響について解析した。従来提案した電気光学変換に基づく相関ノイズチャネルとは違って,本プロトコルの相関ノイズチャネルは全光学的であり,より大きな動作帯域を有する。

Attenuation and amplification are the most common processes for optical communications. Amplification can be used to compensate the attenuation of the complex amplitude of an optical field, but is unable to recover the coherence lost, provided that the attenuation channel and the amplification channel are independent. In this work, we show that the quantum coherence of an optical filed can be regained if the attenuation channel and the amplification channel share correlated noise. We propose an all-optical correlated noisy channel relying on four-wave mixing process and demonstrate its capability of recovering quantum coherence within continuous-variable systems. We quantitatively investigate the coherence recovery phenomena for coherent states and two-mode squeezed states. Moreover, we analyze the effect of other photon losses that are independent with the recovery channel on the performance of recovering coherence. Different from correlated noisy channels previously proposed based on electro-optic conversions, the correlated noisy channel in our protocol is all-optical and thus owns larger operational bandwidths.

翻訳日:2023-12-25 18:10:35 公開日:2023-12-22

# 絶対政策最適化

Absolute Policy Optimization ( http://arxiv.org/abs/2310.13230v3 )

ライセンス: Link先を確認

Weiye Zhao, Feihan Li, Yifan Sun, Rui Chen, Tianhao Wei, Changliu Liu

(参考訳) 近年,信頼領域の政治強化学習は,複雑な制御タスクやゲームシナリオに対処する上で,目覚ましい成果を上げている。しかし、このカテゴリの現代の最先端のアルゴリズムは、期待されるパフォーマンスの改善を強調し、最悪のパフォーマンス結果を制御する能力が欠如している。この制限に対処するため、我々は新しい目的関数を導入し、その最適化により、ほぼ全ての性能サンプル(絶対性能)の下限における単調な改善が保証される。この画期的な理論の進歩を考えると、我々はこの理論的に基礎付けられたアルゴリズムを一連の近似によって洗練し、絶対政策最適化 (apo) と呼ばれる実用的な解法を生み出した。本実験は,継続制御ベンチマークタスクに挑戦する手法の有効性を実証し,atariゲームのマスタリングへの適用性を拡張する。以上の結果から,APOは最先端のポリシー勾配アルゴリズムよりも大幅に優れており,期待される性能と最悪の性能の両方が大幅に向上することがわかった。

In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function; by optimizing which, it will lead to guaranteed monotonic improvement in the lower bound of near-total performance samples (absolute performance). Considering this groundbreaking theoretical advancement, we then refine this theoretically grounded algorithm through a series of approximations, resulting in a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in both expected performance and worst-case performance.

翻訳日:2023-12-25 18:10:19 公開日:2023-12-22

# 構造概念はトランスフォーマー言語モデルに普遍的か? 解釈可能な言語間一般化に向けて

Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization ( http://arxiv.org/abs/2310.12794v2 )

ライセンス: Link先を確認

Ningyu Xu, Qi Zhang, Jingting Ye, Menghan Zhang, Xuanjing Huang

(参考訳) 大規模言語モデル(llm)は、言語間の知識を暗黙的に伝達する、言語横断的一般化能力を示している。しかし、この転送はすべての言語、特に低リソース言語に対して等しく成功していないため、現在進行中の課題となっている。暗黙の言語間一般化の限界に達したのか、明示的な知識伝達が可能かどうかは不明だ。本稿では,言語間の概念対応を明確に整合させ,言語間の一般化を促進する可能性を検討する。言語構文的側面をテストベッドとして用いた43言語の解析により,エンコーダのみおよびデコーダのみのLLMに対して,言語内構造概念空間間で高い整合性を示す。次に,メタラーニングに基づく概念空間の整合学習手法を提案し,概念分類におけるゼロショットおよび少数ショットの一般化を促進するとともに,言語間相互学習現象に関する洞察を提供する。構文解析タスクの実験により,本手法は最先端の手法で競争的な結果を達成し,言語間の性能ギャップを狭め,特に資源の少ない者にとって有益であることが示された。

Large language models (LLMs) have exhibited considerable cross-lingual generalization abilities, whereby they implicitly transfer knowledge across languages. However, the transfer is not equally successful for all languages, especially for low-resource ones, which poses an ongoing challenge. It is unclear whether we have reached the limits of implicit cross-lingual generalization and if explicit knowledge transfer is viable. In this paper, we investigate the potential for explicitly aligning conceptual correspondence between languages to enhance cross-lingual generalization. Using the syntactic aspect of language as a testbed, our analyses of 43 languages reveal a high degree of alignability among the spaces of structural concepts within each language for both encoder-only and decoder-only LLMs. We then propose a meta-learning-based method to learn to align conceptual spaces of different languages, which facilitates zero-shot and few-shot generalization in concept classification and also offers insights into the cross-lingual in-context learning phenomenon. Experiments on syntactic analysis tasks show that our approach achieves competitive results with state-of-the-art methods and narrows the performance gap between languages, particularly benefiting those with limited resources.

翻訳日:2023-12-25 18:10:01 公開日:2023-12-22

# シリコンマイクロリング型貯水池計算における空洞非線形性と線形損失の影響

Effects of cavity nonlinearities and linear losses on silicon microring-based reservoir computing ( http://arxiv.org/abs/2310.09433v2 )

ライセンス: Link先を確認

Bernard J. Giron Castro, Christophe Peucheret, Darko Zibar, Francesco Da Ros

(参考訳) マイクロリング共振器(MRR)は、時間遅延フォトニック貯水池コンピューティングに有望な装置であるが、MRRにおける異なる物理効果が貯水池演算性能に与える影響は、まだ完全には理解されていない。時系列タスクnarma-10の予測誤差に対する線形損失と熱光学および自由キャリア効果緩和時間の影響を数値的に解析した。入力電力と光源とマイクロリング共鳴の周波数差で定義される3つの領域の存在を実証し、線形状態から非線形状態へのキャビティ遷移を明らかにする。これらの領域の1つは、比較的低い入力パワーとノード数の下での時系列予測において非常に低いエラーを提供する一方、他の領域は非線形性を欠いているか不安定になる。本研究は,mrrの設計と物理特性の最適化に関する知見を提供し,時間分解型貯留層計算の予測性能を向上させる。

Microring resonators (MRRs) are promising devices for time-delay photonic reservoir computing, but the impact of the different physical effects taking place in the MRRs on the reservoir computing performance is yet to be fully understood. We numerically analyze the impact of linear losses as well as thermo-optic and free-carrier effects relaxation times on the prediction error of the time-series task NARMA-10. We demonstrate the existence of three regions, defined by the input power and the frequency detuning between the optical source and the microring resonance, that reveal the cavity transition from linear to nonlinear regimes. One of these regions offers very low error in time-series prediction under relatively low input power and number of nodes while the other regions either lack nonlinearity or become unstable. This study provides insight into the design of the MRR and the optimization of its physical properties for improving the prediction performance of time-delay reservoir computing.

翻訳日:2023-12-25 18:09:38 公開日:2023-12-22

# 運動誘起スピン移動の最適化

Optimising motion-induced spin transfer ( http://arxiv.org/abs/2310.08200v2 )

ライセンス: Link先を確認

Daigo Oue, Matsuo Mamoru

(参考訳) 本稿では、2つの強磁性絶縁体間のスピン移動について検討する。強磁性絶縁体の間には狭い隙間があり、互いに弱い相互作用をしている。強磁性絶縁体のうちの1つは一定速度で動き、もう1つは静止している。せん断運動の存在下では、相互作用振幅はドップラー周波数で周期的に変調される。ユニタリ変換により、相互作用振幅の周期的変調を、スピン移動を駆動する有効なポテンシャルと考えることができる。スピン電流の量は、2つの強磁性媒体間のスペクトルオーバーラップとキャリア集団差によって制御される。 2つの強磁性体のスペクトルが適度に広がると、スペクトル領域の重なりが増加し、スピン電流が増大する。しかし、過度の拡大はスペクトルの重なりを損なうため、スピン電流は低下する。これは、スピン移動を最大化する最適条件が存在することを意味する。

In this paper, the spin transfer between two ferromagnetic insulators is studied. There is a narrow gap between the ferromagnetic insulators so that they are weakly interacting with each other. One of the ferromagnetic insulators is moving at a constant speed while the other is at rest; hence, the system is out of equilibrium. In the presence of the shearing motion, the interaction amplitude is periodically modulated at the Doppler frequency. A unitary transformation allows us to regard the periodic modulation of the interaction amplitude as an effective potential, which drives the spin transfer. The amount of the spin current is controlled by the spectral overlap and the carrier population difference between the two ferromagnetic media. If the spectra of the two ferromagnets are moderately broadened, the overlap in the spectral domain increases, enlarging the spin current. However, too much broadening spoils the spectral overlap and, hence, the spin current. This implies that there is an optimal condition for maximising the spin transfer.

翻訳日:2023-12-25 18:09:22 公開日:2023-12-22

# ベイズ的アプローチによる人選好言語モデルの調整

Aligning Language Models with Human Preferences via a Bayesian Approach ( http://arxiv.org/abs/2310.05782v2 )

ライセンス: Link先を確認

Jiashuo Wang, Haozhao Wang, Shichao Sun, Wenjie Li

(参考訳) 人間中心の自然言語生成(NLG)システムを推し進めるためには、NLGモデルと人間の嗜好の整合性を確保することが不可欠である。このアライメントのために、現在の一般的な方法は、人間からのフィードバックに基づいて訓練された報酬モデルで強化学習(RL)アプローチを利用する。しかし,人間の嗜好の主観的性質による内在的な不一致は,報酬モデルの訓練において大きな課題となり,nlgパフォーマンスの低下を招いた。この問題に対処するため、従来のアプローチは通常、複数の一貫性のない選好をマージしたものに集約するために、多数決または平均化に依存していた。理解と実行は容易であるが、このような手法は人間の不合理さを捉えることができず、個人の特別なサブセットのみを表現できるため、人間の嗜好の普遍性を定量的に開示する能力が欠如している。この課題に対処するために, ベイズ的枠組みを用いて, 選好モデルのトレーニングとして, 人選好間の不一致の分布を考慮し, d-PMと命名する手法を提案する。さらに,学習効率よりもRL戦略の非効率で複雑な訓練プロセスを考えると,NLGモデルをd-PMモデルから導出した選好スコアで学習するためのコントラスト学習戦略も提案する。感情的支援会話と整合性(Rule-of-Thumb)生成という2つの人間中心型NLGタスクに対する広範囲な実験により,本手法が従来のSOTAモデルを上回る結果が得られた。

In the quest to advance human-centric natural language generation (NLG) systems, ensuring alignment between NLG models and human preferences is crucial. For this alignment, current popular methods leverage a reinforcement learning (RL) approach with a reward model trained on feedback from humans. However, inherent disagreements due to the subjective nature of human preferences pose a significant challenge for training the reward model, resulting in a deterioration of the NLG performance. To tackle this issue, previous approaches typically rely on majority voting or averaging to consolidate multiple inconsistent preferences into a merged one. Although straightforward to understand and execute, such methods suffer from an inability to capture the nuanced degrees of disaggregation among humans and may only represent a specialized subset of individuals, thereby lacking the ability to quantitatively disclose the universality of human preferences. To address this challenge, this paper proposes a novel approach, which employs a Bayesian framework to account for the distribution of disagreements among human preferences as training a preference model, and names it as d-PM. Besides, considering the RL strategy's inefficient and complex training process over the training efficiency, we further propose utilizing the contrastive learning strategy to train the NLG model with the preference scores derived from the d-PM model. Extensive experiments on two human-centric NLG tasks, i.e., emotional support conversation and integrity "Rule-of-Thumb" generation, show that our method consistently exceeds previous SOTA models in both automatic and human evaluations.

翻訳日:2023-12-25 18:09:10 公開日:2023-12-22

# 計画トークンを用いた言語モデル推論の指導

Guiding Language Model Reasoning with Planning Tokens ( http://arxiv.org/abs/2310.05707v2 )

ライセンス: Link先を確認

Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, Alessandro Sordoni

(参考訳) 大規模言語モデル(LLM)は、最近、連鎖推論のような複雑な推論タスクを実行する能力に対して、かなりの関心を集めている。しかしながら、この能力を強化する既存のアプローチのほとんどは、モデルの推論能力の構造的な側面を無視しながら、データ駆動型メソッドに大きく依存しています。 LLMは個々の推論ステップをうまく管理できますが、すべての推論チェーンの一貫性を維持するのに苦労しています。これを解決するために,各推論ステップの始めに「計画トークン」を導入し,モデルのガイドとして機能する。これらのトークン埋め込みは、残りのモデルパラメータとともに微調整される。我々のアプローチでは、トレーニング可能なパラメータ(わずか0.001%)の無視可能な増加が必要であり、完全な微調整またはよりパラメータ効率の良いスキームによって適用できる。提案手法の有効性を3つの異なるLLMに適用し,3つの算術語問題データセットにおいて顕著な精度向上を示す。

Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning steps well, they struggle with maintaining consistency across an entire reasoning chain. To solve this, we introduce 'planning tokens' at the start of each reasoning step, serving as a guide for the model. These token embeddings are then fine-tuned along with the rest of the model parameters. Our approach requires a negligible increase in trainable parameters (just 0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets w.r.t. plain chain-of-thought fine-tuning baselines.

翻訳日:2023-12-25 18:08:42 公開日:2023-12-22

# 塩分誘導特徴の相関による一般化エージェントの学習

Learning Generalizable Agents via Saliency-Guided Features Decorrelation ( http://arxiv.org/abs/2310.05086v2 )

ライセンス: Link先を確認

Sili Huang, Yanchao Sun, Jifeng Hu, Siyuan Guo, Hechang Chen, Yi Chang, Lichao Sun, Bo Yang

(参考訳) 視覚に基づく強化学習(Reinforcement Learning, RL)では、エージェントは訓練中に観察されなかった状態空間の環境変動によく適応するのに苦労する。この変化は、背景雑音などのタスク非関連特徴と、最適決定に関連するロボット構成のようなタスク関連特徴の両方に生じる可能性がある。両状況の一般化を実現するために,エージェントは変化した特徴が決定に与える影響,すなわち変化した特徴と政策モデルにおける決定との真の関連性を確立することを正確に理解する必要がある。しかし、国家空間の特徴間の固有の相関関係のため、特徴と決定の関連が絡み合っており、政策がそれらの区別を困難にしている。そこで本研究では,これらの相関を除去すべく,sgfd(saliency-guided features decorrelation)を提案する。具体的には、SGFDはランダムフーリエ関数(RFF)とサリエンシマップの2つのコア技術から構成される。 RFFは高次元画像における複雑な非線形相関を推定するために利用され、サリエンシマップは変化した特徴を識別するために設計されている。サリエンシマップの指導のもと、SGFDはサンプル再重み付けを用いて、変化した特徴に関する推定相関を最小化し、視覚的RLタスクにおけるデコリレーションを実現する。実験の結果,sgfdは幅広いテスト環境において十分に一般化でき,タスクの無関係なバリエーションとタスク関連のバリエーションの両方を扱う場合,最先端の手法を著しく上回ることがわかった。

In visual-based Reinforcement Learning (RL), agents often struggle to generalize well to environmental variations in the state space that were not observed during training. The variations can arise in both task-irrelevant features, such as background noise, and task-relevant features, such as robot configurations, that are related to the optimal decisions. To achieve generalization in both situations, agents are required to accurately understand the impact of changed features on the decisions, i.e., establishing the true associations between changed features and decisions in the policy model. However, due to the inherent correlations among features in the state space, the associations between features and decisions become entangled, making it difficult for the policy to distinguish them. To this end, we propose Saliency-Guided Features Decorrelation (SGFD) to eliminate these correlations through sample reweighting. Concretely, SGFD consists of two core techniques: Random Fourier Functions (RFF) and the saliency map. RFF is utilized to estimate the complex non-linear correlations in high-dimensional images, while the saliency map is designed to identify the changed features. Under the guidance of the saliency map, SGFD employs sample reweighting to minimize the estimated correlations related to changed features, thereby achieving decorrelation in visual RL tasks. Our experimental results demonstrate that SGFD can generalize well on a wide range of test environments and significantly outperforms state-of-the-art methods in handling both task-irrelevant variations and task-relevant variations.

翻訳日:2023-12-25 18:08:24 公開日:2023-12-22

# フレキシブル、スケーラブル、マシンラーニング対応のマルチモーダルoncologyデータセットの構築

Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets ( http://arxiv.org/abs/2310.01438v2 )

ライセンス: Link先を確認

Aakash Tripathi, Asim Waqas, Kavya Venkatesan, Yasin Yilmaz, Ghulam Rasool

(参考訳) データ取得、ストレージ、処理技術の進歩は、異種医療データの急速な成長をもたらした。放射線スキャン,病理像,分子情報を臨床データと統合することは,疾患の総合的理解と治療の最適化に不可欠である。複数のソースからのデータを統合する必要性はさらに、精密医療やパーソナライズされた治療を可能にするために、がんなどの複雑な疾患で顕著である。本研究は,がん研究データコモンズ (CRDC) などの公開ソースからの異種データを相互接続型で患者中心のフレームワークに効率的に融合するための,柔軟でスケーラブルで費用対効果の高いメタデータフレームワークであるマルチモーダル・インテグレーション・オブ・オンコロジー・データ・システム (MINDS) を提案する。 MINDSはデータ型間の関係を探索し、大規模マルチモーダル機械学習モデルを開発するためのコホートを構築するためのインターフェースを提供する。 MINDSはマルチモーダルデータを調和させることで、研究者に診断と予後の洞察を明らかにし、エビデンスベースのパーソナライズされたケアを可能にする分析能力を高めることを目指している。 MINDSは詳細なエンドツーエンドのデータプロファイランスを追跡し、再現性と透明性を確保する。 MINDSのクラウドネイティブアーキテクチャは、大幅なストレージ最適化、レプリケーション回避、動的アクセス機能を確保しながら、安全でコスト最適化された方法で指数関数的なデータ成長を処理することができる。自動スケーリング、アクセス制御、その他のメカニズムは、パイプラインのスケーラビリティとセキュリティを保証する。 MINDSは、オンコロジーデータ統合の将来に向けた重要なステップである相互運用可能なメタデータ駆動アプローチを通じて、既存のバイオメディカルデータサイロの限界を克服する。

The advancements in data acquisition, storage, and processing techniques have resulted in the rapid growth of heterogeneous medical data. Integrating radiological scans, histopathology images, and molecular information with clinical data is essential for developing a holistic understanding of the disease and optimizing treatment. The need for integrating data from multiple sources is further pronounced in complex diseases such as cancer for enabling precision medicine and personalized treatments. This work proposes Multimodal Integration of Oncology Data System (MINDS) - a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources such as the Cancer Research Data Commons (CRDC) into an interconnected, patient-centric framework. MINDS offers an interface for exploring relationships across data types and building cohorts for developing large-scale multimodal machine learning models. By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability to uncover diagnostic and prognostic insights and enable evidence-based personalized care. MINDS tracks granular end-to-end data provenance, ensuring reproducibility and transparency. The cloud-native architecture of MINDS can handle exponential data growth in a secure, cost-optimized manner while ensuring substantial storage optimization, replication avoidance, and dynamic access capabilities. Auto-scaling, access controls, and other mechanisms guarantee pipelines' scalability and security. MINDS overcomes the limitations of existing biomedical data silos via an interoperable metadata-driven approach that represents a pivotal step toward the future of oncology data integration.

翻訳日:2023-12-25 18:07:57 公開日:2023-12-22

# 密度汎関数理論の凸条件

The Convexity Condition of Density-Functional Theory ( http://arxiv.org/abs/2309.17443v2 )

ライセンス: Link先を確認

Andrew C. Burgess, Edward Linscott, and David D. O'Regan

(参考訳) 密度汎関数理論(DFT)では、有限電子系の全エネルギーが電子数に対して凸であることから、2 E_v[N_0] <= E_v[N_0 - 1] + E_v[N_0 + 1] が成り立つ。無限分離リミット法を用いて、(1)すべてのv表現可能密度、(2)サイズ整合、(3)翻訳不変量に対して完全であるdftの定式化に対する凸条件を証明します。類似の結果は、一体還元密度行列汎関数理論でも証明されている。基底状態が常にアクセス可能であるとは限らない既知の DFT の定式化があり、そのような場合には凸性は保たないことを示しているが、それでもこの証明は正確な交換相関関数の厳密な制約を確認する。また,密度汎関数近似の開発に役立つ近似DFTの凸性について十分な条件を提供する。この結果は、Khn-ShamバンドギャップとDFTの交換相関微分不連続性を理解する中心となる電子数に関する分数線形性条件の証明において立証された仮定を持ち上げる。

It has long been postulated that within density-functional theory (DFT) the total energy of a finite electronic system is convex with respect to electron count, so that 2 E_v[N_0] <= E_v[N_0 - 1] + E_v[N_0 + 1]. Using the infinite-separation-limit technique, this article proves the convexity condition for any formulation of DFT that is (1) exact for all v-representable densities, (2) size-consistent, and (3) translationally invariant. An analogous result is also proven for one-body reduced density matrix functional theory. While there are known DFT formulations in which the ground state is not always accessible, indicating that convexity does not hold in such cases, this proof nonetheless confirms a stringent constraint on the exact exchange-correlation functional. We also provide sufficient conditions for convexity in approximate DFT, which could aid in the development of density-functional approximations. This result lifts a standing assumption in the proof of the piecewise linearity condition with respect to electron count, which has proven central to understanding the Kohn-Sham band-gap and the exchange-correlation derivative discontinuity of DFT.

翻訳日:2023-12-25 18:07:32 公開日:2023-12-22

# 多変量地球系データキューブとしてのシーズファイア

SeasFire as a Multivariate Earth System Datacube for Wildfire Dynamics ( http://arxiv.org/abs/2312.07199v2 )

ライセンス: Link先を確認

Ilektra Karasante, Lazaro Alonso, Ioannis Prapas, Akanksha Ahuja, Nuno Carvalhais and Ioannis Papoutsis

(参考訳) 森林火災の世界的な発生、規模、頻度は、生態系サービスや人間の生活に大きな脅威をもたらす。森林火災の前兆条件を効果的に定量化し、属性付けするため、地球系力学の徹底的な理解が不可欠である。そこで,本研究では,地球観測による季節的野火モデルに準じた時空間データセットであるseasfire datacubeについて紹介する。海火データキューブは、気候、植生、海洋指数、人的要因を含む59の変数で構成され、8日間の時間分解能を持ち、空間分解能は0.25$^{\circ}$であり、2001年から2021年までの期間にわたる。深層学習モデルを用いて,山火事運転者の多様性と季節性を探究し,海と気候の相互接続と山火事の因果関係をモデル化し,複数の時間スケールにわたるサブシーズンの山火事パターンを予測した。私たちは、SeasFireデータキューブを公開し、地球システム科学者や機械学習の実践者に、山火事の理解と予測の改善に利用するようアピールします。

The global occurrence, scale, and frequency of wildfires pose significant threats to ecosystem services and human livelihoods. To effectively quantify and attribute the antecedent conditions for wildfires, a thorough understanding of Earth system dynamics is imperative. In response, we introduce the SeasFire datacube, a meticulously curated spatiotemporal dataset tailored for global sub-seasonal to seasonal wildfire modeling via Earth observation. The SeasFire datacube comprises of 59 variables encompassing climate, vegetation, oceanic indices, and human factors, has an 8-day temporal resolution and a spatial resolution of 0.25$^{\circ}$, and spans from 2001 to 2021. We showcase the versatility of SeasFire for exploring the variability and seasonality of wildfire drivers, modeling causal links between ocean-climate teleconnections and wildfires, and predicting sub-seasonal wildfire patterns across multiple timescales with a Deep Learning model. We publicly release the SeasFire datacube and appeal to Earth system scientists and Machine Learning practitioners to use it for an improved understanding and anticipation of wildfires.

翻訳日:2023-12-25 18:00:37 公開日:2023-12-22

# 人間のデータを超えた: 言語モデルによる問題解決のための自己学習のスケーリング

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models ( http://arxiv.org/abs/2312.06585v3 )

ライセンス: Link先を確認

Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura Culp, Lechao Xiao, Maxwell L. Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yundi Qian, Yamini Bansal, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel

(参考訳) 人間の生成したデータに対する微調整言語モデル~(lms)が普及している。しかし、これらのモデルの性能はしばしば高品質な人間のデータの量と多様性によって制限される。本稿では,スカラーフィードバックにアクセスできるタスク,例えば正当性を検証できる数学問題において,人間のデータを超えることができるかどうかを考察する。そこで我々は,(1)モデルからサンプルを生成し,二元フィードバックを用いてフィルタリングし,(2)これらのサンプル上でモデルを微調整し,(3)このプロセスを数回繰り返す。 PaLM-2モデルを用いた高度なMATH推論とAPPS符号化ベンチマークを用いて、ReST$^{EM}$はモデルサイズに好適にスケールし、人間のデータのみによる微調整を大幅に上回る。総じて,フィードバックによる自己学習は,人間生成データへの依存を大幅に低減できることが示唆された。

Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST$^{EM}$ scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can substantially reduce dependence on human-generated data.

翻訳日:2023-12-25 17:59:38 公開日:2023-12-22

# DG-TTA:ドメイン一般化とテスト時間適応による領域外医療画像分割

DG-TTA: Out-of-domain medical image segmentation through Domain Generalization and Test-Time Adaptation ( http://arxiv.org/abs/2312.06275v2 )

ライセンス: Link先を確認

Christian Weihsbach, Christian N. Kruse, Alexander Bigalke, Mattias P. Heinrich

(参考訳) ドメイン外の画像に事前訓練された医療セグメンテーションモデルを適用すると、品質の不足を予測できる。微調整や教師なしおよびソースフリーなドメイン適応など、モデルパフォーマンスを維持するためのいくつかの戦略が提案されている。これらの戦略はデータ可用性に対する制限的な要件を設定した。本研究では,未熟な対象領域における事前学習モデルの再使用に対して,ドメインの一般化とテスト時間適応を組み合わせることを提案する。ソースデータに対するドメイン一般化事前トレーニングは、ターゲット領域で最高の初期性能を得るために使用される。本稿では,これまで画像登録タスクで用いられてきたマインドディスクリプタを,従来の手法と比較して,汎用化を実現し,小規模データセットの優れた性能を示す技術として紹介する。テスト時には、画像の増大に応じてモデルの重み付けを最適化することで、1回のスキャン毎に高品質なセグメンテーションが保証される。これにより、ソースとターゲットデータの分離使用が可能となり、現在のデータ可用性の障壁が排除される。さらに、提案手法は、特定のモデルアーキテクチャや関連するドメインやラベルの事前知識を必要としないため、高度にモジュール化されている。我々は、現在医療画像セグメンテーションの最もポピュラーで正確なフレームワークであるnnUNetに統合することでこれを実証する。本研究では,腹部,心臓,腰椎を対象とする複数のデータセットを用い,いくつかの領域外シナリオを構築した。本手法は, 事前訓練した全身CTモデルと組み合わせることで, 上記すべてのシナリオにおいて, MR画像を高精度に分割できることを実証する。オープンソースコードは以下のとおりである。 https://github.com/multimodallearning/dg-tta

Applying pre-trained medical segmentation models on out-of-domain images often yields predictions of insufficient quality. Several strategies have been proposed to maintain model performance, such as finetuning or unsupervised- and source-free domain adaptation. These strategies set restrictive requirements for data availability. In this study, we propose to combine domain generalization and test-time adaptation to create a highly effective approach for reusing pre-trained models in unseen target domains. Domain-generalized pre-training on source data is used to obtain the best initial performance in the target domain. We introduce the MIND descriptor previously used in image registration tasks as a further technique to achieve generalization and present superior performance for small-scale datasets compared to existing approaches. At test-time, high-quality segmentation for every single unseen scan is ensured by optimizing the model weights for consistency given different image augmentations. That way, our method enables separate use of source and target data and thus removes current data availability barriers. Moreover, the presented method is highly modular as it does not require specific model architectures or prior knowledge of involved domains and labels. We demonstrate this by integrating it into the nnUNet, which is currently the most popular and accurate framework for medical image segmentation. We employ multiple datasets covering abdominal, cardiac, and lumbar spine scans and compose several out-of-domain scenarios in this study. We demonstrate that our method, combined with pre-trained whole-body CT models, can effectively segment MR images with high accuracy in all of the aforementioned scenarios. Open-source code can be found here: https://github.com/multimodallearning/DG-TTA

翻訳日:2023-12-25 17:59:21 公開日:2023-12-22

# particle swarm optimization-back propagation neural network と multivariate gaussian-hidden markov model に基づくストックピッキングとタイミングの定量的融合戦略

A quantitative fusion strategy of stock picking and timing based on Particle Swarm Optimized-Back Propagation Neural Network and Multivariate Gaussian-Hidden Markov Model ( http://arxiv.org/abs/2312.05756v3 )

ライセンス: Link先を確認

Huajian Li, Longjian Li, Jiajian Liang, Weinan Dai

(参考訳) 近年、機械学習(ml)は経済的意思決定、投資予測、リスク管理などに効果的なアプローチと新しい技術をもたらし、経済・金融環境の可変かつ複雑な性質に対処している。本研究は,多変量ガウス・ハイデンマルコフモデル (MGHMM) とParticle Swarm (PSO-BPNN) に最適化されたバックプロパゲーションニューラルネットワークを活用することで,株価タイミングとピッキング戦略を組み合わせた定量的融合モデルを提案する。利得化、中和、標準化、CSI300指数の戻りを含む52の因子間の情報係数(IC)が算出された後、主成分分析(PCA)による次元減少後のPSO-BPNNの入力に向かう候補因子として、上位にランクインする要因の所定の量を選択し、次いで一定量の成分在庫を出力する。その後,過去4年間の卓越したパフォーマンスを示すBox-Cox変換後のCSI300インデックスデータを入力して訓練したMGHMMが出力するスクリーニング株と株式市場の状態に基づいて,予測と取引を行う。最終的に、従来の予測と取引の方法は、中国株式市場の戦略と比較される。本論文で提示する株式の選定とタイミングを取り入れた融合戦略は、金融分析の革新的な技術である。

In recent years, machine learning (ML) has brought effective approaches and novel techniques to economic decision, investment forecasting, and risk management, etc., coping the variable and intricate nature of economic and financial environments. For the investment in stock market, this research introduces a pioneering quantitative fusion model combining stock timing and picking strategy by leveraging the Multivariate Gaussian-Hidden Markov Model (MGHMM) and Back Propagation Neural Network optimized by Particle Swarm (PSO-BPNN). After the information coefficients (IC) between fifty-two factors that have been winsorized, neutralized and standardized and the return of CSI 300 index are calculated, a given amount of factors that rank ahead are choose to be candidate factors heading for the input of PSO-BPNN after dimension reduction by Principal Component Analysis (PCA), followed by a certain amount of constituent stocks outputted. Subsequently, we conduct the prediction and trading on the basis of the screening stocks and stock market state outputted by MGHMM trained using inputting CSI 300 index data after Box-Cox transformation, bespeaking eximious performance during the period of past four years. Ultimately, some conventional forecast and trading methods are compared with our strategy in Chinese stock market. Our fusion strategy incorporating stock picking and timing presented in this article provide a innovative technique for financial analysis.

翻訳日:2023-12-25 17:58:53 公開日:2023-12-22

# 大規模言語モデルを用いた脆弱性検出にどこまで関わったか

How Far Have We Gone in Vulnerability Detection Using Large Language Models ( http://arxiv.org/abs/2311.12420v3 )

ライセンス: Link先を確認

Zeyu Gao, Hao Wang, Yuchen Zhou, Wenyu Zhu, Chao Zhang

(参考訳) ソフトウェアはますます複雑になり、脆弱性が生じる傾向にあるため、自動脆弱性検出は極めて重要でありながら困難である。様々なタスクにおける大規模言語モデル(llm)の著しい成功を考えると、脆弱性検出においてその効果が期待されている。しかし、脆弱性検出におけるその可能性の定量的理解はいまだに欠けている。このギャップを埋めるために,包括的脆弱性ベンチマークvulbenchを導入する。このベンチマークは、幅広いCTF(Capture-the-Flag)課題と実世界のアプリケーションからの高品質なデータを集約し、脆弱性タイプとその根本原因を詳述した各脆弱性関数に対するアノテーションを提供する。 16のLLMと6つの最先端(SOTA)ディープラーニングベースモデルと静的アナライザを含む実験により、複数のLLMが脆弱性検出において従来のディープラーニングアプローチよりも優れており、LLMの未解決の可能性を明らかにしていることがわかった。この作業は、ソフトウェアセキュリティ強化のためのllmの理解と利用に寄与する。

As software becomes increasingly complex and prone to vulnerabilities, automated vulnerability detection is critically important, yet challenging. Given the significant successes of large language models (LLMs) in various tasks, there is growing anticipation of their efficacy in vulnerability detection. However, a quantitative understanding of their potential in vulnerability detection is still missing. To bridge this gap, we introduce a comprehensive vulnerability benchmark VulBench. This benchmark aggregates high-quality data from a wide range of CTF (Capture-the-Flag) challenges and real-world applications, with annotations for each vulnerable function detailing the vulnerability type and its root cause. Through our experiments encompassing 16 LLMs and 6 state-of-the-art (SOTA) deep learning-based models and static analyzers, we find that several LLMs outperform traditional deep learning approaches in vulnerability detection, revealing an untapped potential in LLMs. This work contributes to the understanding and utilization of LLMs for enhanced software security.

翻訳日:2023-12-25 17:56:15 公開日:2023-12-22

# 直接クリフォード+T格子手術による実用量子回路の実用化

Realistic Cost to Execute Practical Quantum Circuits using Direct Clifford+T Lattice Surgery Compilation ( http://arxiv.org/abs/2311.10686v2 )

ライセンス: Link先を確認

Tyler LeBlond, Christopher Dean, George Watkins, and Ryan S. Bennink

(参考訳) 本稿では,clifford+tゲートで表現された量子回路を表面コード格子手術命令セットに明示的にコンパイルする資源推定パイプラインについて報告する。コンパイルされた回路からのマジック状態要求のケイデンスにより、ポストホック解析においてマジック状態の蒸留と貯蔵要求の最適化が可能となる。論理回路を格子状手術操作にコンパイルするために,オープンソースのLattice Surgery Compilerを構築した。修正されたコンパイラは、論理ゲートを抽象的なレイアウトに依存しない命令セットに変換し、第2は、特定のリソースレイアウトに従ってハードウェアタイルに割り当てられる局所格子手術命令にコンパイルする。第2段階では、フォールトトレラント層でのリソース競合を避けながら論理並列性を維持し、リアリズムを支援する。さらに、ユーザーはマジック状態が補充される専用のタイルを指定することができ、論理計算からのリソースコストはマジック状態の蒸留と貯蔵とは独立に考慮できる。分子の基底状態推定のための資源推定を提供することにより、パイプラインを大規模で実用的な量子回路に適用する可能性を示す。注意して考慮しなければ、マジック状態の消費率が異なる実回路において、マジック状態のストレージのリソースコストが支配的であることが分かる。

In this article, we report a resource estimation pipeline that explicitly compiles quantum circuits expressed using the Clifford+T gate set into a surface code lattice surgery instruction set. The cadence of magic state requests from the compiled circuit enables the optimization of magic state distillation and storage requirements in a post-hoc analysis. To compile logical circuits into lattice surgery operations, we build upon the open-source Lattice Surgery Compiler. The revised compiler operates in two stages: the first translates logical gates into an abstract, layout-independent instruction set; the second compiles these into local lattice surgery instructions that are allocated to hardware tiles according to a specified resource layout. The second stage retains logical parallelism while avoiding resource contention in the fault-tolerant layer, aiding realism. Additionally, users can specify dedicated tiles at which magic states are replenished, enabling resource costs from the logical computation to be considered independently from magic state distillation and storage. We demonstrate the applicability of our pipeline to large, practical quantum circuits by providing resource estimates for the ground state estimation of molecules. We find that, unless carefully considered, the resource costs of magic state storage can dominate in real circuits which have variable magic state consumption rates.

翻訳日:2023-12-25 17:55:58 公開日:2023-12-22

# 多エージェントpomdpにおけるファクタド・オンライン・プランニング

Factored Online Planning in Many-Agent POMDPs ( http://arxiv.org/abs/2312.11434v2 )

ライセンス: Link先を確認

Maris F.L. Galesloot, Thiago D. Sim\~ao, Sebastian Junges, Nils Jansen

(参考訳) 集中型マルチエージェントシステムでは、しばしばマルチエージェント部分観測可能なマルコフ決定プロセス (MPOMDPs) としてモデル化され、アクションと観測空間はエージェントの数とともに指数関数的に増加し、単一エージェントのオンライン計画の価値と信念を効果的に見積もる。事前作業は、いわゆるコーディネーショングラフを通じて、マルチエージェント設定の固有の構造を利用して、部分的に価値見積もりに取り組む。さらに、近似に観測の可能性が組み込まれ、信念の推定が向上した。しかし、価値推定と信念推定の課題は個別にのみ取り組まれており、既存の手法が多くのエージェントへのスケーリングを妨げている。したがって、これらの課題を同時に解決する。まず,MPOMDPのサンプルベースオンラインプランナに重み付き粒子フィルタリングを導入する。第二に、我々はその信念をスケーラブルに近似する。第3に, エージェントインタラクションの典型的な局所性を活用した手法を, スパース粒子フィルタツリー上で動作させるmpomdpsの新しいオンライン計画アルゴリズムに適用する。いくつかの最先端のベースラインに対する実験的な評価は、(1)手法が少数のエージェントと競合し、(2)多数のエージェントが存在する場合のベースラインよりも改善されていることを示している。

In centralized multi-agent systems, often modeled as multi-agent partially observable Markov decision processes (MPOMDPs), the action and observation spaces grow exponentially with the number of agents, making the value and belief estimation of single-agent online planning ineffective. Prior work partially tackles value estimation by exploiting the inherent structure of multi-agent settings via so-called coordination graphs. Additionally, belief estimation has been improved by incorporating the likelihood of observations into the approximation. However, the challenges of value estimation and belief estimation have only been tackled individually, which prevents existing methods from scaling to many agents. Therefore, we address these challenges simultaneously. First, we introduce weighted particle filtering to a sample-based online planner for MPOMDPs. Second, we present a scalable approximation of the belief. Third, we bring an approach that exploits the typical locality of agent interactions to novel online planning algorithms for MPOMDPs operating on a so-called sparse particle filter tree. Our experimental evaluation against several state-of-the-art baselines shows that our methods (1) are competitive in settings with only a few agents and (2) improve over the baselines in the presence of many agents.

翻訳日:2023-12-25 17:48:01 公開日:2023-12-22

# OsmLocator:非学習的生成的視点による重なり合う散乱点の探索

OsmLocator: locating overlapping scatter marks with a non-training generative perspective ( http://arxiv.org/abs/2312.11146v2 )

ライセンス: Link先を確認

Yuming Qiu, Aleksandra Pizurica, Qi Ming, Nicolas Nadisic

(参考訳) 散乱画像におけるマークの自動定位は、膨大な文書画像の発見と理解に大いに役立ち、視覚的質問応答aiシステムにおける推論は、重複するマークの普遍性のため、非常に難しい問題である。重複するマークの配置には、テクスチャの欠如、文脈の少ない情報、ハロー形状、小さなサイズなど、多くの困難がある。本稿では,非学習的な生成的視点からクラスタリングに基づく再可視化に関する組合せ最適化問題として,目的関数が最小値に達した場合のマルチ変数の状態を見つけ,散乱マークの同定を行う。目的関数は、2値化散乱画像とそれに対応するクラスタリングに基づいて生成された再視覚化の差に基づいて構成される。基本的に、再視覚化は、ラスタ化された散乱画像を入力としてのみ新しい散乱グラフを生成し、再視覚化のための情報を提供するためにクラスタリングを用いる。この方法は、トレーニングデータセットや参照に依存することなく、散乱画像に重なり合い、可変サイズ、可変形状のマークを安定的に配置することができる。一方,本研究では,様々な接続領域で動作するシミュレートアニーリングの適応型を提案する。さらに,sml2023というデータセットを特に構築し,異なるマーカーと重なり合う重大さのさまざまなレベルを持つ数百の散乱画像を用いて,提案手法をテストし,既存の手法と比較した。その結果,重複重畳度やマーカータイプが異なる散乱画像において,割当コストに基づく測定値に対して0.3 % の絶対値の増加を,最先端法と比較して精度良く検出できることがわかった。この研究は、巨大なウェブページや文献のデータマイニングに価値があり、バブル計数などの画像計測に新たな光を当てている。

Automated mark localization in scatter images, greatly helpful for discovering knowledge and understanding enormous document images and reasoning in visual question answering AI systems, is a highly challenging problem because of the ubiquity of overlapping marks. Locating overlapping marks faces many difficulties such as no texture, less contextual information, hallow shape and tiny size. Here, we formulate it as a combinatorial optimization problem on clustering-based re-visualization from a non-training generative perspective, to locate scatter marks by finding the status of multi-variables when an objective function reaches a minimum. The objective function is constructed on difference between binarized scatter images and corresponding generated re-visualization based on their clustering. Fundamentally, re-visualization tries to generate a new scatter graph only taking a rasterized scatter image as an input, and clustering is employed to provide the information for such re-visualization. This method could stably locate severely-overlapping, variable-size and variable-shape marks in scatter images without dependence of any training dataset or reference. Meanwhile, we propose an adaptive variant of simulated annealing which can works on various connected regions. In addition, we especially built a dataset named SML2023 containing hundreds of scatter images with different markers and various levels of overlapping severity, and tested the proposed method and compared it to existing methods. The results show that it can accurately locate most marks in scatter images with different overlapping severity and marker types, with about 0.3 absolute increase on an assignment-cost-based metric in comparison with state-of-the-art methods. This work is of value to data mining on massive web pages and literatures, and shedding new light on image measurement such as bubble counting.

翻訳日:2023-12-25 17:47:40 公開日:2023-12-22

# 変圧器の数学的展望

A mathematical perspective on Transformers ( http://arxiv.org/abs/2312.10794v2 )

ライセンス: Link先を確認

Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet

(参考訳) トランスフォーマーは、大きな言語モデルの内部動作において中心的な役割を果たす。本研究では,相互作用する粒子系として解釈したトランスフォーマーを解析するための数学的枠組みを構築した。我々の研究は基礎となる理論を探求し、数学者と計算機科学者に新しい視点を提供する。

Transformers play a central role in the inner workings of large language models. We develop a mathematical framework for analyzing Transformers based on their interpretation as interacting particle systems, which reveals that clusters emerge in long time. Our study explores the underlying theory and offers new perspectives for mathematicians as well as computer scientists.

翻訳日:2023-12-25 17:47:09 公開日:2023-12-22

# 長期の公正制約を考慮したオンラインレスマルチアーマーバンド

Online Restless Multi-Armed Bandits with Long-Term Fairness Constraints ( http://arxiv.org/abs/2312.10303v2 )

ライセンス: Link先を確認

Shufan Wang, Guojun Xiong, Jian Li

(参考訳) Restless Multi-armed bandits (RMAB) は、制約のある逐次決定問題をモデル化するために広く用いられている。意思決定者(dm)は、マルコフ決定過程(mdp)に従って各アームの状態が確率的に進化する任意の決定期において、最大bアームを活性化できる「即時活性化制約」の下で、無限の地平線上で期待される総報酬を最大化することを目指している。しかし、この基本モデルは武器間の公平性を保証することができない。本稿では, RMAB-Fモデルについて述べる。RMAB-Fは「長期公正性制約」を持つ新しいRMABモデルであり, 各アームに対する最小の長期活性化率を満たすことを目的としている。オンラインRMAB-F設定(つまり、各腕に付随するMDPがDMに未知である)に対して、Fair-UCRLという新しい強化学習アルゴリズムを開発する。 Fair-UCRLは、報酬の後悔と公正性違反の両面において、確率的サブリニア境界を保証することを証明している。既定のrl法と比較して、我々のフェアucrlは、意思決定に低複雑さのインデックスポリシーを利用する新しいエクスプロイトを含んでいるため、計算効率がはるかに高い。実験の結果,Fair-UCRLの有効性がさらに示された。

Restless multi-armed bandits (RMAB) have been widely used to model sequential decision making problems with constraints. The decision maker (DM) aims to maximize the expected total reward over an infinite horizon under an "instantaneous activation constraint" that at most B arms can be activated at any decision epoch, where the state of each arm evolves stochastically according to a Markov decision process (MDP). However, this basic model fails to provide any fairness guarantee among arms. In this paper, we introduce RMAB-F, a new RMAB model with "long-term fairness constraints", where the objective now is to maximize the long term reward while a minimum long-term activation fraction for each arm must be satisfied. For the online RMAB-F setting (i.e., the underlying MDPs associated with each arm are unknown to the DM), we develop a novel reinforcement learning (RL) algorithm named Fair-UCRL. We prove that Fair-UCRL ensures probabilistic sublinear bounds on both the reward regret and the fairness violation regret. Compared with off-the-shelf RL methods, our Fair-UCRL is much more computationally efficient since it contains a novel exploitation that leverages a low-complexity index policy for making decisions. Experimental results further demonstrate the effectiveness of our Fair-UCRL.

翻訳日:2023-12-25 17:46:44 公開日:2023-12-22

# 平衡内外相互作用鎖における2つの不連続区間の絡み合いエントロピーとスピン構造

Entanglement entropy of two disjoint intervals and spin structures in interacting chains in and out of equilibrium ( http://arxiv.org/abs/2312.10028v2 )

ライセンス: Link先を確認

Vanja Mari\'c, Saverio Bocini, Maurizio Fagotti

(参考訳) 我々は、ハイゼンベルクスピン-$\frac{1}{2}$ xxzモデルと相互作用するスピン鎖のパラダイムを基準系として、ヨルダン-ウィグナー変換と部分鎖への制限によってそれに関連する相互作用モデルを検討する。例えば、空隙のない XXZ ハミルトニアンのフェルミオン類似体は、連続的なスケーリング極限において、質量のないチューリングモデルによって記述される。基底状態における不連続ブロックの r\'enyi-$\alpha$ エントロピーを調べ、無限長の極限において r\'enyi-$\alpha$ 三成分情報を記述する普遍的スケーリング関数を抽出する。また、フォン・ノイマンのエントロピーを考えるが、大距離の限界のみを考える。スピンブロックのエントロピーを用いて、基礎となる無質量チューリングモデルのスピン構造を明らかにする方法を示す。最後に,大域的クエンチ後の三成分情報について推測し,無限時間と小クエンチの限界におけるその漸近的挙動を推測する。結果として得られる'residual tripartite information''の予想は、区間の長さが(大きな)距離よりも無限に大きい極限に対応するもので、最近、非相互作用スピン鎖の研究を行った普遍性(universality)の主張を支持する。我々の軽微な仮定は、XXZの隙間のない位相における異方性の小さなクエンチ後の残留三部体情報は、$-\log 2$と等しいことを示唆している。

We take the paradigm of interacting spin chains, the Heisenberg spin-$\frac{1}{2}$ XXZ model, as a reference system and consider interacting models that are related to it by Jordan-Wigner transformations and restrictions to sub-chains. An example is the fermionic analogue of the gapless XXZ Hamiltonian, which, in a continuum scaling limit, is described by the massless Thirring model. We work out the R\'enyi-$\alpha$ entropies of disjoint blocks in the ground state and extract the universal scaling functions describing the R\'enyi-$\alpha$ tripartite information in the limit of infinite lengths. We consider also the von Neumann entropy, but only in the limit of large distance. We show how to use the entropies of spin blocks to unveil the spin structures of the underlying massless Thirring model. Finally, we speculate about the tripartite information after global quenches and conjecture its asymptotic behaviour in the limit of infinite time and small quench. The resulting conjecture for the ``residual tripartite information'', which corresponds to the limit in which the intervals' lengths are infinitely larger than their (large) distance, supports the claim of universality recently made studying noninteracting spin chains. Our mild assumptions imply that the residual tripartite information after a small quench of the anisotropy in the gapless phase of XXZ is equal to $-\log 2$.

翻訳日:2023-12-25 17:46:18 公開日:2023-12-22

# Q-Segment: 血管型診断のためのイメージインセンサー

Q-Segment: Segmenting Images In-Sensor for Vessel-Based Medical Diagnosis ( http://arxiv.org/abs/2312.09854v2 )

ライセンス: Link先を確認

Pietro Bonazzi, Julian Moosmann, Yawei Li, Sizhen Bian, Michele Magno

(参考訳) 本稿では,ディープラーニングモデルを直接センサに展開することへの関心が高まっている。本稿では,量子化リアルタイムセグメンテーションアルゴリズム"q-segment"を提案し,センサ内プロセッサであるsony imx500を用いた低消費電力エッジビジョンプラットフォームについて包括的評価を行う。このモデルの主な目的の1つは、血管ベースの診断のためのエンドツーエンドのイメージセグメンテーションを実現することである。 IMX500プラットフォーム上に展開されたQ-Segmentは、センサー内での超低推論時間と72mWの消費電力を実現している。提案したネットワークと,フロートおよび量子化の両方の最先端モデルを比較し,提案手法が計算効率の面で,例えばERFNetの75倍の係数で,様々なプラットフォーム上の既存ネットワークより優れていることを示す。このネットワークは、接続をスキップするエンコーダ・デコーダ構造を採用しており、2進法の精度は97.25%、受信器動作特性曲線(AUC)は96.97%である。また、IMX500処理コアと、低消費電力のマルチコアARM Cortex-Mマイクロコントローラ、シングルコアARM Cortex-M4を比較し、エンドツーエンドの低レイテンシ(17ms)と電力消費(254mW)でセンサ内処理を実現できることを示す。この研究は、エッジベースのイメージセグメンテーションに関する貴重な洞察をもたらし、低消費電力環境に適した効率的なアルゴリズムの基礎を築いた。

This paper addresses the growing interest in deploying deep learning models directly in-sensor. We present "Q-Segment", a quantized real-time segmentation algorithm, and conduct a comprehensive evaluation on a low-power edge vision platform with an in-sensors processor, the Sony IMX500. One of the main goals of the model is to achieve end-to-end image segmentation for vessel-based medical diagnosis. Deployed on the IMX500 platform, Q-Segment achieves ultra-low inference time in-sensor only 0.23 ms and power consumption of only 72mW. We compare the proposed network with state-of-the-art models, both float and quantized, demonstrating that the proposed solution outperforms existing networks on various platforms in computing efficiency, e.g., by a factor of 75x compared to ERFNet. The network employs an encoder-decoder structure with skip connections, and results in a binary accuracy of 97.25% and an Area Under the Receiver Operating Characteristic Curve (AUC) of 96.97% on the CHASE dataset. We also present a comparison of the IMX500 processing core with the Sony Spresense, a low-power multi-core ARM Cortex-M microcontroller, and a single-core ARM Cortex-M4 showing that it can achieve in-sensor processing with end-to-end low latency (17 ms) and power concumption (254mW). This research contributes valuable insights into edge-based image segmentation, laying the foundation for efficient algorithms tailored to low-power environments.

翻訳日:2023-12-25 17:45:52 公開日:2023-12-22

# 冷却・コヒーレンス転移機構としてのフォノン光子変換

Phonon-photon conversion as mechanism for cooling and coherence transfer ( http://arxiv.org/abs/2312.09837v2 )

ライセンス: Link先を確認

Alessandro Ferreri, David Edward Bruschi, Frank K. Wilhelm, Franco Nori and Vincenzo Macr\`i

(参考訳) 力学カシミール効果(dynamical casimir effect)は、量子場を閉じ込めた空洞の可動壁の機械的エネルギーを場の量子量に変換することができる物理現象である。この効果は、量子場理論の最も驚くべき予測の1つとして認識されている。量子スケールでは、エネルギー変換は非一貫性、すなわち壁の物理的運動なしでも起こりうる。量子熱力学を用いて, 壁面とキャビティの温度勾配が非破壊的な場合, この現象を壁面を冷却する道具として用いることができることを示した。同時に、熱伝達の過程は、レーザーによって駆動される1つのキャビティモードから壁へのコヒーレンスを共有し、コヒーレント振動を強制することができる。最後に、他のサブシステムで構成される場合を含むシステム全体を冷却するために、1つのレーザードライブを使用する方法を示す。

The dynamical Casimir effect is the physical phenomenon where the mechanical energy of a movable wall of a cavity confining a quantum field can be converted into quanta of the field itself. This effect has been recognized as one of the most astonishing predictions of quantum field theory. At the quantum scale, the energy conversion can also occur incoherently, namely without an physical motion of the wall. We employ quantum thermodynamics to show that this phenomenon can be employed as a tool to cool down the wall when there is a non-vanishing temperature gradient between the wall and the cavity. At the same time, the process of heat-transfer enables to share the coherence from one cavity mode, driven by a laser, to the wall, thereby forcing its coherent oscillation. Finally, we show how to employ one laser drive to cool the entire system including the case when it is composed of other subsystems.

翻訳日:2023-12-25 17:45:21 公開日:2023-12-22

# 大規模量子ネットワークのための真空ビームガイド

Vacuum Beam Guide for Large-Scale Quantum Networks ( http://arxiv.org/abs/2312.09372v2 )

ライセンス: Link先を確認

Yuexun Huang, Francisco Salces--Carcoba, Rana X Adhikari, Amir H. Safavi-Naeini, Liang Jiang

(参考訳) 真空ビームガイド(vbg)は、長距離量子通信における既存のファイバーや衛星技術の限界を克服するための、量子チャネルの全く異なるソリューションを提供する。 VBGは、レンズの配列を1km間隔で配置することで、幅広い光波長に対して超高透過性を提供します。現実的なパラメータでは、VBGは減衰率の点で3桁の精度で最高の繊維を上回ります。その結果、vbgは、最先端の量子衛星通信レートよりも桁違いに高い10^{13}$ qubit/sec以上の量子チャネル容量を持つ数千km以上の長距離量子通信を可能にする。驚くべきことに、量子リピータを使わずに、vbgは地上ベース、低損失、高帯域幅の量子チャネルを提供し、コンピューティング、通信、センシングのための新しい分散量子情報アプリケーションを可能にする。

The vacuum beam guide (VBG) presents a completely different solution for quantum channels to overcome the limitations of existing fiber and satellite technologies for long-distance quantum communication. With an array of aligned lenses spaced kilometers apart, the VBG offers ultra-high transparency over a wide range of optical wavelengths. With realistic parameters, the VBG can outperform the best fiber by three orders of magnitude in terms of attenuation rate. Consequently, the VBG can enable long-range quantum communication over thousands of kilometers with quantum channel capacity beyond $10^{13}$ qubit/sec, orders of magnitude higher than the state-of-the-art quantum satellite communication rate. Remarkably, without relying on quantum repeaters, the VBG can provide a ground-based, low-loss, high-bandwidth quantum channel that enables novel distributed quantum information applications for computing, communication, and sensing.

翻訳日:2023-12-25 17:45:05 公開日:2023-12-22

# 夜間UAV追跡のための相互学習知識蒸留

Mutual-Learning Knowledge Distillation for Nighttime UAV Tracking ( http://arxiv.org/abs/2312.07884v2 )

ライセンス: Link先を確認

Yufeng Liu

(参考訳) 夜間無人航空機(UAV)の追跡は、必要不可欠なプラグアンドプレイの低照度エンハンサーによって促進されている。しかし、低照度エンハンサーの導入は、UAVの余分な計算負担を増大させ、リアルタイムUAVアプリケーションの開発を著しく妨げている。一方、これらの最先端のSOTA(State-of-the-art)エンハンサーは、高度な日中UAVトラッキングアプローチと密結合を欠いている。そこで本研究では,夜間UAV追跡のための新たな相互学習知識蒸留フレームワークであるMLKDを提案する。本フレームワークは,教師からの知識伝達と学生間の知識共有を通じて,コンパクトで迅速な夜間トラッカーを学習するために構築されている。具体的には,SOTAエンハンサーと優れたトラッキングバックボーンとに基づく上級教師を,タイトな結合認識トラッキングバックボーンのみに基づいて指導し,夜間のオブジェクト特徴を直接抽出する。一人の生徒のバイアス学習に対処するために,多様な蒸留方法を持つ多様な軽量の生徒が,教師の知識の様々な側面に焦点を合わせるように構築されている。さらに、先進的な相互学習室を設計し、上位の学生候補を選抜し、訓練段階において残りの学生をフレーム単位で支援する。さらに、テストデータセットから最後の最高の学生であるMLKD-Trackが選択される。 MLKDとMLKD-Trackの有効性と優位性を示す。 MLKD-Trackの実用性は、異なる課題のある実世界のテストで検証される。コードはhttps://github.com/lyfeng001/MLKDで公開されている。

Nighttime unmanned aerial vehicle (UAV) tracking has been facilitated with indispensable plug-and-play low-light enhancers. However, the introduction of low-light enhancers increases the extra computational burden for the UAV, significantly hindering the development of real-time UAV applications. Meanwhile, these state-of-the-art (SOTA) enhancers lack tight coupling with the advanced daytime UAV tracking approach. To solve the above issues, this work proposes a novel mutual-learning knowledge distillation framework for nighttime UAV tracking, i.e., MLKD. This framework is constructed to learn a compact and fast nighttime tracker via knowledge transferring from the teacher and knowledge sharing among various students. Specifically, an advanced teacher based on a SOTA enhancer and a superior tracking backbone is adopted for guiding the student based only on the tight coupling-aware tracking backbone to directly extract nighttime object features. To address the biased learning of a single student, diverse lightweight students with different distillation methods are constructed to focus on various aspects of the teacher's knowledge. Moreover, an innovative mutual-learning room is designed to elect the superior student candidate to assist the remaining students frame-by-frame in the training phase. Furthermore, the final best student, i.e., MLKD-Track, is selected through the testing dataset. Extensive experiments demonstrate the effectiveness and superiority of MLKD and MLKD-Track. The practicality of the MLKD-Track is verified in real-world tests with different challenging situations. The code is available at https://github.com/lyfeng001/MLKD.

翻訳日:2023-12-25 17:44:24 公開日:2023-12-22

# NeuSurf: スパースインプットビューからのニューラルサーフェスリコンストラクションのためのオンサーフェス

NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views ( http://arxiv.org/abs/2312.13977v2 )

ライセンス: Link先を確認

Han Huang, Yulun Wu, Junsheng Zhou, Ge Gao, Ming Gu, Yu-Shen Liu

(参考訳) 近年,多視点再構成の分野では,神経暗黙関数が顕著な成果を上げている。しかし、既存のほとんどの手法は密集したビュー用に調整されており、スパースビューを扱う際に不満足なパフォーマンスを示す。スパースビュー再構築タスクに対処するために暗黙的再構成を一般化するために、いくつかの最新の方法が提案されているが、それらは依然として高いトレーニングコストを被り、慎重に選択された観点でのみ有効である。本稿では,表面上の事前情報を利用して高度に忠実な表面再構成を実現する新しいスパースビュー再構築フレームワークを提案する。具体的には,大域的幾何アライメントと局所幾何洗練に関する制約を設計し,粗い形状と細部を協調的に最適化する。これを実現するために、ニューラルネットワークをトレーニングし、SfMから得られる地上点からグローバルな暗黙の場を学習し、粗い幾何学的制約として活用する。局所的な幾何的整合性を利用するために、我々は地上の点を見かけや見えない視点に投影し、投影された特徴の一貫した損失を微細な幾何学的制約として扱う。 dtu と blendedmvs データセットによる2つの分散設定の実験結果は、最先端の方法よりも大幅に改善されていることを示している。

Recently, neural implicit functions have demonstrated remarkable results in the field of multi-view reconstruction. However, most existing methods are tailored for dense views and exhibit unsatisfactory performance when dealing with sparse views. Several latest methods have been proposed for generalizing implicit reconstruction to address the sparse view reconstruction task, but they still suffer from high training costs and are merely valid under carefully selected perspectives. In this paper, we propose a novel sparse view reconstruction framework that leverages on-surface priors to achieve highly faithful surface reconstruction. Specifically, we design several constraints on global geometry alignment and local geometry refinement for jointly optimizing coarse shapes and fine details. To achieve this, we train a neural network to learn a global implicit field from the on-surface points obtained from SfM and then leverage it as a coarse geometric constraint. To exploit local geometric consistency, we project on-surface points onto seen and unseen views, treating the consistent loss of projected features as a fine geometric constraint. The experimental results with DTU and BlendedMVS datasets in two prevalent sparse settings demonstrate significant improvements over the state-of-the-art methods.

翻訳日:2023-12-25 17:37:13 公開日:2023-12-22

# 部分最適輸送について:シンクホーンの実用性の改善と効率的な勾配法

On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods ( http://arxiv.org/abs/2312.13970v2 )

ライセンス: Link先を確認

Anh Duc Nguyen, Tuan Dung Nguyen, Quang Minh Nguyen, Hoang H. Nguyen, Lam M. Nguyen, Kim-Chuan Toh

(参考訳) 本稿では、最大$n$の非バランスな2つの測度間の部分最適輸送(POT)問題と、色移動やドメイン適応といった様々なAIタスクへの応用について検討する。したがって、アプリケーションの原因となる問題のサイズがますます大きくなるPOTの高速な近似が必要である。我々はまず,ポットに対する最先端のシンクホーンアルゴリズムの非互換な丸め手順による実現不可能性を理論的に実験的に検討し,ポイントクラウド登録のような実世界のアプリケーションにおける質的性能を低下させる。そこで本研究では,POT の新たなラウンドリングアルゴリズムを提案し,計算複雑性を$\mathcal{\widetilde O}(n^2/\varepsilon^4)$に修正した,実行可能な Sinkhorn プロシージャを提案する。丸めアルゴリズムはポット問題を近似する2つの一階法の開発も可能にしている。最初のアルゴリズムであるadaptive primal-dual accelerated gradient descent (apdagd) は、修正されたシンクホーンよりも$\varepsilon$の方が良い$\mathcal{\widetilde o}(n^{2.5}/\varepsilon)$のポット問題に対する$\varepsilon$-approximate solutionを見つける。 2つ目の方法であるDual Extrapolationは、$\mathcal{\widetilde O}(n^2/\varepsilon)$の計算複雑性を実現する。さらに,ポットの柔軟性を標準otと比較し,二つの限界分布が不均衡な実アプリケーションにおけるアルゴリズムの実用性を示す。

This paper studies the Partial Optimal Transport (POT) problem between two unbalanced measures with at most $n$ supports and its applications in various AI tasks such as color transfer or domain adaptation. There is hence the need for fast approximations of POT with increasingly large problem sizes in arising applications. We first theoretically and experimentally investigate the infeasibility of the state-of-the-art Sinkhorn algorithm for POT due to its incompatible rounding procedure, which consequently degrades its qualitative performance in real world applications like point-cloud registration. To this end, we propose a novel rounding algorithm for POT, and then provide a feasible Sinkhorn procedure with a revised computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon^4)$. Our rounding algorithm also permits the development of two first-order methods to approximate the POT problem. The first algorithm, Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD), finds an $\varepsilon$-approximate solution to the POT problem in $\mathcal{\widetilde O}(n^{2.5}/\varepsilon)$, which is better in $\varepsilon$ than revised Sinkhorn. The second method, Dual Extrapolation, achieves the computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon)$, thereby being the best in the literature. We further demonstrate the flexibility of POT compared to standard OT as well as the practicality of our algorithms on real applications where two marginal distributions are unbalanced.

翻訳日:2023-12-25 17:36:52 公開日:2023-12-22

# paint3d: ライティングレステクスチャ拡散モデルによる3dペイント

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models ( http://arxiv.org/abs/2312.13913v2 )

ライセンス: Link先を確認

Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, Gang Yu

(参考訳) 本研究では,テキストや画像の入力に条件付された非テクスチャ3Dメッシュに対して,高分解能,光レス,多彩な2KUVテクスチャマップを作成可能な,粗大かつ微細な生成フレームワークであるPaint3Dを提案する。対処すべき重要な課題は、組み込み照明情報なしで高品質なテクスチャを生成することだ。そこで本手法では,まず,事前学習した深度認識2次元拡散モデルを用いて視条件画像を生成し,マルチビューテクスチャ融合を行い,初期粗いテクスチャマップを生成する。しかし, 2次元モデルでは3次元形状を完全に表現できず, 照明効果が損なわれるため, 粗いテクスチャマップは不完全領域と照明アーチファクトを呈する。これを解決するために,不完全領域の形状認識と照明器具の除去に特化したUV塗装とUVHD拡散モデルを個別に訓練する。この粗いプロセスを通じて、Paint3Dは3Dオブジェクトのテクスチャ化において、セマンティック一貫性を維持しながらセマンティック一貫性を維持する高品質な2KUVテクスチャを生成することができる。

This paper presents Paint3D, a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs. The key challenge addressed is generating high-quality textures without embedded illumination information, which allows the textures to be re-lighted or re-edited within modern graphics pipelines. To achieve this, our method first leverages a pre-trained depth-aware 2D diffusion model to generate view-conditional images and perform multi-view texture fusion, producing an initial coarse texture map. However, as 2D models cannot fully represent 3D shapes and disable lighting effects, the coarse texture map exhibits incomplete areas and illumination artifacts. To resolve this, we train separate UV Inpainting and UVHD diffusion models specialized for the shape-aware refinement of incomplete areas and the removal of illumination artifacts. Through this coarse-to-fine process, Paint3D can produce high-quality 2K UV textures that maintain semantic consistency while being lighting-less, significantly advancing the state-of-the-art in texturing 3D objects.

翻訳日:2023-12-25 17:36:20 公開日:2023-12-22

# AppAgent: スマートフォンユーザとしてのマルチモーダルエージェント

AppAgent: Multimodal Agents as Smartphone Users ( http://arxiv.org/abs/2312.13771v2 )

ライセンス: Link先を確認

Chi Zhang and Zhao Yang and Jiaxuan Liu and Yucheng Han and Xin Chen and Zebiao Huang and Bin Fu and Gang Yu

(参考訳) 大規模言語モデル(LLM)の最近の進歩は、複雑なタスクを実行できるインテリジェントエージェントの開発につながっている。本稿では,スマートフォンアプリケーションを操作するための新しいLLMベースのマルチモーダルエージェントフレームワークを提案する。本フレームワークは,タッピングやスワイプなどのヒューマンライクなインタラクションを模倣した,簡易なアクションスペースによるスマートフォンアプリケーションの操作を可能にする。この新しいアプローチは、システムバックエンドアクセスの必要性を回避し、様々なアプリに適用性を広げる。エージェントの機能の中心は、その革新的な学習方法です。エージェントは、自律的な探索または人間のデモを観察することで、ナビゲートと新しいアプリの使用を学習する。このプロセスは、エージェントが異なるアプリケーション間で複雑なタスクを実行するために参照する知識ベースを生成する。エージェントの実用性を実証するため,ソーシャルメディア,メール,地図,ショッピング,高度な画像編集ツールなど10種類のアプリケーションで50以上のタスクを広範囲にテストした。以上の結果から,エージェントの多種多様なハイレベルタスクの処理能力が確認できた。

Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps. Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications. To demonstrate the practicality of our agent, we conducted extensive testing over 50 tasks in 10 different applications, including social media, email, maps, shopping, and sophisticated image editing tools. The results affirm our agent's proficiency in handling a diverse array of high-level tasks.

翻訳日:2023-12-25 17:35:53 公開日:2023-12-22

# NeRFをベースとした色とオパクティを持つガウススメッティング

Gaussian Splatting with NeRF-based Color and Opacity ( http://arxiv.org/abs/2312.13729v2 )

ライセンス: Link先を確認

Dawid Malarz, Weronika Smolak, Jacek Tabor, S{\l}awomir Tadeja, Przemys{\l}aw Spurek

(参考訳) neural radiance fields (nerfs) は、3dオブジェクトの複雑さを捉えるためのニューラルネットワークの驚くべき可能性を実証している。ニューラルネットワークの重みの中に形状と色情報をエンコードすることで、NeRFは3Dオブジェクトの驚くほどシャープな新しいビューを生み出すのに優れています。近年, 生成モデルを用いたNeRFの一般化が数多く現れ, その汎用性が高まっている。対照的に、gaussian splatting (gs) はニューラルネットワークを必要とせず、より高速なトレーニングと推論で同様のレンダリング品質を提供する。ガウス分布の集合に3Dオブジェクトに関する情報をエンコードし、古典的メッシュと同様に3Dで描画できる。残念ながら、GSは通常数十万のガウス成分を必要とするため、条件付けが難しい。両モデルの欠点を軽減するために、3Dオブジェクトの形状のGS表現とNeRFによる色と不透明度の符号化を用いたハイブリッドモデルを提案する。我々のモデルは、ガウス分布とトレーニング可能な位置(すなわちガウスの手段)、形状(ガウスの共分散)、色と不透明度、ニューラルネットワークを用いており、ガウス分布と視方向のパラメータを使って色と不透明度の変化を生成する。その結果、3dオブジェクトのシャドウ、光反射、透明性をよりよく記述した。

Neural Radiance Fields (NeRFs) have demonstrated the remarkable potential of neural networks to capture the intricacies of 3D objects. By encoding the shape and color information within neural network weights, NeRFs excel at producing strikingly sharp novel views of 3D objects. Recently, numerous generalizations of NeRFs utilizing generative models have emerged, expanding its versatility. In contrast, Gaussian Splatting (GS) offers a similar renders quality with faster training and inference as it does not need neural networks to work. We encode information about the 3D objects in the set of Gaussian distributions that can be rendered in 3D similarly to classical meshes. Unfortunately, GS are difficult to condition since they usually require circa hundred thousand Gaussian components. To mitigate the caveats of both models, we propose a hybrid model that uses GS representation of the 3D object's shape and NeRF-based encoding of color and opacity. Our model uses Gaussian distributions with trainable positions (i.e. means of Gaussian), shape (i.e. covariance of Gaussian), color and opacity, and neural network, which takes parameters of Gaussian and viewing direction to produce changes in color and opacity. Consequently, our model better describes shadows, light reflections, and transparency of 3D objects.

翻訳日:2023-12-25 17:35:39 公開日:2023-12-22

# 運転シーンに対する弱監督型セマンティックセグメンテーション

Weakly Supervised Semantic Segmentation for Driving Scenes ( http://arxiv.org/abs/2312.13646v2 )

ライセンス: Link先を確認

Dongseob Kim, Seungho Lee, Junsuk Choe, Hyunjung Shim

(参考訳) 画像レベルラベルを用いたweakly supervised semantic segmentation(wsss)における最先端技術は、都市景観などの運転シーンデータセットにおいて深刻な性能低下を示す。この課題に対処するため、シーンデータセットの駆動に適した新しいWSSSフレームワークを開発しました。データセットの特徴を広範囲に分析し,提案するベースラインとしてコントラスト言語画像事前学習(CLIP)を用いて擬似マスクを得る。しかし、CLIPは、(1)CLIPの擬似マスクが小さなオブジェクトクラスを表現していないこと、(2)これらのマスクが顕著なノイズを含んでいること、の2つの主要な課題を紹介している。それぞれの問題に対する解決策を次のように提案する。 1)モデルトレーニング中に小規模パッチをシームレスに組み込んだグローバルローカルビュートレーニングを考案し,モデルが運転シーン(例えば交通信号)において小型で重要なオブジェクトを扱う能力を高める。 2)CLIPマスクとセグメンテーション予測の整合性を評価することによって,信頼性と雑音の領域を識別する新しい手法であるCARBを導入する。適応的な損失重み付けによってノイズの多いピクセルよりも信頼性の高いピクセルを優先する。特に,提案手法はCityscapesテストデータセット上で51.8\% mIoUを達成し,シーンデータセットを駆動するWSSSベースラインとしての可能性を示した。 camvidとwilddash2の実験結果は、小規模のデータセットや視覚的に困難な状況でも、さまざまなデータセットにまたがる手法の有効性を示しています。コードはhttps://github.com/k0u-id/CARBで公開されている。

State-of-the-art techniques in weakly-supervised semantic segmentation (WSSS) using image-level labels exhibit severe performance degradation on driving scene datasets such as Cityscapes. To address this challenge, we develop a new WSSS framework tailored to driving scene datasets. Based on extensive analysis of dataset characteristics, we employ Contrastive Language-Image Pre-training (CLIP) as our baseline to obtain pseudo-masks. However, CLIP introduces two key challenges: (1) pseudo-masks from CLIP lack in representing small object classes, and (2) these masks contain notable noise. We propose solutions for each issue as follows. (1) We devise Global-Local View Training that seamlessly incorporates small-scale patches during model training, thereby enhancing the model's capability to handle small-sized yet critical objects in driving scenes (e.g., traffic light). (2) We introduce Consistency-Aware Region Balancing (CARB), a novel technique that discerns reliable and noisy regions through evaluating the consistency between CLIP masks and segmentation predictions. It prioritizes reliable pixels over noisy pixels via adaptive loss weighting. Notably, the proposed method achieves 51.8\% mIoU on the Cityscapes test dataset, showcasing its potential as a strong WSSS baseline on driving scene datasets. Experimental results on CamVid and WildDash2 demonstrate the effectiveness of our method across diverse datasets, even with small-scale datasets or visually challenging conditions. The code is available at https://github.com/k0u-id/CARB.

翻訳日:2023-12-25 17:35:16 公開日:2023-12-22

# 光とマイクロ波フォトニック量子ビットの量子絡み合い

Quantum entanglement between optical and microwave photonic qubits ( http://arxiv.org/abs/2312.13559v2 )

ライセンス: Link先を確認

Srujan Meesala, David Lake, Steven Wood, Piero Chiappina, Changchun Zhong, Andrew D. Beyer, Matthew D. Shaw, Liang Jiang, and Oskar Painter

(参考訳) 絡み合いは量子力学の異常な特徴である。絡み合った光子源はベルの不等式を破って量子物理学の基礎をテストするのに不可欠であった。近年、マイクロ波回路と超伝導量子ビットの強い非線形相互作用により、絡み合った多体状態が実現されている。ここでは、光およびマイクロ波フォトニック量子ビットを絡み合うチップスケールの源を示す。我々のデバイスプラットフォームは、圧電オプトメカニカルトランスデューサと、光照射下で頑健な超伝導共振器を統合している。我々は光子対生成過程を駆動し、マイクロ波および光子の絡み合った状態を作成するために、本システムに固有のデュアルレール符号化を用いる。 2つの直交基底におけるマイクロ波および光光子を測定することにより、絡み合う状態の忠実度を低くする。この絡み合い源は、量子通信と計算のための確立された2つのプラットフォームである通信波長のタイムビン量子ビットとghz周波数超伝導量子ビットを直接接続することができる。

Entanglement is an extraordinary feature of quantum mechanics. Sources of entangled optical photons were essential to test the foundations of quantum physics through violations of Bell's inequalities. More recently, entangled many-body states have been realized via strong non-linear interactions in microwave circuits with superconducting qubits. Here we demonstrate a chip-scale source of entangled optical and microwave photonic qubits. Our device platform integrates a piezo-optomechanical transducer with a superconducting resonator which is robust under optical illumination. We drive a photon-pair generation process and employ a dual-rail encoding intrinsic to our system to prepare entangled states of microwave and optical photons. We place a lower bound on the fidelity of the entangled state by measuring microwave and optical photons in two orthogonal bases. This entanglement source can directly interface telecom wavelength time-bin qubits and GHz frequency superconducting qubits, two well-established platforms for quantum communication and computation, respectively.

翻訳日:2023-12-25 17:34:47 公開日:2023-12-22

# 対話型観光計画システムの開発 : 大規模言語モデルを用いた対話ロボットシステム

Developing Interactive Tourism Planning: A Dialogue Robot System Powered by a Large Language Model ( http://arxiv.org/abs/2312.13545v2 )

ライセンス: Link先を確認

Katsumasa Yoshikawa and Takato Yamazaki and Masaya Ohagi and Tomoya Mizumoto and Keiya Sato

(参考訳) 近年,大規模言語モデル (LLM) が急速に普及し,対話システムの研究など,様々なタスクに活用されている。我々は, LLMの柔軟な会話能力を活用するだけでなく, 人間の会話負荷を低減し, 旅行を効率的に計画できるシステムを構築することを目指していた。さらに,旅行代理店の複雑なタスクを複数のサブタスクに分割し,それぞれを個別のフェーズとして管理し,効果的にタスクを実現する手法を提案する。提案システムは,対話ロボットコンペティション2023のプリリミナリーラウンドにおいて,第4位に到達し,一定の成功を収めた。競技を通して特定した課題について報告する。

In recent years, large language models (LLMs) have rapidly proliferated and have been utilized in various tasks, including research in dialogue systems. We aimed to construct a system that not only leverages the flexible conversational abilities of LLMs but also their advanced planning capabilities to reduce the speaking load on human interlocutors and efficiently plan trips. Furthermore, we propose a method that divides the complex task of a travel agency into multiple subtasks, managing each as a separate phase to effectively accomplish the task. Our proposed system confirmed a certain level of success by achieving fourth place in the Dialogue Robot Competition 2023 preliminaries rounds. We report on the challenges identified through the competition.

翻訳日:2023-12-25 17:34:31 公開日:2023-12-22

# Zero-1-to-3:3つの診断対象に対する早期学生の1バッチによるドメインレベルのゼロショット認知診断

Zero-1-to-3: Domain-level Zero-shot Cognitive Diagnosis via One Batch of Early-bird Students towards Three Diagnostic Objectives ( http://arxiv.org/abs/2312.13434v2 )

ライセンス: Link先を確認

Weibo Gao, Qi Liu, Hao Wang, Linan Yue, Haoyang Bi, Yin Gu, Fangzhou Yao, Zheng Zhang, Xin Li, Yuanjing He

(参考訳) 認知診断は、記録された実践クイズデータを探索することで、学生の認知状態を推定しようとする。知的教育システムにおけるパーソナライズされた学習指導において重要な役割を果たす。本稿では,新たに立ち上げられたドメインに学生の実践ログがないために生じる,ドメインレベルのゼロショット認知診断(DZCD)という,重要かつ実用的だがしばしば未発見の課題に焦点を当てる。最近のクロスドメイン診断モデルはDZCDにとって有望な戦略であることが示されている。これらの手法は主に、ドメイン間で学生状態を転送する方法に焦点を当てている。しかし、生徒の表現に不注意な情報を組み込むことで、知識伝達の有効性を制限できる。そこで本研究では,早期学習者の3つの診断目的に向けて,ドメインレベルのゼロショット認知診断フレームワークZero-1-to-3を提案する。本手法は, 学生状態をドメイン共有部分とドメイン固有部分に分離する2つの正則化器を用いた診断モデルの事前学習から始める。共有された認知信号は対象領域に転送することができ、新しい領域の認知的事前を豊かにすることにより、認知状態の伝播目標が保証される。その後,早期学習者の行動パターンを解析し,ドメイン適応目標を達成し,冷間開始学生のための模擬実践ログを作成する戦略を考案した。その結果, コールドスタート学生の認知状態は, 仮想データによる診断結果として洗練され, 診断目標と一致した。最後に、実世界の6つのデータセットに対する広範な実験により、DZCDに対する我々のモデルの有効性と、その課題に対する実践的応用を強調した。

Cognitive diagnosis seeks to estimate the cognitive states of students by exploring their logged practice quiz data. It plays a pivotal role in personalized learning guidance within intelligent education systems. In this paper, we focus on an important, practical, yet often underexplored task: domain-level zero-shot cognitive diagnosis (DZCD), which arises due to the absence of student practice logs in newly launched domains. Recent cross-domain diagnostic models have been demonstrated to be a promising strategy for DZCD. These methods primarily focus on how to transfer student states across domains. However, they might inadvertently incorporate non-transferable information into student representations, thereby limiting the efficacy of knowledge transfer. To tackle this, we propose Zero-1-to-3, a domain-level zero-shot cognitive diagnosis framework via one batch of early-bird students towards three diagnostic objectives. Our approach initiates with pre-training a diagnosis model with dual regularizers, which decouples student states into domain-shared and domain-specific parts. The shared cognitive signals can be transferred to the target domain, enriching the cognitive priors for the new domain, which ensures the cognitive state propagation objective. Subsequently, we devise a strategy to generate simulated practice logs for cold-start students through analyzing the behavioral patterns from early-bird students, fulfilling the domain-adaption goal. Consequently, we refine the cognitive states of cold-start students as diagnostic outcomes via virtual data, aligning with the diagnosis-oriented goal. Finally, extensive experiments on six real-world datasets highlight the efficacy of our model for DZCD and its practical application in question recommendation.

翻訳日:2023-12-25 17:34:18 公開日:2023-12-22

# ancilla qubits を伴わない多対数奥行き制御なしゲート

Polylogarithmic-depth controlled-NOT gates without ancilla qubits ( http://arxiv.org/abs/2312.13206v2 )

ライセンス: Link先を確認

Baptiste Claudon, Julien Zylberman, C\'esar Feniou, Fabrice Debbasch, Alberto Peruzzo, Jean-Philip Piquemal

(参考訳) 制御された操作は量子アルゴリズムの基本構成要素である。 n$-control-not ゲート(c^n(x)$) を任意のシングルキュービットと cnot ゲートに分解することは、重要ではあるが非自明な作業である。本研究は、漸近的および非漸近的レジームにおいて、従来の方法に匹敵する$c^n(x)$回路を導入する。回路深度$\Theta\left(\log(n)^{\log_2(12)}\right)$、回路深度$\mathcal O \left(\log(n)^{\log_2(12)}\log(1/\epsilon)\right)$、m\leq n$ ancilla qubitsを用いた調整可能な深度回路を持つ正確なもの。その結果生じる指数関数的スピードアップは、量子化学から物理学、ファイナンス、量子機械学習に至るまで、無数の量子アルゴリズムの複雑さを改善することによって、フォールトトレラントな量子コンピューティングに大きな影響を与える可能性がある。

Controlled operations are fundamental building blocks of quantum algorithms. Decomposing $n$-control-NOT gates ($C^n(X)$) into arbitrary single-qubit and CNOT gates, is a crucial but non-trivial task. This study introduces $C^n(X)$ circuits outperforming previous methods in the asymptotic and non-asymptotic regimes. Three distinct decompositions are presented: an exact one using one borrowed ancilla with a circuit depth $\Theta\left(\log(n)^{\log_2(12)}\right)$, an approximating one without ancilla qubits with a circuit depth $\mathcal O \left(\log(n)^{\log_2(12)}\log(1/\epsilon)\right)$ and an exact one with an adjustable-depth circuit using $m\leq n$ ancilla qubits. The resulting exponential speedup is likely to have a substantial impact on fault-tolerant quantum computing by improving the complexities of countless quantum algorithms with applications ranging from quantum chemistry to physics, finance and quantum machine learning.

翻訳日:2023-12-25 17:33:50 公開日:2023-12-22

# MoSAR:微分シェーディングを用いた単眼アバター再構成モデル

MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading ( http://arxiv.org/abs/2312.13091v2 )

ライセンス: Link先を確認

Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got, Trevor Anderson, Amin Fadaeinejad, Rafael M. O. Cruz, Marc-Andre Carbonneau

(参考訳) ポートレート画像からアバターを再構築することはマルチメディアに多くの応用があるが、依然として困難な研究課題である。 1つの画像から反射率マップと幾何を抽出することは誤りであり、幾何の復元は1対多のマッピング問題であり、反射率と光の分離は困難である。正確な幾何学と反射率を光段の制御条件下で捉えることはできるが、この方法で大規模なデータセットを取得するにはコストがかかる。さらに、この種のデータのみでのトレーニングは、Wildイメージによる一般化の貧弱につながる。これはモノクロ画像から3Dアバターを生成するMoSARの導入を動機付けている。そこで本研究では,光ステージと地中データセットの両方から学習することで,一般化を向上する半教師付きトレーニング手法を提案する。これは、新しい微分可能なシェーディング式を用いて達成される。提案手法は,本質的な顔パラメータを効果的に切り離し,照らしやすいアバターを生成する。その結果、MoSARはよりリッチな皮膚反射マップを推定し、既存の最先端手法よりも現実的なアバターを生成する。 FFHQ-UV-Intrinsicsという名の新しいデータセットも導入しました。これは10万件の被験者に対して、内在的な顔属性をスケールで提供する最初の公開データセットです。 https://ubisoft-laforge.github.io/character/mosar/ プロジェクトのwebサイトとデータセットは以下のリンクで利用可能である。

Reconstructing an avatar from a portrait image has many applications in multimedia, but remains a challenging research problem. Extracting reflectance maps and geometry from one image is ill-posed: recovering geometry is a one-to-many mapping problem and reflectance and light are difficult to disentangle. Accurate geometry and reflectance can be captured under the controlled conditions of a light stage, but it is costly to acquire large datasets in this fashion. Moreover, training solely with this type of data leads to poor generalization with in-the-wild images. This motivates the introduction of MoSAR, a method for 3D avatar generation from monocular images. We propose a semi-supervised training scheme that improves generalization by learning from both light stage and in-the-wild datasets. This is achieved using a novel differentiable shading formulation. We show that our approach effectively disentangles the intrinsic face parameters, producing relightable avatars. As a result, MoSAR estimates a richer set of skin reflectance maps, and generates more realistic avatars than existing state-of-the-art methods. We also introduce a new dataset, named FFHQ-UV-Intrinsics, the first public dataset providing intrinsic face attributes at scale (diffuse, specular, ambient occlusion and translucency maps) for a total of 10k subjects. The project website and the dataset are available on the following link: https://ubisoft-laforge.github.io/character/mosar/

翻訳日:2023-12-25 17:33:21 公開日:2023-12-22

# DiffPortrait3D:ゼロショットポートレートビュー合成のための制御可能な拡散

DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis ( http://arxiv.org/abs/2312.13016v3 )

ライセンス: Link先を確認

Yuming Gu, You Xie, Hongyi Xu, Guoxian Song, Yichun Shi, Di Chang, Jing Yang, Linjie Luo

(参考訳) 本稿では,DiffPortrait3Dという条件付き拡散モデルについて述べる。具体的には、単一のRGB入力を前提として、アイデンティティと表情の両方を保持する新しいカメラビューから、可塑性だが一貫した顔の詳細を合成することを目的としている。時間を要する最適化と微調整に代えて,ゼロショット方式は,不適切なカメラビュー,極端な表情,多彩な芸術的描写を備えた任意の顔のポートレートにうまく一般化する。その中心となるのが,大規模画像データセットで事前学習した2次元拡散モデルの生成前処理をレンダリングバックボーンとして活用すると同時に,外観とカメラの姿勢の無角な注意制御によって雑音を誘導する手法である。そこで我々はまず,凍結したユニセットの自己注意層に参照画像から外観コンテキストを注入する。そして、レンダリングビューを、同じビューから横断被写体の条件画像を見て、カメラポーズを解釈する新しい条件制御モジュールで操作する。さらに,学習可能なクロスビューアテンションモジュールを挿入することで,新たな3dアウェアノイズ生成プロセスによってさらに強化され,ビュー一貫性が向上する。我々は,本研究の課題であるマルチビュー・イン・ザ・ワイルドベンチマークを質的かつ定量的に評価し,最新結果を実証する。

We present DiffPortrait3D, a conditional diffusion model that is capable of synthesizing 3D-consistent photo-realistic novel views from as few as a single in-the-wild portrait. Specifically, given a single RGB input, we aim to synthesize plausible but consistent facial details rendered from novel camera views with retained both identity and facial expression. In lieu of time-consuming optimization and fine-tuning, our zero-shot method generalizes well to arbitrary face portraits with unposed camera views, extreme facial expressions, and diverse artistic depictions. At its core, we leverage the generative prior of 2D diffusion models pre-trained on large-scale image datasets as our rendering backbone, while the denoising is guided with disentangled attentive control of appearance and camera pose. To achieve this, we first inject the appearance context from the reference image into the self-attention layers of the frozen UNets. The rendering view is then manipulated with a novel conditional control module that interprets the camera pose by watching a condition image of a crossed subject from the same view. Furthermore, we insert a trainable cross-view attention module to enhance view consistency, which is further strengthened with a novel 3D-aware noise generation process during inference. We demonstrate state-of-the-art results both qualitatively and quantitatively on our challenging in-the-wild and multi-view benchmarks.

翻訳日:2023-12-25 17:32:56 公開日:2023-12-22

# 内部状態、制約のない接続、離散的アクティベーションを用いたニューラルネットワークのトレーニング

Training Neural Networks with Internal State, Unconstrained Connectivity, and Discrete Activations ( http://arxiv.org/abs/2312.14359v1 )

ライセンス: Link先を確認

Alexander Grushin

(参考訳) 今日の最も強力な機械学習アプローチは、通常、事前に定義されたレイヤと異なるアクティベーション機能を持つステートレスアーキテクチャをトレーニングするために設計されている。これらのアプローチは、自然言語処理や画像認識といった分野で前例のない成功を収める一方で、トレーニングされたモデルは、人間がしないような間違いを犯しやすい。本稿では、真の知性は内部状態を管理するために機械学習モデルの能力を必要とするかもしれないが、そのようなモデルを訓練するための最も効果的なアルゴリズムはまだ発見されていない。我々はさらに、そのようなアルゴリズムは必ずしも深いアーキテクチャ上の勾配降下に基づくものではなく、むしろ、離散的なアクティベーションと、(複数の事前定義された層のような)初期トポロジー的制約の少ないアーキテクチャが最もうまく機能するかもしれないと仮定する。我々は,このような学習アルゴリズムの設計を継続する試みの1つとして,バイナリアクティベーションと重みの行列のみを持つアーキテクチャに適用し,自然言語テキストの有用な表現を生成できるが,大量のトレーニングデータを活用する能力に制限があることを示す。次に、アルゴリズムの改善と、類似したアーキテクチャのための他のトレーニングアルゴリズムを設計するためのアイデアを提供する。最後に,効果的な学習アルゴリズムが見つかると得られる潜在的な利点について議論し,その効果が実際に存在するかどうかを評価する実験を提案する。

Today's most powerful machine learning approaches are typically designed to train stateless architectures with predefined layers and differentiable activation functions. While these approaches have led to unprecedented successes in areas such as natural language processing and image recognition, the trained models are also susceptible to making mistakes that a human would not. In this paper, we take the view that true intelligence may require the ability of a machine learning model to manage internal state, but that we have not yet discovered the most effective algorithms for training such models. We further postulate that such algorithms might not necessarily be based on gradient descent over a deep architecture, but rather, might work best with an architecture that has discrete activations and few initial topological constraints (such as multiple predefined layers). We present one attempt in our ongoing efforts to design such a training algorithm, applied to an architecture with binary activations and only a single matrix of weights, and show that it is able to form useful representations of natural language text, but is also limited in its ability to leverage large quantities of training data. We then provide ideas for improving the algorithm and for designing other training algorithms for similar architectures. Finally, we discuss potential benefits that could be gained if an effective training algorithm is found, and suggest experiments for evaluating whether these benefits exist in practice.

翻訳日:2023-12-25 16:39:37 公開日:2023-12-22

# Kac-Luttingerモデルにおける相互作用するボース気体中のボース・アインシュタイン凝縮について

On Bose-Einstein condensation in interacting Bose gases in the Kac-Luttinger model ( http://arxiv.org/abs/2312.14357v1 )

ライセンス: Link先を確認

Chiara Boccato, Joachim Kerner, and Maximilian Pechmann

(参考訳) 2\le d \in \mathbb n$ をゼロ温度で相互作用するボース気体をkac-luttingerモデルとして知られるランダムモデルで研究した。平均場型であるボソン間の対相互作用を選択することで、確率的あるいはほぼ1つの確率的(完全)ボース・アインシュタイン凝縮をハーツリー型汎函数の最小化に証明する。我々は,非相互作用ボース気体のスペクトルギャップに関するalain-sol sznitmanの最近の結果に基づいて,これを達成する。

We study interacting Bose gases of dimensions $2\le d \in \mathbb N$ at zero temperature in a random model known as the Kac-Luttinger model. Choosing the pair-interaction between the bosons to be of a mean-field type, we prove (complete) Bose-Einstein condensation in probability or with probability almost one into the minimizer of a Hartree-type functional. We accomplish this by building upon very recent results by Alain-Sol Sznitman on the spectral gap of the noninteracting Bose gas.

翻訳日:2023-12-25 16:39:15 公開日:2023-12-22

# すべてを信じるな - 大言語モデルにおける幻覚の自動識別による要約解釈可能性の向上

Don't Believe Everything You Read: Enhancing Summarization Interpretability through Automatic Identification of Hallucinations in Large Language Models ( http://arxiv.org/abs/2312.14346v1 )

ライセンス: Link先を確認

Priyesh Vakharia, Devavrat Joshi, Meenal Chavan, Dhananjay Sonawane, Bhrigu Garg, Parsa Mazaheri, Ian Lane

(参考訳) 大規模言語モデル(LLM)は、機械翻訳やテキスト要約といったタスクのテキスト操作に適しています。しかし、これらのモデルは幻覚を引き起こす傾向があり、それはモデルが提供する答えの忠実さを損なう可能性がある。 llmにおける幻覚と闘う最近の研究は、幻覚文の同定と、モデルが幻覚を起こす異なる方法の分類を扱う。本稿では,幻覚に対する LLM の振る舞いを深く掘り下げ,異なる種類の幻覚を識別するためのトークンレベルのアプローチを定義し,さらに,このトークンレベルのタグ付けを用いて対話要約タスクにおける LLM の解釈性と忠実性を改善する。そこで本稿では,新たな拡張データセットと新たなトレーニングパラダイムを提案する。

Large Language Models (LLMs) are adept at text manipulation -- tasks such as machine translation and text summarization. However, these models can also be prone to hallucination, which can be detrimental to the faithfulness of any answers that the model provides. Recent works in combating hallucinations in LLMs deal with identifying hallucinated sentences and categorizing the different ways in which models hallucinate. This paper takes a deep dive into LLM behavior with respect to hallucinations, defines a token-level approach to identifying different kinds of hallucinations, and further utilizes this token-level tagging to improve the interpretability and faithfulness of LLMs in dialogue summarization tasks. Through this, the paper presents a new, enhanced dataset and a new training paradigm.

翻訳日:2023-12-25 16:39:04 公開日:2023-12-22

# logic-scaffolding: llmsを用いたパーソナライズされたアスペクト誘導型推奨説明生成

Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs ( http://arxiv.org/abs/2312.14345v1 )

ライセンス: Link先を確認

Behnam Rahdari, Hao Ding, Ziwei Fan, Yifei Ma, Zhuotong Chen, Anoop Deoras and Branislav Kveton

(参考訳) 自然言語テキスト生成機能のようなLarge Language Models(LLMs)のユニークな能力は、レコメンデーションの説明を提供する強力な候補としてそれらを位置づけている。しかし、LLMのサイズにもかかわらず、既存のモデルのほとんどはゼロショットの説明を確実に作成するのに苦労している。この問題に対処するために、アスペクトベースの説明とチェーン・オブ・思想のアイデアを組み合わせたLogic-Scaffolding(Logic-Scaffolding)というフレームワークを提案する。本稿では,フレームワーク構築の経験を共有し,その結果を探索するためのインタラクティブなデモンストレーションを行う。

The unique capabilities of Large Language Models (LLMs), such as the natural language text generation ability, position them as strong candidates for providing explanation for recommendations. However, despite the size of the LLM, most existing models struggle to produce zero-shot explanations reliably. To address this issue, we propose a framework called Logic-Scaffolding, that combines the ideas of aspect-based explanation and chain-of-thought prompting to generate explanations through intermediate reasoning steps. In this paper, we share our experience in building the framework and present an interactive demonstration for exploring our results.

翻訳日:2023-12-25 16:38:50 公開日:2023-12-22

# 量子多重グラフ状態と多重ハイパーグラフ状態

Quantum multigraph states and multihypergraph states ( http://arxiv.org/abs/2312.14399v1 )

ライセンス: Link先を確認

Xiao-Dong Zhang, Bin-Bin Cai, and Song Lin

(参考訳) 我々は、エッジとハイパーエッジのユニークな操作によって定義される2種類の多粒子交絡状態、マルチグラフ状態とマルチハイパーグラフ状態を提案した。重要な発見は、提案された多重ハイパーグラフ状態と、dが素数であるときの一般化実重み付け状態との1対1対応である。合成 d に対して、多重超グラフ状態は一般化された実重み付け状態の部分集合を形成する。一方,超グラフ状態から実等重み付け状態を構築する方法を詳述し,d-次元超グラフ状態から生成できない一般化実重み付け状態を明らかにする。

We proposed two classes of multiparticle entangled states, the multigraph states and multihypergraph states, defined by unique operations on the edges and hyperedges. A key discovery is the one-to-one correspondence between the proposed multihypergraph states and the generalized real equally weighted states when d is prime. While for composite d, multihypergraph states form a subset of the generalized real equally weighted states. Meanwhile, we detailed a method for constructing real equally weighted states from hypergraph states and revealed the generalized real equally weighted states which cannot be generated from d-dimensional hypergraph states.

翻訳日:2023-12-25 16:28:33 公開日:2023-12-22

# 教師なし深層学習画像検証法

Unsupervised Deep Learning Image Verification Method ( http://arxiv.org/abs/2312.14395v1 )

ライセンス: Link先を確認

Enoch Solomon, Abraham Woubie and Eyael Solomon Emiru

(参考訳) ディープラーニングは一般的に画像認識に使用されるが、通常は大量のラベル付きトレーニングデータが必要である。これにより、最先端の教師なし顔認証技術と比較すると、顕著な性能格差が生じる。本研究では,顔画像ベクトルを新しい表現に変換するオートエンコーダを利用して,このギャップを狭める手法を提案する。特に、オートエンコーダは、元の入力画像ベクトルではなく、隣接する顔画像ベクトルを再構成するように訓練される。これらの隣接顔画像ベクトルは、訓練顔画像ベクトルとの最高コサインスコアに基づいて教師なしプロセスにより選択される。提案手法は,野生(lfw)データセットのラベル付き顔のベースラインシステム上でのeerの相対的改善を56\%達成する。これにより、コサインとPLDAスコアリングシステムのパフォーマンスギャップを狭めることに成功した。

Although deep learning are commonly employed for image recognition, usually huge amount of labeled training data is required, which may not always be readily available. This leads to a noticeable performance disparity when compared to state-of-the-art unsupervised face verification techniques. In this work, we propose a method to narrow this gap by leveraging an autoencoder to convert the face image vector into a novel representation. Notably, the autoencoder is trained to reconstruct neighboring face image vectors rather than the original input image vectors. These neighbor face image vectors are chosen through an unsupervised process based on the highest cosine scores with the training face image vectors. The proposed method achieves a relative improvement of 56\% in terms of EER over the baseline system on Labeled Faces in the Wild (LFW) dataset. This has successfully narrowed down the performance gap between cosine and PLDA scoring systems.

翻訳日:2023-12-25 16:28:22 公開日:2023-12-22

# AdapTraj:マルチエージェント軌道予測のためのマルチソースドメイン一般化フレームワーク

AdapTraj: A Multi-Source Domain Generalization Framework for Multi-Agent Trajectory Prediction ( http://arxiv.org/abs/2312.14394v1 )

ライセンス: Link先を確認

Tangwen Qian, Yile Chen, Gao Cong, Yongjun Xu, Fei Wang

(参考訳) 近年,動的システムにおけるオブジェクトの複雑な相互作用をモデル化するための重要な課題として,マルチエージェント軌道予測が注目されている。有望な進歩にもかかわらず、既存の研究はすべて、実際のデプロイメントで遭遇したモデル学習中に観測されたデータ分布が一致しているという仮定に従っている。しかし、本質的な分散シフトが配置環境のモビリティパターンに存在する可能性があり、ドメインの一般化とパフォーマンスの低下に繋がるので、この仮定はしばしば現実には成り立たない。したがって、マルチエージェント軌道予測タスクにおけるそのような不一致を緩和するために、複数のソースドメインの軌跡を利用するのが望ましい。しかし,本課題におけるマルチソース領域一般化の開発は,(1)負の伝達,(2)外部要因のモデリングが不十分な2つの課題を提起している。これらの課題に対処するために、焦点エージェントと隣接エージェントの両方に対して、ドメイン不変およびドメイン固有の4種類の特徴を明示的にモデル化する新しい因果式を提案する。新たな定式化に基づいて,マルチエージェント軌道予測に特化したマルチソースドメイン一般化フレームワークadaptrajを提案する。 adaptrajは様々なモデルに適応可能なプラグアンドプレイモジュールとして機能する。異なるドメインを持つ4つのデータセットに対する大規模な実験は、AdapTrajが他のベースラインをかなり上回っていることを示している。

Multi-agent trajectory prediction, as a critical task in modeling complex interactions of objects in dynamic systems, has attracted significant research attention in recent years. Despite the promising advances, existing studies all follow the assumption that data distribution observed during model learning matches that encountered in real-world deployments. However, this assumption often does not hold in practice, as inherent distribution shifts might exist in the mobility patterns for deployment environments, thus leading to poor domain generalization and performance degradation. Consequently, it is appealing to leverage trajectories from multiple source domains to mitigate such discrepancies for multi-agent trajectory prediction task. However, the development of multi-source domain generalization in this task presents two notable issues: (1) negative transfer; (2) inadequate modeling for external factors. To address these issues, we propose a new causal formulation to explicitly model four types of features: domain-invariant and domain-specific features for both the focal agent and neighboring agents. Building upon the new formulation, we propose AdapTraj, a multi-source domain generalization framework specifically tailored for multi-agent trajectory prediction. AdapTraj serves as a plug-and-play module that is adaptable to a variety of models. Extensive experiments on four datasets with different domains demonstrate that AdapTraj consistently outperforms other baselines by a substantial margin.

翻訳日:2023-12-25 16:28:10 公開日:2023-12-22

# 平面符号による二項符号の結合

Concatenating Binomial Codes with the Planar Code ( http://arxiv.org/abs/2312.14390v1 )

ライセンス: Link先を確認

Juliette Soule, Andrew C. Doherty, Arne L. Grimsmo

(参考訳) 回転対称ボソニック符号は、量子ビットを振動子の自由度に、特に超伝導量子ビット実験において魅力的な符号化である。これらのコードはかなりの損失と強調を許容するが、大規模なデバイスを達成するためには、より高いレベルのコードと組み合わせる必要がある。耐故障性量子計算のための計測に基づくスキームにおいて,これらの符号と平面符号の整合性を検討する。我々は, 基本レベルの符号化として二項符号に着目し, 各種計測プロトコルにおいて損失を受ける符号化の破断点を推定する。これらの符号は光子損失に耐性があるが、ゲート操作や測定には平均光子数と高位相分解能が必要である。二項符号量子ビットを用いた平面符号の性能を得るには,適応位相計測,最大ラピッド量子状態推定,重み付き最小重み復号を実装する必要がある。

Rotation symmetric bosonic codes are are an attractive encoding for qubits into oscillator degrees of freedom, particularly in superconducting qubit experiments. While these codes can tolerate considerable loss and dephasing, they will need to be combined with higher level codes to achieve large-scale devices. We investigate concatenating these codes with the planar code in a measurement-based scheme for fault-tolerant quantum computation. We focus on binomial codes as the base level encoding, and estimate break-even points for such encodings under loss for various types of measurement protocol. These codes are more resistant to photon loss errors, but require both higher mean photon numbers and higher phase resolution for gate operations and measurements. We find that it is necessary to implement adaptive phase measurements, maximum likelihood quantum state inference, and weighted minimum weight decoding to obtain good performance for a planar code using binomial code qubits.

翻訳日:2023-12-25 16:27:48 公開日:2023-12-22

# StyleRetoucher:GANプリミティブによる一般的なポートレートイメージのリタッチ

StyleRetoucher: Generalized Portrait Image Retouching with GAN Priors ( http://arxiv.org/abs/2312.14389v1 )

ライセンス: Link先を確認

Wanchao Su, Can Wang, Chen Liu, Hangzhou Han, Hongbo Fu, Jing Liao

(参考訳) ポートレート画像の微調整は、プロのアーティストにとっても退屈で時間がかかります。自動リタッチは存在するが、過度にスムースなアーティファクトに悩まされるか、一般化能力に欠ける。そこで本研究では,styleganの生成と一般化を活かし,顔の細部を保ちつつ入力画像の肌状態を改善するための,新しい自動ポートレート画像リタッチフレームワークであるstyleretoucherを提案する。事前訓練したStyleGANの先行性から,本手法はより優れた堅牢性を示す。 a)。少ないトレーニングサンプルで安定して実行し b)。ドメイン外のデータでうまく一般化する。さらに,入力画像の空間的特徴とStyleGAN層の中間特徴を混合することにより,入力特性を最大に保持する。さらに,スキンブレミッシュを効果的に識別し除去し,画像皮膚状態を改善する新しいブレミッシュ認識特徴選択機構を提案する。定性的かつ定量的な評価は,本手法の大きな一般化能力を検証する。さらなる実験により、styleretoucherはイメージリタッチタスクの代替ソリューションよりも優れたパフォーマンスを示している。また,既存手法よりも優れた修正性能を確認するために,利用者の意識調査を実施している。

Creating fine-retouched portrait images is tedious and time-consuming even for professional artists. There exist automatic retouching methods, but they either suffer from over-smoothing artifacts or lack generalization ability. To address such issues, we present StyleRetoucher, a novel automatic portrait image retouching framework, leveraging StyleGAN's generation and generalization ability to improve an input portrait image's skin condition while preserving its facial details. Harnessing the priors of pretrained StyleGAN, our method shows superior robustness: a). performing stably with fewer training samples and b). generalizing well on the out-domain data. Moreover, by blending the spatial features of the input image and intermediate features of the StyleGAN layers, our method preserves the input characteristics to the largest extent. We further propose a novel blemish-aware feature selection mechanism to effectively identify and remove the skin blemishes, improving the image skin condition. Qualitative and quantitative evaluations validate the great generalization capability of our method. Further experiments show StyleRetoucher's superior performance to the alternative solutions in the image retouching task. We also conduct a user perceptive study to confirm the superior retouching performance of our method over the existing state-of-the-art alternatives.

翻訳日:2023-12-25 16:27:32 公開日:2023-12-22

# 対話型画像セグメンテーションのための可変非感性および目標保存マスク微細化

Variance-insensitive and Target-preserving Mask Refinement for Interactive Image Segmentation ( http://arxiv.org/abs/2312.14387v1 )

ライセンス: Link先を確認

Chaowei Fang, Ziyin Zhou, Junye Chen, Hanjing Su, Qingyao Wu, Guanbin Li

(参考訳) ポイントベースのインタラクティブな画像セグメンテーションは、セマンティックセグメンテーションや画像編集といったアプリケーションにおけるマスクアノテーションの負担を軽減できる。しかし,ユーザ入力を限定したターゲットマスクの完全抽出は依然として困難である。本稿では,ユーザ入力の少ないセグメンテーション品質を向上する新しい手法である可変無感・ターゲット保存マスクリファインメントを提案する。初期マスクとしての最後のセグメンテーション結果については、初期マスクを継続的に強化する反復精錬工程が一般的である。それにもかかわらず、従来の手法は初期マスクのばらつきに敏感である。この問題を回避するため,提案手法では,異なる種類の初期マスクからの一貫した推論を保証するマスクマッチングアルゴリズムを組み込んだ。また,ターゲット認識型ズームアルゴリズムを導入し,ダウンサンプリング時のオブジェクト情報保存,効率のバランス,正確性について述べる。 GrabCut、バークレー、SBD、DAVISデータセットの実験は、インタラクティブな画像セグメンテーションにおける我々の手法の最先端性能を実証している。

Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing. However, fully extracting the target mask with limited user inputs remains challenging. We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs. Regarding the last segmentation result as the initial mask, an iterative refinement process is commonly employed to continually enhance the initial mask. Nevertheless, conventional techniques suffer from sensitivity to the variance in the initial mask. To circumvent this problem, our proposed method incorporates a mask matching algorithm for ensuring consistent inferences from different types of initial masks. We also introduce a target-aware zooming algorithm to preserve object information during downsampling, balancing efficiency and accuracy. Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.

翻訳日:2023-12-25 16:27:12 公開日:2023-12-22

# LLMを超えた生成AI:マルチモーダル生成のシステム意味

Generative AI Beyond LLMs: System Implications of Multi-Modal Generation ( http://arxiv.org/abs/2312.14385v1 )

ライセンス: Link先を確認

Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer, Yejin Lee, Zachary DeVito, Jeff Johnson, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

(参考訳) 大規模な生成AIモデルの開発がテキスト(1D)生成を超えて進化し、画像(2D)とビデオ(3D)生成を含むようになると、空間的および時間的情報の処理は品質、パフォーマンス、効率に固有の課題をもたらす。本稿では,マルチモーダルテキスト・ツー・イメージ(TTI)とテキスト・ツー・ビデオ(TTV)生成モデルに対する新しいシステム設計空間の理解に向けた最初の取り組みを示す。現在のモデルアーキテクチャ設計は、拡散モデルとトランスフォーマーモデルという2つのカテゴリに分けられる。 8種類のTTI/TTVモデルの系統的性能評価では,Flash Attentionのような最先端の最適化手法を適用した後,ConvolutionはDiffusionベースのTTIモデルの実行時間の最大44%を占め,Linear層はTransformerベースのモデルの実行時間の最大49%を消費している。また,Diffusion ベースの TTI モデルは LLM 推論の Prefill 段階に似ており,Decode フェーズに類似した Transformer ベースの TTI モデルよりも Flash Attention の 1.1-2.5 倍の高速化が期待できる。 LLM向けに設計された最適化は、直接TTI/TTVモデルにマッピングされないため、新たな最適化機会を得るために、これらのワークロードを徹底的に評価する必要がある。このようにして、TTI/TTVモデルの文脈でシーケンス長を定義し、拡散モデル推論において、シーケンス長は最大4倍まで変化する。さらに,ttvワークロードの時間的側面がユニークなシステムボトルネックをもたらし,時間的注意が全体の注意時間の60%以上を占めることを観察した。全体として、当社のシステムパフォーマンス評価は、新たなTTI/TTVワークロードのために効率的でデプロイ可能なシステムを設計するための重要な第一歩です。

As the development of large-scale Generative AI models evolve beyond text (1D) generation to include image (2D) and video (3D) generation, processing spatial and temporal information presents unique challenges to quality, performance, and efficiency. We present the first work towards understanding this new system design space for multi-modal text-to-image (TTI) and text-to-video (TTV) generation models. Current model architecture designs are bifurcated into 2 categories: Diffusion- and Transformer-based models. Our systematic performance characterization on a suite of eight representative TTI/TTV models shows that after state-of-the-art optimization techniques such as Flash Attention are applied, Convolution accounts for up to 44% of execution time for Diffusion-based TTI models, while Linear layers consume up to 49% of execution time for Transformer-based models. We additionally observe that Diffusion-based TTI models resemble the Prefill stage of LLM inference, and benefit from 1.1-2.5x greater speedup from Flash Attention than Transformer-based TTI models that resemble the Decode phase. Since optimizations designed for LLMs do not map directly onto TTI/TTV models, we must conduct a thorough characterization of these workloads to gain insights for new optimization opportunities. In doing so, we define sequence length in the context of TTI/TTV models and observe sequence length can vary up to 4x in Diffusion model inference. We additionally observe temporal aspects of TTV workloads pose unique system bottlenecks, with Temporal Attention accounting for over 60% of total Attention time. Overall, our in-depth system performance characterization is a critical first step towards designing efficient and deployable systems for emerging TTI/TTV workloads.

翻訳日:2023-12-25 16:26:57 公開日:2023-12-22

# van der Waals位相強磁性金属テルル化鉄の破壊反転対称性

Broken inversion symmetry in van der Waals topological ferromagnetic metal iron germanium telluride ( http://arxiv.org/abs/2312.14384v1 )

ライセンス: Link先を確認

Kai-Xuan Zhang, Hwiin Ju, Hyuncheol Kim, Jingyuan Cui, Jihoon Keum, Je-Geun Park, and Jong Seok Lee

(参考訳) 反転対称性の破れは多くの量子効果にとって重要であり、次世代のスピントロニクスにとって重要なスピン軌道トルクの基礎である。近年, トポロジカルファンダーワールス (vdW) マグネット鉄テルル化物に, 新規な内在性スピン軌道トルクが確立されている。しかし、層間反転対称性の破れに関する明確な証拠がないため、謎のままである。本稿では,第2高調波発生法(SHG)により直接測定されたテルル化鉄鉄の破壊反転対称性の証拠を報告する。結晶対称性は、中心対称P63/mmcから非中心対称極性P3m1空間群へと減少し、3次元SHGパターンが支配的な外面偏光を与えることを示す。さらに、SHG反応は、主にランダムな欠陥から順序付けられたFe空孔への移行によって、Fe欠乏の増加に伴って、等方パターンから鋭い3倍対称性へと進化する。このようなSHG応答は温度に対して堅牢であり、強磁性遷移温度より上および下方の未変化結晶対称性を保証する。これらの発見は、この興味深いvdw金属、鉄ゲルマニウムテルライド:バンドトポロジー、固有スピン軌道トルク、位相的vdw極性状態の理解に重要な新しい情報を与える。

Inversion symmetry breaking is critical for many quantum effects and fundamental for spin-orbit torque, which is crucial for next-generation spintronics. Recently, a novel type of gigantic intrinsic spin-orbit torque has been established in the topological van-der-Waals (vdW) magnet iron germanium telluride. However, it remains a puzzle because no clear evidence exists for interlayer inversion symmetry breaking. Here, we report the definitive evidence of broken inversion symmetry in iron germanium telluride directly measured by the second harmonic generation (SHG) technique. Our data show that the crystal symmetry reduces from centrosymmetric P63/mmc to noncentrosymmetric polar P3m1 space group, giving the three-fold SHG pattern with dominant out-of-plane polarization. Additionally, the SHG response evolves from an isotropic pattern to a sharp three-fold symmetry upon increasing Fe deficiency, mainly due to the transition from random defects to ordered Fe vacancies. Such SHG response is robust against temperature, ensuring unaltered crystalline symmetries above and below the ferromagnetic transition temperature. These findings add crucial new information to our understanding of this interesting vdW metal, iron germanium telluride: band topology, intrinsic spin-orbit torque and topological vdW polar metal states.

翻訳日:2023-12-25 16:26:25 公開日:2023-12-22

# 可視性透かし除去のための干渉除去とコンテンツ回収

Removing Interference and Recovering Content Imaginatively for Visible Watermark Removal ( http://arxiv.org/abs/2312.14383v1 )

ライセンス: Link先を確認

Yicheng Leng, Chaowei Fang, Gen Li, Yixiang Fang, Guanbin Li

(参考訳) 可視的な透かしは、画像の著作権を保護するのに役立ちますが、基礎となるコンテンツを歪め、シーンの解釈や画像編集といった作業を複雑にします。目に見える透かし除去は、透かしの干渉をなくし、背景コンテンツを復元することを目的としている。しかし, 従来の手法では, 同一枝内で透かし成分の除去や背景復元を行う場合が多く, 予測や背景が曖昧な場合の無視において, 透かしの残差が生じている。これらの制約に対処するために、Removing Interference and Recovering Content Imaginatively (RIRCI)フレームワークを紹介した。 rirciは2段階のアプローチを具現化している: 最初のフェーズはウォーターマークのコンポーネントの識別と分離に集中し、次のフェーズは背景コンテンツの復元に焦点を当てている。本モデルでは, 半透明な透かしの下の固有背景情報と, 影響のない地域からの周辺環境情報を完全に探索できる2経路ネットワークを用いた。さらに、グローバルおよびローカルなコンテキストインタラクションモジュールは、背景復元フェーズにおける包括的な表現モデリングのための多層パーセプトロンと双方向特徴変換の上に構築される。提案手法の有効性は2つの大規模データセットで実証的に検証され,既存の透かし除去技術よりも顕著に向上した。

Visible watermarks, while instrumental in protecting image copyrights, frequently distort the underlying content, complicating tasks like scene interpretation and image editing. Visible watermark removal aims to eliminate the interference of watermarks and restore the background content. However, existing methods often implement watermark component removal and background restoration tasks within a singular branch, leading to residual watermarks in the predictions and ignoring cases where watermarks heavily obscure the background. To address these limitations, this study introduces the Removing Interference and Recovering Content Imaginatively (RIRCI) framework. RIRCI embodies a two-stage approach: the initial phase centers on discerning and segregating the watermark component, while the subsequent phase focuses on background content restoration. To achieve meticulous background restoration, our proposed model employs a dual-path network capable of fully exploring the intrinsic background information beneath semi-transparent watermarks and peripheral contextual information from unaffected regions. Moreover, a Global and Local Context Interaction module is built upon multi-layer perceptrons and bidirectional feature transformation for comprehensive representation modeling in the background restoration phase. The efficacy of our approach is empirically validated across two large-scale datasets, and our findings reveal a marked enhancement over existing watermark removal techniques.

翻訳日:2023-12-25 16:26:02 公開日:2023-12-22

# インドシアニングリーンの共振強化2光子吸収断面積に関する実験的検討

Experimental Upper Bounds for Resonance-Enhanced Entangled Two-Photon Absorption Cross Section of Indocyanine Green ( http://arxiv.org/abs/2312.14382v1 )

ライセンス: Link先を確認

Manni He, Bryce P. Hickam, Nathan Harper, Scott K. Cushing

(参考訳) 共振中間状態は、絡み合った2光子吸収(ETPA)の効率を高めるために提案されている。共鳴励起etpa(r-etpa)は、明るい真空を用いて原子系で証明されているが、有機分子では研究されていない。有機分子色素であるインドシアニングリーン (ICG) において, 近赤外光子に励起された光子によって初めてr-ETPAが検出された。報告されている多くの仮想状態媒介ETPA(v-ETPA)測定と同様に、r-ETPA信号は測定されず、クロスセクションの試験上界は6 \times 10^{-23}$ cm$^2$/moleculeである。さらに、ICGの800nmにおける古典的共鳴励起2光子吸収(r-TPA)断面積を初めて20(\pm13)=GMと測定し、共鳴中間状態を持つことはICGの2光子過程を著しく向上させるものではないことを示唆した。絡み合った光子によって励起されたICGの分光分解発光シグネチャもこの結論を支持する。

Resonant intermediate states have been proposed to increase the efficiency of entangled two-photon absorption (ETPA). Although resonance-enhanced ETPA (r-ETPA) has been demonstrated in atomic systems using bright squeezed vacuum, it has not been studied in organic molecules. We investigate for the first time r-ETPA in an organic molecular dye, indocyanine green (ICG), when excited by broadband entangled photons in near-IR. Similar to many reported virtual state mediated ETPA (v-ETPA) measurements, no r-ETPA signals are measured, with an experimental upper bound for the cross section placed at $6 \times 10^{-23}$ cm$^2$/molecule. In addition, the classical resonance-enhanced two-photon absorption (r-TPA) cross section of ICG at 800 nm is measured for the first time to be $20(\pm13)$ GM, suggesting that having a resonant intermediate state does not significantly enhance two-photon processes in ICG. The spectrotemporally resolved emission signatures of ICG excited by entangled photons are also presented to support this conclusion.

翻訳日:2023-12-25 16:25:37 公開日:2023-12-22

# 投影軌道正規化による連合学習

Federated Learning with Projected Trajectory Regularization ( http://arxiv.org/abs/2312.14380v1 )

ライセンス: Link先を確認

Tiejin Chen, Yuanpu Cao, Yujia Wang, Cho-Jui Hsieh, Jinghui Chen

(参考訳) フェデレーション学習は、ローカルデータを共有せずに、分散クライアントから機械学習モデルの共同トレーニングを可能にする。フェデレーション学習における1つの重要な課題は、非識別的に分散したデータをクライアント間で処理することで、モデルトレーニングのパフォーマンスが低下する。この一連の研究は、主に最終段階のグローバルモデルパラメータ/勾配や過去のモデルパラメータ/勾配の線形結合の利用に焦点を当てており、モデル訓練軌道からのグローバル情報の可能性を完全に活用していない。本稿では、データ不均一性問題に対処するための予測軌道正規化(FedPTR)を備えた新しいフェデレーション学習フレームワークを提案する。具体的には、ローカルクライアントやサーバが、最近のモデル更新の学習ダイナミクスを模倣した補助(合成)データセットを最適化し、それを、ローカルトレーニングの正規化のために次のステップモデル軌道を投影する。非凸確率的設定下で提案手法の厳密な理論解析を行い,不均質なデータ分布下での収束性を検証する。各種ベンチマークデータセットと非i.d.設定の実験により,提案フレームワークの有効性が検証された。

Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients, which leads to deteriorated model training performances. Prior works in this line of research mainly focus on utilizing last-step global model parameters/gradients or the linear combinations of the past model parameters/gradients, which do not fully exploit the potential of global information from the model training trajectory. In this paper, we propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data heterogeneity issue, which proposes a unique way to better extract the essential global information from the model training trajectory. Specifically, FedPTR allows local clients or the server to optimize an auxiliary (synthetic) dataset that mimics the learning dynamics of the recent model update and utilizes it to project the next-step model trajectory for local training regularization. We conduct rigorous theoretical analysis for our proposed framework under nonconvex stochastic settings to verify its fast convergence under heterogeneous data distributions. Experiments on various benchmark datasets and non-i.i.d. settings validate the effectiveness of our proposed framework.

翻訳日:2023-12-25 16:25:14 公開日:2023-12-22

# 音声認識と音声イベント分類の改善を目的としたマルチモーダルアテンションマージ

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification ( http://arxiv.org/abs/2312.14378v1 )

ライセンス: Link先を確認

Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu

(参考訳) ラベルなしデータに対する自己教師付き目標を用いた大規模基礎モデルのトレーニングと下流タスクの微調整が標準手順として登場している。残念ながら、このアプローチの有効性は、制限された微調整計算とラベル付き下流データの不足によって制約されることが多い。マルチモーダル・アテンション・マージング(MAM)は、高リソース・モダリティ・テキスト・画像に根ざしたモデルの注意行列から、ゼロショット・パラダイムを用いたリソース制約領域・音声・音声への直接的な知識伝達を容易にする試みである。 MAMは、自動音声認識(ASR)モデルの相対的な単語誤り率(WER)を最大6.70%削減し、オーディオイベント分類(AEC)モデルの相対的な分類誤差を10.63%削減する。データ/計算が利用可能である場合、注意行列をマージするためのデータ駆動アプローチであるLearnerable-MAMを提示し、その結果、ASRのWERがさらに2.90%減少し、AECの18.42%が微調整に比べて減少する結果となった。

Training large foundation models using self-supervised objectives on unlabeled data, followed by fine-tuning on downstream tasks, has emerged as a standard procedure. Unfortunately, the efficacy of this approach is often constrained by both limited fine-tuning compute and scarcity in labeled downstream data. We introduce Multimodal Attention Merging (MAM), an attempt that facilitates direct knowledge transfer from attention matrices of models rooted in high resource modalities, text and images, to those in resource-constrained domains, speech and audio, employing a zero-shot paradigm. MAM reduces the relative Word Error Rate (WER) of an Automatic Speech Recognition (ASR) model by up to 6.70%, and relative classification error of an Audio Event Classification (AEC) model by 10.63%. In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2.90% relative reduction in WER for ASR and 18.42% relative reduction in AEC compared to fine-tuning.

翻訳日:2023-12-25 16:24:52 公開日:2023-12-22

# マルチエージェント軌道予測のための社会時間グラフの学習

Learning Socio-Temporal Graphs for Multi-Agent Trajectory Prediction ( http://arxiv.org/abs/2312.14373v1 )

ライセンス: Link先を確認

Yuke Li, Lixiong Chen, Guangyi Chen, Ching-Yao Chan, Kun Zhang, Stefano Anzellotti, Donglai Wei

(参考訳) 歩行者の軌道を正確に予測するためには、他の歩行者との社会的・時間的相互作用を常に考慮しなければならない。関連する情報を分離、部分的、または暗黙的に表現する既存の作業とは異なり、それを完全かつ明示的に捉えて分析するための完全な表現を提案する。特に, 社会時間グラフ (stg) と呼ぶ有向非循環グラフ構造を導入し, 空間と時間にまたがる集団間の対方向の社会時間相互作用を明示的に捉えた。我々のモデルは、STGの構造を決定する潜在変数を持つ時間変化生成プロセスに基づいて構築される。軌道予測のためのSTGの構造を学習するためのエンドツーエンドパイプラインを提供するSTGformerというアテンションベースモデルを設計する。提案手法は,2つの大規模ベンチマークデータセットにおいて,最先端の予測精度を実現する。本分析は, 過去の軌跡が, 他人の進路を予測する上で重要であることを示す。我々のモデルは、社会時間的局所性の強い概念でこの関係を学習する。統計学は、この情報を明示的に予測するために利用すると、軌道のみのアプローチに対して顕著な性能向上が得られることを示した。

In order to predict a pedestrian's trajectory in a crowd accurately, one has to take into account her/his underlying socio-temporal interactions with other pedestrians consistently. Unlike existing work that represents the relevant information separately, partially, or implicitly, we propose a complete representation for it to be fully and explicitly captured and analyzed. In particular, we introduce a Directed Acyclic Graph-based structure, which we term Socio-Temporal Graph (STG), to explicitly capture pair-wise socio-temporal interactions among a group of people across both space and time. Our model is built on a time-varying generative process, whose latent variables determine the structure of the STGs. We design an attention-based model named STGformer that affords an end-to-end pipeline to learn the structure of the STGs for trajectory prediction. Our solution achieves overall state-of-the-art prediction accuracy in two large-scale benchmark datasets. Our analysis shows that a person's past trajectory is critical for predicting another person's future path. Our model learns this relationship with a strong notion of socio-temporal localities. Statistics show that utilizing this information explicitly for prediction yields a noticeable performance gain with respect to the trajectory-only approaches.

翻訳日:2023-12-25 16:24:24 公開日:2023-12-22

# KamLAND-Zenシミュレーションのための生成モデル

Generative Models for Simulation of KamLAND-Zen ( http://arxiv.org/abs/2312.14372v1 )

ライセンス: Link先を確認

Z. Fu, C. Grant, D. M. Krawiec, A. Li, L. Winslow

(参考訳) ニュートリノのない二重ベータ崩壊(0{\nu}\b{eta}\b{eta})の次の世代の探索は、ニュートリノの性質と宇宙の物質-反物質非対称性の源についての深い疑問に答えるものである。年1トンの同位体が1トン当たり1件未満の事象率を観測する。発見を主張するには、0{\nu}\b{eta}\b{eta}を模倣する検出器事象の正確かつ効率的なシミュレーションが重要である。伝統的なモンテカルロシミュレーションは機械学習に基づく生成モデルによって補うことができる。本研究では,KamLANDのようなモノリシック液体シンチレータ検出器向けに設計された生成モデルの性能について述べる。低レベルの機能を復元し、補間を行う能力を示す。将来、これらの生成モデルの結果は、高品質な豊富な生成データを提供することで、イベントの分類と背景拒絶を改善するのに使うことができる。

The next generation of searches for neutrinoless double beta decay (0{\nu}\b{eta}\b{eta}) are poised to answer deep questions on the nature of neutrinos and the source of the Universe's matter-antimatter asymmetry. They will be looking for event rates of less than one event per ton of instrumented isotope per year. To claim discovery, accurate and efficient simulations of detector events that mimic 0{\nu}\b{eta}\b{eta} is critical. Traditional Monte Carlo (MC) simulations can be supplemented by machine-learning-based generative models. In this work, we describe the performance of generative models designed for monolithic liquid scintillator detectors like KamLAND to produce highly accurate simulation data without a predefined physics model. We demonstrate its ability to recover low-level features and perform interpolation. In the future, the results of these generative models can be used to improve event classification and background rejection by providing high-quality abundant generated data.

翻訳日:2023-12-25 16:24:05 公開日:2023-12-22

# 合成データを用いた学習のための品質多様性生成サンプリング

Quality-Diversity Generative Sampling for Learning with Synthetic Data ( http://arxiv.org/abs/2312.14369v1 )

ライセンス: Link先を確認

Allen Chang, Matthew C. Fontaine, Serena Booth, Maja J. Matari\'c, Stefanos Nikolaidis

(参考訳) 生成モデルは、合成トレーニングデータセットを作成することによって、実際のデータソースのサロゲートとして機能することができる。合成トレーニングデータセットを生成する際の品質と多様性の保護に注力する。バイアス発生器から得られるデータにもかかわらず、ユーザ定義測度空間を均一にサンプリングするフレームワークである品質多様性生成サンプリング(QDGS)を提案する。 qdgsはモデルに依存しないフレームワークで、生成モデルを微調整することなく、合成によって生成されたデータの多様性の尺度で品質目標を最適化する。 qdgsが生成するバランスのとれた合成データセットを用いて,まず,カラーバイアス形状データセットで学習した識別器を概念実証としてデバイアスする。顔データ合成にqdgを適用することで、肌の色調や年齢といった所望の意味概念を駆使して、視覚特徴のブレンドを組み合わせた交叉データセットを作成する。このバランスの取れたデータを分類器のトレーニングに利用することで、顔認識ベンチマークの精度を維持しながら公平性が向上する。 https://github.com/Cylumn/qd-generative-sampling

Generative models can serve as surrogates for some real data sources by creating synthetic training datasets, but in doing so they may transfer biases to downstream tasks. We focus on protecting quality and diversity when generating synthetic training datasets. We propose quality-diversity generative sampling (QDGS), a framework for sampling data uniformly across a user-defined measure space, despite the data coming from a biased generator. QDGS is a model-agnostic framework that uses prompt guidance to optimize a quality objective across measures of diversity for synthetically generated data, without fine-tuning the generative model. Using balanced synthetic datasets generated by QDGS, we first debias classifiers trained on color-biased shape datasets as a proof-of-concept. By applying QDGS to facial data synthesis, we prompt for desired semantic concepts, such as skin tone and age, to create an intersectional dataset with a combined blend of visual features. Leveraging this balanced data for training classifiers improves fairness while maintaining accuracy on facial recognition benchmarks. Code available at: https://github.com/Cylumn/qd-generative-sampling

翻訳日:2023-12-25 16:23:48 公開日:2023-12-22

# GROOD:補間多様体における勾配認識外分布検出

GROOD: GRadient-aware Out-Of-Distribution detection in interpolated manifolds ( http://arxiv.org/abs/2312.14427v1 )

ライセンス: Link先を確認

Mostafa ElAraby, Sabyasachi Sahoo, Yann Pequignot, Paul Novello, Liam Paull

(参考訳) ディープニューラルネットワーク(DNN)は、オフ・オブ・ディストリビューション(OOD)サンプルの過信予測でサイレントに失敗することが多く、現実のデプロイメントにおいてリスクを生じさせる。既存の手法は主にDNNパラメータに関して計算される特徴表現空間や勾配ノルムを強調するが、それらは複雑な勾配分布と分類領域のトポロジを見落としている。このギャップに対処するために, 勾配空間の識別力に依存する新しい枠組みである補間多様体 (grood) における勾配認識外分布検出を導入する。このスペースを構築するために、GROODはOODの特徴を特に捉えるプロトタイプとともに、クラスプロトタイプに依存している。この手法はDNNの初期中間層において,IDとOODサンプル間の勾配空間の分離を改良するために,目的とする混合演算を取り入れている。トレーニングセットから最も近い隣接勾配までの距離を用いてOOD検出の有効性を定量化し,より堅牢なOODスコアを得た。実験的評価は、ターゲット入力混合の導入が勾配空間におけるIDとOODの分離を増幅し、多様なデータセット間で印象的な結果をもたらすことを裏付けるものである。特に、ImageNet-1kに対してベンチマークすると、GROODは最先端のベースラインの確立した堅牢性を上回る。本研究により,画像分類におけるDNNのOOD検出を向上するために,勾配空間とクラスプロトタイプを利用する方法を確立した。

Deep neural networks (DNNs) often fail silently with over-confident predictions on out-of-distribution (OOD) samples, posing risks in real-world deployments. Existing techniques predominantly emphasize either the feature representation space or the gradient norms computed with respect to DNN parameters, yet they overlook the intricate gradient distribution and the topology of classification regions. To address this gap, we introduce GRadient-aware Out-Of-Distribution detection in interpolated manifolds (GROOD), a novel framework that relies on the discriminative power of gradient space to distinguish between in-distribution (ID) and OOD samples. To build this space, GROOD relies on class prototypes together with a prototype that specifically captures OOD characteristics. Uniquely, our approach incorporates a targeted mix-up operation at an early intermediate layer of the DNN to refine the separation of gradient spaces between ID and OOD samples. We quantify OOD detection efficacy using the distance to the nearest neighbor gradients derived from the training set, yielding a robust OOD score. Experimental evaluations substantiate that the introduction of targeted input mix-upamplifies the separation between ID and OOD in the gradient space, yielding impressive results across diverse datasets. Notably, when benchmarked against ImageNet-1k, GROOD surpasses the established robustness of state-of-the-art baselines. Through this work, we establish the utility of leveraging gradient spaces and class prototypes for enhanced OOD detection for DNN in image classification.

翻訳日:2023-12-25 16:17:05 公開日:2023-12-22

# Room Occupency Prediction: 機械学習のパワーと時間的洞察を探る

Room Occupancy Prediction: Exploring the Power of Machine Learning and Temporal Insights ( http://arxiv.org/abs/2312.14426v1 )

ライセンス: Link先を確認

Siqi Mao, Yaping Yuan, Yinpu Li, Ziren Wang, Yuanxin Yao, Yixin Kang

(参考訳) 建物の省エネルギーは温室効果ガス排出対策や気候変動対策において最重要課題である。照明制御や気候調整といった行動を伴う部屋の効率の良い管理は、エネルギー消費を削減するための重要な戦略である。監視技術が実現できない状況では、部屋の占有率を推定するために非侵入センサーが使用される。本研究では,ランダムフォレストが連続的に最も高い予測精度を達成し,多様な機械学習モデルを用いた部屋占有率予測フレームワークを提案する。特にこのデータセットは、時間次元と空間次元の両方を包含し、豊富な情報を明らかにする。興味深いことに、我々のフレームワークは明示的な時間的モデリングがなくても堅牢な性能を示す。これらの発見は、従来の機械学習モデルの顕著な予測力を強調している。この成功は、特徴冗長性の存在、線形空間パターンと時間パターンの単純さ、高周波データサンプリングの利点に起因する。これらの結果は説得力があるが、時間次元を明示的にモデル化することで深い洞察を解き放ち、特定のシナリオにおける予測能力をさらに高める可能性があることには、オープンにしておくことが不可欠である。まとめると,本研究は,連続的および分類的タスクに対する予測フレームワークの有効性を検証するだけでなく,時間的側面の包含による改善の可能性も強調する。この研究は、エネルギー効率のよいプラクティスと部屋の占有管理を形作る機械学習の約束を強調している。

Energy conservation in buildings is a paramount concern to combat greenhouse gas emissions and combat climate change. The efficient management of room occupancy, involving actions like lighting control and climate adjustment, is a pivotal strategy to curtail energy consumption. In contexts where surveillance technology isn't viable, non-intrusive sensors are employed to estimate room occupancy. In this study, we present a predictive framework for room occupancy that leverages a diverse set of machine learning models, with Random Forest consistently achieving the highest predictive accuracy. Notably, this dataset encompasses both temporal and spatial dimensions, revealing a wealth of information. Intriguingly, our framework demonstrates robust performance even in the absence of explicit temporal modeling. These findings underscore the remarkable predictive power of traditional machine learning models. The success can be attributed to the presence of feature redundancy, the simplicity of linear spatial and temporal patterns, and the advantages of high-frequency data sampling. While these results are compelling, it's essential to remain open to the possibility that explicitly modeling the temporal dimension could unlock deeper insights or further enhance predictive capabilities in specific scenarios. In summary, our research not only validates the effectiveness of our prediction framework for continuous and classification tasks but also underscores the potential for improvements through the inclusion of temporal aspects. The study highlights the promise of machine learning in shaping energy-efficient practices and room occupancy management.

翻訳日:2023-12-25 16:16:38 公開日:2023-12-22

# ロジスティカル・ファンハウスにおける損失--合成メディア企業としての投機的デザイン

Lost in the Logistical Funhouse: Speculative Design as Synthetic Media Enterprise ( http://arxiv.org/abs/2312.14424v1 )

ライセンス: Link先を確認

Zoe Horn, Liam Magee, Anna Munster

(参考訳) ウォルマートなどの企業による調達交渉機関としてのチャットボットの展開から、オーバーブックされたフライトを管理するための「差別化されたチャット」を提供する自律エージェントに至るまで、合成メディアはロジスティクスの世界を「自然」な環境にしている。ここでは、商品、部品、労働の協調が問題を設計し、「ソリューション」を合成できるトレーニングセットを作成する。しかし、MidJourneyやOpenAIといったプロトプラットフォームや、Eleven LabsやD:IDといったアプリを通じて、合成メディアはどの程度まで、ロジスティックメディアとして理解されるのか? 本稿では,GPTをベースとしたロジスティクスデザインビジネス開発を支援するボットChatFOSを用いた合成メディア実験について述べる。素早い生成メディア出力を用いて、ロジカルワールド内のAIの出現する機能のシミュレーションとパロディを組み立てる。この過程では,大規模言語モデルがメディアルータやスイッチとなり,画像プロンプト,Webサイトコード,プロモーションコピー,投資家ピッチシナリオの生成を管理する過程が説明される。これらの要素は、企業ウェブサイトやプロモーションビデオなどのメディアアンサンブルにチェーン化され、当社が「設立」した架空の物流視覚化会社を刺激します。 ChatFOSを介して投機的シナリオを創出するプロセスと方法により,合成メディアをロジスティックメディアとして再配置する方法について考察する。我々の実験は、メディアのロジスティクスとメディアのロジスティクスの展開の仕方を探るものである。現代計算と資本の政治と美学について、ロジスティクスと合成メディリティの両面から(実践ベースで)具体的に説明できることは何だろうか?

From the deployment of chatbots as procurement negotiators by corporations such as Walmart to autonomous agents providing 'differentiated chat' for managing overbooked flights, synthetic media are making the world of logistics their 'natural' habitat. Here the coordination of commodities, parts and labour design the problems and produce the training sets from which 'solutions' can be synthesised. But to what extent might synthetic media, surfacing via proto-platforms such as MidJourney and OpenAI and apps such as Eleven Labs and D:ID, be understood as logistical media? This paper details synthetic media experiments with 'ChatFOS', a GPT-based bot tasked with developing a logistics design business. Using its prompt-generated media outputs, we assemble a simulation and parody of AI's emerging functionalities within logistical worlds. In the process, and with clunky 'human-in-the-loop' stitching, we illustrate how large language models become media routers or switches, governing production of image prompts, website code, promotional copy, and investor pitch scenarios. Together these elements become links chained together in media ensembles such as the corporate website or the promotional video, fuelling the fictive logistics visualisation company we have 'founded'. The processes and methods of producing speculative scenarios via ChatFOS lead us to consider how synthetic media might be re-positioned as logistical media. Our experiments probe the ways in which the media of logistics and the logistics of media are increasingly enfolded. We ask: what can a (practice-based) articulation of this double-becoming of logistics and synthetic mediality tell us about the politics and aesthetics of contemporary computation and capital?

翻訳日:2023-12-25 16:16:18 公開日:2023-12-22

# 機械生成指示の有効性

Efficacy of Machine-Generated Instructions ( http://arxiv.org/abs/2312.14423v1 )

ライセンス: Link先を確認

Samaksh Gulati and Anshit Verma and Manoj Parmar and Palash Chaudhary

(参考訳) 大きな"インストラクションチューニング"言語モデル(命令に応答するために微調整された)は、ゼロショットを新しいタスクに一般化する驚くべき能力を示している。それでも、それらはしばしば量、多様性、創造性に制限される人間による命令データに大きく依存しているため、チューニングされたモデルの一般化を妨げる。我々は,機械生成アノテーションの有効性を定量的に検討し,細調整されたBERTモデルと人間のv/s機械生成アノテーションとの比較を行った。我々の手法をバニラGPT-3モデルに適用すると、機械が生成したアノテーションは78.54%正確であり、微調整されたモデルは、人間のラベル付きアノテーションと比較して96.01%の性能を達成した。この結果は、マシン生成アノテーションがリソースであり、ダウンストリームモデルを微調整するコスト効率のよい方法であることを示している。

Large "instruction-tuned" language models (i.e., finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is often limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We conducted a quantitative study to figure out the efficacy of machine-generated annotations, where we compare the results of a fine-tuned BERT model with human v/s machine-generated annotations. Applying our methods to the vanilla GPT-3 model, we saw that machine generated annotations were 78.54% correct and the fine-tuned model achieved a 96.01% model performance compared to the performance with human-labelled annotations. This result shows that machine-generated annotations are a resource and cost effective way to fine-tune down-stream models.

翻訳日:2023-12-25 16:15:44 公開日:2023-12-22

# base-equivalent concept-relevance を用いた動作可能な形式的概念識別の強化

Enhancing Actionable Formal Concept Identification with Base-Equivalent Conceptual-Relevance ( http://arxiv.org/abs/2312.14421v1 )

ライセンス: Link先を確認

Ayao Bobi, Rokia Missaoui and Mohamed Hamza Ibrahim

(参考訳) 知識発見アプリケーションでは、データから生成されたパターンは極めて大きく、アナリストによる探索は困難である。形式的概念分析(FCA)フレームワークでは、安定性指標やその他の品質指標を通じて重要な形式的概念を特定する研究が行われている。本稿では,行動可能な概念の識別を改善するための新しい概念関連性指標であるBase-Equivalent Conceptual Relevance(BECR)スコアを紹介する。概念的観点からは、基本属性と等価属性は意味のある情報と見なされ、概念の概念的構造を維持するために非常に不可欠である。したがって、BECRの基本的な考え方は、より基本的で等価な属性と概念意図が持つ最小のジェネレータがより関連性が高いことである。そのため、BECRはこれらの属性と最小限のジェネレータを概念意図ごとに定量化する。合成および実世界のデータセットに関する予備実験は、よく知られた安定性指標と比較してBECRの効率を示す。

In knowledge discovery applications, the pattern set generated from data can be tremendously large and hard to explore by analysts. In the Formal Concept Analysis (FCA) framework, there have been studies to identify important formal concepts through the stability index and other quality measures. In this paper, we introduce the Base-Equivalent Conceptual Relevance (BECR) score, a novel conceptual relevance interestingness measure for improving the identification of actionable concepts. From a conceptual perspective, the base and equivalent attributes are considered meaningful information and are highly essential to maintain the conceptual structure of concepts. Thus, the basic idea of BECR is that the more base and equivalent attributes and minimal generators a concept intent has, the more relevant it is. As such, BECR quantifies these attributes and minimal generators per concept intent. Our preliminary experiments on synthetic and real-world datasets show the efficiency of BECR compared to the well-known stability index.

翻訳日:2023-12-25 16:15:28 公開日:2023-12-22

# 目標測度拡散写像のシャープな誤差推定とコミッタ問題への応用

Sharp error estimates for target measure diffusion maps with applications to the committor problem ( http://arxiv.org/abs/2312.14418v1 )

ライセンス: Link先を確認

Shashank Sule, Luke Evans and Maria Cameron

(参考訳) 重要サンプリングを特徴とする拡散マップの変種である目標測度拡散マップ(tmdmap,banisch et al. 2020)の一貫性エラーに対する漸近的に鋭い誤差推定を行い,任意の密度から入力データを抽出できるようにした。導出誤差推定にはバイアス誤差と分散誤差が含まれる。結果として得られる収束率はグラフラプラシアンの近似理論と一致する。結果の重要新しさは、先行項上のすべての前因子の明示的な定量化にある。また, TMDmapを用いて得られたディリクレBVPの解に対して, 解誤差が整合誤差によって制御されることを示す。これらの結果を用いて,遷移経路理論(tpt)の枠組みを用いた過減衰ランジュバン力学が制御する系における希少事象の解析におけるtmdmapの応用について検討した。 TPTの礎石成分はコミッタ問題の解であり、コルモゴロフ PDE に対する境界値問題である。注目すべきことに、TMDmapアルゴリズムは、プレファクタ式におけるいくつかのエラー項のキャンセルによるコミッタ問題に対するメッシュレス解法として特に適している。さらに, 準均一サンプリング密度を用いた場合, バイアスおよび分散誤差の顕著な改善が生じる。 TMDmapアルゴリズムの空間的均一な入力として$\delta$-netsを使用することで,これらの精度の向上が実現可能であることを示す。

We obtain asymptotically sharp error estimates for the consistency error of the Target Measure Diffusion map (TMDmap) (Banisch et al. 2020), a variant of diffusion maps featuring importance sampling and hence allowing input data drawn from an arbitrary density. The derived error estimates include the bias error and the variance error. The resulting convergence rates are consistent with the approximation theory of graph Laplacians. The key novelty of our results lies in the explicit quantification of all the prefactors on leading-order terms. We also prove an error estimate for solutions of Dirichlet BVPs obtained using TMDmap, showing that the solution error is controlled by consistency error. We use these results to study an important application of TMDmap in the analysis of rare events in systems governed by overdamped Langevin dynamics using the framework of transition path theory (TPT). The cornerstone ingredient of TPT is the solution of the committor problem, a boundary value problem for the backward Kolmogorov PDE. Remarkably, we find that the TMDmap algorithm is particularly suited as a meshless solver to the committor problem due to the cancellation of several error terms in the prefactor formula. Furthermore, significant improvements in bias and variance errors occur when using a quasi-uniform sampling density. Our numerical experiments show that these improvements in accuracy are realizable in practice when using $\delta$-nets as spatially uniform inputs to the TMDmap algorithm.

翻訳日:2023-12-25 16:15:13 公開日:2023-12-22

# パラメトリック駆動非線形共振器の臨界量子幾何テンソル

Critical quantum geometric tensors of parametrically-driven nonlinear resonators ( http://arxiv.org/abs/2312.14414v1 )

ライセンス: Link先を確認

Hao-Long Zhang, Jia-Hao Lv, Ken Chen, Xue-Jia Yu, Fan Wu, Zhen-Biao Yang, and Shi-Biao Zheng

(参考訳) パラメトリック駆動非線形共振器は、フォールトトレラント量子計算を実現するためのビルディングブロックであり、臨界量子センシングに有用である。基本的な観点からすると、そのような系の最も興味深い特徴はおそらく、他の量子系との相互作用なしに生じる臨界現象である。固有スペクトルの非解析的挙動は実質的に研究されているが、基底状態波動関数に関連するものはほとんど未調査のままである。量子基底状態幾何学的テンソルを指標として、駆動パラメータ $\varepsilon$ と phase $\phi$ を含む位相図を包括的に確立する。その結果、$\varepsilon$の増加に伴い、システムは通常から超ラジアント相への量子相転移を行い、臨界点は$\phi$の影響を受けないことが明らかとなった。さらに, 臨界指数とスケーリング次元は, 従来の作業と一致した, 厳密な数値的手法によって求めた。その結果、位相遷移は量子ラビモデルの普遍性クラスに含まれることがわかった。この研究は、量子計量とベリー曲率が量子相転移の様々な挙動を示すことを示した。

Parametrically driven nonlinear resonators represent a building block for realizing fault-tolerant quantum computation and are useful for critical quantum sensing. From a fundamental viewpoint, the most intriguing feature of such a system is perhaps the critical phenomena, which can occur without interaction with any other quantum system. The non-analytic behaviors of its eigenspectrum have been substantially investigated, but those associated with the ground state wavefunction have largely remained unexplored. Using the quantum ground state geometric tensor as an indicator, we comprehensively establish a phase diagram involving the driving parameter $\varepsilon$ and phase $\phi$. The results reveal that with the increase in $\varepsilon$, the system undergoes a quantum phase transition from the normal to the superradiant phase, with the critical point unaffected by $\phi$. Furthermore, the critical exponent and scaling dimension are obtained by an exact numerical method, which is consistent with previous works. Our numerical results show that the phase transition falls within the universality class of the quantum Rabi model. This work reveals that the quantum metric and Berry curvature display diverging behaviors across the quantum phase transition.

翻訳日:2023-12-25 16:14:48 公開日:2023-12-22

# マルチモーダル歩行認識のための多段適応型特徴融合ニューラルネットワーク

A Multi-Stage Adaptive Feature Fusion Neural Network for Multimodal Gait Recognition ( http://arxiv.org/abs/2312.14410v1 )

ライセンス: Link先を確認

Shinan Zou and Jianbo Xiong and Chao Fan and Shiqi Yu and Jin Tang

(参考訳) 歩行認識は生体計測技術であり、広く注目を集めている。多くの既存の歩行認識アルゴリズムは単調であり、少数のマルチモーダル歩行認識アルゴリズムは一度だけマルチモーダル融合を行う。これらのアルゴリズムは、複数のモダリティの相補的な利点を完全に活用することができない。本稿では,歩行データの時間的・空間的特性を考慮して,特徴抽出過程において異なる段階のマルチモーダル融合を行う多段特徴融合戦略(msffs)を提案する。また,シルエットと骨格のセマンティックな関連性を考慮したAFFM(Adaptive Feature fusion Module)を提案する。融合プロセスは、より関連する骨格関節と異なるシルエット領域を融合する。歩行時間における視覚的外見の変化と時間経過が共起しているため,空間時空間特徴抽出器(MSSTFE)を提案する。特に、MSSTFEは異なる空間スケールで時空間リンク情報を抽出し集約する。上記の戦略とモジュールを組み合わせることで,多段階適応機能融合(MSAFF)ニューラルネットワークを提案する。さらに、MSAFFは特徴次元プーリング(FDプール)を備えており、精度を損なうことなく歩行表現の寸法を大幅に削減することができる。 https://github.com/ShinanZou/MSAFF

Gait recognition is a biometric technology that has received extensive attention. Most existing gait recognition algorithms are unimodal, and a few multimodal gait recognition algorithms perform multimodal fusion only once. None of these algorithms may fully exploit the complementary advantages of the multiple modalities. In this paper, by considering the temporal and spatial characteristics of gait data, we propose a multi-stage feature fusion strategy (MSFFS), which performs multimodal fusions at different stages in the feature extraction process. Also, we propose an adaptive feature fusion module (AFFM) that considers the semantic association between silhouettes and skeletons. The fusion process fuses different silhouette areas with their more related skeleton joints. Since visual appearance changes and time passage co-occur in a gait period, we propose a multiscale spatial-temporal feature extractor (MSSTFE) to learn the spatial-temporal linkage features thoroughly. Specifically, MSSTFE extracts and aggregates spatial-temporal linkages information at different spatial scales. Combining the strategy and modules mentioned above, we propose a multi-stage adaptive feature fusion (MSAFF) neural network, which shows state-of-the-art performance in many experiments on three datasets. Besides, MSAFF is equipped with feature dimensional pooling (FD Pooling), which can significantly reduce the dimension of the gait representations without hindering the accuracy. https://github.com/ShinanZou/MSAFF

翻訳日:2023-12-25 16:14:26 公開日:2023-12-22

# サービス効率と平等のバランスをとるための拡張p中間問題

Extended p-median problems for balancing service efficiency and equality ( http://arxiv.org/abs/2312.14408v1 )

ライセンス: Link先を確認

Yunfeng Kong, Chenchen Lian, Guangli Zhang, Shiyan Zhai

(参考訳) この記事では、サービス効率と平等のバランスをとるためのロケーション問題を取り上げます。公共サービスシステムでは、他のサービスにアクセスするのに長い旅行距離が必要な場合、うらやましい思いをする人もいます。走行距離をサービス施設としきい値距離と比較することにより、エンビーの強度を測定することができる。サービス効率と等価性の間のトレードオフに関して,全エンビー関数を用いて4つの拡張p中間問題を提案する。新しい問題の5つの分析的性質は数学的に証明されている。新しい問題は、よく設計された3つのインスタンスでテストされた。実験により,旅行コストと空間的エンビーを最小化することにより,標準偏差,平均絶対偏差,旅行距離間のジーニ係数などの等式を著しく改善できることを示した。また, 施設数の観点からサービス供給が提供される場合, 走行距離をわずかに増加させることでサービス平等性を大幅に向上させることができることを示した。施設数でサービス供給量が増えると、サービス効率と空間平等の両方を著しく向上させることができる。

This article deals with the location problem for balancing the service efficiency and equality. In public service systems, some people may feel envy in case that they need longer travel distance to access services than others. The strength of the envy can be measured by comparing one's travel distance to service facility with a threshold distance. Using the total envy function, four extended p-median problems are proposed for trade-off between service efficiency and equality. Five analytical properties of the new problems are mathematically proven. The new problems were tested on three sets of well-designed instances. The experimentation shows that the equality measures, such as the standard deviation, the mean absolute deviation, and the Gini coefficient between travel distances, can be substantially improved by minimizing the travel cost and the spatial envy. The experimentation also shows that, when the service supply is given in terms of the number of facilities, the service equality can be considerably improved by slightly increasing the travel distance. When the service supply is increased in terms of the number of facilities, both the service efficiency and spatial equality can be significantly improved.

翻訳日:2023-12-25 16:14:04 公開日:2023-12-22

# advcloak:プライバシー保護のためにカスタマイズされたadversarial cloak

AdvCloak: Customized Adversarial Cloak for Privacy Protection ( http://arxiv.org/abs/2312.14407v1 )

ライセンス: Link先を確認

Xuannan Liu and Yaoyao Zhong and Xing Cui and Yuhang Zhang and Peipei Li and Weihong Deng

(参考訳) ソーシャルメディアで広範な顔画像が共有されているため、プライバシーに関する懸念が顕著に高まっている。本稿では,生成モデルを用いたプライバシー保護のための革新的なフレームワークであるAdvCloakを提案する。 AdvCloakは、機能レベルの一般化機能を提供しながら、優れた画像レベルの自然性を維持することができる、クラスワイドの対向マスクを自動でカスタマイズするように設計されている。具体的には、AdvCloakは、2段階のトレーニング戦略を用いて、生成する敵ネットワークを逐次最適化する。この戦略は、最初は、イメージ固有のトレーニングを通じて、個々の顔にマスクを適応させることに焦点を当て、続いて、特徴レベルの一般化能力を、個人固有のトレーニングを通じて、個人の顔の多様なバリエーションに拡張する。限られたトレーニングデータを完全に活用するために,AdvCloakと幾何的モデリング手法を組み合わせることで,情報源の特徴部分空間をより正確に記述する。 AdvCloakが既存の最先端の手法よりも効率と有効性で優れていることを示す。

With extensive face images being shared on social media, there has been a notable escalation in privacy concerns. In this paper, we propose AdvCloak, an innovative framework for privacy protection using generative models. AdvCloak is designed to automatically customize class-wise adversarial masks that can maintain superior image-level naturalness while providing enhanced feature-level generalization ability. Specifically, AdvCloak sequentially optimizes the generative adversarial networks by employing a two-stage training strategy. This strategy initially focuses on adapting the masks to the unique individual faces via image-specific training and then enhances their feature-level generalization ability to diverse facial variations of individuals via person-specific training. To fully utilize the limited training data, we combine AdvCloak with several general geometric modeling methods, to better describe the feature subspace of source identities. Extensive quantitative and qualitative evaluations on both common and celebrity datasets demonstrate that AdvCloak outperforms existing state-of-the-art methods in terms of efficiency and effectiveness.

翻訳日:2023-12-25 16:13:47 公開日:2023-12-22

# スケールにおける生成的事前学習: フラッド検出のためのトランザクショナル・ビヘイビアの符号化

Generative Pretraining at Scale: Transformer-Based Encoding of Transactional Behavior for Fraud Detection ( http://arxiv.org/abs/2312.14406v1 )

ライセンス: Link先を確認

Ze Yu Zhao (1), Zheng Zhu (1), Guilin Li (1), Wenhan Wang (1), Bo Wang (1) ((1) Tencent, WeChat Pay)

(参考訳) 本稿では,支払いシステムにおける不正検出に適したgpt(generative pretrained transformer)アーキテクチャを活用した,革新的な自己回帰モデルを提案する。本手法は,トークン爆発に対して革新的に対処し,行動シーケンスを再構築し,時間的および文脈的分析によるトランザクション動作の微妙な理解を提供する。教師なし事前トレーニングを利用することで,ラベル付きデータを必要とせず,特徴表現に優れる。さらに,中国最大のオンライン決済業者のセキュリティと有効性を促進し,異常検出を強化するための差分畳み込みアプローチを統合する。我々のモデルのスケーラビリティと適応性は、様々なトランザクションコンテキストにおける幅広い適用性を約束します。

In this work, we introduce an innovative autoregressive model leveraging Generative Pretrained Transformer (GPT) architectures, tailored for fraud detection in payment systems. Our approach innovatively confronts token explosion and reconstructs behavioral sequences, providing a nuanced understanding of transactional behavior through temporal and contextual analysis. Utilizing unsupervised pretraining, our model excels in feature representation without the need for labeled data. Additionally, we integrate a differential convolutional approach to enhance anomaly detection, bolstering the security and efficacy of one of the largest online payment merchants in China. The scalability and adaptability of our model promise broad applicability in various transactional contexts.

翻訳日:2023-12-25 16:13:28 公開日:2023-12-22

# グラフ注意に基づくアナログ回路の対称性制約抽出

Graph Attention-Based Symmetry Constraint Extraction for Analog Circuits ( http://arxiv.org/abs/2312.14405v1 )

ライセンス: Link先を確認

Qi Xu, Lijie Wang, Jing Wang, Song Chen, Lin Cheng, Yi Kang

(参考訳) 近年、アナログ回路は広く注目され、多くの新興アプリケーションで広く利用されている。アナログ回路の高需要は、より短い回路設計サイクルを必要とする。所望のパフォーマンスと仕様を達成するためには、アナログレイアウトプロセス中に様々な幾何学的対称性の制約を慎重に考慮する必要がある。しかし、経験豊富なアナログエンジニアによるこれらの制約の手動ラベリングは、手間と時間がかかるプロセスである。本稿では,アナログ回路レイアウトにおける対称制約を自動的に抽出するグラフベースの学習フレームワークを提案する。提案フレームワークは,回路の接続特性とデバイス情報を利用して対称制約の一般的な規則を学習し,回路網上のデバイスレベルの制約を効果的に抽出する。実験結果は,最先端の対称制約検出手法と比較して,高い精度と低い偽陽性率を実現することを実証した。

In recent years, analog circuits have received extensive attention and are widely used in many emerging applications. The high demand for analog circuits necessitates shorter circuit design cycles. To achieve the desired performance and specifications, various geometrical symmetry constraints must be carefully considered during the analog layout process. However, the manual labeling of these constraints by experienced analog engineers is a laborious and time-consuming process. To handle the costly runtime issue, we propose a graph-based learning framework to automatically extract symmetric constraints in analog circuit layout. The proposed framework leverages the connection characteristics of circuits and the devices'information to learn the general rules of symmetric constraints, which effectively facilitates the extraction of device-level constraints on circuit netlists. The experimental results demonstrate that compared to state-of-the-art symmetric constraint detection approaches, our framework achieves higher accuracy and lower false positive rate.

翻訳日:2023-12-25 16:13:16 公開日:2023-12-22

# クロスコヴァリエートな歩行認識:ベンチマーク

Cross-Covariate Gait Recognition: A Benchmark ( http://arxiv.org/abs/2312.14404v1 )

ライセンス: Link先を確認

Shinan Zou and Chao Fan and Jianbo Xiong and Chuanfu Shen and Shiqi Yu and Jin Tang

(参考訳) 歩行データセットは歩行研究に不可欠である。しかし,本研究では,従来の制約付きデータセットや新興実世界のデータセットが,共変量多様性に関して不足していることを示す。このギャップを埋めるため、私たちは、CCGRデータセットの収集に20ヶ月の懸命な努力を払っています。 CCGRデータセットには970人の被験者と約1.6万のシーケンスがあり、ほぼすべての被験者は33のビューと53の異なる共変体を持っている。既存のデータセットと比較すると、CCGRは個体数と個体レベルの多様性の両方を持っている。さらに、ビューとコ変数はよくラベル付けされ、異なる要因の影響を分析することができる。 CCGRは、RGB、パース、シルエット、ポーズなど、さまざまな種類の歩行データを提供し、研究者に探索のための包括的なリソースを提供する。本稿では,新たに提案する解析データを用いて,多変量歩行認識に深く取り組むために,解析に基づく歩行認識(parsinggait)を提案する。我々は広範な実験を行った。私たちの主な結果は以下のとおりです。 1) 歩行認識の実用的応用において, クロスコヴァリエートが重要な課題として出現する。 2)ParsingGaitは,さらなる進歩の可能性を示す。 3)既存のSOTA法はCCGRで43%未満の精度を達成し,クロスコバルト歩行認識の緊急性を強調した。リンク: https://github.com/shinanzou/ccgr。

Gait datasets are essential for gait research. However, this paper observes that present benchmarks, whether conventional constrained or emerging real-world datasets, fall short regarding covariate diversity. To bridge this gap, we undertake an arduous 20-month effort to collect a cross-covariate gait recognition (CCGR) dataset. The CCGR dataset has 970 subjects and about 1.6 million sequences; almost every subject has 33 views and 53 different covariates. Compared to existing datasets, CCGR has both population and individual-level diversity. In addition, the views and covariates are well labeled, enabling the analysis of the effects of different factors. CCGR provides multiple types of gait data, including RGB, parsing, silhouette, and pose, offering researchers a comprehensive resource for exploration. In order to delve deeper into addressing cross-covariate gait recognition, we propose parsing-based gait recognition (ParsingGait) by utilizing the newly proposed parsing data. We have conducted extensive experiments. Our main results show: 1) Cross-covariate emerges as a pivotal challenge for practical applications of gait recognition. 2) ParsingGait demonstrates remarkable potential for further advancement. 3) Alarmingly, existing SOTA methods achieve less than 43% accuracy on the CCGR, highlighting the urgency of exploring cross-covariate gait recognition. Link: https://github.com/ShinanZou/CCGR.

翻訳日:2023-12-25 16:13:03 公開日:2023-12-22

# フェアネスフェア:人間の認識を集団的意思決定に持ち込む

The Fairness Fair: Bringing Human Perception into Collective Decision-Making ( http://arxiv.org/abs/2312.14402v1 )

ライセンス: Link先を確認

Hadi Hosseini

(参考訳) 公正は集団意思決定において最も望ましい社会的原則の1つである。過去数十年間、その公理的性質について広く研究され、アルゴリズム決定における理論的・計算的な側面から、近年、マルチエージェントシステムコミュニティからかなりの注目を集めている。しかし、これらの研究はしばしば、現実世界の問題の曖昧な性質における人間の公正性に対する認識の複雑さを捉えるのに十分ではない。我々は、公正な解決策は、社会的プランナー(設計者)によって望ましいものとみなすだけでなく、人間と社会的認知によって支配され、人間の判断に基づいて認識された結果が検討され、検証可能であるべきであると論じる。この目標を達成するには、コンピューティングやAIから行動経済学、人間とAIの相互作用まで幅広い学際的なアプローチが必要である。その際,現在のフェア・ディビジョン文学の欠点と長期的な課題を特定し,最近の取り組みを解説し,さらに重要なこととして,一連のオープン・リサーチの方向性を強調する。

Fairness is one of the most desirable societal principles in collective decision-making. It has been extensively studied in the past decades for its axiomatic properties and has received substantial attention from the multiagent systems community in recent years for its theoretical and computational aspects in algorithmic decision-making. However, these studies are often not sufficiently rich to capture the intricacies of human perception of fairness in the ambivalent nature of the real-world problems. We argue that not only fair solutions should be deemed desirable by social planners (designers), but they should be governed by human and societal cognition, consider perceived outcomes based on human judgement, and be verifiable. We discuss how achieving this goal requires a broad transdisciplinary approach ranging from computing and AI to behavioral economics and human-AI interaction. In doing so, we identify shortcomings and long-term challenges of the current literature of fair division, describe recent efforts in addressing them, and more importantly, highlight a series of open research directions.

翻訳日:2023-12-25 16:12:39 公開日:2023-12-22

# CLIPのバックボーン効果の解明 : 表現の相乗効果と変異

Unveiling Backbone Effects in CLIP: Exploring Representational Synergies and Variances ( http://arxiv.org/abs/2312.14400v1 )

ライセンス: Link先を確認

Cristian Rodriguez-Opazo and Edison Marrese-Taylor and Ehsan Abbasnejad and Hamed Damirchi and Ignacio M. Jara and Felipe Bravo-Marquez and Anton van den Hengel

(参考訳) コントラスト言語-画像事前学習(CLIP)は画像表現学習において顕著な手法である。ビジョントランスフォーマー(ViT)やResNetsのような畳み込みネットワーク(ConvNet)といったトランスフォーマーベースのモデルにまたがるさまざまなニューラルネットワークは、CLIPでトレーニングされ、さまざまなビジョンタスクにわたって普遍的なバックボーンとして機能する。同じデータとトレーニング目標を活用しているにも関わらず、これらのアーキテクチャによって学習される表現の有効性は重要な疑問を提起する。本研究は,これらのバックボーンアーキテクチャ間のクリップ性能の違いを調査し,その分類の相違を明らかにした。特に、これらの表現の正規化は、かなりの性能変化をもたらす。その結果,適切なバックボーンの選択により20%以上の改善が期待できるバックボーン予測の相乗効果が顕著に示された。さらに,複数のバックボーンからの予測を組み合わせれば,最大6.34倍の性能向上が期待できる,単純かつ効果的な手法を提案する。結果を再現するためのコードをリリースします。

Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning. Various neural architectures, spanning Transformer-based models like Vision Transformers (ViTs) to Convolutional Networks (ConvNets) like ResNets, are trained with CLIP and serve as universal backbones across diverse vision tasks. Despite utilizing the same data and training objectives, the effectiveness of representations learned by these architectures raises a critical question. Our investigation explores the differences in CLIP performance among these backbone architectures, revealing significant disparities in their classifications. Notably, normalizing these representations results in substantial performance variations. Our findings showcase a remarkable possible synergy between backbone predictions that could reach an improvement of over 20% through informed selection of the appropriate backbone. Moreover, we propose a simple, yet effective approach to combine predictions from multiple backbones, leading to a notable performance boost of up to 6.34\%. We will release the code for reproducing the results.

翻訳日:2023-12-25 16:12:21 公開日:2023-12-22

# 多様化による適応的微分進化:最適化への挑戦

Adaptive Differential Evolution with Diversification: Addressing Optimization Challenges ( http://arxiv.org/abs/2312.14464v1 )

ライセンス: Link先を確認

Sarit Maitra

(参考訳) 微分進化(DE)アルゴリズムの既存の変種は、局所探索の貧弱さや早期収束に対する感受性など、一定の制限がある。本研究では,周辺構造を動的に修飾する手法であるaded(adaptive differential evolution with diversification)を提案する。凸関数と非凸関数の両方を扱うために開発されたADEDは、Rosenbrock、Rastrigin、Ackley、DeVilliers-Glasser02を含む22のベンチマーク関数で検証されている。開発はGoogle CloudでJupyter NotebookとPython v3.10.12を使って行われ、マルチオブジェクトベンチマークのZDTテストスイートで追加のテストが行われた。 ADEDは適応的かつ多様なアプローチで、適応的突然変異とクロスオーバーレート、多様な突然変異戦術、多様化測定、局所探索機構、収束監視を含む。これらの特徴の組み合わせは、複雑で多様な風景をナビゲートするADEDの有効性を強化し、単目的と多目的の両方の最適化シナリオにおける課題に対処するための有望なツールとして位置づけている。

The existing variants of the Differential Evolution (DE) algorithm come with certain limitations, such as poor local search and susceptibility to premature convergence. This study introduces Adaptive Differential Evolution with Diversification (ADED), a method that dynamically modifies the neighborhood structure by evaluating the trial solutions' fitness. Developed to work with both convex and nonconvex objective functions, ADED is validated with 22 benchmark functions, including Rosenbrock, Rastrigin, Ackley, and DeVilliers-Glasser02. The development is carried out in Google Cloud using Jupyter Notebook and Python v3.10.12, with additional testing conducted on the multi-objective benchmark ZDT test suite. ADED distinguishes itself with its adaptive and diverse approach, which includes adaptive mutation and crossover-rates, diverse mutation tactics, diversification measurements, local search mechanisms, and convergence monitoring. The unique combination of these features collectively enhances ADED's effectiveness in navigating complex and diverse landscapes, positioning it as a promising tool for addressing challenges in both single- and multi-objective optimization scenarios.

翻訳日:2023-12-25 16:05:30 公開日:2023-12-22

# ビザンチン系ロバスト集団の高次元攻撃

Attacking Byzantine Robust Aggregation in High Dimensions ( http://arxiv.org/abs/2312.14461v1 )

ライセンス: Link先を確認

Sarthak Choudhary, Aashish Kolluri and Prateek Saxena

(参考訳) 現代のニューラルネットワークやモデルのトレーニングには、一般的に高次元ベクトルのサンプルを平均化する必要がある。毒殺攻撃は、モデルトレーニングに使用される平均ベクターを歪めたり偏ったりし、モデルに特定のパターンを学習させたり、有用なものを学ぶのを避けたりする。ビザンチンのロバストアグリゲーションは、そのようなバイアスに対するアルゴリズムによる防御である。ロバストアグリゲータは、たとえ一部の入力が任意に破損したとしても、平均のような中央値統計計算における最大バイアスを制限できる。このようなアグリゲータの設計は、高次元を扱う場合に難しい。しかし、バイアスの強い理論的境界を持つ最初の多項式時間アルゴリズムが最近提案されている。彼らの境界線は数次元とは無関係であり、防衛戦における毒殺の威力に対する概念的な制限を約束している。本稿では,次元非依存バイアスの主張を覆す強力な防御の実現に向けたHIDRAと呼ばれる新たな攻撃を示す。 HIDRAは、それまでの情報理論分析には関心がなかった、新しい計算ボトルネックを強調している。実験結果から,本攻撃はモデル性能をほぼ完全に破壊するが,同じ目標を持つ既存攻撃は大きな効果をもたらさないことが示された。我々の発見は、毒殺と証明可能な防御の間の武器競争を広範囲に開放している。

Training modern neural networks or models typically requires averaging over a sample of high-dimensional vectors. Poisoning attacks can skew or bias the average vectors used to train the model, forcing the model to learn specific patterns or avoid learning anything useful. Byzantine robust aggregation is a principled algorithmic defense against such biasing. Robust aggregators can bound the maximum bias in computing centrality statistics, such as mean, even when some fraction of inputs are arbitrarily corrupted. Designing such aggregators is challenging when dealing with high dimensions. However, the first polynomial-time algorithms with strong theoretical bounds on the bias have recently been proposed. Their bounds are independent of the number of dimensions, promising a conceptual limit on the power of poisoning attacks in their ongoing arms race against defenses. In this paper, we show a new attack called HIDRA on practical realization of strong defenses which subverts their claim of dimension-independent bias. HIDRA highlights a novel computational bottleneck that has not been a concern of prior information-theoretic analysis. Our experimental evaluation shows that our attacks almost completely destroy the model performance, whereas existing attacks with the same goal fail to have much effect. Our findings leave the arms race between poisoning attacks and provable defenses wide open.

翻訳日:2023-12-25 16:05:08 公開日:2023-12-22

# ヒト脳波とTD3深部強化学習における共有自律性のためのマルチエージェントコパイロットアプローチ

Multiagent Copilot Approach for Shared Autonomy between Human EEG and TD3 Deep Reinforcement Learning ( http://arxiv.org/abs/2312.14458v1 )

ライセンス: Link先を確認

Chun-Ren Phang and Akimasa Hirata

(参考訳) 深層強化学習(RL)アルゴリズムは、環境と対話できる完全自律エージェントの開発を可能にする。脳コンピュータインタフェース(BCI)システムは、明示的な環境に関係なく人間の暗黙の脳信号を解読する。本研究では,deep rlとbciを統合し,環境要因を考慮し,自律系における有益なヒューマン介入と脳活動のデコード性能を向上させる。人体の脳波(EEG)からデコードされた作用指令と、与えられた環境に対する双発遅延DDPG(TD3)エージェントから生成された作用との間には、共有自律性が認められた。提案手法は,EEG(EEG-NB)やTD3(TD3制御)よりも有意に優れていた。 co-fbモデルは、eeg-nbモデルよりも高い目標接近スコア、低い故障率、低いヒューマンワークロードを達成した。 Co-FB制御方式はTD3モデルよりも目に見える目標スコアと人間の介入のレベルが高い。また,エージェント決定の矛盾が副操縦士モデルの制御精度と権限に与える影響を評価するために,差分d-インデックスを提案した。我々は,TD3エージェントの制御権限と,d-インデックスに対するヒト脳波分類の性能改善との間に有意な相関が認められた。また,制御権限をtd3エージェントに移行することで,bci復号が最適でない場合の性能が向上した。これらの結果から, コンピロシステムは複雑な環境を効果的に扱えること, 環境要因を考慮したBCI性能の向上が期待できることがわかった。今後の作業は、協調動作の性能を評価するために、連続的な行動空間と異なるマルチエージェントアプローチを採用するべきである。

Deep reinforcement learning (RL) algorithms enable the development of fully autonomous agents that can interact with the environment. Brain-computer interface (BCI) systems decipher human implicit brain signals regardless of the explicit environment. In this study, we integrated deep RL and BCI to improve beneficial human interventions in autonomous systems and the performance in decoding brain activities by considering environmental factors. Shared autonomy was allowed between the action command decoded from the electroencephalography (EEG) of the human agent and the action generated from the twin delayed DDPG (TD3) agent for a given environment. Our proposed copilot control scheme with a full blocker (Co-FB) significantly outperformed the individual EEG (EEG-NB) or TD3 control. The Co-FB model achieved a higher target approaching score, lower failure rate, and lower human workload than the EEG-NB model. The Co-FB control scheme had a higher invisible target score and level of allowed human intervention than the TD3 model. We also proposed a disparity d-index to evaluate the effect of contradicting agent decisions on the control accuracy and authority of the copilot model. We found a significant correlation between the control authority of the TD3 agent and the performance improvement of human EEG classification with respect to the d-index. We also observed that shifting control authority to the TD3 agent improved performance when BCI decoding was not optimal. These findings indicate that the copilot system can effectively handle complex environments and that BCI performance can be improved by considering environmental factors. Future work should employ continuous action space and different multi-agent approaches to evaluate copilot performance.

翻訳日:2023-12-25 16:04:47 公開日:2023-12-22

# QuaR-VLA:四足歩行ロボットの視覚言語行動モデル

QUAR-VLA: Vision-Language-Action Model for Quadruped Robots ( http://arxiv.org/abs/2312.14457v1 )

ライセンス: Link先を確認

Pengxiang Ding, Han Zhao, Zhitao Wang, Zhenyu Wei, Shangke Lyu, Donglin Wang

(参考訳) ロボット知性の重要な発現は、自然と対話し、自律的に意思決定する能力である。従来のロボット制御のアプローチは、知覚、計画、意思決定を分割し、システム設計を単純化するが、異なる情報ストリーム間のシナジーを制限する。この区画化は、シームレスな自律的推論、意思決定、行動実行を達成する上での課題を提起する。これらの制約に対処するために、Quadruped Robots (QUAR-VLA) のためのビジョン・ランゲージ・アクションタスクと呼ばれる新しいパラダイムが導入された。このアプローチでは、視覚情報と指示を密に統合して実行可能なアクションを生成し、知覚、計画、意思決定を効果的に融合する。中心となるアイデアは、ロボット全体の知性を高めることだ。この枠組みの中で注目すべき課題は、きめ細かい指示を視覚的知覚情報と整合させることである。これは、ロボットが視覚観察と調和して詳細な指示を正しく解釈し行動することを保証するのに必要な複雑さを強調している。そこで本研究では,VLAモデルのファミリーである Quadruped Robotic Transformer (QUART) を提案し,実世界のロボットの入力として様々なモードから視覚情報と指示を統合し,実世界のロボットに対して実行可能なアクションを生成するとともに, quadruped Robot Dataset (QUARD) を提示する。評価試験(4000回)により,本手法がロボットの能力向上に寄与し,QUIRTが創発的能力の獲得を可能にした。

The important manifestation of robot intelligence is the ability to naturally interact and autonomously make decisions. Traditional approaches to robot control often compartmentalize perception, planning, and decision-making, simplifying system design but limiting the synergy between different information streams. This compartmentalization poses challenges in achieving seamless autonomous reasoning, decision-making, and action execution. To address these limitations, a novel paradigm, named Vision-Language-Action tasks for QUAdruped Robots (QUAR-VLA), has been introduced in this paper. This approach tightly integrates visual information and instructions to generate executable actions, effectively merging perception, planning, and decision-making. The central idea is to elevate the overall intelligence of the robot. Within this framework, a notable challenge lies in aligning fine-grained instructions with visual perception information. This emphasizes the complexity involved in ensuring that the robot accurately interprets and acts upon detailed instructions in harmony with its visual observations. Consequently, we propose QUAdruped Robotic Transformer (QUART), a family of VLA models to integrate visual information and instructions from diverse modalities as input and generates executable actions for real-world robots and present QUAdruped Robot Dataset (QUARD), a large-scale multi-task dataset including navigation, complex terrain locomotion, and whole-body manipulation tasks for training QUART models. Our extensive evaluation (4000 evaluation trials) shows that our approach leads to performant robotic policies and enables QUART to obtain a range of emergent capabilities.

翻訳日:2023-12-25 16:04:18 公開日:2023-12-22

# 分散検出のための次元の呪いを克服する方法

How to Overcome Curse-of-Dimensionality for Out-of-Distribution Detection? ( http://arxiv.org/abs/2312.14452v1 )

ライセンス: Link先を確認

Soumya Suvra Ghosal, Yiyou Sun, and Yixuan Li

(参考訳) ワイルドにデプロイされた機械学習モデルは、未知のクラスからのout-of-distribution(ood)データに挑戦できる。 OOD検出の最近の進歩は、分布内(ID)データから比較的離れたサンプルを識別するための距離測定に依存している。約束にもかかわらず、距離ベースの手法は、高次元の特徴空間における有効性を制限している次元の呪いに悩まされる。この問題に対処するために,OOD検出のための新しいフレームワーク,Subspace Nearest Neighbor (SNN)を提案する。トレーニングにおいて,本手法は,次元の最も関連性の高い部分集合(部分空間)を活用することにより,モデルとその特徴表現を正規化する。サブスペース学習は、IDとOODデータの間の高度に区別可能な距離測定を行う。我々はSNNの有効性を検証するための総合的な実験と改善を行った。現在の最良の距離ベースの手法と比較して、SNNはCIFAR-100ベンチマークで平均FPR95を15.96%削減している。

Machine learning models deployed in the wild can be challenged by out-of-distribution (OOD) data from unknown classes. Recent advances in OOD detection rely on distance measures to distinguish samples that are relatively far away from the in-distribution (ID) data. Despite the promise, distance-based methods can suffer from the curse-of-dimensionality problem, which limits the efficacy in high-dimensional feature space. To combat this problem, we propose a novel framework, Subspace Nearest Neighbor (SNN), for OOD detection. In training, our method regularizes the model and its feature representation by leveraging the most relevant subset of dimensions (i.e. subspace). Subspace learning yields highly distinguishable distance measures between ID and OOD data. We provide comprehensive experiments and ablations to validate the efficacy of SNN. Compared to the current best distance-based method, SNN reduces the average FPR95 by 15.96% on the CIFAR-100 benchmark.

翻訳日:2023-12-25 16:03:48 公開日:2023-12-22

# セッションベースレコメンデーションにおけるアンラーニングの効果について

On the Effectiveness of Unlearning in Session-Based Recommendation ( http://arxiv.org/abs/2312.14447v1 )

ライセンス: Link先を確認

Xin Xin, Liu Yang, Ziqi Zhao, Pengjie Ren, Zhumin Chen, Jun Ma, Zhaochun Ren

(参考訳) セッションベースのレコメンデーションは、セッション内の前のインタラクションからユーザの将来の関心を予測する。歴史的なサンプルを記憶しているにも関わらず、特定のトレーニングサンプルの影響を取り除こうとする未学習の要求も、ユーザのプライバシやモデルの忠実性といった理由から発生する。しかし、未学習に関する既存の研究はセッションベースの推薦には適していない。一方、これらの手法は、セッション中の未学習項目と残りの項目との協調的相関や逐次的接続により、未学習効果を満足することができない。一方,セッションベースのレコメンデーションシナリオにおいて,未学習の有効性を検証する研究はほとんど行われていない。本稿では,セッションベースレコメンデーションにおける高い学習効率,正確なレコメンデーション性能,学習効率の向上を実現する,セッションベースのレコメンデーションアンラーニングフレームワークsruを提案する。具体的には、まず、セッション間の類似性に応じてトレーニングセッションを個別のサブモデルに分割し、次に、セッションとサブモデル内のデータのセントロイドの相関関係に応じて隠れた状態を融合させる注意ベースの集約層を利用する。さらに,未学習の有効性を向上させるために,協調追加削除(ced),隣接追加削除(ned),ランダム追加削除(red)という3つの追加データ削除戦略を提案する。さらに,データ削除後に未学習サンプルを推測できるかどうかを測定し,未学習の有効性を検証する評価指標を提案する。 3つの代表的なセッションベースレコメンデーションモデルでSRUを実装し、3つのベンチマークデータセットで実験を行う。実験の結果,本手法の有効性が示された。

Session-based recommendation predicts users' future interests from previous interactions in a session. Despite the memorizing of historical samples, the request of unlearning, i.e., to remove the effect of certain training samples, also occurs for reasons such as user privacy or model fidelity. However, existing studies on unlearning are not tailored for the session-based recommendation. On the one hand, these approaches cannot achieve satisfying unlearning effects due to the collaborative correlations and sequential connections between the unlearning item and the remaining items in the session. On the other hand, seldom work has conducted the research to verify the unlearning effectiveness in the session-based recommendation scenario. In this paper, we propose SRU, a session-based recommendation unlearning framework, which enables high unlearning efficiency, accurate recommendation performance, and improved unlearning effectiveness in session-based recommendation. Specifically, we first partition the training sessions into separate sub-models according to the similarity across the sessions, then we utilize an attention-based aggregation layer to fuse the hidden states according to the correlations between the session and the centroid of the data in the sub-model. To improve the unlearning effectiveness, we further propose three extra data deletion strategies, including collaborative extra deletion (CED), neighbor extra deletion (NED), and random extra deletion (RED). Besides, we propose an evaluation metric that measures whether the unlearning sample can be inferred after the data deletion to verify the unlearning effectiveness. We implement SRU with three representative session-based recommendation models and conduct experiments on three benchmark datasets. Experimental results demonstrate the effectiveness of our methods.

翻訳日:2023-12-25 16:03:33 公開日:2023-12-22

# modality-aware fusion networkと大規模データセットによるクロスモーダルオブジェクト追跡

Cross-Modal Object Tracking via Modality-Aware Fusion Network and A Large-Scale Dataset ( http://arxiv.org/abs/2312.14446v1 )

ライセンス: Link先を確認

Lei Liu, Mengya Zhang, Cheng Li, Chenglong Li, and Jin Tang

(参考訳) ビジュアルトラッキングは、RGB画像シーケンスのみに依存する場合、無効なターゲットや低照度環境でのパフォーマンス低下といった課題に直面することが多い。深度データや赤外線データといった追加のモダリティは有効であることが証明されているが、既存のマルチモーダルイメージングプラットフォームは複雑で、現実の応用性に欠ける。対照的に、監視カメラで一般的に使用される近赤外線(NIR)イメージングは、光強度に基づいてRGBとNIRを切り替えることができる。しかしながら、これらの不均質なモダリティを横断するオブジェクトの追跡は、特に追跡中にモダリティスイッチ信号がないため、大きな課題となる。これらの課題に対処するため,我々はmodality-aware fusion network (mafnet) と呼ばれる適応型クロスモーダルオブジェクトトラッキングアルゴリズムを提案する。 MAFNetは、適応重み付け機構を用いてRGBとNIRの両方のモダリティからの情報を効率的に統合し、外観ギャップを効果的にブリッジし、モダリティ対応ターゲット表現を可能にする。適応重み付けモジュールとモダリティ固有の表現モジュール...の2つのキーコンポーネントで構成されている。

Visual tracking often faces challenges such as invalid targets and decreased performance in low-light conditions when relying solely on RGB image sequences. While incorporating additional modalities like depth and infrared data has proven effective, existing multi-modal imaging platforms are complex and lack real-world applicability. In contrast, near-infrared (NIR) imaging, commonly used in surveillance cameras, can switch between RGB and NIR based on light intensity. However, tracking objects across these heterogeneous modalities poses significant challenges, particularly due to the absence of modality switch signals during tracking. To address these challenges, we propose an adaptive cross-modal object tracking algorithm called Modality-Aware Fusion Network (MAFNet). MAFNet efficiently integrates information from both RGB and NIR modalities using an adaptive weighting mechanism, effectively bridging the appearance gap and enabling a modality-aware target representation. It consists of two key components: an adaptive weighting module and a modality-specific representation module......

翻訳日:2023-12-25 16:03:04 公開日:2023-12-22

# 最適制御による時間反転支援量子メトロロジー

Time-reversal assisted quantum metrology with an optimal control ( http://arxiv.org/abs/2312.14443v1 )

ライセンス: Link先を確認

Da-Wei Luo, Ting Yu

(参考訳) 本稿では, 量子最適制御と時間反転戦略を用いて, ショットノイズ限界を克服し, パラメータ推定のためのハイゼンベルクスケーリング限界に達するプロトコルを提案する。量子ナビゲーションと測定において重要な役割を果たす位相推定を例に、系の光子数測定から生じる不確実性は、推定される位相とは無関係に、補助的なCram\'er-Rao境界を飽和させることができることを示す。光子損失の現実的な場合、最適な推定は最適な制御とフォトニックモードに結合したアンシラ2レベル系の射影測定によって達成可能であることを示す。

We propose a protocol to overcome the shot noise limit and reach the Heisenberg scaling limit for parameter estimation by using quantum optimal control and a time-reversal strategy. Exemplified through the phase estimation, which can play an important role in quantum navigation and measurement, we show that the uncertainty arising from a photon number measurement of the system can saturate the assisted Cream\'er-Rao bound, independent of the phase being estimated. In a realistic case with photon loss, we show that the optimal estimation may still be attainable by optimal control and a projective measurement on an ancilla two-level system coupled to photonic modes.

翻訳日:2023-12-25 16:02:43 公開日:2023-12-22

# DMC4ML: 機械学習のためのデータ移動複雑性

DMC4ML: Data Movement Complexity for Machine Learning ( http://arxiv.org/abs/2312.14441v1 )

ライセンス: Link先を確認

Chen Ding, Christopher Kanan, Dylan McKellips, Toranosuke Ozawa, Arian Shahmirza, Wesley Smith

(参考訳) 今日のコンピューティングの最大の需要は機械学習です。本稿では,変圧器,空間畳み込み,FFTという3つの機械学習アルゴリズムを解析する。その分析は3つの点で新しい。まず、従来の時間や空間の複雑さではなく、抽象的なメモリ階層におけるメモリアクセスのコストを測定する。第2に、解析は漸近的であり、メモリコストの主な源を特定する。最後に、結果はシンボリックであり、任意の次元サイズとヘッド数に対してグループ化されたクエリアテンションにおけるグループサイズや、任意の画像サイズとカーネルサイズに対してバッチ化された畳み込みのためのバッチサイズなどのアルゴリズムパラメータを選択するために使用できる。

The greatest demand for today's computing is machine learning. This paper analyzes three machine learning algorithms: transformers, spatial convolution, and FFT. The analysis is novel in three aspects. First, it measures the cost of memory access on an abstract memory hierarchy, instead of traditional time or space complexity. Second, the analysis is asymptotic and identifies the primary sources of the memory cost. Finally, the result is symbolic, which can be used to select algorithmic parameters such as the group size in grouped query attention for any dimension size and number of heads and the batch size for batched convolution for any image size and kernel size.

翻訳日:2023-12-25 16:02:30 公開日:2023-12-22

# 逆攻撃によるテキスト・画像生成における非対称バイアス

Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks ( http://arxiv.org/abs/2312.14440v1 )

ライセンス: Link先を確認

Haz Sameen Shahgir, Xianghao Kong, Greg Ver Steeg, Yue Dong

(参考訳) コンテンツ生成におけるテキスト・ツー・イメージ(T2I)モデルの普及は、敵対的攻撃に対する堅牢性を含む安全性を慎重に検査する必要がある。これに関する広範な研究にもかかわらず、その効果の理由は未解明である。本稿では,攻撃成功率(ASR)に関連する要因の分析に焦点をあて,T2Iモデルに対する敵攻撃に関する実証的研究を行った。敵接尾辞と2つの勾配に基づく攻撃アルゴリズムを用いた新たな攻撃目標であるエンティティスワップを導入する。人間と自動評価は、エンティティスワップ上でのASRの非対称性を明らかにし、例えば、「雨の中で踊る人間」というプロンプトで「人間」を「ロボット」に置き換えるのは容易であるが、逆の逆の接尾辞は極めて困難である。さらに、モデルの信念から敵対的ASRへの示唆的信号を確立するための測度を提案する。我々は、敵攻撃の60%の成功確率と、この確率が5%以下に低下する状況を特定する。

The widespread use of Text-to-Image (T2I) models in content generation requires careful examination of their safety, including their robustness to adversarial attacks. Despite extensive research into this, the reasons for their effectiveness are underexplored. This paper presents an empirical study on adversarial attacks against T2I models, focusing on analyzing factors associated with attack success rates (ASRs). We introduce a new attack objective - entity swapping using adversarial suffixes and two gradient-based attack algorithms. Human and automatic evaluations reveal the asymmetric nature of ASRs on entity swap: for example, it is easier to replace "human" with "robot" in the prompt "a human dancing in the rain." with an adversarial suffix but is significantly harder in reverse. We further propose probing metrics to establish indicative signals from the model's beliefs to the adversarial ASR. We identify conditions resulting in a 60% success probability for adversarial attacks and others where this likelihood drops below 5%.

翻訳日:2023-12-25 16:02:19 公開日:2023-12-22

# PUMA: グラフ凝縮を用いた効率的な連続グラフ学習

PUMA: Efficient Continual Graph Learning with Graph Condensation ( http://arxiv.org/abs/2312.14439v1 )

ライセンス: Link先を確認

Yilun Liu, Ruihong Qiu, Yanran Tang, Hongzhi Yin, Zi Huang

(参考訳) ストリーミンググラフを扱う場合、既存のグラフ表現学習モデルは破滅的な忘れがちな問題に遭遇する。これに対し、連続グラフ学習は、静的グラフからストリーミンググラフへのグラフ表現学習を可能にする新しいパラダイムとして出現する。これまでの作業であるCaTは、連続的な学習手順をバランスよく行うリプレイベースのフレームワークで、入ってくるグラフを凝縮してデータを再生するための、小さいが効果的なメモリバンクを設計する。 CaTは破滅的な記憶問題を緩和するが,(1)CaTから派生したグラフ凝縮アルゴリズムはラベル付きノードにのみ焦点をあてるが,(2)CaTの継続トレーニングスキームは,これまでに学習した知識に重きを置いて,新たに追加された記憶から学習するモデル能力を制限する;(3)CaTの凝縮過程と再生過程はいずれも時間を要する。本稿では,CaT から拡張した Psudo-label guided memory bank (PUMA) CGL フレームワークを提案する。グラフ内の情報をフル活用するために、PUMAはラベル付きノードと非ラベル付きノードの両方でグラフ凝縮時のノードのカバレッジを拡大する。さらに,過去の連続学習スキームを改良し,歴史と新しいグラフのバランスのとれたトレーニングを行うための,scratchからトレーニング戦略を提案する。さらにpumaは、ワンタイムプロジェクションとワイドグラフエンコーダを使用して、トレーニングステージにおけるグラフ凝縮とグラフエンコーディングプロセスを加速し、フレームワーク全体の効率を向上させる。 4つのデータセットに関する広範な実験は、既存のメソッドに対する最先端のパフォーマンスと効率を示している。

When handling streaming graphs, existing graph representation learning models encounter a catastrophic forgetting problem, where previously learned knowledge of these models is easily overwritten when learning with newly incoming graphs. In response, Continual Graph Learning emerges as a novel paradigm enabling graph representation learning from static to streaming graphs. Our prior work, CaT is a replay-based framework with a balanced continual learning procedure, which designs a small yet effective memory bank for replaying data by condensing incoming graphs. Although the CaT alleviates the catastrophic forgetting problem, there exist three issues: (1) The graph condensation algorithm derived in CaT only focuses on labelled nodes while neglecting abundant information carried by unlabelled nodes; (2) The continual training scheme of the CaT overemphasises on the previously learned knowledge, limiting the model capacity to learn from newly added memories; (3) Both the condensation process and replaying process of the CaT are time-consuming. In this paper, we propose a psudo-label guided memory bank (PUMA) CGL framework, extending from the CaT to enhance its efficiency and effectiveness by overcoming the above-mentioned weaknesses and limits. To fully exploit the information in a graph, PUMA expands the coverage of nodes during graph condensation with both labelled and unlabelled nodes. Furthermore, a training-from-scratch strategy is proposed to upgrade the previous continual learning scheme for a balanced training between the historical and the new graphs. Besides, PUMA uses a one-time prorogation and wide graph encoders to accelerate the graph condensation and the graph encoding process in the training stage to improve the efficiency of the whole framework. Extensive experiments on four datasets demonstrate the state-of-the-art performance and efficiency over existing methods.

翻訳日:2023-12-25 16:02:01 公開日:2023-12-22

# PC-Conv:2次元フィルタリングによるホモフィリーとヘテロフィリーの統合

PC-Conv: Unifying Homophily and Heterophily with Two-fold Filtering ( http://arxiv.org/abs/2312.14438v1 )

ライセンス: Link先を確認

Bingheng Li, Erlin Pan, Zhao Kang

(参考訳) 近年,厳密なグラフ表現学習法が,強いヘテロ親和性グラフとホモ親和性グラフの両方において優れた性能を達成している。したがって、それらは異なる相同性のレベルを持つ実世界のグラフをまたいでうまく一般化できない。これは、ヘテロ親和グラフにおけるホモフィリーの無視と、その逆によるものである。本稿では,親水性グラフのホモフィアを抽出するための2次元フィルタリング機構を提案する。特に、グラフ熱方程式を拡張して、長距離からの大域情報のヘテロ親和的な集約を行う。結果のフィルタは Possion-Charlier (PC) 多項式によって正確に近似することができる。複数の順序で情報を活用するために,ノード分類タスクのための強力なグラフ畳み込みPC-ConvとそのインスタンスPCNetを導入する。最先端のGNNと比較すると、PCNetはよく知られたホモフィルグラフとヘテロフィルグラフの競合性能を示す。私たちの実装はhttps://github.com/uestclbh/pc-convで利用可能です。

Recently, many carefully crafted graph representation learning methods have achieved impressive performance on either strong heterophilic or homophilic graphs, but not both. Therefore, they are incapable of generalizing well across real-world graphs with different levels of homophily. This is attributed to their neglect of homophily in heterophilic graphs, and vice versa. In this paper, we propose a two-fold filtering mechanism to extract homophily in heterophilic graphs and vice versa. In particular, we extend the graph heat equation to perform heterophilic aggregation of global information from a long distance. The resultant filter can be exactly approximated by the Possion-Charlier (PC) polynomials. To further exploit information at multiple orders, we introduce a powerful graph convolution PC-Conv and its instantiation PCNet for the node classification task. Compared with state-of-the-art GNNs, PCNet shows competitive performance on well-known homophilic and heterophilic graphs. Our implementation is available at https://github.com/uestclbh/PC-Conv.

翻訳日:2023-12-25 16:01:27 公開日:2023-12-22

# REBEL:人間のフィードバックによる強化学習におけるリワード過最適化のための正規化に基づく解法

REBEL: A Regularization-Based Solution for Reward Overoptimization in Reinforcement Learning from Human Feedback ( http://arxiv.org/abs/2312.14436v1 )

ライセンス: Link先を確認

Souradip Chakraborty, Amisha Bhaskar, Anukriti Singh, Pratap Tokekar, Dinesh Manocha, and Amrit Singh Bedi

(参考訳) 本研究では,人間のフィードバック(RRLHF)からのロボット強化学習を応用した,効率的な報酬正規化アルゴリズムREBELを提案する。連続制御ロボットタスクの強化学習(RL)性能は、基礎となる報酬関数に敏感である。実際には、報酬機能は人間の意図や価値観、社会的規範などと不一致に陥り、現実世界で壊滅的な失敗に繋がることが多い。人間の好みを利用して、正規化された報酬機能を学び、最終的にエージェントを真の意図した行動に合わせる。エージェント選好と呼ばれる既存のRRLHFフレームワークに報酬正規化という新たな概念を導入する。そこで我々は,人間のフィードバックを嗜好の観点から考えるだけでなく,報酬関数を学習しながら,基礎となるRLエージェントの嗜好を考慮することを提案する。このことは,RLにおける報酬関数の設計に伴う過度な最適化の改善に役立つことを示す。 PEBBLEやPEBBLE+SURFのような最先端の手法と比較して,REBELは試料効率を最大70%向上させ,同程度の報酬を得られることを示した。

In this work, we propose REBEL, an algorithm for sample efficient reward regularization based robotic reinforcement learning from human feedback (RRLHF). Reinforcement learning (RL) performance for continuous control robotics tasks is sensitive to the underlying reward function. In practice, the reward function often ends up misaligned with human intent, values, social norms, etc., leading to catastrophic failures in the real world. We leverage human preferences to learn regularized reward functions and eventually align the agents with the true intended behavior. We introduce a novel notion of reward regularization to the existing RRLHF framework, which is termed as agent preferences. So, we not only consider human feedback in terms of preferences, we also propose to take into account the preference of the underlying RL agent while learning the reward function. We show that this helps to improve the over-optimization associated with the design of reward functions in RL. We experimentally show that REBEL exhibits up to 70% improvement in sample efficiency to achieve a similar level of episodic reward returns as compared to the state-of-the-art methods such as PEBBLE and PEBBLE+SURF.

翻訳日:2023-12-25 16:01:09 公開日:2023-12-22

# オンライン機械学習に基づく単一粒子x線回折画像からのスケーラブルな3次元再構成

Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning ( http://arxiv.org/abs/2312.14432v1 )

ライセンス: Link先を確認

Jay Shenoy, Axel Levy, Fr\'ed\'eric Poitevin, Gordon Wetzstein

(参考訳) X線自由電子レーザー(XFEL)は、生体分子の構造と力学を計測し、生命の基本的な構成要素を理解するのに役立つ。特に、高い繰り返し速度のXFELは、低温または結晶化状態では捕獲できないフリーティング状態にアクセスする機会として、個々の弱い散乱生体分子をほぼ生理的条件下で撮像する単一粒子イメージング(X線SPI)を可能にする。既存のX線SPI再構成アルゴリズムは、各撮像画像中の粒子の未知の向きと共有3次元構造を推定するが、これらの新興XFELによって生成された大量のデータセットを扱うには不十分である。本稿では,大規模なX線SPIデータセットから3次元マクロ分子の構造を推定するオンライン再構成フレームワークであるX-RAIを紹介する。 X-RAIは畳み込みエンコーダ(convolutional encoder)で構成されており、大きなデータセットに対するポーズ推定をアモーティズするとともに、暗黙の神経表現を用いてエンドツーエンドで自己管理的な高品質な3D再構成を可能にする物理ベースのデコーダ(decoder)も備えている。我々は、X-RAIがシミュレーションと挑戦的な実験環境において、数百万の回折画像を含む大規模なデータセットをオンライン形式で処理する前例のない能力を示した。これらの能力は、リアルタイムのキャプチャと再構築に向けたX線SPIのパラダイムシフトを表している。

X-ray free-electron lasers (XFELs) offer unique capabilities for measuring the structure and dynamics of biomolecules, helping us understand the basic building blocks of life. Notably, high-repetition-rate XFELs enable single particle imaging (X-ray SPI) where individual, weakly scattering biomolecules are imaged under near-physiological conditions with the opportunity to access fleeting states that cannot be captured in cryogenic or crystallized conditions. Existing X-ray SPI reconstruction algorithms, which estimate the unknown orientation of a particle in each captured image as well as its shared 3D structure, are inadequate in handling the massive datasets generated by these emerging XFELs. Here, we introduce X-RAI, an online reconstruction framework that estimates the structure of a 3D macromolecule from large X-ray SPI datasets. X-RAI consists of a convolutional encoder, which amortizes pose estimation over large datasets, as well as a physics-based decoder, which employs an implicit neural representation to enable high-quality 3D reconstruction in an end-to-end, self-supervised manner. We demonstrate that X-RAI achieves state-of-the-art performance for small-scale datasets in simulation and challenging experimental settings and demonstrate its unprecedented ability to process large datasets containing millions of diffraction images in an online fashion. These abilities signify a paradigm shift in X-ray SPI towards real-time capture and reconstruction.

翻訳日:2023-12-25 16:00:52 公開日:2023-12-22

# スマートマニュファクチャリングにおける一元的産業大知識モデルフレームワーク

A Unified Industrial Large Knowledge Model Framework in Smart Manufacturing ( http://arxiv.org/abs/2312.14428v1 )

ライセンス: Link先を確認

Jay Lee, Hanqi Su

(参考訳) 近年の大規模言語モデル(LLM)の出現は、人工知能の可能性を示し、業界 4.0 とスマート製造の新しい機会を明らかにしている。しかし、これらのLSMを産業に適用する際、主にドメイン固有の知識ではなく、一般的な知識に関するトレーニングのために顕著なギャップが存在する。このような専門的なドメイン知識は、産業アプリケーションの複雑なニーズに効果的に対処するために不可欠である。このギャップを埋めるために,スマートマニュファクチャリングにおける産業に革命をもたらす可能性を強調する産業大知識モデル(ILKM)フレームワークを提案する。さらに、ILKMとLLMは8つの視点から比較される。最後に、スマート製造におけるilkms開発指針として「6s原則」を提案する。

The recent emergence of large language models (LLMs) shows the potential for artificial general intelligence, revealing new opportunities in industry 4.0 and smart manufacturing. However, a notable gap exists in applying these LLMs in industry, primarily due to their training on general knowledge rather than domain-specific knowledge. Such specialized domain knowledge is vital for effectively addressing the complex needs of industrial applications. To bridge this gap, this paper proposes an Industrial Large Knowledge Model (ILKM) framework emphasizing their potential to revolutionize the industry in smart manufacturing. In addition, ILKMs and LLMs are compared from eight perspectives. Finally, "6S Principle" is proposed as the guideline for the development of ILKMs in smart manufacturing.

翻訳日:2023-12-25 16:00:24 公開日:2023-12-22

# 等分散に基づく幻覚の理論

Theory of Hallucinations based on Equivariance ( http://arxiv.org/abs/2312.14504v1 )

ライセンス: Link先を確認

Hisaichi Shibata

(参考訳) 等分散は、言語モデルを含む機械学習において重要な特徴である。同じ意味の句列が一貫して解釈されることを保証する。例えば、"There is a cat on the table"という文は、トークンレベルの表現のバリエーションに関係なく、言語モデルによって解釈されるべきである。この知見に基づいて,言語モデルの等分散性の不足が幻覚に繋がる可能性を示唆する新しい理論を提案する。この理論によれば、比較的小さなデータセットで訓練された言語モデルは、入力テキストを誤解釈したり、誤ったテキスト(すなわち幻覚)を生成する傾向がある。この理論をテストするために、私はキャラクターレベルの置換暗号である「dancing men」として知られる玩具モデルを開発した。さらに,T5(Text To Text Transfer Transformer)モデルに基づく新しい手法を提案する。私は、このT5モデルは暗号をほぼ完全に解き、このフレームで同値を得る能力を示した。この方法は、トークンや辞書を使わずに、大きな言語モデルに類似した、単語レベルおよび文レベルの置換暗号にスケールできる。このスケーラビリティは、不適切な同値獲得と幻覚の出現の間の関係を調査するのに適している。

Equivariance is an important feature in machine learning, including language models. It ensures that any sequences of phrases with the same meanings are interpreted consistently. For example, the sentence 'There is a cat on the table' should be interpreted by language models as it is, regardless of variations in its token-level expression. Building on this insight, I propose a new theory suggesting that insufficient equivariance in language models can lead to hallucinations. According to this theory, which is both intuitive and novel, language models trained on relatively small datasets tend to misinterpret input texts and/or generate incorrect texts (i.e., hallucinations). To test this theory, I developed a toy model known as 'dancing men', which is a character-level substitution cipher. Additionally, I propose a novel technique based on the T5 (Text To Text Transfer Transformer) model to efficiently decipher these codes without relying on frequency analysis. I have found that this T5 model can almost completely solve the cipher, demonstrating its ability to acquire equivariance in this frame. This method could be scaled up to word-level and sentence-level substitution ciphers, analogous to large language models without tokenizers or dictionaries. This scalability makes it suitable for investigating the proposed link between inadequate equivariance acquisition and the emergence of hallucinations.

翻訳日:2023-12-25 15:55:48 公開日:2023-12-22

# vistripformer:汎用ビデオ復元のためのトークン効率の高いトランスフォーマー

ViStripformer: A Token-Efficient Transformer for Versatile Video Restoration ( http://arxiv.org/abs/2312.14502v1 )

ライセンス: Link先を確認

Fu-Jen Tsai, Yan-Tsung Peng, Chen-Yu Chang, Chan-Yu Li, Yen-Yu Lin, Chung-Chi Tsai, and Chia-Wen Lin

(参考訳) ビデオ復元は、画質の劣化したフレームからクリーンでシャープなビデオを復元する、低レベルの視覚タスクである。隣接するフレームからの時間情報を使ってビデオの復元を成功させる。近年,トランスフォーマーの成功はコンピュータビジョンコミュニティにおいて認知度を高めている。しかし、その自己保持機構は大量のメモリを必要とするため、ビデオ復元のような高解像度の視覚タスクには適さない。本稿では,空間的および時間的情報を抽出するために,フレーム内ストリップ注意 (intra-sa) とフレーム間ストリップ注意 (inter-sa) からなる長距離データ相関を捉えるために時空間的ストリップ注意を利用するvistripformer (video stripformer) を提案する。ビデオフレームを水平方向と垂直方向のストリップ状の特徴に分解し,様々な方向や大きさの劣化パターンに対処する。さらに、ViStripformerはバニラ変圧器よりもメモリ使用量の少ない効率的かつ効率的なトランスアーキテクチャである。広範に実験した結果,提案手法は,ビデオデブラリング,デモレーリング,デレイニングなどの映像復元作業において,高速な推定時間で優れた結果が得られることがわかった。

Video restoration is a low-level vision task that seeks to restore clean, sharp videos from quality-degraded frames. One would use the temporal information from adjacent frames to make video restoration successful. Recently, the success of the Transformer has raised awareness in the computer-vision community. However, its self-attention mechanism requires much memory, which is unsuitable for high-resolution vision tasks like video restoration. In this paper, we propose ViStripformer (Video Stripformer), which utilizes spatio-temporal strip attention to catch long-range data correlations, consisting of intra-frame strip attention (Intra-SA) and inter-frame strip attention (Inter-SA) for extracting spatial and temporal information. It decomposes video frames into strip-shaped features in horizontal and vertical directions for Intra-SA and Inter-SA to address degradation patterns with various orientations and magnitudes. Besides, ViStripformer is an effective and efficient transformer architecture with much lower memory usage than the vanilla transformer. Extensive experiments show that the proposed model achieves superior results with fast inference time on video restoration tasks, including video deblurring, demoireing, and deraining.

翻訳日:2023-12-25 15:55:26 公開日:2023-12-22

# 高速・高次物理インフォームドニューラルネットワークのハッチンソントレース推定

Hutchinson Trace Estimation for High-Dimensional and High-Order Physics-Informed Neural Networks ( http://arxiv.org/abs/2312.14499v1 )

ライセンス: Link先を確認

Zheyuan Hu, Zekun Shi, George Em Karniadakis, Kenji Kawaguchi

(参考訳) 物理学に変形したニューラルネットワーク(pinns)は偏微分方程式(pdes)の解法として有効であることが証明されている。しかし, PINNを高次元かつ高次元のPDEに拡張することは, 残留損失の自動微分に伴う計算コストが大きな課題となる。本稿では,Hutchinson Trace Estimation (HTE)を導入し,高次元・高次PDE処理におけるPINNの限界に対処する。科学計算においてユビキタスな2階高次元PDEから始め、HTEはヘッセン行列全体の計算をヘッセンベクトル積(HVP)に変換する。このアプローチはテイラーモードの自動微分による計算ボトルネックを緩和し、ヘッセン行列からHVPへのメモリ消費を大幅に削減する。我々はさらに,hteのオリジナルのピン損失への収束と,その偏りのない挙動を特定の条件下で示す。 Stochastic Dimension Gradient Descent (SDGD)との比較は、特に次元間で大きな差異があるシナリオにおいて、HTEの明確な利点を強調している。さらにHTEを高次および高次元PDEに拡張し、特にバイハーモニック方程式に対処する。テンソルベクトル積(TVP)を用いることで、HTEは、4階高次元バイハーモニック方程式に関連する余剰テンソルを効率的に計算し、メモリを節約し、高速な計算を可能にする。 HTEの有効性は実験的な設定を通じて説明され、メモリと速度制約の下でSDGDと同等の収束率を示す。さらに、HTEは、グラディエント強化PINN(gPINN)バージョンとバイハーモニック方程式の加速に有用である。全体として、HTEは高次および高次元PDEに対処する科学的機械学習の新たな能力を開く。

Physics-Informed Neural Networks (PINNs) have proven effective in solving partial differential equations (PDEs), especially when some data are available by blending seamlessly data and physics. However, extending PINNs to high-dimensional and even high-order PDEs encounters significant challenges due to the computational cost associated with automatic differentiation in the residual loss. Herein, we address the limitations of PINNs in handling high-dimensional and high-order PDEs by introducing Hutchinson Trace Estimation (HTE). Starting with the second-order high-dimensional PDEs ubiquitous in scientific computing, HTE transforms the calculation of the entire Hessian matrix into a Hessian vector product (HVP). This approach alleviates the computational bottleneck via Taylor-mode automatic differentiation and significantly reduces memory consumption from the Hessian matrix to HVP. We further showcase HTE's convergence to the original PINN loss and its unbiased behavior under specific conditions. Comparisons with Stochastic Dimension Gradient Descent (SDGD) highlight the distinct advantages of HTE, particularly in scenarios with significant variance among dimensions. We further extend HTE to higher-order and higher-dimensional PDEs, specifically addressing the biharmonic equation. By employing tensor-vector products (TVP), HTE efficiently computes the colossal tensor associated with the fourth-order high-dimensional biharmonic equation, saving memory and enabling rapid computation. The effectiveness of HTE is illustrated through experimental setups, demonstrating comparable convergence rates with SDGD under memory and speed constraints. Additionally, HTE proves valuable in accelerating the Gradient-Enhanced PINN (gPINN) version as well as the Biharmonic equation. Overall, HTE opens up a new capability in scientific machine learning for tackling high-order and high-dimensional PDEs.

翻訳日:2023-12-25 15:54:52 公開日:2023-12-22

# ビジョンランゲージモデルによるFew-Shot物体検出の再検討

Revisiting Few-Shot Object Detection with Vision-Language Models ( http://arxiv.org/abs/2312.14494v1 )

ライセンス: Link先を確認

Anish Madan, Neehar Peri, Shu Kong, Deva Ramanan

(参考訳) few-shot object detection (fsod)ベンチマークには、制限されたアノテーションで新しいカテゴリを検出するための高度な技術がある。既存のベンチマークでは、COCOのような確立されたデータセットを、それぞれ、事前トレーニングと微調整のためのベースクラスと新しいクラスに分割することで再利用している。しかし、これらのベンチマークは、実際にfsodをデプロイする方法を反映していない。少数のベースカテゴリを事前学習するよりも、ターゲットドメインに対して基礎モデル(例えば、webスケールデータで事前学習された視覚言語モデル(vlm))を微調整することがより実用的であると主張する。驚いたことに、GroundingDINOのようなVLMからのゼロショット推論はCOCO上の最先端(48.3対33.1 AP)よりも著しく優れている。しかし、そのようなゼロショットモデルは、それでも対象とする興味ある概念と一致しない。例えば、web上のトレーラーは、自動運転車の文脈でトレーラーとは異なるかもしれない。本研究では,任意の外部データセット上で事前学習し,ターゲットクラス毎のKショットを微調整した検出器を評価するための新しいベンチマークプロトコルであるFoundational FSODを提案する。さらに、現在のfsodベンチマークは、実際にはデータサブセット上の各カテゴリに対する徹底したアノテーションを含むフェデレーションデータセットである点にも注目する。我々はこの知見を利用して、連合的損失を伴う微調整VLMの簡単な戦略を提案する。我々は LVIS と nu Images に対するアプローチの有効性を実証し,5.9 AP による先行作業よりも改善した。

Few-shot object detection (FSOD) benchmarks have advanced techniques for detecting new categories with limited annotations. Existing benchmarks repurpose well-established datasets like COCO by partitioning categories into base and novel classes for pre-training and fine-tuning respectively. However, these benchmarks do not reflect how FSOD is deployed in practice. Rather than only pre-training on a small number of base categories, we argue that it is more practical to fine-tune a foundation model (e.g., a vision-language model (VLM) pre-trained on web-scale data) for a target domain. Surprisingly, we find that zero-shot inference from VLMs like GroundingDINO significantly outperforms the state-of-the-art (48.3 vs. 33.1 AP) on COCO. However, such zero-shot models can still be misaligned to target concepts of interest. For example, trailers on the web may be different from trailers in the context of autonomous vehicles. In this work, we propose Foundational FSOD, a new benchmark protocol that evaluates detectors pre-trained on any external datasets and fine-tuned on K-shots per target class. Further, we note that current FSOD benchmarks are actually federated datasets containing exhaustive annotations for each category on a subset of the data. We leverage this insight to propose simple strategies for fine-tuning VLMs with federated losses. We demonstrate the effectiveness of our approach on LVIS and nuImages, improving over prior work by 5.9 AP.

翻訳日:2023-12-25 15:53:57 公開日:2023-12-22

# 単一画像物体検出のためのコンテキスト拡張トランス

Context Enhanced Transformer for Single Image Object Detection ( http://arxiv.org/abs/2312.14492v1 )

ライセンス: Link先を確認

Seungjun An, Seonghoon Park, Gyeongnyeon Kim, Jeongyeol Baek, Byeongwon Lee, Seungryong Kim

(参考訳) 実世界のアプリケーションにおけるビデオデータの重要性が高まっているため、時間情報を利用する効率的なオブジェクト検出手法の必要性が高まっている。既存のビデオオブジェクト検出(VOD)技術では、この課題に対処するための様々な戦略が採用されているが、通常は、近隣のフレームやクリップ内のランダムなサンプル画像に依存する。近年の Transformer ベースのVOD 法は有望な結果を示しているが,時間的情報を組み込むネットワークの複雑さにより,実用性は制限されている。本稿では,新たに設計されたメモリモジュールを用いて,detrに時間的コンテキストを組み込むことにより,コンテキストエンハンストランス(cetr)と呼ばれる単一画像オブジェクト検出手法を提案する。時間情報を効率的に保存するために,データ間で文脈情報を収集するクラスメモリを構築する。さらに,現在の画像の関連メモリを選択的に活用するための分類に基づくサンプリング手法を提案する。本テストでは,テスト分布を考慮し,個々のメモリ機能を更新するテスト時間メモリ適応手法を提案する。 citycamとimagenet vidデータセットを用いた実験は、様々なビデオシステムにおけるフレームワークの効率を示す。プロジェクトページとコードは、https://ku-cvlab.github.io/cetr.com/で利用可能になる。

With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://ku-cvlab.github.io/CETR.

翻訳日:2023-12-25 15:53:17 公開日:2023-12-22

# 言語モデルは同時機械翻訳のための分岐予測器である

Language Model is a Branch Predictor for Simultaneous Machine Translation ( http://arxiv.org/abs/2312.14488v1 )

ライセンス: Link先を確認

Aoxiong Yin, Tianyun Zhong, Haoyuan Li, Siliang Tang, Zhou Zhao

(参考訳) 同時機械翻訳(SiMT)の主な目的は、最終翻訳の品質を維持しながらレイテンシを最小限にすることである。本稿では,CPU分岐予測技術からインスピレーションを得て,SiMTタスクに分岐予測技術を取り入れて翻訳遅延を低減することを提案する。具体的には,言語モデルを分岐予測器として活用し,潜在的な分岐方向,すなわち未来語を予測している。その後、予測されたソース語を用いて事前に出力を復号する。実際のソースワードが予測されたソースワードから逸脱すると、実際のソースワードを使用して再び出力をデコードし、予測された出力を置き換える。計算コストをさらに削減するため,エンコーダと分岐予測器のパラメータを共有し,事前学習した言語モデルを用いて初期化を行う。提案手法は任意のSiMTモデルとシームレスに統合できる。広範な実験結果から,本手法は翻訳品質とレイテンシを同時に向上できることが示された。私たちのコードはhttps://github.com/YinAoXiong/simt_branch_predictorで利用可能です。

The primary objective of simultaneous machine translation (SiMT) is to minimize latency while preserving the quality of the final translation. Drawing inspiration from CPU branch prediction techniques, we propose incorporating branch prediction techniques in SiMT tasks to reduce translation latency. Specifically, we utilize a language model as a branch predictor to predict potential branch directions, namely, future source words. Subsequently, we utilize the predicted source words to decode the output in advance. When the actual source word deviates from the predicted source word, we use the real source word to decode the output again, replacing the predicted output. To further reduce computational costs, we share the parameters of the encoder and the branch predictor, and utilize a pre-trained language model for initialization. Our proposed method can be seamlessly integrated with any SiMT model. Extensive experimental results demonstrate that our approach can improve translation quality and latency at the same time. Our code is available at https://github.com/YinAoXiong/simt_branch_predictor .

翻訳日:2023-12-25 15:52:24 公開日:2023-12-22

# 極小核散乱における絡み合い

Entanglement in few-nucleon scattering events ( http://arxiv.org/abs/2312.14484v1 )

ライセンス: Link先を確認

Tanja Kirchner, Wael Elkamhawy, Hans-Werner Hammer

(参考訳) 核子と重陽子を含む少数核子散乱過程におけるスピンの絡み合いを調べる。この目的のために、 Beane らが導入した絡み合い力を考える。強相互作用の絡み合い力を定義するために異なる絡み合いエントロピーを分析し、陽子-ニュートロン、中性子重陽子、陽子重陽子、重陽子-重陽子散乱の絡み合い力を計算する。後者の2つのプロセスでは、クーロン相互作用の修正も考慮に入れます。陽子-ニュートロン散乱とは対照的に、中性子-重陽子、陽子-重陽子散乱、重陽子-重陽子散乱におけるスピンの絡み合いには普遍的な低エネルギーの特徴はない。

We investigate the spin entanglement in few-nucleon scattering processes involving nucleons and deuterons. For this purpose, we consider the entanglement power introduced by Beane et al. We analyze different entanglement entropies as a basis to define the entanglement power of the strong interaction and calculate the corresponding entanglement powers for proton-neutron, neutron-deuteron, proton-deuteron, and deuteron-deuteron scattering. For the latter two processes, we also take into account the modification from the Coulomb interaction. In contrast to proton-neutron scattering, no universal low-energy features are evident in the spin entanglement in neutron-deuteron, proton-deuteron, and deuteron-deuteron scattering.

翻訳日:2023-12-25 15:52:05 公開日:2023-12-22

# 手術器具分割のための協調的プロンプト

Part to Whole: Collaborative Prompting for Surgical Instrument Segmentation ( http://arxiv.org/abs/2312.14481v1 )

ライセンス: Link先を確認

Wenxi Yue, Jing Zhang, Kun Hu, Qiuxia Wu, Zongyuan Ge, Yong Xia, Jiebo Luo, Zhiyong Wang

(参考訳) Segment Anything Model (SAM)のような基礎モデルでは、ジェネリックオブジェクトセグメンテーションが約束されている。しかし,手術器具のセグメンテーションにSAMを直接適用することは重要な課題である。まずSAMは、外科医とコンピュータの相互作用を複雑にするフレーム単位のポイント・オー・ボックスプロンプトに依存する。また、SAMは、手術前訓練に不十分な手術データ、複雑な構造、各種手術器具の細部の詳細などにより、外科器具の分節化に最適である。これらの課題に対処するため,本論文では,テキスト・プロンプト可能な手術器具のセグメンテーションについて検討し,手術器具の構造知識とSAMの汎用セグメンテーション知識を統合した,新しい効率的なチューニング手法であるSP-SAM(Surgical Part-SAM)を提案する。 Specifically, we achieve this by proposing (1) collaborative prompts in the text form "[part name] of [instrument category name]" that decompose instruments into fine-grained parts; (2) a Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) a Part-to-Whole Selective Fusion and a Hierarchical Decoding strategy that selectively assemble the part-level representations into a whole for accurate instrument segmentation. SP-SAMは、手術器具の構造を理解し、様々なカテゴリーを区別するより良い能力を得る。 EndoVis2018とEndoVis2017の両方のデータセットに対する大規模な実験は、最小限のチューニング可能なパラメータでSP-SAMの最先端のパフォーマンスを示している。コードはhttps://github.com/wenxi-yue/SurgicalPart-SAMにある。

Foundation models like the Segment Anything Model (SAM) have demonstrated promise in generic object segmentation. However, directly applying SAM to surgical instrument segmentation presents key challenges. First, SAM relies on per-frame point-or-box prompts which complicate surgeon-computer interaction. Also, SAM yields suboptimal performance on segmenting surgical instruments, owing to insufficient surgical data in its pre-training as well as the complex structure and fine-grained details of various surgical instruments. To address these challenges, in this paper, we investigate text promptable surgical instrument segmentation and propose SP-SAM (SurgicalPart-SAM), a novel efficient-tuning approach that integrates surgical instrument structure knowledge with the generic segmentation knowledge of SAM. Specifically, we achieve this by proposing (1) collaborative prompts in the text form "[part name] of [instrument category name]" that decompose instruments into fine-grained parts; (2) a Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) a Part-to-Whole Selective Fusion and a Hierarchical Decoding strategy that selectively assemble the part-level representations into a whole for accurate instrument segmentation. Built upon them, SP-SAM acquires a better capability to comprehend surgical instrument structures and distinguish between various categories. Extensive experiments on both the EndoVis2018 and EndoVis2017 datasets demonstrate SP-SAM's state-of-the-art performance with minimal tunable parameters. Code is at https://github.com/wenxi-yue/SurgicalPart-SAM.

翻訳日:2023-12-25 15:51:34 公開日:2023-12-22

# MetaAID 2.5: 大規模言語モデルによるメタバースアプリケーション開発のためのセキュアフレームワーク

MetaAID 2.5: A Secure Framework for Developing Metaverse Applications via Large Language Models ( http://arxiv.org/abs/2312.14480v1 )

ライセンス: Link先を確認

Hongyin Zhu

(参考訳) 大規模言語モデル(LLM)は、動的で現実的なコンテンツを生成し、非プレイヤー文字(NPC)の振る舞いを制御するために、メタバース環境でますます使われている。しかし、LSMに関連するサイバーセキュリティの懸念はますます顕著になっている。これまでの研究は主に、セキュリティを強化するためにシステムの脆弱性にパッチを当てることに重点を置いてきたが、これらのアプローチは、仮想空間がより複雑であるMetaverseには適していない。さらに、メタバースにおけるサイバーセキュリティの範囲は大幅に拡大すると予想されている。本稿では,LLMとのユーザインタラクションシミュレーションによるサイバーセキュリティ向上手法を提案する。我々の目標は、ユーザーを教育し、総合的なシミュレーションシステムに触れることで防衛能力を強化することである。このシステムには広範なMetaverseサイバーセキュリティQ&Aと攻撃シミュレーションシナリオが含まれる。ユーザーはこれらのリスクに関わり、リスクを認識し、耐えられる能力を向上させる。さらに,ユーザ入力の倫理的意味に対処するため,5次元のユーザコンテンツを評価するための評価器としてLLMを提案する。さらに、語彙拡張トレーニングを通じてモデルに適応し、パーソナライズされた入力やエモティコンをよりよく理解する。複数のLLM実験を行い,本手法が有効であることを確認した。

Large language models (LLMs) are increasingly being used in Metaverse environments to generate dynamic and realistic content and to control the behavior of non-player characters (NPCs). However, the cybersecurity concerns associated with LLMs have become increasingly prominent. Previous research has primarily focused on patching system vulnerabilities to enhance cybersecurity, but these approaches are not well-suited to the Metaverse, where the virtual space is more complex, LLMs are vulnerable, and ethical user interaction is critical. Moreover, the scope of cybersecurity in the Metaverse is expected to expand significantly. This paper proposes a method for enhancing cybersecurity through the simulation of user interaction with LLMs. Our goal is to educate users and strengthen their defense capabilities through exposure to a comprehensive simulation system. This system includes extensive Metaverse cybersecurity Q&A and attack simulation scenarios. By engaging with these, users will improve their ability to recognize and withstand risks. Additionally, to address the ethical implications of user input, we propose using LLMs as evaluators to assess user content across five dimensions. We further adapt the models through vocabulary expansion training to better understand personalized inputs and emoticons. We conduct experiments on multiple LLMs and find that our approach is effective.

翻訳日:2023-12-25 15:51:08 公開日:2023-12-22

# 入出力協調蒸留による連合学習

Federated Learning via Input-Output Collaborative Distillation ( http://arxiv.org/abs/2312.14478v1 )

ライセンス: Link先を確認

Xuan Gong, Shanglin Li, Yuxiang Bao, Barry Yao, Yawen Huang, Ziyan Wu, Baochang Zhang, Yefeng Zheng, David Doermann

(参考訳) Federated Learning(FL)は、個別に保持されたプライベートデータを共有せずに、分散ローカルノードが協調的に中央モデルをトレーニングする機械学習パラダイムである。既存のFLメソッドは、ローカルモデルパラメータを反復的に共有するか、共蒸留をデプロイする。しかし、前者はプライベートデータ漏洩の影響を受けやすく、後者の設計はタスク関連実データの前提条件に依存している。代わりに,直接入力と出力空間利用を用いた局所-中央協調蒸留に基づくデータフリーflフレームワークを提案する。我々の設計では、知識を伝達するための再帰的ローカルパラメータ交換や補助タスク関連データの要求を排除し、ローカルユーザーに直接プライバシ制御を行う。特に,ローカルモデル間の固有のデータの不均一性に対処するために,各ローカルモデルが各専門知識を表現するためのコンセンサスかつユニークな結果を生成する入力を蒸留することを学ぶ。提案するFLフレームワークは,自然画像と医用画像の両方において,実世界の異質なフェデレーション学習環境下での画像分類とセグメンテーションタスクに関する広範な実験により,顕著なプライバシー利用トレードオフを実現する。

Federated learning (FL) is a machine learning paradigm in which distributed local nodes collaboratively train a central model without sharing individually held private data. Existing FL methods either iteratively share local model parameters or deploy co-distillation. However, the former is highly susceptible to private data leakage, and the latter design relies on the prerequisites of task-relevant real data. Instead, we propose a data-free FL framework based on local-to-central collaborative distillation with direct input and output space exploitation. Our design eliminates any requirement of recursive local parameter exchange or auxiliary task-relevant data to transfer knowledge, thereby giving direct privacy control to local users. In particular, to cope with the inherent data heterogeneity across locals, our technique learns to distill input on which each local model produces consensual yet unique results to represent each expertise. Our proposed FL framework achieves notable privacy-utility trade-offs with extensive experiments on image classification and segmentation tasks under various real-world heterogeneous federated learning settings on both natural and medical images.

翻訳日:2023-12-25 15:50:50 公開日:2023-12-22

# MonoLSS: モノクロ3D検出のための学習可能なサンプル選択

MonoLSS: Learnable Sample Selection For Monocular 3D Detection ( http://arxiv.org/abs/2312.14474v1 )

ライセンス: Link先を確認

Zhenjia Li and Jinrang Jia and Yifeng Shi

(参考訳) 自律運転の分野では、1つのRGB画像における物体の3次元特性(深さ、寸法、方向)を推定する1つの重要なタスクである。以前の作品では、不適切な特徴が悪影響を及ぼすことを考慮せずに、ヒューリスティックな方法で3d特性を学ぶために機能を使用してきた。本稿では, 3d 特性を回帰させるために適切なサンプルのみを訓練することを提案する。サンプルを適応的に選択するために,Gumbel-Softmaxと相対距離サンプル分割器をベースとしたLearningable Sample Selection (LSS)モジュールを提案する。 LSSモジュールはウォームアップ戦略の下で動作し、トレーニングの安定性が向上する。さらに、3Dプロパティのサンプル選択専用のLSSモジュールは、オブジェクトレベルの特徴に依存しているため、曖昧さを伴わずに画像の原理に適合した3Dプロパティのサンプルを濃縮するMixUp3Dというデータ拡張手法をさらに発展させる。 2つの直交法として、LSSモジュールとMixUp3Dは独立または共同で使用できる。十分な実験により、それらの組み合わせが相乗効果をもたらし、個々のアプリケーションの合計を超越する改善をもたらすことが示されている。 LSSモジュールとMixUp3Dを利用すると、余分なデータなしでMonoLSSというメソッドがKITTIの3Dオブジェクト検出ベンチマークで3つのカテゴリ(カー、サイクリスト、ペデストリアン)で1位にランクされ、WaymoデータセットとKITTI-nuScenesのクロスデータセット評価の両方で競合する結果が得られる。コードは補助資料に含まれており、関連する学術・工業研究を促進するためにリリースされる。

In the field of autonomous driving, monocular 3D detection is a critical task which estimates 3D properties (depth, dimension, and orientation) of objects in a single RGB image. Previous works have used features in a heuristic way to learn 3D properties, without considering that inappropriate features could have adverse effects. In this paper, sample selection is introduced that only suitable samples should be trained to regress the 3D properties. To select samples adaptively, we propose a Learnable Sample Selection (LSS) module, which is based on Gumbel-Softmax and a relative-distance sample divider. The LSS module works under a warm-up strategy leading to an improvement in training stability. Additionally, since the LSS module dedicated to 3D property sample selection relies on object-level features, we further develop a data augmentation method named MixUp3D to enrich 3D property samples which conforms to imaging principles without introducing ambiguity. As two orthogonal methods, the LSS module and MixUp3D can be utilized independently or in conjunction. Sufficient experiments have shown that their combined use can lead to synergistic effects, yielding improvements that transcend the mere sum of their individual applications. Leveraging the LSS module and the MixUp3D, without any extra data, our method named MonoLSS ranks 1st in all three categories (Car, Cyclist, and Pedestrian) on KITTI 3D object detection benchmark, and achieves competitive results on both the Waymo dataset and KITTI-nuScenes cross-dataset evaluation. The code is included in the supplementary material and will be released to facilitate related academic and industrial studies.

翻訳日:2023-12-25 15:50:31 公開日:2023-12-22

# すべてのタスクが同じくらい難しいわけではない:動的深さルーティングによるマルチタスク強化学習

Not All Tasks Are Equally Difficult: Multi-Task Reinforcement Learning with Dynamic Depth Routing ( http://arxiv.org/abs/2312.14472v1 )

ライセンス: Link先を確認

Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

(参考訳) マルチタスク強化学習は、一つのポリシーで異なるタスクセットを達成する。複数のタスクにまたがるパラメータを共有することでデータ効率を向上させるため、一般的なプラクティスでは、ネットワークを異なるモジュールに分割し、これらのモジュールをタスク固有のポリシーに再結合するようにルーティングネットワークを訓練する。しかしながら、既存のルーティングアプローチでは、すべてのタスクに一定数のモジュールを使用するため、さまざまな困難を伴うタスクには通常、さまざまな知識が必要になることを無視する。この研究は動的深度ルーティング(D2R)フレームワークを示し、特定の中間モジュールの戦略的スキップを学習し、各タスクに対して異なる数のモジュールを柔軟に選択する。この枠組みでは,オフ・ポリシー・トレーニング中の行動と対象ポリシーの異なる経路の問題に対処するための再ルーティング手法についても紹介する。さらに,マスタードタスクのルーティングを乱すことなく,未マスタータスクの経路探索を継続させる自動経路バランス機構の設計を行った。メタワールドベンチマークでは,D2Rが最先端性能を実現し,学習効率が大幅に向上した。

Multi-task reinforcement learning endeavors to accomplish a set of different tasks with a single policy. To enhance data efficiency by sharing parameters across multiple tasks, a common practice segments the network into distinct modules and trains a routing network to recombine these modules into task-specific policies. However, existing routing approaches employ a fixed number of modules for all tasks, neglecting that tasks with varying difficulties commonly require varying amounts of knowledge. This work presents a Dynamic Depth Routing (D2R) framework, which learns strategic skipping of certain intermediate modules, thereby flexibly choosing different numbers of modules for each task. Under this framework, we further introduce a ResRouting method to address the issue of disparate routing paths between behavior and target policies during off-policy training. In addition, we design an automatic route-balancing mechanism to encourage continued routing exploration for unmastered tasks without disturbing the routing of mastered ones. We conduct extensive experiments on various robotics manipulation tasks in the Meta-World benchmark, where D2R achieves state-of-the-art performance with significantly improved learning efficiency.

翻訳日:2023-12-25 15:49:58 公開日:2023-12-22

# プロトタイプを用いたクロスモーダル物体追跡

Prototype-based Cross-Modal Object Tracking ( http://arxiv.org/abs/2312.14471v1 )

ライセンス: Link先を確認

Lei Liu, Chenglong Li, Futian Wang, Longfeng Shen, and Jin Tang

(参考訳) クロスモーダル物体追跡は情報融合分野における重要な研究課題であり、切替可能な可視光と近赤外モードを統合することで、困難なシナリオにおける画像制限に対処することを目的としている。しかし,既存の追跡手法では,モダリティスイッチの存在下での客観性の変化に適応することが困難である。例えば、モデル更新に基づくトラッキング手法は、モダリティ切り替え中に安定したトラッキング結果を維持するのに苦労し、エラーの蓄積とモデルドリフトにつながる。テンプレートベースのトラッキング手法は、最初のフレームおよび/または最後のフレームからのテンプレート情報のみに依存している。この問題に対処するために,prototrackと呼ばれるプロトタイプベースのクロスモーダルオブジェクトトラッカを提案する。特に,対象情報を表すマルチモーダルプロトタイプを,第1フレームからの固定サンプルと異なるモダリティの2つの代表サンプルを含む,多種多様なサンプルで設計する。さらに、2つの新しいモジュールに基づくプロトタイプ生成アルゴリズムを開発し、異なる課題におけるプロトタイプ代表性を保証する。

Cross-modal object tracking is an important research topic in the field of information fusion, and it aims to address imaging limitations in challenging scenarios by integrating switchable visible and near-infrared modalities. However, existing tracking methods face some difficulties in adapting to significant target appearance variations in the presence of modality switch. For instance, model update based tracking methods struggle to maintain stable tracking results during modality switching, leading to error accumulation and model drift. Template based tracking methods solely rely on the template information from first frame and/or last frame, which lacks sufficient representation ability and poses challenges in handling significant target appearance changes. To address this problem, we propose a prototype-based cross-modal object tracker called ProtoTrack, which introduces a novel prototype learning scheme to adapt to significant target appearance variations, for cross-modal object tracking. In particular, we design a multi-modal prototype to represent target information by multi-kind samples, including a fixed sample from the first frame and two representative samples from different modalities. Moreover, we develop a prototype generation algorithm based on two new modules to ensure the prototype representative in different challenges......

翻訳日:2023-12-25 15:49:37 公開日:2023-12-22

# 即時制約による安全強化学習:積極的な探索の役割

Safe Reinforcement Learning with Instantaneous Constraints: The Role of Aggressive Exploration ( http://arxiv.org/abs/2312.14470v1 )

ライセンス: Link先を確認

Honghao Wei, Xin Liu, Lei Ying

(参考訳) 本稿では,線形関数近似による安全強化学習(safe rl)と,各ステップで安全でない動作を回避すべき厳密な瞬時制約について検討する。既存の研究では、厳密な瞬間制約を持つ安全なRLが検討されているが、そのアプローチはいくつかの重要な仮定に依存している。 (i)$ the rl agent は {\it every} 状態の安全なアクションセットを知っているか、あるいはすべての状態アクション状態トリプルが安全であるような {\it safe graph} を知っている。 (ii)$ 制約/コスト関数は線型である。本稿では,仮定なしで短時間の制約付き安全なRLを考える。 (i)$ と generalize $ (ii)Kernel Hilbert Space (RKHS)を再生するために。提案アルゴリズムであるLSVI-AEは,コスト関数が線形な場合のハード制約違反を$\tilde{\cO}(\sqrt{d^3H^4K})$後悔と$\tilde{\cO}(H \sqrt{dK})$コスト関数がRKHSに属する場合のハード制約違反を$\cO(H\gamma_K \sqrt{K})$ハード制約違反を達成している。ここで$K$は学習の地平線、$H$は各エピソードの長さ、$\gamma_K$はコスト関数の近似に使用されるカーネルの情報ゲインである。本論文では,学習用地平線への最適依存性をK$で実現し,LSVI-AEの効率性を実証した。特に,本手法の設計は積極的政策探索を奨励し,一般費用関数による安全RLのユニークな視点と,独立性のある安全行動に関する事前の知識を提供する。

This paper studies safe Reinforcement Learning (safe RL) with linear function approximation and under hard instantaneous constraints where unsafe actions must be avoided at each step. Existing studies have considered safe RL with hard instantaneous constraints, but their approaches rely on several key assumptions: $(i)$ the RL agent knows a safe action set for {\it every} state or knows a {\it safe graph} in which all the state-action-state triples are safe, and $(ii)$ the constraint/cost functions are {\it linear}. In this paper, we consider safe RL with instantaneous hard constraints without assumption $(i)$ and generalize $(ii)$ to Reproducing Kernel Hilbert Space (RKHS). Our proposed algorithm, LSVI-AE, achieves $\tilde{\cO}(\sqrt{d^3H^4K})$ regret and $\tilde{\cO}(H \sqrt{dK})$ hard constraint violation when the cost function is linear and $\cO(H\gamma_K \sqrt{K})$ hard constraint violation when the cost function belongs to RKHS. Here $K$ is the learning horizon, $H$ is the length of each episode, and $\gamma_K$ is the information gain w.r.t the kernel used to approximate cost functions. Our results achieve the optimal dependency on the learning horizon $K$, matching the lower bound we provide in this paper and demonstrating the efficiency of LSVI-AE. Notably, the design of our approach encourages aggressive policy exploration, providing a unique perspective on safe RL with general cost functions and no prior knowledge of safe actions, which may be of independent interest.

翻訳日:2023-12-25 15:49:16 公開日:2023-12-22

# FM-OV3D:オープン語彙検出のための基礎モデルに基づくクロスモーダル知識ブレンディング

FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection ( http://arxiv.org/abs/2312.14465v1 )

ライセンス: Link先を確認

Dongmei Zhang, Chang Li, Ray Zhang, Shenghao Xie, Wei Xue, Xiaodong Xie, Shanghang Zhang

(参考訳) 様々な視覚タスクにおける事前訓練された基礎モデルの優れた性能は、2Dモデルのオープン語彙能力を高める可能性を示している。既存の方法は3D空間における類似の応用を探索する。しかし、そのほとんどは特異基盤モデルからの知識抽出のみに集中しており、3次元モデルの開語彙能力を制限している。様々な基礎モデルから相補的な事前学習知識を活用することで、2次元事前学習された視覚言語モデルから3次元空間への知識伝達を改善することができると仮定する。本研究では,複数の事前学習基礎モデルの知識をブレンドすることで,3次元モデルのオープンな局所化と認識能力を向上し,本来の3次元データセットの制約に直面することなく真のオープンな語彙を実現する,基礎モデルに基づくクロスモーダル知識ブレンディング法FM-OV3Dを提案する。具体的には, 開語彙3次元定位能力を学ぶために, 接地セグメンツモデルにおける開語彙定位知識を採用する。オープン語彙の3D認識能力には,GPT-3や安定拡散モデルなどの生成基盤モデルの知識とCLIPのような相互識別モデルを活用する。オープンボカブラリ3dオブジェクト検出のための2つの人気のあるベンチマーク実験の結果から,複数のファンデーションモデルから知識を効率的に学習し,オープンボカブラリ3dオブジェクト検出タスクにおいて,オープンボカブラリモデルのオープンボカブラリ能力を高め,最先端のパフォーマンスを達成することができた。コードはhttps://github.com/dmzhang0425/fm-ov3d.gitでリリースされる。

The superior performances of pre-trained foundation models in various visual tasks underscore their potential to enhance the 2D models' open-vocabulary ability. Existing methods explore analogous applications in the 3D space. However, most of them only center around knowledge extraction from singular foundation models, which limits the open-vocabulary ability of 3D models. We hypothesize that leveraging complementary pre-trained knowledge from various foundation models can improve knowledge transfer from 2D pre-trained visual language models to the 3D space. In this work, we propose FM-OV3D, a method of Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection, which improves the open-vocabulary localization and recognition abilities of 3D model by blending knowledge from multiple pre-trained foundation models, achieving true open-vocabulary without facing constraints from original 3D datasets. Specifically, to learn the open-vocabulary 3D localization ability, we adopt the open-vocabulary localization knowledge of the Grounded-Segment-Anything model. For open-vocabulary 3D recognition ability, We leverage the knowledge of generative foundation models, including GPT-3 and Stable Diffusion models, and cross-modal discriminative models like CLIP. The experimental results on two popular benchmarks for open-vocabulary 3D object detection show that our model efficiently learns knowledge from multiple foundation models to enhance the open-vocabulary ability of the 3D model and successfully achieves state-of-the-art performance in open-vocabulary 3D object detection tasks. Code is released at https://github.com/dmzhang0425/FM-OV3D.git.

翻訳日:2023-12-25 15:48:39 公開日:2023-12-22

# CaptainCook4D: 手続き的アクティビティにおけるエラーを理解するデータセット

CaptainCook4D: A dataset for understanding errors in procedural activities ( http://arxiv.org/abs/2312.14556v1 )

ライセンス: Link先を確認

Rohith Peddi, Shivvrat Arya, Bharath Challa, Likhitha Pallapothula, Akshay Vyas, Jikai Wang, Qifan Zhang, Vasundhara Komaragiri, Eric Ragan, Nicholas Ruozzi, Yu Xiang, Vibhav Gogate

(参考訳) ステップバイステップの手順は、日常生活において個人が行う様々な活動に不可欠な要素である。これらの手順は、家具の組み立てやレシピの作成など、目標を効率的に達成するための指針となる。しかし、手続き活動の複雑さと持続性は本質的にエラーを起こす可能性を高める。このような手続き的アクティビティを一連のフレームから理解することは、視覚情報の正確な解釈とアクティビティの構造を推論する能力を必要とする難しいタスクである。そこで,本研究では,キッチン環境でレシピを行う384人の記録(94.5時間)からなる,エゴセントリックな4dデータセットcaptaincook4dを収集した。このデータセットは、2つの異なるタイプのアクティビティで構成されている。1つは参加者が提供されたレシピの指示に従属し、もう1つはエラーを逸脱し誘発する。我々は5.3Kステップアノテーションと10Kきめ細かいアクションアノテーションを提供し、以下のタスクのデータセットをベンチマークする:教師付きエラー認識、マルチステップローカライゼーション、手続き学習。

Following step-by-step procedures is an essential component of various activities carried out by individuals in their daily lives. These procedures serve as a guiding framework that helps to achieve goals efficiently, whether it is assembling furniture or preparing a recipe. However, the complexity and duration of procedural activities inherently increase the likelihood of making errors. Understanding such procedural activities from a sequence of frames is a challenging task that demands an accurate interpretation of visual information and the ability to reason about the structure of the activity. To this end, we collect a new egocentric 4D dataset, CaptainCook4D, comprising 384 recordings (94.5 hours) of people performing recipes in real kitchen environments. This dataset consists of two distinct types of activity: one in which participants adhere to the provided recipe instructions and another in which they deviate and induce errors. We provide 5.3K step annotations and 10K fine-grained action annotations and benchmark the dataset for the following tasks: supervised error recognition, multistep localization, and procedure learning

翻訳日:2023-12-25 15:41:29 公開日:2023-12-22

# 構造誘導材料のための機械学習とプロセス設計

Machine learning for structure-guided materials and process design ( http://arxiv.org/abs/2312.14552v1 )

ライセンス: Link先を確認

Lukas Morand, Tarek Iraki, Johannes Dornheim, Stefan Sandfeld, Norbert Link, Dirk Helm

(参考訳) 近年は、研究と産業の両方において、材料革新の加速への関心が高まっている。しかし、新しい先端材料の開発に真に価値を加えるためには、製造工程を考慮し、下流のプロセス設計アプローチをサポートする材料設計アプローチを調整することが不可欠である。この方向への大きなステップとして、材料プロセス-構造-プロパティチェーン全体を網羅する全体最適化アプローチを提案する。本手法では,2つの重要な識別問題に対処するために,機械学習技術を用いる。 1つ目は、望まれるマクロな特性を示す準最適材料構造を識別する材料設計問題の解決である。 2つ目は、これらの材料構造を製造するための最適な処理経路を見つけるプロセス設計問題を解決することである。どちらの識別問題も典型的には不十分であり、ソリューションアプローチにおいて重要な課題となる。しかし、これらの問題の非特異性もまた、処理に重要な利点をもたらす: 同様に機能するターゲット構造を複数持つことにより、対応するプロセスは、最適な到達可能な構造を製造するために効率的にガイドすることができる。特に,材料設計のためのマルチタスク学習に基づく最適化手法と組み合わせて,プロセス設計に深層強化学習を適用する。このアプローチの機能は、金属成形プロセスにおいて所望の特性を有する結晶テクスチャを製造するために使用することで実証される。

In recent years, there has been a growing interest in accelerated materials innovation in both, research and industry. However, to truly add value to the development of new advanced materials, it is inevitable to take into account manufacturing processes and thereby tailor materials design approaches to support downstream process design approaches. As a major step into this direction, we present a holistic optimization approach that covers the entire materials process-structure-property chain. Our approach specifically employs machine learning techniques to address two critical identification problems. The first is to solve a materials design problem, which involves identifying near-optimal material structures that exhibit desired macroscopic properties. The second is to solve a process design problem that is to find an optimal processing path to manufacture these material structures. Both identification problems are typically ill-posed, which presents a significant challenge for solution approaches. However, the non-unique nature of these problems also offers an important advantage for processing: By having several target structures that perform similarly well, the corresponding processes can be efficiently guided towards manufacturing the best reachable structure. In particular, we apply deep reinforcement learning for process design in combination with a multi-task learning-based optimization approach for materials design. The functionality of the approach will be demonstrated by using it to manufacture crystallographic textures with desired properties in a metal forming process.

翻訳日:2023-12-25 15:41:13 公開日:2023-12-22

# 因果独立源を持つ絡み合いスワッピング量子ネットワークにおける実量子理論の排除の提案

Proposals for ruling out the real quantum theories in an entanglement-swapping quantum network with causally independent sources ( http://arxiv.org/abs/2312.14547v1 )

ライセンス: Link先を確認

Jian Yao, Hu Chen, Ya-Li Mao, Zheng-Da Li, Jingyun Fan

(参考訳) 量子論における複素数の役割に関する問題は、量子力学の開始以来議論されてきた。近年,ベル非局所性テスト手法に基づく実量子論と複素量子論の区別が実現可能な提案が現れた [nature 600, 625-629 (2021)]。この方法に基づいて、実量子論は光速量子系と超伝導量子系の両方(Phys. Lett. 128, 040402 (2022), Phys. Lett. 128, 040403 (2022)]で実験的にファルシファイドされている。因果関係のない複数の独立したソースを持つ量子ネットワークは、非局所性の研究に新たな視点を提供するため、大きな関心を集めている。これらのソースの独立性は、観測可能な共分散にさらなる制約を課し、古典的および量子的相関に対する新しい境界をもたらす。本研究では,2つの源が因果独立であるという強い仮定の下で,絡み合いのシナリオを持つ実数理論と複素数理論の区別について検討した。改良されたNavascu\'es-Pironio-Ac\in法とベイジアン最適化を用いて、相関関数の最適係数を用いて、既存の提案と比較した実数理論と量子理論をより大きく区別できる提案を求める。この研究は、因果独立なパーティを特徴とする複雑な量子ネットワーク内の実と複素量子理論の識別をさらに探求するための道を開く。

The question of whether complex numbers play a fundamental role in quantum theory has been debated since the inception of quantum mechanics. Recently, a feasible proposal to differentiate between real and complex quantum theories based on the technique of testing Bell nonlocalities has emerged [Nature 600, 625-629 (2021)]. Based on this method, the real quantum theory has been falsified experimentally in both photonic and superconducting quantum systems [Phys. Rev. Lett. 128, 040402 (2022), Phys. Rev. Lett. 128, 040403 (2022)]. The quantum networks with multiple independent sources which are not causally connected have gained significant interest as they offer new perspective on studying the nonlocalities. The independence of these sources imposes additional constraints on observable covariances and leads to new bounds for classical and quantum correlations. In this study, we examine the discrimination between the real and complex quantum theories with an entanglement swapping scenario under a stronger assumption that the two sources are causally independent, which wasn't made in previous works. Using a revised Navascu\'es-Pironio-Ac\'in method and Bayesian optimization, we find a proposal with optimal coefficients of the correlation function which could give a larger discrimination between the real and quantum theories comparing with the existing proposals. This work opens up avenues for further exploration of the discrimination between real and complex quantum theories within intricate quantum networks featuring causally independent parties.

翻訳日:2023-12-25 15:40:54 公開日:2023-12-22

# パスポートフォーマットへの顔画像の包括的正規化

Inclusive normalization of face images to passport format ( http://arxiv.org/abs/2312.14544v1 )

ライセンス: Link先を確認

Hongliu Cao, Minh Nhat Do, Alexis Ravanel, Eoin Thomas

(参考訳) 近年、顔認識は現実世界のアプリケーションでますます使われている。しかし、肌の色バイアスと過酷な照明などの個人内変異を組み合わせると、人間の検査中にも顔認識タスクが失敗する可能性が高くなる。顔の正規化手法は、同一性を保ちながら入力画像から個人内変動を取り除き、このような課題に対処しようとする。しかし、ほとんどの顔の正規化法は1つまたは2つのバリエーションだけを取り除き、肌の色バイアスのようなデータセットバイアスを無視することができる。多くの顔正規化法の出力も人間の観察者には現実的ではない。本研究では、ポーズ、悪い照明、低解像度、ぼやけ、表情、サングラスのようなアクセサリーなどの大きな変化を含む、ほとんどの個人内変異を取り除くために、スタイルベースの顔正規化モデル(StyleFNM)を提案する。この論文では、事前訓練されたGANを制御して、パスポートのような画像のバランスの取れたデータセットを生成することにより、データセットバイアスも扱う。実験により、StyleFNMはよりリアルな出力を生成でき、顔認識システムの精度と公平性を大幅に向上できることが示された。

Face recognition has been used more and more in real world applications in recent years. However, when the skin color bias is coupled with intra-personal variations like harsh illumination, the face recognition task is more likely to fail, even during human inspection. Face normalization methods try to deal with such challenges by removing intra-personal variations from an input image while keeping the identity the same. However, most face normalization methods can only remove one or two variations and ignore dataset biases such as skin color bias. The outputs of many face normalization methods are also not realistic to human observers. In this work, a style based face normalization model (StyleFNM) is proposed to remove most intra-personal variations including large changes in pose, bad or harsh illumination, low resolution, blur, facial expressions, and accessories like sunglasses among others. The dataset bias is also dealt with in this paper by controlling a pretrained GAN to generate a balanced dataset of passport-like images. The experimental results show that StyleFNM can generate more realistic outputs and can improve significantly the accuracy and fairness of face recognition systems.

翻訳日:2023-12-25 15:40:26 公開日:2023-12-22

# 言語横断要約のための自動データ検索

Automatic Data Retrieval for Cross Lingual Summarization ( http://arxiv.org/abs/2312.14542v1 )

ライセンス: Link先を確認

Nikhilesh Bhatnagar, Ashok Urlana, Vandan Mujadia, Pruthwik Mishra, Dipti Misra Sharma

(参考訳) 言語間の要約では、ある言語で書かれたテキストを別の言語に要約する。英語から他のヨーロッパ諸言語への言語間要約を扱う研究機関がある。本研究では,英語からヒンディー語への言語間要約を実現することを目的とする。テキストとビデオのフォーマットでニュースにふさわしいイベントのカバレッジをペアリングすることで、クロスランガル要約のためのデータ取得に役立つことを示す。本稿では,データを分析し,文書と要約のペアとして機能するビデオ記述とをマッチングする手法を提案する。また,要約の正しさを確保するため,合理的なしきい値に対するフィルタリング手法を概説する。さらに、28,583のmonoおよびcross-lingual article-summary pairs https://github.com/tingc9/cross-sum-news-alignedを利用可能にする。また、収集したデータの複数のベースラインを構築し分析し、エラーを報告します。

Cross-lingual summarization involves the summarization of text written in one language to a different one. There is a body of research addressing cross-lingual summarization from English to other European languages. In this work, we aim to perform cross-lingual summarization from English to Hindi. We propose pairing up the coverage of newsworthy events in textual and video format can prove to be helpful for data acquisition for cross lingual summarization. We analyze the data and propose methods to match articles to video descriptions that serve as document and summary pairs. We also outline filtering methods over reasonable thresholds to ensure the correctness of the summaries. Further, we make available 28,583 mono and cross-lingual article-summary pairs https://github.com/tingc9/Cross-Sum-News-Aligned. We also build and analyze multiple baselines on the collected data and report error analysis.

翻訳日:2023-12-25 15:40:05 公開日:2023-12-22

# 戦略学習による適応的再収束駆動AIG書き換え

Adaptive Reconvergence-driven AIG Rewriting via Strategy Learning ( http://arxiv.org/abs/2312.14536v1 )

ライセンス: Link先を確認

Liwei Ni, Zonglin Yang, Jiaxi Zhang, Junfeng Liu, Huawei Li, Biwei Xie, Xinquan Li

(参考訳) リライトは、回路の性能、パワー、面積(PPA)を改善することを目的とした論理合成における一般的な手順である。従来のreconvergence-driven and-inverter graph (aig) rewriting法では、ブール代数の最小化によるreconvergence coneの最適化にのみ焦点が当てられている。しかし、特定のコーンに適した他のノード書き換えアルゴリズムを組み込む機会がある。本稿では,マルチストラテジーに基づくAIG書き換えと戦略学習に基づくアルゴリズム選択という,2つの重要な手法を組み合わせた適応型再収束型AIG書き換えアルゴリズムを提案する。マルチストラテジーベースの書き換え手法は、マルチノード書き換えアルゴリズムのサポートを取り入れ、最適化空間を拡大することで従来のアプローチに拡張する。さらに、戦略学習に基づくアルゴリズム選択法は、与えられたコーンに対して最も適切なノード書き換えアルゴリズムを決定する。実験の結果,本手法は5.567\%,深さ5.327\%の有意な改善が得られた。

Rewriting is a common procedure in logic synthesis aimed at improving the performance, power, and area (PPA) of circuits. The traditional reconvergence-driven And-Inverter Graph (AIG) rewriting method focuses solely on optimizing the reconvergence cone through Boolean algebra minimization. However, there exist opportunities to incorporate other node-rewriting algorithms that are better suited for specific cones. In this paper, we propose an adaptive reconvergence-driven AIG rewriting algorithm that combines two key techniques: multi-strategy-based AIG rewriting and strategy learning-based algorithm selection. The multi-strategy-based rewriting method expands upon the traditional approach by incorporating support for multi-node-rewriting algorithms, thus expanding the optimization space. Additionally, the strategy learning-based algorithm selection method determines the most suitable node-rewriting algorithm for a given cone. Experimental results demonstrate that our proposed method yields a significant average improvement of 5.567\% in size and 5.327\% in depth.

翻訳日:2023-12-25 15:39:51 公開日:2023-12-22

# ADA-GAD:グラフ異常検出用自動符号化器

ADA-GAD: Anomaly-Denoised Autoencoders for Graph Anomaly Detection ( http://arxiv.org/abs/2312.14535v1 )

ライセンス: Link先を確認

Junwei He, Qianqian Xu, Yangbangyan Jiang, Zitai Wang, Qingming Huang

(参考訳) グラフ異常検出(graph anomaly detection)は、グラフ内の通常の振る舞いから逸脱するノードを特定する上で極めて重要である。既存の再構成に基づく手法はかなり成功したが、グラフの異常パターンによって生じる \textit{Anomaly Overfitting} と \textit{Homophily Trap} の問題に直面し、通常のノードが異常なノードよりもよく再構成されるという仮定を破ることがある。その結果,異常の少ないグラフで学習したモデルの方が検出性能が高いことがわかった。この知見に基づき、我々はAnomaly-Denoized Autoencoders for Graph Anomaly Detection (ADA-GAD)と呼ばれる新しい2段階のフレームワークを導入する。第1段階では,異常レベルを低減したグラフを生成する学習自由な異常化拡張法を設計する。グラフオートエンコーダを複数のレベルで事前訓練することにより、グラフオートエンコーダは通常のパターンをキャプチャできる。次の段階では、デコーダは元のグラフで検出するために再訓練され、前段で学んだマルチレベル表現の恩恵を受ける。一方、ノード異常分布正規化を提案し、さらに \textit{anomaly overfitting} を緩和する。本手法の有効性を検証するために,合成データと実世界データの両方について広範な実験を行った。

Graph anomaly detection is crucial for identifying nodes that deviate from regular behavior within graphs, benefiting various domains such as fraud detection and social network. Although existing reconstruction-based methods have achieved considerable success, they may face the \textit{Anomaly Overfitting} and \textit{Homophily Trap} problems caused by the abnormal patterns in the graph, breaking the assumption that normal nodes are often better reconstructed than abnormal ones. Our observations indicate that models trained on graphs with fewer anomalies exhibit higher detection performance. Based on this insight, we introduce a novel two-stage framework called Anomaly-Denoised Autoencoders for Graph Anomaly Detection (ADA-GAD). In the first stage, we design a learning-free anomaly-denoised augmentation method to generate graphs with reduced anomaly levels. We pretrain graph autoencoders on these augmented graphs at multiple levels, which enables the graph autoencoders to capture normal patterns. In the next stage, the decoders are retrained for detection on the original graph, benefiting from the multi-level representations learned in the previous stage. Meanwhile, we propose the node anomaly distribution regularization to further alleviate \textit{Anomaly Overfitting}. We validate the effectiveness of our approach through extensive experiments on both synthetic and real-world datasets.

翻訳日:2023-12-25 15:39:30 公開日:2023-12-22

# 個人情報のないユーザマッチングのための多視点ユーザ表現学習

Multi-view user representation learning for user matching without personal information ( http://arxiv.org/abs/2312.14533v1 )

ライセンス: Link先を確認

Hongliu Cao, Ilias El Baamrani, Eoin Thomas

(参考訳) 旅行産業のデジタル化が加速するにつれ、旅行者の行動の分析と理解がますます重要になる。しかし,旅行者データには,旅行プロバイダとのユーザインタラクションの頻度が比較的低いため,高いデータ間隔が生じることが多い。この効果を複雑にすることで、デバイス、アカウント、プラットフォームをオンラインでブラウジングするときに、データ分散ももたらされる。これらの課題に対処するために、確率的トラベラーマッチングが使用できる。トラベラーのブラウジング履歴は一般的に短く、旅行業界のurlは多くのトークンと非常に異質であるので、既存のユーザマッチングソリューションのほとんどはトラベラーマッチングには適していない。これらの課題に対処するために、類似性に基づく多視点情報融合を提案し、URLを多視点データとして扱うことにより、URLからより良いユーザ表現を学習する。実験の結果,提案した多視点ユーザ表現学習は,異なるビューからの相補的な情報を活用でき,URLのキー情報を強調表示し,ユーザマッチングタスクの他の表現学習ソリューションよりもはるかに優れた性能を発揮することがわかった。

As the digitization of travel industry accelerates, analyzing and understanding travelers' behaviors becomes increasingly important. However, traveler data frequently exhibit high data sparsity due to the relatively low frequency of user interactions with travel providers. Compounding this effect the multiplication of devices, accounts and platforms while browsing travel products online also leads to data dispersion. To deal with these challenges, probabilistic traveler matching can be used. Most existing solutions for user matching are not suitable for traveler matching as a traveler's browsing history is typically short and URLs in the travel industry are very heterogeneous with many tokens. To deal with these challenges, we propose the similarity based multi-view information fusion to learn a better user representation from URLs by treating the URLs as multi-view data. The experimental results show that the proposed multi-view user representation learning can take advantage of the complementary information from different views, highlight the key information in URLs and perform significantly better than other representation learning solutions for the user matching task.

翻訳日:2023-12-25 15:39:07 公開日:2023-12-22

# DuaLight: シナリオ特有かつシナリオ共有知識を活用した交通信号制御の強化

DuaLight: Enhancing Traffic Signal Control by Leveraging Scenario-Specific and Scenario-Shared Knowledge ( http://arxiv.org/abs/2312.14532v1 )

ライセンス: Link先を確認

Jiaming Lu, Jingqing Ruan, Haoyuan Jiang, Ziyue Li, Hangyu Mao and Rui Zhao

(参考訳) 強化学習は従来の交通信号制御タスクに革命をもたらしており、混雑を緩和し効率を向上する有望な力を示している。しかし,既存の手法では,特定のシナリオに固有の動的情報を吸収し,様々なシナリオにまたがる動的情報を普遍的に適用できる効果的な学習機構が欠如している。さらに、それぞれのシナリオにおいて、隣り合う交差点とターゲットの交差点の調整方法に関する本質的な経験を完全に捉えることができず、システム全体の準最適結果をもたらす。これらの問題を考察し、単一のシナリオにおける経験情報と様々なシナリオにわたる一般化可能な情報の両方を活用することを目的としたDuaLightを提案する。具体的には、DuaLightは2つの学習可能な部分を持つシナリオ固有の経験的加重モジュールを紹介している。さらに,シナリオ共有型Co-Trainモジュールを実装し,シナリオ間の動的情報の一般化を容易にする。実世界のシナリオと合成のシナリオの実証結果から、dualightはさまざまなメトリクスで競争力のあるパフォーマンスを達成し、交通渋滞を緩和するための有望なソリューションを提供する。コードは、https://github.com/lujiaming-12138/DuaLight.comで入手できる。

Reinforcement learning has been revolutionizing the traditional traffic signal control task, showing promising power to relieve congestion and improve efficiency. However, the existing methods lack effective learning mechanisms capable of absorbing dynamic information inherent to a specific scenario and universally applicable dynamic information across various scenarios. Moreover, within each specific scenario, they fail to fully capture the essential empirical experiences about how to coordinate between neighboring and target intersections, leading to sub-optimal system-wide outcomes. Viewing these issues, we propose DuaLight, which aims to leverage both the experiential information within a single scenario and the generalizable information across various scenarios for enhanced decision-making. Specifically, DuaLight introduces a scenario-specific experiential weight module with two learnable parts: Intersection-wise and Feature-wise, guiding how to adaptively utilize neighbors and input features for each scenario, thus providing a more fine-grained understanding of different intersections. Furthermore, we implement a scenario-shared Co-Train module to facilitate the learning of generalizable dynamics information across different scenarios. Empirical results on both real-world and synthetic scenarios show DuaLight achieves competitive performance across various metrics, offering a promising solution to alleviate traffic congestion, with 3-7\% improvements. The code is available under: https://github.com/lujiaming-12138/DuaLight.

翻訳日:2023-12-25 15:38:47 公開日:2023-12-22

# ZodiacEdge: インクリメンタルルールセットメンテナンスを備えたデータログエンジン

ZodiacEdge: a Datalog Engine With Incremental Rule Set Maintenance ( http://arxiv.org/abs/2312.14530v1 )

ライセンス: Link先を確認

Weiqin Xu and Olivier Cur\'e

(参考訳) 本稿では,ルールセットを更新可能なとき,データログの実体化の漸進的なメンテナンスに取り組む。これはモノのインターネットとエッジコンピューティングの文脈において特に重要であり、スマートデバイスはデータログルールに代表される新しい知識を推論する必要があるかもしれない。提案手法は,データログプログラムにおけるルールセットに対応するノードを依存ハイパーグラフに適用する階層化戦略の適応に基づいている。ネゲーションとアグリゲーションの両方を含む再帰的ルールをサポートしている。本システムの有効性を実データおよび合成データで実証する。

In this paper, we tackle the incremental maintenance of Datalog inference materialisation when the rule set can be updated. This is particularly relevant in the context of the Internet of Things and Edge computing where smart devices may need to reason over newly acquired knowledge represented as Datalog rules. Our solution is based on an adaptation of a stratification strategy applied to a dependency hypergraph whose nodes correspond to rule sets in a Datalog program. Our implementation supports recursive rules containing both negation and aggregation. We demonstrate the effectiveness of our system on real and synthetic data.

翻訳日:2023-12-25 15:38:22 公開日:2023-12-22

# 一層ニューラルネットワークのための効率的かつ効率的なグリーンフェデレーション学習法

An effective and efficient green federated learning method for one-layer neural networks ( http://arxiv.org/abs/2312.14528v1 )

ライセンス: Link先を確認

Oscar Fontenla-Romero, Bertha Guijarro-Berdi\~nas, Elena Hern\'andez-Pereira, Beatriz P\'erez-S\'anchez

(参考訳) 現在、機械学習アルゴリズムは複雑さを増し続けており、かなりの量の計算資源とエネルギーを必要としている。これらの理由から、新しいグリーンアルゴリズムの開発に対する認識が高まっており、分散AIがこれに寄与することができる。フェデレーション学習(federated learning, fl)は、分散的な方法で協調モデルのトレーニングを可能にするため、マシンラーニングで最も活発な研究分野の1つであり、iot(internet of things)など、多くの実環境において興味深い選択肢であり、エッジコンピューティングデバイスでこれらのモデルを使用することを可能にする。本研究では,隠れレイヤのないニューラルネットワークに基づくfl法を提案する。従来のfl法とは異なり,単一のトレーニングラウンドでグローバルな協調モデルを生成することができる。これにより、トレーニングプロセスの管理を簡単にする効率的で効率的なモデルを得ることができる。さらに、この手法は、現在のデータ保護規制において重要な側面である、設計によるデータのプライバシを保持する。大規模データセットと多数のフェデレーションクライアントを用いて実験を行った。隠れたレイヤを持たないネットワークモデルに基づいているにもかかわらず、より複雑な最先端の機械学習モデルと比較して、あらゆるケースで競合する精度が維持される。さらに,本手法は同一および非同一の分散シナリオでも等しく動作することを示す。最後に、これは環境に優しいアルゴリズムであり、中央集権的なアルゴリズムに比べてトレーニングプロセス中にかなりの省エネを可能にする。

Nowadays, machine learning algorithms continue to grow in complexity and require a substantial amount of computational resources and energy. For these reasons, there is a growing awareness of the development of new green algorithms and distributed AI can contribute to this. Federated learning (FL) is one of the most active research lines in machine learning, as it allows the training of collaborative models in a distributed way, an interesting option in many real-world environments, such as the Internet of Things, allowing the use of these models in edge computing devices. In this work, we present a FL method, based on a neural network without hidden layers, capable of generating a global collaborative model in a single training round, unlike traditional FL methods that require multiple rounds for convergence. This allows obtaining an effective and efficient model that simplifies the management of the training process. Moreover, this method preserve data privacy by design, a crucial aspect in current data protection regulations. We conducted experiments with large datasets and a large number of federated clients. Despite being based on a network model without hidden layers, it maintains in all cases competitive accuracy results compared to more complex state-of-the-art machine learning models. Furthermore, we show that the method performs equally well in both identically and non-identically distributed scenarios. Finally, it is an environmentally friendly algorithm as it allows significant energy savings during the training process compared to its centralized counterpart.

翻訳日:2023-12-25 15:38:13 公開日:2023-12-22

# 一般化幾何相:数学的側面

Generalised Geometric Phase: Mathematical Aspects ( http://arxiv.org/abs/2312.14522v1 )

ライセンス: Link先を確認

Vivek M. Vyas

(参考訳) 幾何位相の概念の作用素一般化は、最近、純粋に物理的根拠に基づいて提案されている。ここでは、量子系の新しい幾何学構造を発見しながら、その存在の数学的基礎を提供する。観測可能な観測可能な平均を探索する一方で、量子系は異なる光線空間と関連するファイバー束構造を示す。一般化された幾何学的位相は、これらのファイバー束上の接続のホロノミーとして理解される。一般に基底となるレイ空間は擬ケーラー多様体であり、そのシンプレクティック構造は一般化された幾何学的位相として現れる。

An operator generalisation of the notion of geometric phase has been recently proposed purely based on physical grounds. Here we provide a mathematical foundation for its existence, while uncovering new geometrical structures in quantum systems. While probing the average of any observable it is found that a quantum system exhibits different ray spaces and associated fibre bundle structures. The generalised geometric phase is understood as (an)holonomy of a connection over these fibre bundles. The underlying ray spaces in general are found to be pseudo-Kahler manifolds, and its symplectic structure gets manifests as the generalised geometric phase.

翻訳日:2023-12-25 15:37:52 公開日:2023-12-22

# 量子エラー補正による量子コンピューティングプライバシのチューニング

Tuning Quantum Computing Privacy through Quantum Error Correction ( http://arxiv.org/abs/2312.14521v1 )

ライセンス: Link先を確認

Hui Zhong, Keyi Ju, Manojna Sistla, Xinyue Zhang, Xiaoqi Qin, Xin Fu, Miao Pan

(参考訳) 量子コンピューティングは、大規模で複雑な問題を解決する上で有望なパラダイムである。量子コンピューティングのプライバシを保護するため、量子コンピューティングにおける差分プライバシ(英語版)(qdp)を再定義し、量子コンピューティングによって生成された固有ノイズを収集してqdpを実装するための研究の先駆者となる。しかしながら、そのような実装アプローチは、qdp機構のプライバシー予算を固定し制御不能にする固有のノイズの量によって制限される。本稿では,量子誤り訂正(quantum error correction, qec)技術を用いて,qdpにおけるプライバシー保護レベルを調整しながら,量子コンピューティングの誤りを減らすことを提案する。要するに、複数の単一キュービットゲート回路において、ゲートにQEC演算を適用するかどうかを決定することにより、量子ノイズエラー率を徐々に減少させる。我々はQEC後の一般誤差率とそれに対応するプライバシー予算の新しい計算式を導出した。そして、マルチレベル連結QEC演算を用いて、さらにノイズ低減を実現する。広範な数値シミュレーションを通じて,量子コンピューティングにおけるプライバシ保護の程度を規定する手段としてQECが実現可能であることを示す。

Quantum computing is a promising paradigm for efficiently solving large and high-complexity problems. To protect quantum computing privacy, pioneering research efforts proposed to redefine differential privacy (DP) in quantum computing, i.e., quantum differential privacy (QDP), and harvest inherent noises generated by quantum computing to implement QDP. However, such an implementation approach is limited by the amount of inherent noises, which makes the privacy budget of the QDP mechanism fixed and uncontrollable. To address this issue, in this paper, we propose to leverage quantum error correction (QEC) techniques to reduce quantum computing errors, while tuning the privacy protection levels in QDP. In short, we gradually decrease the quantum noise error rate by deciding whether to apply QEC operations on the gate in a multiple single qubit gates circuit. We have derived a new calculation formula for the general error rate and corresponding privacy budgets after QEC operation. Then, we expand to achieve further noise reduction using multi-level concatenated QEC operation. Through extensive numerical simulations, we demonstrate that QEC is a feasible way to regulate the degree of privacy protection in quantum computing.

翻訳日:2023-12-25 15:37:43 公開日:2023-12-22

# ニューロン分類のための置換不変エンコーダを用いた統合学習神経骨格と脳回路トポロジー

Joint Learning Neuronal Skeleton and Brain Circuit Topology with Permutation Invariant Encoders for Neuron Classification ( http://arxiv.org/abs/2312.14518v1 )

ライセンス: Link先を確認

Minghui Liao, Guojia Wan, Bo Du

(参考訳) 神経系内のニューロンの種類を決定することは、脳コネクトミクスの分析や神経疾患の研究において重要な役割を果たす。しかし、ニューロンの解剖学的、生理学的、分子的特性を利用する効率は比較的低く費用がかかる。脳組織の電子顕微鏡イメージングと解析技術の進歩により、我々は神経細胞の高分解能形態と接続情報からなる全脳コネクトームを得ることができる。しかし、そのようなデータに基づいて自動ニューロン分類を行うモデルはほとんどない。本稿では,スケルトンから得られるニューロンの形態情報と神経回路から得られるニューロン間のトポロジ情報を組み合わせたフレームワークであるNeuNetを提案する。具体的には、NeuNetはSkeleton Encoder、Connectome Encoder、Readout Layerという3つのコンポーネントで構成されている。スケルトンエンコーダは、神経骨格の点データに1次元の畳み込みを伴うボトムアップ方式でニューロンの局所情報を統合し、コネクトームエンコーダはグラフニューラルネットワークを使用して神経回路のトポロジ情報を取得し、読み出し層は上記の2つの情報を融合し、分類結果を出力する。ヒト大脳皮質とショウジョウバエ脳のボリューム電子顕微鏡(vem)画像から,ニューロン分類タスクのための2つの新しいデータセットを再処理し,公開する。これら2つのデータセットに対する実験により, 精度0.9169と0.9363のモデルの有効性が示された。コードとデータは、https://github.com/WHUminghui/NeuNet.comで入手できる。

Determining the types of neurons within a nervous system plays a significant role in the analysis of brain connectomics and the investigation of neurological diseases. However, the efficiency of utilizing anatomical, physiological, or molecular characteristics of neurons is relatively low and costly. With the advancements in electron microscopy imaging and analysis techniques for brain tissue, we are able to obtain whole-brain connectome consisting neuronal high-resolution morphology and connectivity information. However, few models are built based on such data for automated neuron classification. In this paper, we propose NeuNet, a framework that combines morphological information of neurons obtained from skeleton and topological information between neurons obtained from neural circuit. Specifically, NeuNet consists of three components, namely Skeleton Encoder, Connectome Encoder, and Readout Layer. Skeleton Encoder integrates the local information of neurons in a bottom-up manner, with a one-dimensional convolution in neural skeleton's point data; Connectome Encoder uses a graph neural network to capture the topological information of neural circuit; finally, Readout Layer fuses the above two information and outputs classification results. We reprocess and release two new datasets for neuron classification task from volume electron microscopy(VEM) images of human brain cortex and Drosophila brain. Experiments on these two datasets demonstrated the effectiveness of our model with accuracy of 0.9169 and 0.9363, respectively. Code and data are available at: https://github.com/WHUminghui/NeuNet.

翻訳日:2023-12-25 15:37:24 公開日:2023-12-22

# 微分可能DSPとスペクトル最適輸送を用いた教師なし高調波パラメータ推定

Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and Spectral Optimal Transport ( http://arxiv.org/abs/2312.14507v1 )

ライセンス: Link先を確認

Bernardo Torres (S2A, IDS, LTCI), Geoffroy Peeters (S2A, IDS, LTCI), Ga\"el Richard (S2A, IDS, LTCI)

(参考訳) ニューラルオーディオ信号処理では、ピッチコンディショニングがシンセサイザーの性能向上に使われている。しかし, 音高推定器と合成器の併用は, 標準音高再生損失を用いた場合の課題であり, 外部の音高トラッカーに依存している。そこで本稿では,スペクトルエネルギーの変位を最小化する最適輸送理論に着想を得たスペクトル損失関数を提案する。我々は、調和テンプレートを調和信号に適合させる教師なしの自動符号化タスクを通じて、このアプローチを検証する。軽量エンコーダを用いて高調波の基本周波数と振幅を共同で推定し,可微分高調波合成器を用いて信号を再構成する。提案手法は、ニューラルオーディオアプリケーションにおける教師なしパラメータ推定を改善するための有望な方向を提供する。

In neural audio signal processing, pitch conditioning has been used to enhance the performance of synthesizers. However, jointly training pitch estimators and synthesizers is a challenge when using standard audio-to-audio reconstruction loss, leading to reliance on external pitch trackers. To address this issue, we propose using a spectral loss function inspired by optimal transportation theory that minimizes the displacement of spectral energy. We validate this approach through an unsupervised autoencoding task that fits a harmonic template to harmonic signals. We jointly estimate the fundamental frequency and amplitudes of harmonics using a lightweight encoder and reconstruct the signals using a differentiable harmonic synthesizer. The proposed approach offers a promising direction for improving unsupervised parameter estimation in neural audio applications.

翻訳日:2023-12-25 15:36:59 公開日:2023-12-22

# SIG: Prompt-based generation を用いた文学における話者識別

SIG: Speaker Identification in Literature via Prompt-Based Generation ( http://arxiv.org/abs/2312.14590v1 )

ライセンス: Link先を確認

Zhenlin Su, Liyan Xu, Jin Xu, Jiangnan Li, Mingdu Huangfu

(参考訳) 物語における引用の話者を特定することは文学的分析において重要な課題であり、未知の話者に対するドメイン外推論や、周囲の文脈に話者の言及がない非議論的なケースなど、難しいシナリオがある。本研究では,設計したプロンプトテンプレートに基づいてタスクと引用入力を口頭で表現し,他の補助タスクと容易に統合し,話者識別性能をさらに高めるための簡易かつ効果的な手法であるsigを提案する。予測はモデルによる直接生成から生じるか、または各話者候補の最大生成確率によって決定される。我々のアプローチ設計に基づき、SIGはドメイン外評価をサポートし、任意の形式の候補入力を受け入れることができるオープンワールド分類パラダイムを実現する。我々は,このタスクの最大のデータセットであるPDNCにおいて,クロスドメイン評価とドメイン内評価の両方を行い,SIGがそれまでの複雑な設計のベースラインを上回り,特に難易度のないシナリオでは最大17%改善した。別のデータセットWPに関する追加実験は、SIGの有効性をさらに裏付ける。

Identifying speakers of quotations in narratives is an important task in literary analysis, with challenging scenarios including the out-of-domain inference for unseen speakers, and non-explicit cases where there are no speaker mentions in surrounding context. In this work, we propose a simple and effective approach SIG, a generation-based method that verbalizes the task and quotation input based on designed prompt templates, which also enables easy integration of other auxiliary tasks that further bolster the speaker identification performance. The prediction can either come from direct generation by the model, or be determined by the highest generation probability of each speaker candidate. Based on our approach design, SIG supports out-of-domain evaluation, and achieves open-world classification paradigm that is able to accept any forms of candidate input. We perform both cross-domain evaluation and in-domain evaluation on PDNC, the largest dataset of this task, where empirical results suggest that SIG outperforms previous baselines of complicated designs, as well as the zero-shot ChatGPT, especially excelling at those hard non-explicit scenarios by up to 17% improvement. Additional experiments on another dataset WP further corroborate the efficacy of SIG.

翻訳日:2023-12-25 15:32:09 公開日:2023-12-22

# 非線形前方拡散

Non-Denoising Forward-Time Diffusions ( http://arxiv.org/abs/2312.14589v1 )

ライセンス: Link先を確認

Stefano Peluchetti

(参考訳) 本論文のスコープは拡散過程による生成的モデリングである。このパラダイムの1つのアプローチはsong et al. (2021) の仕事であり、これは所望のデータ分散をターゲットとした拡散プロセスを構築するのに時間反転の議論に依存している。拡散確率モデルの提案に共通する時間反転の議論は不要であることを示す。拡散ブリッジを適切に混合することにより,所望のデータ分布をターゲットとした拡散過程を得る。結果として得られる輸送は、構成によって正確であり、基盤となる拡散のダイナミクスを選択する際の柔軟性が向上し、新しいトレーニング目的によってニューラルネットワークによって近似することができる。我々は,我々の時間反転アプローチに対応するドリフト調整の統一的視点を開発し,この表現を用いて拡散に基づく生成モデルの内部動作を検査する。最後に,空間統計学で一般的なスケーラブルなシミュレーションと推論手法を活用し,基礎となる拡散動力学の完全因子分布を超越する。本研究に含まれる方法論の進歩は,拡散過程に基づく生成モデリングの一般的な枠組みの確立に寄与する。

The scope of this paper is generative modeling through diffusion processes. An approach falling within this paradigm is the work of Song et al. (2021), which relies on a time-reversal argument to construct a diffusion process targeting the desired data distribution. We show that the time-reversal argument, common to all denoising diffusion probabilistic modeling proposals, is not necessary. We obtain diffusion processes targeting the desired data distribution by taking appropriate mixtures of diffusion bridges. The resulting transport is exact by construction, allows for greater flexibility in choosing the dynamics of the underlying diffusion, and can be approximated by means of a neural network via novel training objectives. We develop a unifying view of the drift adjustments corresponding to our and to time-reversal approaches and make use of this representation to inspect the inner workings of diffusion-based generative models. Finally, we leverage on scalable simulation and inference techniques common in spatial statistics to move beyond fully factorial distributions in the underlying diffusion dynamics. The methodological advances contained in this work contribute toward establishing a general framework for generative modeling based on diffusion processes.

翻訳日:2023-12-25 15:31:48 公開日:2023-12-22

# Sparse Pooled Data 問題に対する準最適 \& Efficient アルゴリズムについて

On a Near-Optimal \& Efficient Algorithm for the Sparse Pooled Data Problem ( http://arxiv.org/abs/2312.14588v1 )

ライセンス: Link先を確認

Max Hahn-Klimroth, Remco van der Hofstad, Noela M\"uller, Connor Riddlesden

(参考訳) プールデータ問題は、凝縮された測定値から、一連の未知のラベルを識別することを要求する。より正確には、$n$ アイテムが与えられたとき、各アイテムが$\cbc{0,1,\ldots, d}$ というラベルを持つと仮定する。もし$\sigma$のゼロでないエントリ数が$\theta \in (0,1)$に対して$k \sim n^{\theta}$であるなら、プールデータ問題 sparse と呼ぶ。 $\SIGMA$に関する情報はプールされた測定値から得られ、各ラベルのアイテムがプールに含まれているかを示す。最も基本的な問題は、できるだけ少数のプールを使用するプーリングスキームを設計し、高い確率で$\SIGMA$を再構築することである。問題の変種とその組合せの分岐は少なくとも35年間研究されてきた。しかし、ラベルの 'emph{efficient} 推論に関する現代の問題の研究は、理論上可能か効率の良い推論に必要となるプールの最小数の$$\log n$の統計的-計算的ギャップを示唆している。本稿では,この$\log n$-gapが人工的か,あるいは,情報理論のしきい値に非常に近い複数のプール上の新しいプールプール方式に基づいて,効率的なアルゴリズムである \algoname を設計することで,基本的な問題を解決する。

The pooled data problem asks to identify the unknown labels of a set of items from condensed measurements. More precisely, given $n$ items, assume that each item has a label in $\cbc{0,1,\ldots, d}$, encoded via the ground-truth $\SIGMA$. We call the pooled data problem sparse if the number of non-zero entries of $\SIGMA$ scales as $k \sim n^{\theta}$ for $\theta \in (0,1)$. The information that is revealed about $\SIGMA$ comes from pooled measurements, each indicating how many items of each label are contained in the pool. The most basic question is to design a pooling scheme that uses as few pools as possible, while reconstructing $\SIGMA$ with high probability. Variants of the problem and its combinatorial ramifications have been studied for at least 35 years. However, the study of the modern question of \emph{efficient} inference of the labels has suggested a statistical-to-computational gap of order $\log n$ in the minimum number of pools needed for theoretically possible versus efficient inference. In this article, we resolve the question whether this $\log n$-gap is artificial or of a fundamental nature by the design of an efficient algorithm, called \algoname, based upon a novel pooling scheme on a number of pools very close to the information-theoretic threshold.

翻訳日:2023-12-25 15:31:31 公開日:2023-12-22

# 環境に特有な人々

Environment-Specific People ( http://arxiv.org/abs/2312.14579v1 )

ライセンス: Link先を確認

Mirela Ostrek, Soubhik Sanyal, Carol O'Sullivan, Michael J. Black, Justus Thies

(参考訳) 生成画像合成とフルボディ生成の進歩にもかかわらず、最先端の手法は文脈に依存しず、テキストプロンプトに過度に依存しているか、あるいは単調な背景を持つファッション画像のようなキュレートされたトレーニングデータセットに縛られている。ここでの目標は、特定のシーンに意味的に適切な服装の人々を作ることです。そこで本研究では,既存の「野生内」写真に人物を写実的に塗り替えることのできる,コンテクスト認識フルボディ生成のための新しい手法であるespを提案する。 ESPは、環境写真から抽出され、生成プロセスに統合された2Dポーズおよびコンテキストキューに条件付けされる。当社のモデルは、さまざまな環境をカバーする人々の野生の写真セットを含むデータセットでトレーニングされています。本手法は定量的かつ定性的に分析され,ESPがコンテキストフルボディ生成のタスクにおいて最先端の処理性能を発揮することを示す。

Despite significant progress in generative image synthesis and full-body generation in particular, state-of-the-art methods are either context-independent, overly reliant to text prompts, or bound to the curated training datasets, such as fashion images with monotonous backgrounds. Here, our goal is to generate people in clothing that is semantically appropriate for a given scene. To this end, we present ESP, a novel method for context-aware full-body generation, that enables photo-realistic inpainting of people into existing "in-the-wild" photographs. ESP is conditioned on a 2D pose and contextual cues that are extracted from the environment photograph and integrated into the generation process. Our models are trained on a dataset containing a set of in-the-wild photographs of people covering a wide range of different environments. The method is analyzed quantitatively and qualitatively, and we show that ESP outperforms state-of-the-art on the task of contextual full-body generation.

翻訳日:2023-12-25 15:31:08 公開日:2023-12-22

# PoseViNet:多視点ポス推定と視覚変換器を用いたドライバ動作認識フレームワーク

PoseViNet: Distracted Driver Action Recognition Framework Using Multi-View Pose Estimation and Vision Transformer ( http://arxiv.org/abs/2312.14577v1 )

ライセンス: Link先を確認

Neha Sengar, Indra Kumari, Jihui Lee, Dongsoo Har

(参考訳) 交通事故の主な原因は運転注意障害である。高速道路交通安全局(NHTSA)が実施した調査では、車内メニューとの対話、食事や飲み物の消費、車両の運転中の電話による会話など、運転者の注意をそらす重要な要因となっている。そこで本研究では,マルチビュー・ドライバ・アクション画像を用いたドライバの注意散逸検出手法を提案する。提案手法は,ポーズ推定とアクション推論,すなわち PoseViNet を用いた視覚変換器ベースのフレームワークである。姿勢情報を追加する動機は、トランスフォーマーが重要な機能に集中できるようにすることである。その結果、フレームワークは重要なアクションを特定するのにより適しています。提案するフレームワークは,ドライバの挙動を表すsfd3データセットを用いて,さまざまな最先端モデルと比較する。比較の結果,PoseViNetはこれらのモデルより優れていることがわかった。提案フレームワークは,運転者の行動を表すSynDD1データセットを用いて評価する。その結果、PoseViNetは、難しいデータセットで97.55%の検証精度と90.92%のテスト精度を達成した。

Driver distraction is a principal cause of traffic accidents. In a study conducted by the National Highway Traffic Safety Administration, engaging in activities such as interacting with in-car menus, consuming food or beverages, or engaging in telephonic conversations while operating a vehicle can be significant sources of driver distraction. From this viewpoint, this paper introduces a novel method for detection of driver distraction using multi-view driver action images. The proposed method is a vision transformer-based framework with pose estimation and action inference, namely PoseViNet. The motivation for adding posture information is to enable the transformer to focus more on key features. As a result, the framework is more adept at identifying critical actions. The proposed framework is compared with various state-of-the-art models using SFD3 dataset representing 10 behaviors of drivers. It is found from the comparison that the PoseViNet outperforms these models. The proposed framework is also evaluated with the SynDD1 dataset representing 16 behaviors of driver. As a result, the PoseViNet achieves 97.55% validation accuracy and 90.92% testing accuracy with the challenging dataset.

翻訳日:2023-12-25 15:30:49 公開日:2023-12-22

# mmgpl:グラフプロンプト学習を用いたマルチモーダル医療データ分析

MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning ( http://arxiv.org/abs/2312.14574v1 )

ライセンス: Link先を確認

Liang Peng, Songyue Cai, Zongqian Wu, Huifang Shang, Xiaofeng Zhu, and Xiaoxiao Li

(参考訳) プロンプト学習は、広範囲の下流タスクに対するマルチモーダル大モデルの微調整において顕著な効果を示した。それにもかかわらず、神経障害の診断に既存の即興学習法を適用することには、2つの問題がある。 (i)既存の方法では、神経イメージングにおいて少数のパッチだけが疾患に関連するにもかかわらず、すべてのパッチを平等に扱うのが一般的である。 (ii)神経障害の理解と診断に不可欠である脳接続ネットワークに内在する構造情報を無視する。そこで本研究では,神経疾患の診断のためのマルチモーダル大規模モデルの微調整過程において,グラフプロンプトを学習することで,新しいプロンプト学習モデルを提案する。具体的には、まずgpt-4を用いて関連する疾患概念を取得し、これらの概念とすべてのパッチ間の意味的類似性を計算する。第2に,各パッチと疾患関連概念間の意味的類似性に応じて,無関係パッチの重みを低減させる。さらに,これらの概念に基づいてトークン間のグラフを構築し,グラフ畳み込みネットワーク層を用いてグラフの構造情報を抽出する。広範にわたる実験により,神経疾患の診断において,最先端の方法と比較して優れた成績が得られ,臨床医が検証した。

Prompt learning has demonstrated impressive efficacy in the fine-tuning of multimodal large models to a wide range of downstream tasks. Nonetheless, applying existing prompt learning methods for the diagnosis of neurological disorder still suffers from two issues: (i) existing methods typically treat all patches equally, despite the fact that only a small number of patches in neuroimaging are relevant to the disease, and (ii) they ignore the structural information inherent in the brain connection network which is crucial for understanding and diagnosing neurological disorders. To tackle these issues, we introduce a novel prompt learning model by learning graph prompts during the fine-tuning process of multimodal large models for diagnosing neurological disorders. Specifically, we first leverage GPT-4 to obtain relevant disease concepts and compute semantic similarity between these concepts and all patches. Secondly, we reduce the weight of irrelevant patches according to the semantic similarity between each patch and disease-related concepts. Moreover, we construct a graph among tokens based on these concepts and employ a graph convolutional network layer to extract the structural information of the graph, which is used to prompt the pre-trained multimodal large models for diagnosing neurological disorders. Extensive experiments demonstrate that our method achieves superior performance for neurological disorder diagnosis compared with state-of-the-art methods and validated by clinicians.

翻訳日:2023-12-25 15:30:33 公開日:2023-12-22

# gromov-wasserstein距離の半定値緩和

Semidefinite Relaxations of the Gromov-Wasserstein Distance ( http://arxiv.org/abs/2312.14572v1 )

ライセンス: Link先を確認

Junyu Chen, Binh T. Nguyen, Yong Sheng Soh

(参考訳) グロモフ=ワッセルシュタイン距離(gromov-wasserstein distance)は、可比較空間間の対象をマッチングできる最適な輸送問題の変種である。その中核では、GW距離は非凸二次プログラムの解として指定されており、解けないことは知られていない。特に、GW距離の既存の解法は局所最適解のみを見つけることができる。本稿では,GW距離の半定値プログラミング(SDP)緩和を提案する。緩和は、輸送写像の線型項と二次項を関連付ける制約によって拡張されたgw距離の双対と見なすことができる。我々の緩和は、任意の輸送写像の近似比を大域最適解に計算する原理的な方法を提供する。最後に,数値実験により,大域的最適解を頻繁に計算し,大域的最適性の証明を行うことで,提案する緩和が強いことが示唆された。

The Gromov-Wasserstein (GW) distance is a variant of the optimal transport problem that allows one to match objects between incomparable spaces. At its core, the GW distance is specified as the solution of a non-convex quadratic program and is not known to be tractable to solve. In particular, existing solvers for the GW distance are only able to find locally optimal solutions. In this work, we propose a semi-definite programming (SDP) relaxation of the GW distance. The relaxation can be viewed as the dual of the GW distance augmented with constraints that relate the linear and quadratic terms of transportation maps. Our relaxation provides a principled manner to compute the approximation ratio of any transport map to the global optimal solution. Finally, our numerical experiments suggest that the proposed relaxation is strong in that it frequently computes the global optimal solution, together with a proof of global optimality.

翻訳日:2023-12-25 15:30:06 公開日:2023-12-22

# Data is Moody: プロセスイベントログからデータ修正ルールを発見する

Data is Moody: Discovering Data Modification Rules from Process Event Logs ( http://arxiv.org/abs/2312.14571v1 )

ライセンス: Link先を確認

Marco Bjarne Schuster, Boris Wiegand, Jilles Vreeken

(参考訳) イベントログは、基盤となるビジネスプロセスの振る舞いに関する洞察を得るための強力なソースであるが、既存の作業は主に、イベント属性データを無視しながら、イベントログのアクティビティシーケンス内のパターンを見つけることに焦点を当てている。イベント属性データは、主にイベントの発生とプロセス結果を予測するために使用されるが、その技術状態は、プロセス実行中にイベント属性データがどのように変化するかの簡潔さと解釈可能なルールを無視する。サブグループ発見とルールベースの分類アプローチは、イベントログに存在するシーケンシャルな依存関係をキャプチャする能力に欠けており、プロセスの振る舞いに関する限られた洞察で満足できない結果をもたらす。イベントログが与えられたら、正確で簡潔で解釈可能なif-thenルールを見つけることに興味があります。我々はこの問題をMDL(Minimum Description Length)の原理で定式化し、データについて最も損失のない記述でモデルを選択する。さらに,ルールを効率的に探索するgreedy Moodyアルゴリズムを提案する。合成データと実世界のデータの両方に関する広範な実験により、Moodyはコンパクトで解釈可能なルールを見つけ、正確な発見のためにはほとんどデータを必要としておらず、ノイズに対して堅牢であることを示した。

Although event logs are a powerful source to gain insight about the behavior of the underlying business process, existing work primarily focuses on finding patterns in the activity sequences of an event log, while ignoring event attribute data. Event attribute data has mostly been used to predict event occurrences and process outcome, but the state of the art neglects to mine succinct and interpretable rules how event attribute data changes during process execution. Subgroup discovery and rule-based classification approaches lack the ability to capture the sequential dependencies present in event logs, and thus lead to unsatisfactory results with limited insight into the process behavior. Given an event log, we are interested in finding accurate yet succinct and interpretable if-then rules how the process modifies data. We formalize the problem in terms of the Minimum Description Length (MDL) principle, by which we choose the model with the best lossless description of the data. Additionally, we propose the greedy Moody algorithm to efficiently search for rules. By extensive experiments on both synthetic and real-world data, we show Moody indeed finds compact and interpretable rules, needs little data for accurate discovery, and is robust to noise.

翻訳日:2023-12-25 15:29:51 公開日:2023-12-22

# BSS-Bench: 再現性と効果的なバンド選択検索を目指して

BSS-Bench: Towards Reproducible and Effective Band Selection Search ( http://arxiv.org/abs/2312.14570v1 )

ライセンス: Link先を確認

Wenshuai Xu, Zhenbo Xu

(参考訳) ハイパースペクトルイメージングの欠点(拡張性、高キャプチャ遅延、低空間分解能)を克服し、数百のバンドから少数の代表バンドのみを選択することが、広く適用できるようにする鍵となる技術である。しかしながら、現在のバンド選択(BS)手法は、バンド数、データセット分割、再トレーニング設定など、一貫性のないトレイン/バリデーション設定のため、公正な比較において課題に直面している。本稿では,BS法を簡易かつ再現可能なものにするために,52kのトレーニングを含むBSS-Benchベンチマーク(BSS-Bench)を提案する。 BSS-Benchの開発には1.26kのGPU日を要した。 bss-benchをクエリすることで、bs実験を簡単かつ再現可能とし、検索結果と最良性能とのギャップを測定することができる。 BSS-Benchに基づいて、帯域数、教師なし統計、異なるバックボーンなど、BSに対する様々な要因の影響をさらに議論する。 bss-bench に加えて single combination one shot (scos) と呼ばれる有効な単発bs法を提案する。さらに、SCOSの探索プロセスは柔軟であり、訓練を必要とせず、効率的かつ効果的である。 SCOSは、帯域がはるかに少ない場合でも、複数のタスクにおいて現在のBS法よりも優れていることを示す。私たちのBSS-Benchとコードは補足資料で利用可能で、公開されます。

The key technology to overcome the drawbacks of hyperspectral imaging (expensive, high capture delay, and low spatial resolution) and make it widely applicable is to select only a few representative bands from hundreds of bands. However, current band selection (BS) methods face challenges in fair comparisons due to inconsistent train/validation settings, including the number of bands, dataset splits, and retraining settings. To make BS methods easy and reproducible, this paper presents the first band selection search benchmark (BSS-Bench) containing 52k training and evaluation records of numerous band combinations (BC) with different backbones for various hyperspectral analysis tasks. The creation of BSS-Bench required a significant computational effort of 1.26k GPU days. By querying BSS-Bench, BS experiments can be performed easily and reproducibly, and the gap between the searched result and the best achievable performance can be measured. Based on BSS-Bench, we further discuss the impact of various factors on BS, such as the number of bands, unsupervised statistics, and different backbones. In addition to BSS-Bench, we present an effective one-shot BS method called Single Combination One Shot (SCOS), which learns the priority of any BCs through one-time training, eliminating the need for repetitive retraining on different BCs. Furthermore, the search process of SCOS is flexible and does not require training, making it efficient and effective. Our extensive evaluations demonstrate that SCOS outperforms current BS methods on multiple tasks, even with much fewer bands. Our BSS-Bench and codes are available in the supplementary material and will be publicly available.

翻訳日:2023-12-25 15:29:31 公開日:2023-12-22

# 正規化フローを用いた新しい音声生成

Creating New Voices using Normalizing Flows ( http://arxiv.org/abs/2312.14569v1 )

ライセンス: Link先を確認

Piotr Bilinski, Thomas Merritt, Abdelhamid Ezzerg, Kamil Pokora, Sebastian Cygert, Kayoko Yanagisawa, Roberto Barra-Chicote, Daniel Korzekwa

(参考訳) 現実的で自然な合成音声を作ることは、訓練中に見つからない音声のアイデンティティにとって大きな課題だ。新たな話者の音声合成への関心が高まっているため,本研究では,学習中に観察された話者から外挿し,未知の話者識別を作成するために,テキスト音声(TTS)と音声変換(VC)モードのフローを正規化する能力について検討する。まず、TSとVCのアプローチを作成し、その上で、インテリジェンス、自然性、話者の類似性、新しい音声を生成する能力の観点から、私たちの方法とベースラインを包括的に評価します。目的と主観の両方を用いて、ゼロショットと新しい音声合成という2つの評価課題にテクニックをベンチマークする。前者のタスクの目標は、目に見えない声への変換の精度を測定することである。後者の目的は、新しい声を作り出す能力を測定することである。広範評価により,提案手法はゼロショット音声合成における最先端性能を体系的に獲得し,トレーニングセットにない様々な新しい音声を生成できることが示されている。本研究は,MTSおよびVCモードの総合的な分析と比較とともに,メルスペクトルと正規化フローに基づく新しい音声を合成する最初の試みであると考えている。

Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we create an approach for TTS and VC, and then we comprehensively evaluate our methods and baselines in terms of intelligibility, naturalness, speaker similarity, and ability to create new voices. We use both objective and subjective metrics to benchmark our techniques on 2 evaluation tasks: zero-shot and new voice speech synthesis. The goal of the former task is to measure the precision of the conversion to an unseen voice. The goal of the latter is to measure the ability to create new voices. Extensive evaluations demonstrate that the proposed approach systematically allows to obtain state-of-the-art performance in zero-shot speech synthesis and creates various new voices, unobserved in the training set. We consider this work to be the first attempt to synthesize new voices based on mel-spectrograms and normalizing flows, along with a comprehensive analysis and comparison of the TTS and VC modes.

翻訳日:2023-12-25 15:28:41 公開日:2023-12-22

# 異方性勾配雑音下での確率重ボール法の加速収束

Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise ( http://arxiv.org/abs/2312.14567v1 )

ライセンス: Link先を確認

Rui Pan, Yuxing Liu, Xiaoyu Wang, Tong Zhang

(参考訳) 学習速度が減衰する重い球運動量は、深層学習モデルの最適化にSGDで広く利用されている。経験的人気とは対照的に、理論的な性質の理解は、特に二次回帰問題に対する標準異方性勾配雑音条件下では、まだかなり限られている。重い球運動量法は加速収束を提供し、大きなバッチ設定でうまく機能すると広く推測されているが、厳密な理論的解析は存在しない。本稿では,2次目的のステップ減衰スケジューラを用いた確率重畳法における非漸近収束境界を異方性勾配雑音条件下で確立することにより,この理論的ギャップを埋める。直接の含意として、重球運動量は、確率的分散項に関してほぼ最適収束速度を保ちながら、sgdのバイアス項の収束を加速させながら、$\tilde{\mathcal{o}}(\sqrt{\kappa})が得られることを示した。この組み合わせ効果は、統計ミニマックスレートからログ係数内の全体的な収束率を意味する。つまり、重い球運動量を持つSGDは、分散機械学習やフェデレーション学習のような大規模なバッチ設定において有用である。

Heavy-ball momentum with decaying learning rates is widely used with SGD for optimizing deep learning models. In contrast to its empirical popularity, the understanding of its theoretical property is still quite limited, especially under the standard anisotropic gradient noise condition for quadratic regression problems. Although it is widely conjectured that heavy-ball momentum method can provide accelerated convergence and should work well in large batch settings, there is no rigorous theoretical analysis. In this paper, we fill this theoretical gap by establishing a non-asymptotic convergence bound for stochastic heavy-ball methods with step decay scheduler on quadratic objectives, under the anisotropic gradient noise condition. As a direct implication, we show that heavy-ball momentum can provide $\tilde{\mathcal{O}}(\sqrt{\kappa})$ accelerated convergence of the bias term of SGD while still achieving near-optimal convergence rate with respect to the stochastic variance term. The combined effect implies an overall convergence rate within log factors from the statistical minimax rate. This means SGD with heavy-ball momentum is useful in the large-batch settings such as distributed machine learning or federated learning, where a smaller number of iterations can significantly reduce the number of communication rounds, leading to acceleration in practice.

翻訳日:2023-12-25 15:28:17 公開日:2023-12-22

# 人間監視の経済学 : ノームとインセンティブがAI労働者のコストとパフォーマンスに与える影響

The Economics of Human Oversight: How Norms and Incentives Affect Costs and Performance of AI Workers ( http://arxiv.org/abs/2312.14565v1 )

ライセンス: Link先を確認

Johann Laux, Fabian Stephany, Alice Liefgreen

(参考訳) aiアプリケーションの世界的急増は、業界を変革させ、既存の雇用の移転と補完をもたらしつつ、新しい雇用機会を生み出している。 AIの人間監督は、人間の労働者がAIモデルと対話して、そのパフォーマンス、安全性、規範的原則の遵守を改善する、新たなタスクである。画像のラベル付けやテキストの注釈を含むデータアノテーションは、データセットの品質がトレーニングされたaiモデルの品質に直接影響を与えるため、人間の監視プロセスとして重要な役割を果たす。したがって、人間の監視作業の効率性は、AI開発者にとって重要な競争上の優位性である。本稿では,データ品質とコストに対する規範設計と金銭的インセンティブの影響に着目し,人間の監視の基礎経済学を考察する。 307データアノテータを含む実験では、様々なタスク指示(ノーム)と金銭インセンティブを持つ6つのグループを調べている。その結果,明確な規則を付した注釈は高い精度を示し,あいまいな基準を14%上回った。同様に、追加の金銭インセンティブを受けるアノテータは、明確な規則とインセンティブの両方で作業するグループで記録された最高精度(87.5%の精度)により、大幅に向上する。しかしながら、両グループはタスク完了により多くの時間を必要としており、標準で作業する人に比べて平均タスク完了時間が31%増加し、インセンティブがない。これらの実証的な発見は、データキュレーションにおけるデータ品質と効率のトレードオフを強調し、標準設計の微妙な影響とAI開発の経済性に対するインセンティブに光を当てている。この論文は、AI技術の経済的、倫理的、法的考察に関する議論に実験的知見を貢献する。

The global surge in AI applications is transforming industries, leading to displacement and complementation of existing jobs, while also giving rise to new employment opportunities. Human oversight of AI is an emerging task in which human workers interact with an AI model to improve its performance, safety, and compliance with normative principles. Data annotation, encompassing the labelling of images or annotating of texts, serves as a critical human oversight process, as the quality of a dataset directly influences the quality of AI models trained on it. Therefore, the efficiency of human oversight work stands as an important competitive advantage for AI developers. This paper delves into the foundational economics of human oversight, with a specific focus on the impact of norm design and monetary incentives on data quality and costs. An experimental study involving 307 data annotators examines six groups with varying task instructions (norms) and monetary incentives. Results reveal that annotators provided with clear rules exhibit higher accuracy rates, outperforming those with vague standards by 14%. Similarly, annotators receiving an additional monetary incentive perform significantly better, with the highest accuracy rate recorded in the group working with both clear rules and incentives (87.5% accuracy). However, both groups require more time to complete tasks, with a 31% increase in average task completion time compared to those working with standards and no incentives. These empirical findings underscore the trade-off between data quality and efficiency in data curation, shedding light on the nuanced impact of norm design and incentives on the economics of AI development. The paper contributes experimental insights to discussions on the economical, ethical, and legal considerations of AI technologies.

翻訳日:2023-12-25 15:27:46 公開日:2023-12-22

# 複数の専門家によるオンラインカバレッジ

Online Covering with Multiple Experts ( http://arxiv.org/abs/2312.14564v1 )

ライセンス: Link先を確認

Enik\H{o} Kevi and Kim-Thang Nguyen

(参考訳) 機械学習の予測でオンラインアルゴリズムを設計することは、スケジューリング、キャッシュ、クラスタリング、スキーレンタルなど、様々な実践的なオンライン問題に対する最悪のパラダイムを越えている。従来の学習強化アルゴリズムのアプローチでは,単一オラクルの予測の統合に重点を置いていたが,オンラインアルゴリズムの設計をemph{multiple}の専門家と検討した。静的なベストエキスパートの人気のあるベンチマークを越えて、新しい \emph{dynamic}ベンチマーク(時間とともに変化する予測の線形組み合わせ)を提案します。我々は,新しい動的ベンチマークにおいて,専門家数を$k$とし,オンライン最適化問題に対して$0-1$の性能保証を行うアルゴリズムを提案する。さらに,マルチエキスパートアプローチは,オンラインアルゴリズム研究コミュニティにおける長年の中心的なテーマである,いくつかのオンラインアルゴリズムをオンラインに結合する方法に関する新たな視点を提供する。

Designing online algorithms with machine learning predictions is a recent technique beyond the worst-case paradigm for various practically relevant online problems (scheduling, caching, clustering, ski rental, etc.). While most previous learning-augmented algorithm approaches focus on integrating the predictions of a single oracle, we study the design of online algorithms with \emph{multiple} experts. To go beyond the popular benchmark of a static best expert in hindsight, we propose a new \emph{dynamic} benchmark (linear combinations of predictions that change over time). We present a competitive algorithm in the new dynamic benchmark with a performance guarantee of $O(\log K)$, where $K$ is the number of experts, for $0-1$ online optimization problems. Furthermore, our multiple-expert approach provides a new perspective on how to combine in an online manner several online algorithms - a long-standing central subject in the online algorithm research community.

翻訳日:2023-12-25 15:26:55 公開日:2023-12-22

# 不特定区間における自然運転行動の交通再建と解析

Traffic Reconstruction and Analysis of Natural Driving Behaviors at Unsignalized Intersections ( http://arxiv.org/abs/2312.14561v1 )

ライセンス: Link先を確認

Supriya Sarker, Bibek Poudel, Michael Villarreal, Weizi Li

(参考訳) 本稿では,SUMOにおける手動ビデオデータラベリングと高度な交通シミュレーションを組み合わせた,新しいデータセットのレンズを通して,信号のない交差点における交通行動の複雑さについて検討する。この研究は、TNのメンフィスにある様々な無署名の交差点で、その日の異なる時間に交通を記録していた。ビデオデータを手動でラベル付けして特定の変数をキャプチャした後,SUMOシミュレーション環境におけるトラフィックシナリオを再構築した。これらのシミュレーションからの出力データは、車両移動の時間空間図、走行時間周波数分布、ボトルネック点を特定するための速度配置プロットを含む包括的な分析を提供した。このアプローチは、トラフィックダイナミクスの理解を深め、効果的なトラフィック管理とインフラ改善のための重要な洞察を提供する。

This paper explores the intricacies of traffic behavior at unsignalized intersections through the lens of a novel dataset, combining manual video data labeling and advanced traffic simulation in SUMO. This research involved recording traffic at various unsignalized intersections in Memphis, TN, during different times of the day. After manually labeling video data to capture specific variables, we reconstructed traffic scenarios in the SUMO simulation environment. The output data from these simulations offered a comprehensive analysis, including time-space diagrams for vehicle movement, travel time frequency distributions, and speed-position plots to identify bottleneck points. This approach enhances our understanding of traffic dynamics, providing crucial insights for effective traffic management and infrastructure improvements.

翻訳日:2023-12-25 15:26:29 公開日:2023-12-22

# Aurora:インストラクション・チューニングによるMistral-8x7Bスパースミキサーのための中国語チャット機能の活性化

Aurora:Activating Chinese chat capability for Mistral-8x7B sparse Mixture-of-Experts through Instruction-Tuning ( http://arxiv.org/abs/2312.14557v1 )

ライセンス: Link先を確認

Rongsheng Wang, Haoming Chen, Ruizhe Zhou, Yaofei Duan, Kunyan Cai, Han Ma, Jiaxi Cui, Jian Li, Patrick Cheong-Iao Pang, Yapeng Wang, Tao Tan

(参考訳) 既存の研究では、機械が生成する命令追従データを利用して大言語モデル(LLM)を精細化することで、人間が許可する命令を必要とせず、新しいタスクに対して印象的なゼロショット能力を発揮することが実証されている。本稿では,Mixtral-8x7B sparse Mixture-of-Experts モデルの中国語会話能力向上を目的とした,中国語の命令追従データセットの体系化,事前処理,統合を行う。この慎重に処理されたデータセットを微調整することで、Mixtral-8x7Bのスパースミクチャー・オブ・エクスプローラモデル"Aurora"の構築に成功した。オーロラの性能を評価するために,C-Eval, MMLU, CMMLUの3つのベンチマークテストを利用する。 Mixtral-8x7B sparse Mixture-of-Experts モデルに適用した命令微調整の有効性を実証研究により検証した。この研究は、スパースなエキスパート混合モデルにおける命令の微調整の実行において先駆的であり、このモデルアーキテクチャの能力向上において重要なブレークスルーとなった。私たちのコード、データ、モデルは、https://github.com/WangRongsheng/Auroraで公開されています。

Existing research has demonstrated that refining large language models (LLMs) through the utilization of machine-generated instruction-following data empowers these models to exhibit impressive zero-shot capabilities for novel tasks, without requiring human-authored instructions. In this paper, we systematically investigate, preprocess, and integrate three Chinese instruction-following datasets with the aim of enhancing the Chinese conversational capabilities of Mixtral-8x7B sparse Mixture-of-Experts model. Through instruction fine-tuning on this carefully processed dataset, we successfully construct the Mixtral-8x7B sparse Mixture-of-Experts model named "Aurora." To assess the performance of Aurora, we utilize three widely recognized benchmark tests: C-Eval, MMLU, and CMMLU. Empirical studies validate the effectiveness of instruction fine-tuning applied to Mixtral-8x7B sparse Mixture-of-Experts model. This work is pioneering in the execution of instruction fine-tuning on a sparse expert-mixed model, marking a significant breakthrough in enhancing the capabilities of this model architecture. Our code, data and model are publicly available at: https://github.com/WangRongsheng/Aurora

翻訳日:2023-12-25 15:25:50 公開日:2023-12-22

# 仮想アシスタントのセキュリティとプライバシーリスク姿勢の評価

Evaluating the Security and Privacy Risk Postures of Virtual Assistants ( http://arxiv.org/abs/2312.14633v1 )

ライセンス: Link先を確認

Borna Kalhor, Sanchari Das

(参考訳) 仮想アシスタント(VA)は、日々の作業で使いやすくなっているため、近年利用が増えている。その普及にもかかわらず、セキュリティとプライバシーの影響はいまだによく分かっていない。このギャップに対処するために、Alexa、Braina、Cortana、Google Assistant、Kalliope、Mycroft、Hound、Extremeの8つの広く使われている音声アシスタントのセキュリティとプライバシーの姿勢を評価する調査を行った。私たちは3つの脆弱性テストツール、AndroBugs, RiskInDroid, MobSFを使って、これらのVAのセキュリティとプライバシを評価しました。分析は、コード、アクセス制御、トラッキング、バイナリ分析、機密データ機密性の5つの領域に焦点を当てた。その結果、これらのVAはSSL証明書の検証、生のSQLクエリの実行、AESアルゴリズムの弱いモードの使用など、さまざまなセキュリティ脅威に対して脆弱であることが判明した。これらの脆弱性は、悪意のあるアクターがユーザーの個人情報を不正にアクセスできるようにする可能性がある。この研究は、これらの技術に関連するリスクを理解するための第一歩であり、より安全でプライバシーを尊重するVAを開発するための将来の研究の基礎を提供する。

Virtual assistants (VAs) have seen increased use in recent years due to their ease of use for daily tasks. Despite their growing prevalence, their security and privacy implications are still not well understood. To address this gap, we conducted a study to evaluate the security and privacy postures of eight widely used voice assistants: Alexa, Braina, Cortana, Google Assistant, Kalliope, Mycroft, Hound, and Extreme. We used three vulnerability testing tools, AndroBugs, RiskInDroid, and MobSF, to assess the security and privacy of these VAs. Our analysis focused on five areas: code, access control, tracking, binary analysis, and sensitive data confidentiality. The results revealed that these VAs are vulnerable to a range of security threats, including not validating SSL certificates, executing raw SQL queries, and using a weak mode of the AES algorithm. These vulnerabilities could allow malicious actors to gain unauthorized access to users' personal information. This study is a first step toward understanding the risks associated with these technologies and provides a foundation for future research to develop more secure and privacy-respecting VAs.

翻訳日:2023-12-25 15:18:47 公開日:2023-12-22

# メタバース検索を可能にする言語ベースのソリューション

A Language-based solution to enable Metaverse Retrieval ( http://arxiv.org/abs/2312.14630v1 )

ライセンス: Link先を確認

Ali Abdari, Alex Falcon, Giuseppe Serra

(参考訳) 最近、Metaverseはますます魅力的になり、数百万のユーザーが利用可能なバーチャルワールドにアクセスしている。しかし、ユーザが現在の関心に最も合うMetaverseを見つけるには、どうすればよいのか? これまでのところ、検索のプロセスは主に口コミか、あるいはテクノロジー指向のウェブサイトの広告によって行われている。しかし、他のマルチメディアフォーマット(例えばビデオ用youtube)で利用可能な検索エンジンの欠如は、その限界を示している。この制限に対処するため,我々はユーザが求めるメタバースの所望の内容を自然に記述する言語を提案する。第2に,従来の3Dシーンとは違って,Metaverseのシナリオは,シナリオ自体のユーザクエリとの関連性に影響を与える,複数のタイプのマルチメディアを含むことが多いため,より複雑なデータフォーマットを表現する。そこで本研究では,テキストデータとのクロスモーダル関係を考慮しつつ,これらの側面をモデル化することを目的とした,テキスト対メタバース検索と呼ばれる新しいタスクを作成する。我々は,この問題に最初に取り組む人物であるため,マルチメディアコンテンツに富んだ3Dシーンで構成された33000のメタバースのデータセットも収集する。最後に、コントラスト学習に基づくディープラーニングフレームワークの設計と実装を行い、徹底的な実験的なセットアップを実現する。

Recently, the Metaverse is becoming increasingly attractive, with millions of users accessing the many available virtual worlds. However, how do users find the one Metaverse which best fits their current interests? So far, the search process is mostly done by word of mouth, or by advertisement on technology-oriented websites. However, the lack of search engines similar to those available for other multimedia formats (e.g., YouTube for videos) is showing its limitations, since it is often cumbersome to find a Metaverse based on some specific interests using the available methods, while also making it difficult to discover user-created ones which lack strong advertisement. To address this limitation, we propose to use language to naturally describe the desired contents of the Metaverse a user wishes to find. Second, we highlight that, differently from more conventional 3D scenes, Metaverse scenarios represent a more complex data format since they often contain one or more types of multimedia which influence the relevance of the scenario itself to a user query. Therefore, in this work, we create a novel task, called Text-to-Metaverse retrieval, which aims at modeling these aspects while also taking the cross-modal relations with the textual data into account. Since we are the first ones to tackle this problem, we also collect a dataset of 33000 Metaverses, each of which consists of a 3D scene enriched with multimedia content. Finally, we design and implement a deep learning framework based on contrastive learning, resulting in a thorough experimental setup.

翻訳日:2023-12-25 15:18:27 公開日:2023-12-22

# 二層グラフェン量子ドットと高インピーダンスマイクロ波共振器の双極子結合

Dipole coupling of a bilayer graphene quantum dot to a high-impedance microwave resonator ( http://arxiv.org/abs/2312.14629v1 )

ライセンス: Link先を確認

Max J. Ruckriegel, Lisa M. G\"achter, David Kealhofer, Mohsen Bahrami Panah, Chuyao Tong, Christoph Adam, Michele Masseroni, Hadrien Duprez, Rebekka Garreis, Kenji Watanabe, Takashi Taniguchi, Andreas Wallraff, Thomas Ihn, Klaus Ensslin, and Wei Wister Huang

(参考訳) 二層グラフェンの量子ドットを用いた回路量子電磁力学 (cQED) を実装し, 長いスピン状態とバレー状態を持つ半導体量子ビットの成熟材料プラットフォームである。本装置は、高インピーダンス(z_\mathrm{r} \approx 1 \mathrm{k{\omega}}$)超伝導マイクロ波共振器と、グラフェン系ファンデルワールスヘテロ構造において静電的に定義される二重量子ドットとを結合する。サブシステム間の電気双極子結合により、共振器は電荷安定図を再構成する二重量子ドットの電気感受性を感知することができる。 1${\mu}\mathrm{s}$の積分時間で信号対雑音比3.5で感度の高い高速検出を実現する。電荷-光子相互作用は、入力出力理論に対する共振器応答のカップリングによる変化を比較し、最大結合強度は$g/2{\pi} = 49.7 \mathrm{MHz}$である。本研究は,ファンデルワールス材料の量子ドットのプローブとしてcqedを導入し,二層グラフェン量子ドットとのコヒーレント電荷-光子カップリングへの道を示す。

We implement circuit quantum electrodynamics (cQED) with quantum dots in bilayer graphene, a maturing material platform for semiconductor qubits that can host long-lived spin and valley states. The presented device combines a high-impedance ($Z_\mathrm{r} \approx 1 \mathrm{k{\Omega}}$) superconducting microwave resonator with a double quantum dot electrostatically defined in a graphene-based van der Waals heterostructure. Electric dipole coupling between the subsystems allows the resonator to sense the electric susceptibility of the double quantum dot from which we reconstruct its charge stability diagram. We achieve sensitive and fast detection with a signal-to-noise ratio of 3.5 within 1 ${\mu}\mathrm{s}$ integration time. The charge-photon interaction is quantified in the dispersive and resonant regimes by comparing the coupling-induced change in the resonator response to input-output theory, yielding a maximal coupling strength of $g/2{\pi} = 49.7 \mathrm{MHz}$. Our results introduce cQED as a probe for quantum dots in van der Waals materials and indicate a path toward coherent charge-photon coupling with bilayer graphene quantum dots.

翻訳日:2023-12-25 15:18:04 公開日:2023-12-22

# cross silofederated learningとanalyticsによる,より持続可能なエンタープライズデータとアプリケーション管理に向けて

Towards more sustainable enterprise data and application management with cross silo Federated Learning and Analytics ( http://arxiv.org/abs/2312.14628v1 )

ライセンス: Link先を確認

Hongliu Cao

(参考訳) プライバシ保護にコミットする新たな法的要件とポリシーに従うために、複数のクライアント/サイロが中央サーバの調整の下でグローバルモデルを協調的にトレーニングするグローバルスケールで、クロスサイロフェデレーション学習を展開する企業がますます増えている。データ共有と送信の代わりに、クライアントはプライベートなローカルデータと交換モデルのアップデートを使ってモデルをトレーニングする。しかし,関連研究の欠如により,クロスサイロ連関学習の炭素排出への影響についてはほとんど理解されていない。本研究では,モデル学習のみに焦点をあてるのではなく,AI製品ライフサイクル全体にわたって,クロスサイロ・フェデレートラーニングの持続可能性の側面を,集中型手法と比較して分析する。実世界のクロスサイロ・フェデレート・ラーニング・セッティングのためのより包括的な量的コストとCO2排出量推定手法を提案する。第2に,クロスサイロ連関学習と分析によるit企業の持続性とコスト効率の向上を目的とした,新たなデータ・アプリケーション管理システムを提案する。

To comply with new legal requirements and policies committed to privacy protection, more and more companies start to deploy cross-silo Federated Learning at global scale, where several clients/silos collaboratively train a global model under the coordination of a central server. Instead of data sharing and transmission, clients train models using their private local data and exchange model updates. However, there is little understanding of the carbon emission impact of cross silo Federated Learning due to the lack of related works. In this study, we first analyze the sustainability aspect of cross-silo Federated Learning, across the AI product life cycle instead of focusing only on the model training, with the comparison to the centralized method. A more holistic quantitative cost and CO2 emission estimation method for real world cross-silo Federated Learning setting is proposed. Secondly, we propose a novel data and application management system using cross silo Federated Learning and analytics to make IT companies more sustainable and cost effective.

翻訳日:2023-12-25 15:17:41 公開日:2023-12-22

# DSAP:データセットのデモグラフィック比較によるバイアスの分析

DSAP: Analyzing Bias Through Demographic Comparison of Datasets ( http://arxiv.org/abs/2312.14626v1 )

ライセンス: Link先を確認

Iris Dominguez-Catena, Daniel Paternain, Mikel Galar

(参考訳) ここ数年、人工知能システムはますます普及している。残念ながら、これらのシステムは人口統計バイアスを含む人間の意思決定と多くのバイアスを共有することができる。多くの場合、これらのバイアスはトレーニングに使用されるデータまで遡ることができる。これらのバイアスに関する私たちの知識にもかかわらず、さまざまなデータセットのバイアスを比較するだけでなく、それらのバイアスを検出して定量化するための一般的なツールがありません。そこで本研究では,2つのデータセットの人口構成を比較する2段階の手法であるdsap(demographic similarity from auxiliary profile)を提案する。 dsapは、3つの主要なアプリケーションにデプロイできる: データセットをまたがる人口統計学的盲点とバイアス問題を検出し、特徴付けし、単一のデータセットにおけるデータセットの人口統計バイアスを計測し、デプロイシナリオにおけるデータセットの人口統計学的シフトを計測する。 DSAPの重要な特徴は、明示的な人口統計ラベルなしでデータセットを堅牢に分析し、広範囲の状況に対してシンプルで解釈可能な機能を提供することである。提案手法の有用性を示すために,これまで人口統計学的偏見がみつかっていた表情認識タスクについて検討する。 3つのアプリケーションは、異なる特性を持つ20のデータセットのセットで研究される。コードはhttps://github.com/irisdominguez/dsapで入手できる。

In the last few years, Artificial Intelligence systems have become increasingly widespread. Unfortunately, these systems can share many biases with human decision-making, including demographic biases. Often, these biases can be traced back to the data used for training, where large uncurated datasets have become the norm. Despite our knowledge of these biases, we still lack general tools to detect and quantify them, as well as to compare the biases in different datasets. Thus, in this work, we propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of two datasets. DSAP can be deployed in three key applications: to detect and characterize demographic blind spots and bias issues across datasets, to measure dataset demographic bias in single datasets, and to measure dataset demographic shift in deployment scenarios. An essential feature of DSAP is its ability to robustly analyze datasets without explicit demographic labels, offering simplicity and interpretability for a wide range of situations. To show the usefulness of the proposed methodology, we consider the Facial Expression Recognition task, where demographic bias has previously been found. The three applications are studied over a set of twenty datasets with varying properties. The code is available at https://github.com/irisdominguez/DSAP.

翻訳日:2023-12-25 15:17:22 公開日:2023-12-22

# 階層型マルチエージェント強化学習による交通ネットワークにおける偽データインジェクション攻撃の評価

Hierarchical Multi-Agent Reinforcement Learning for Assessing False-Data Injection Attacks on Transportation Networks ( http://arxiv.org/abs/2312.14625v1 )

ライセンス: Link先を確認

Taha Eghtesad, Sirui Li, Yevgeniy Vorobeychik, Aron Laszka

(参考訳) ナビゲーションアプリケーションへのドライバーの依存が高まり、交通ネットワークは悪意のある俳優によるデータ操作攻撃の影響を受けやすくなった。管理者はデータ収集やナビゲーションサービスの処理の脆弱性を利用して偽情報を注入し、ドライバーの経路選択を妨害することができる。このような攻撃は交通渋滞を著しく増加させ、時間と資源のかなりの浪費をもたらし、道路ネットワークに依存している本質的なサービスを妨害する恐れがある。このような攻撃による脅威を評価するために,輸送ネットワークに対する最悪のデータ注入攻撃を見つけるための計算枠組みを導入する。まず、特定の道路で認識される走行時間を増加させることでドライバーを操作できる脅威俳優と、敵対的なモデルを考案する。次に,階層型マルチエージェント強化学習を用いて,データ操作の最適逆戦略を提案する。 NDネットワークトポロジであるスーフォールズへの攻撃をシミュレーションすることで,本手法の適用性を実証する。

The increasing reliance of drivers on navigation applications has made transportation networks more susceptible to data-manipulation attacks by malicious actors. Adversaries may exploit vulnerabilities in the data collection or processing of navigation services to inject false information, and to thus interfere with the drivers' route selection. Such attacks can significantly increase traffic congestions, resulting in substantial waste of time and resources, and may even disrupt essential services that rely on road networks. To assess the threat posed by such attacks, we introduce a computational framework to find worst-case data-injection attacks against transportation networks. First, we devise an adversarial model with a threat actor who can manipulate drivers by increasing the travel times that they perceive on certain roads. Then, we employ hierarchical multi-agent reinforcement learning to find an approximate optimal adversarial strategy for data manipulation. We demonstrate the applicability of our approach through simulating attacks on the Sioux Falls, ND network topology.

翻訳日:2023-12-25 15:17:01 公開日:2023-12-22

# 変形分解生成モデルによるルースフィッティングガーメントアニメーションに向けて

Towards Loose-Fitting Garment Animation via Generative Model of Deformation Decomposition ( http://arxiv.org/abs/2312.14619v1 )

ライセンス: Link先を確認

Yifu Liu, Xiaoxia Li, Zhiling Luo, Wei Zhou

(参考訳) 既存の衣服アニメーションのデータ駆動手法は、通常直線的なスキニングによって駆動されるが、タイトな衣服では有効だが、複雑な変形を伴うゆったりした衣服をうまく扱わない。これらの制約に対処するために, 線形スキンを直接使用せずに, ゆるい変形を効率的にシミュレートする, 変形分解に基づく衣服生成モデルを開発した。具体的には,提案した生成モデルを用いて衣服生成空間を学習し,遅延表現を変形しない衣服と復号段階における動的オフセットに分離する。明示的な衣服の変形を分解することにより,我々の生成モデルは,標準的な衣服形状に複雑なポーズ駆動の変形を生成することができる。さらに,身体運動と衣服の以前の状態を潜在空間に移し,動的結果を再生することを学ぶ。さらに,高頻度シワを学習するために,敵の訓練装置に詳細拡張モジュールを導入する。提案手法は,大規模実験により最先端のデータ駆動方式よりも優れており,定性的かつ定量的な結果解析が可能である。

Existing data-driven methods for garment animation, usually driven by linear skinning, although effective on tight garments, do not handle loose-fitting garments with complex deformations well. To address these limitations, we develop a garment generative model based on deformation decomposition to efficiently simulate loose garment deformation without directly using linear skinning. Specifically, we learn a garment generative space with the proposed generative model, where we decouple the latent representation into unposed deformed garments and dynamic offsets during the decoding stage. With explicit garment deformations decomposition, our generative model is able to generate complex pose-driven deformations on canonical garment shapes. Furthermore, we learn to transfer the body motions and previous state of the garment to the latent space to regenerate dynamic results. In addition, we introduce a detail enhancement module in an adversarial training setup to learn high-frequency wrinkles. We demonstrate our method outperforms state-of-the-art data-driven alternatives through extensive experiments and show qualitative and quantitative analysis of results.

翻訳日:2023-12-25 15:16:43 公開日:2023-12-22

# 非エルミート行列反復による任意緩和速度

Arbitrary relaxation rate under non-Hermitian matrix iterations ( http://arxiv.org/abs/2312.14617v1 )

ライセンス: Link先を確認

Ja\v{s} Bensa

(参考訳) ブロックウォール(BW)ランダム量子回路における時間外相関(OTOC)を例として,非エルミート移動行列で伝播する可観測物の指数緩和について検討した。系の大きさとしてスケールするまで、観測可能な天体の指数的崩壊は通常、遷移行列の第二の最大の固有値によって決定されないが、一般的には遅く、この緩やかな崩壊速度は「幻の固有値」と呼ばれた。一般に、この緩やかな崩壊は、伝達行列の擬似スペクトルの最大の値によって与えられるが、この減衰率は、擬似スペクトルの第二の最大の固有値と最大の値の間の任意の値であることを示す。この任意の減衰は、例えば周期境界条件BW回路におけるOTOCの伝播において観測できる。この現象を探索するため,両端の2つの貯水池に結合した1次元偏りランダムウォークについて検討し,この単純なシステムはファントム固有値も示していることを示す。

We study the exponential relaxation of observables, propagated with a non-Hermitian transfer matrix, an example being out-of-time-ordered correlations (OTOC) in brickwall (BW) random quantum circuits. Until a time that scales as the system size, the exponential decay of observables is not usually determined by the second largest eigenvalue of the transfer matrix, as one can naively expect, but it is in general slower -- this slower decay rate was dubbed "phantom eigenvalue". Generally, this slower decay is given by the largest value in the pseudospecturm of the transfer matrix, however we show that the decay rate can be an arbitrary value between the second largest eigenvalue and the largest value in the pseudospectrum. This arbitrary decay can be observed for example in the propagation of OTOC in periodic boundary conditions BW circuits. To explore this phenomenon, we study a 1D biased random walk coupled to two reservoirs at the edges, and prove that this simple system also exhibits phantom eigenvalues.

翻訳日:2023-12-25 15:16:24 公開日:2023-12-22

# 行列投影のための固定点アルゴリズムと量子情報への応用

A fixed-point algorithm for matrix projections with applications in quantum information ( http://arxiv.org/abs/2312.14615v1 )

ライセンス: Link先を確認

Shrigyan Brahmachari, Roberto Rubboli, and Marco Tomamichel

(参考訳) 我々は、ある対称性の下で不変な正定値行列の集合上のバーズ距離に関して行列射影を計算する単純な不動点反復アルゴリズムを開発した。固定点反復アルゴリズムは反復数において最適解に指数関数的に早く収束することを示す。さらに、既定半定プログラム解法と比較して高速収束を示す。我々のアルゴリズムは,行列バリセンタの特定の場合において,元来 (\'Alvarez-Esteban et al., 2016) に導入された固定点反復アルゴリズムを復元する。以前の研究と比較すると、我々の証明は単純な行列の不等式のみに基づいており、より一般的で直接的である。最後に,量子資源理論と量子シャノン理論におけるアルゴリズムの応用について述べる。

We develop a simple fixed-point iterative algorithm that computes the matrix projection with respect to the Bures distance on the set of positive definite matrices that are invariant under some symmetry. We prove that the fixed-point iteration algorithm converges exponentially fast to the optimal solution in the number of iterations. Moreover, it numerically shows fast convergence compared to the off-the-shelf semidefinite program solvers. Our algorithm, for the specific case of matrix barycenters, recovers the fixed-point iterative algorithm originally introduced in (\'Alvarez-Esteban et al., 2016). Compared to previous works, our proof is more general and direct as it is based only on simple matrix inequalities. Finally, we discuss several applications of our algorithm in quantum resource theories and quantum Shannon theory.

翻訳日:2023-12-25 15:16:05 公開日:2023-12-22

# 一貫した画像編集のためのチューニングフリーインバージョンエンハンスド制御

Tuning-Free Inversion-Enhanced Control for Consistent Image Editing ( http://arxiv.org/abs/2312.14611v1 )

ライセンス: Link先を確認

Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong Fei, Mingyuan Fan, Junshi Huang

(参考訳) 実際の画像の一貫性のある編集は、アイデンティティや属性を変更することなく、入力画像のメインオブジェクトへの非厳密な編集(例えば姿勢の変更)を行う必要があるため、難しい作業である。一貫性のある属性を保証するために、既存のメソッドは構造的な一貫性のためにモデル全体やテキストの埋め込みを微調整するが、時間がかかり、厳密でない編集を行わない。他にもチューニングフリーな作品もあるが、実世界のシナリオではしばしば失敗するDDIM(Denoising Diffusion Implicit Model)の再構築によってパフォーマンスが低下している。本稿では, インバージョンプロセスの特徴とサンプリングプロセスの特徴を直接相関させて, DDIM再構成の不整合を緩和する, Tuning-free Inversion-enhanced Control (TIC) という新しい手法を提案する。具体的には、本手法は、自己保持層におけるキーおよび値の特徴から反転特徴を効果的に取得し、これらの反転特徴によりサンプリングプロセスを強化し、正確な再構成とコンテンツ一貫性編集を実現する。また,本手法の適用性を一般的な編集シナリオに拡張するために,インバージョンと単純なDDIM編集プロセスの内容を組み合わせたマスク誘導型注意結合戦略を提案する。実験の結果,提案手法は従来の再構成や一貫した編集に優れており,様々な設定で印象的な結果が得られることがわかった。

Consistent editing of real images is a challenging task, as it requires performing non-rigid edits (e.g., changing postures) to the main objects in the input image without changing their identity or attributes. To guarantee consistent attributes, some existing methods fine-tune the entire model or the textual embedding for structural consistency, but they are time-consuming and fail to perform non-rigid edits. Other works are tuning-free, but their performances are weakened by the quality of Denoising Diffusion Implicit Model (DDIM) reconstruction, which often fails in real-world scenarios. In this paper, we present a novel approach called Tuning-free Inversion-enhanced Control (TIC), which directly correlates features from the inversion process with those from the sampling process to mitigate the inconsistency in DDIM reconstruction. Specifically, our method effectively obtains inversion features from the key and value features in the self-attention layers, and enhances the sampling process by these inversion features, thus achieving accurate reconstruction and content-consistent editing. To extend the applicability of our method to general editing scenarios, we also propose a mask-guided attention concatenation strategy that combines contents from both the inversion and the naive DDIM editing processes. Experiments show that the proposed method outperforms previous works in reconstruction and consistent editing, and produces impressive results in various settings.

翻訳日:2023-12-25 15:15:52 公開日:2023-12-22

# BLSTMを用いたエンドツーエンド音声認識のための信頼度推定

BLSTM-Based Confidence Estimation for End-to-End Speech Recognition ( http://arxiv.org/abs/2312.14609v1 )

ライセンス: Link先を確認

Atsunori Ogawa, Naohiro Tawara, Takatomo Kano, Marc Delcroix

(参考訳) 自動音声認識(ASR)における各認識トークン(単語,サブワード,文字など)の信頼度を推定し,誤認識トークンを検知する信頼度推定は,ASRアプリケーションを開発する上で重要な機能である。本研究では,エンド・ツー・エンド(E2E)ASR仮説に対する信頼度推定を行う。最近のE2E ASRシステムは、様々なASRタスクに対して高い性能(例えば、5%のトークンエラー率)を示す。このような状況では、ほとんど正しいトークンシーケンスから頻繁な不正トークンを検出する必要があるため、信頼度推定が困難になる。この不均衡データセット問題に対処するために、クラスバランスの目的を訓練した強力なバイナリクラス(誤り/誤)シーケンスラベスターとして、双方向長短期メモリ(BLSTM)ベースのモデルを用いる。実験により,複数の種類のASR復号化スコアを補助的特徴として利用することにより,高不均衡条件下での信頼性推定性能が着実に向上することが確認された。また,BLSTMに基づくモデルの方がTransformerベースの信頼度推定モデルより優れていることを確認した。

Confidence estimation, in which we estimate the reliability of each recognized token (e.g., word, sub-word, and character) in automatic speech recognition (ASR) hypotheses and detect incorrectly recognized tokens, is an important function for developing ASR applications. In this study, we perform confidence estimation for end-to-end (E2E) ASR hypotheses. Recent E2E ASR systems show high performance (e.g., around 5% token error rates) for various ASR tasks. In such situations, confidence estimation becomes difficult since we need to detect infrequent incorrect tokens from mostly correct token sequences. To tackle this imbalanced dataset problem, we employ a bidirectional long short-term memory (BLSTM)-based model as a strong binary-class (correct/incorrect) sequence labeler that is trained with a class balancing objective. We experimentally confirmed that, by utilizing several types of ASR decoding scores as its auxiliary features, the model steadily shows high confidence estimation performance under highly imbalanced settings. We also confirmed that the BLSTM-based model outperforms Transformer-based confidence estimation models, which greatly underestimate incorrect tokens.

翻訳日:2023-12-25 15:15:28 公開日:2023-12-22

# 進化的部分微分方程式に対処する離散物理学インフォームドニューラルネットワーク

Efficient Discrete Physics-informed Neural Networks for Addressing Evolutionary Partial Differential Equations ( http://arxiv.org/abs/2312.14608v1 )

ライセンス: Link先を確認

Siqi Chen, Bin Shan, Ye Li

(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、ディープラーニングを用いて偏微分方程式(PDE)を解く有望な可能性を示している。しかし、PINNは進化的PDE、特に時間とともに多スケールまたは乱流の挙動を示す動的システムに対する訓練困難に直面している。 PINNの損失の時間的特徴が同時に訓練されているため、PINNが時間的因果性に反する可能性がある。本稿では,時間的因果関係を強制するために暗黙の時間差のスキームを用い,空間内のピンを異なる時間枠のpde解のサロゲートとして逐次更新する転送学習法を提案する。進化するPINNは、隣接する時間フレーム間の小さな更新しか必要とせず、進化方程式の様々な複雑さを捉えることができる。本手法は, 時間ステップが小さく, 異なる時間フレームのピンが十分に訓練されている場合, 理論的に収束することが証明される。さらに、既存のPINNの定式化が失敗したり、非効率であったりする様々なベンチマークに対して、最先端(SOTA)数値結果を提供する。提案手法は,進化的PDEのPINN近似の精度を向上し,効率を4～40倍に向上することを示した。

Physics-informed neural networks (PINNs) have shown promising potential for solving partial differential equations (PDEs) using deep learning. However, PINNs face training difficulties for evolutionary PDEs, particularly for dynamical systems whose solutions exhibit multi-scale or turbulent behavior over time. The reason is that PINNs may violate the temporal causality property since all the temporal features in the PINNs loss are trained simultaneously. This paper proposes to use implicit time differencing schemes to enforce temporal causality, and use transfer learning to sequentially update the PINNs in space as surrogates for PDE solutions in different time frames. The evolving PINNs are better able to capture the varying complexities of the evolutionary equations, while only requiring minor updates between adjacent time frames. Our method is theoretically proven to be convergent if the time step is small and each PINN in different time frames is well-trained. In addition, we provide state-of-the-art (SOTA) numerical results for a variety of benchmarks for which existing PINNs formulations may fail or be inefficient. We demonstrate that the proposed method improves the accuracy of PINNs approximation for evolutionary PDEs and improves efficiency by a factor of 4-40x.

翻訳日:2023-12-25 15:15:03 公開日:2023-12-22

# ChatGPT, Llama, 私のレポートを書いてもらえますか? 現地)大規模言語モデルを用いたデジタル鑑識レポート支援の試み

ChatGPT, Llama, can you write my report? An experiment on assisted digital forensics reports written using (Local) Large Language Models ( http://arxiv.org/abs/2312.14607v1 )

ライセンス: Link先を確認

Ga\"etan Michelet, Frank Breitinger

(参考訳) 生成AI、特にChatGPTやLlamaのような大規模言語モデル(LLM)は、デジタル法医学の貴重なツールとして大きく進歩している。初期の研究では、ChatGPTの可能性を調査しているが、LLMが法医学的な報告書作成プロセスにどの程度役立つかという問題は未解決のままである。この問題に答えるために、この記事はまず、一般化の目的(例えば、レポートの「平均構造」を見つける)で法医学的なレポートを調べる。次に,本報告の異なる部分を生成するためのllmの強みと限界を事例研究を用いて評価する。この研究は、デジタル法医学調査の重要な側面であるレポート作成の自動化に関する貴重な洞察を提供する。本稿では,徹底的な証明読解と修正を組み合わせることで,レポート作成プロセスの実践者を支援することができるが,現時点では置き換えることはできないと結論付けている。

Generative AIs, especially Large Language Models (LLMs) such as ChatGPT or Llama, have advanced significantly, positioning them as valuable tools for digital forensics. While initial studies have explored the potential of ChatGPT in the context of investigations, the question of to what extent LLMs can assist the forensic report writing process remains unresolved. To answer the question, this article first examines forensic reports with the goal of generalization (e.g., finding the `average structure' of a report). We then evaluate the strengths and limitations of LLMs for generating the different parts of the forensic report using a case study. This work thus provides valuable insights into the automation of report writing, a critical facet of digital forensics investigations. We conclude that combined with thorough proofreading and corrections, LLMs may assist practitioners during the report writing process but at this point cannot replace them.

翻訳日:2023-12-25 15:14:38 公開日:2023-12-22

# トランスフォーマチックサリエンシーマップを用いたマルチカメラ3次元物体検出

Explainable Multi-Camera 3D Object Detection with Transformer-Based Saliency Maps ( http://arxiv.org/abs/2312.14606v1 )

ライセンス: Link先を確認

Till Beemelmanns, Wassim Zahr, Lutz Eckstein

(参考訳) 視覚トランスフォーマー(vits)は、3dオブジェクト検出を含む様々なコンピュータビジョンタスクで最先端の結果を得た。しかし、そのエンドツーエンドの実装により、ViTの説明がより簡単になるため、自律運転のような安全クリティカルなアプリケーションにViTをデプロイする上では、その予測の背後にあるモデルの理由を理解することが、当局、開発者、ユーザにとって重要である。本稿では,3次元物体検出に使用される複数のカメラ入力を持つDutR-like ViTのサリエンシマップを生成する手法を提案する。本手法は生の注意に基づく手法であり,勾配法よりも効率的である。提案手法を広範な摂動テストを用いて評価し, 視覚的品質や定量的指標において, 他の説明可能性法よりも優れていることを示す。また,トランスの異なる層にまたがって注意を集結させることの重要性を示す。私たちの研究は、AIモデルの内部動作に関する透明性を確立することによって、AIアプリケーションの信頼性向上に役立つ、ViTのための説明可能なAIの開発に寄与します。

Vision Transformers (ViTs) have achieved state-of-the-art results on various computer vision tasks, including 3D object detection. However, their end-to-end implementation also makes ViTs less explainable, which can be a challenge for deploying them in safety-critical applications, such as autonomous driving, where it is important for authorities, developers, and users to understand the model's reasoning behind its predictions. In this paper, we propose a novel method for generating saliency maps for a DetR-like ViT with multiple camera inputs used for 3D object detection. Our method is based on the raw attention and is more efficient than gradient-based methods. We evaluate the proposed method on the nuScenes dataset using extensive perturbation tests and show that it outperforms other explainability methods in terms of visual quality and quantitative metrics. We also demonstrate the importance of aggregating attention across different layers of the transformer. Our work contributes to the development of explainable AI for ViTs, which can help increase trust in AI applications by establishing more transparency regarding the inner workings of AI models.

翻訳日:2023-12-25 15:14:20 公開日:2023-12-22

# 拒否する理由? 言語モデルと判断の整合

Reasons to Reject? Aligning Language Models with Judgments ( http://arxiv.org/abs/2312.14591v1 )

ライセンス: Link先を確認

Weiwen Xu, Deng Cai, Zhisong Zhang, Wai Lam, Shuming Shi

(参考訳) 人間として、私たちは常に仲間と対話し、自然言語の形でフィードバックを受けます。この言語フィードバックによって、行動の反映、適切な行動の維持、エラーの修正が可能になります。大きな言語モデル(llm)を調整するために、言語フィードバックを使用できますか? llmを報酬や嗜好データと整合させる以前の研究とは対照的に、言語フィードバック(すなわち判断)のレンズを通してアライメントを体系的に探索する最初の研究を示す。我々は,LSMと判断の整合性に適応できる潜在的な方法の詳細な調査を開始し,これらの方法が判断を十分に活用できないことを明らかにした。判断をより効果的に活用するために,判断に基づく不適切な内容の検出と修正を可能にする新しい枠組みであるContrastive Unlikelihood Training (CUT)を提案する。オフラインアライメントの結果は、市販の判断データ1317件だけで、カット(llama2-13b)が175bのdavinci003を上回り、alpacaevalの最高基準を52.34ポイント上回ったことを示している。オンラインアライメントの結果、cut はモデル固有の判断データを用いて反復的に llms (llama2-chat-13b) を調整でき、alpacaeval の 81.09 から 91.36 ポイントの安定した性能向上が得られた。分析の結果,LLMアライメントの報奨や今後の研究の保証よりも高い可能性が示唆された。

As humans, we consistently engage in interactions with our peers and receive feedback in the form of natural language. This language feedback allows us to reflect on our actions, maintain appropriate behavior, and rectify our errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with reward or preference data, we present the first systematic exploration of alignment through the lens of language feedback (i.e., judgment). We commence with an in-depth investigation of potential methods that can be adapted for aligning LLMs with judgments, revealing that these methods are unable to fully capitalize on the judgments. To facilitate more effective utilization of judgments, we propose a novel framework, Contrastive Unlikelihood Training (CUT), that allows for fine-grained inappropriate content detection and correction based on judgments. Our offline alignment results show that, with merely 1317 off-the-shelf judgment data, CUT (LLaMA2-13b) can beat the 175B DaVinci003 and surpass the best baseline by 52.34 points on AlpacaEval. The online alignment results demonstrate that CUT can align LLMs (LLaMA2-chat-13b) in an iterative fashion using model-specific judgment data, with a steady performance improvement from 81.09 to 91.36 points on AlpacaEval. Our analysis further suggests that judgments exhibit greater potential than rewards for LLM alignment and warrant future research.

翻訳日:2023-12-25 15:14:01 公開日:2023-12-22

# MEAOD:オブジェクト検出器に対するモデル抽出攻撃

MEAOD: Model Extraction Attack against Object Detectors ( http://arxiv.org/abs/2312.14677v1 )

ライセンス: Link先を確認

Zeyu Li, Chenghui Shi, Yuwen Pu, Xuhong Zhang, Yu Li, Jinbao Li, Shouling Ji

(参考訳) さまざまな業界でディープラーニング技術が広く使われているため、ディープニューラルネットワークモデルの価値が高く、その結果、潜在的な攻撃者にとって魅力的なターゲットとなっている。モデル抽出攻撃、特にクエリベースのモデル抽出攻撃は、攻撃者が犠牲者モデルに匹敵する機能を持つ代替モデルを複製し、MLaaSプラットフォームの機密性とセキュリティに重大な脅威を与えることを可能にする。近年、多くの研究が分類モデルに対するモデル抽出攻撃の脅威を探っているが、現実のシナリオで頻繁に使用されるオブジェクト検出モデルはあまり注目されていない。本稿では,オブジェクト検出モデルに対するクエリベースモデル抽出攻撃の課題と実現可能性を調査し,meaodと呼ばれる効果的な攻撃手法を提案する。攻撃者評価データセットからサンプルを選択して、アクティブラーニングを使用して効率的なクエリデータセットを構築し、不十分なオブジェクトでカテゴリを強化する。さらに,クエリデータセットのアノテーションを更新することで,抽出効率も向上する。グレーボックスとブラックボックスのシナリオ実験により、10kのクエリ予算の所定の条件下での抽出性能を70%以上達成した。

The widespread use of deep learning technology across various industries has made deep neural network models highly valuable and, as a result, attractive targets for potential attackers. Model extraction attacks, particularly query-based model extraction attacks, allow attackers to replicate a substitute model with comparable functionality to the victim model and present a significant threat to the confidentiality and security of MLaaS platforms. While many studies have explored threats of model extraction attacks against classification models in recent years, object detection models, which are more frequently used in real-world scenarios, have received less attention. In this paper, we investigate the challenges and feasibility of query-based model extraction attacks against object detection models and propose an effective attack method called MEAOD. It selects samples from the attacker-possessed dataset to construct an efficient query dataset using active learning and enhances the categories with insufficient objects. We additionally improve the extraction effectiveness by updating the annotations of the query dataset. According to our gray-box and black-box scenarios experiments, we achieve an extraction performance of over 70% under the given condition of a 10k query budget.

翻訳日:2023-12-25 15:07:43 公開日:2023-12-22

# LLMによるテキストからのゼロショット因果グラフ外挿

Zero-shot Causal Graph Extrapolation from Text via LLMs ( http://arxiv.org/abs/2312.14670v1 )

ライセンス: Link先を確認

Alessandro Antonucci, Gregorio Piqu\'e, Marco Zaffalon

(参考訳) 我々は,自然言語から因果関係を推定する大規模言語モデル (LLM) の能力を評価する。従来の自然言語処理やディープラーニング技術と比較して、LLMは(専門的な)トレーニングサンプルを必要とせずにペア関係のベンチマークで競合性能を示す。これにより、反復的なペアワイズクエリを通じて因果グラフを外挿するアプローチを拡張するモチベーションが生まれます。専門家が検証した真正の因果グラフを用いた生物医学的要約のベンチマークを予備分析する。この結果は、特に医学領域において、分析する科学的テキストの量が膨大であり、因果関係のステートメントが暗黙的である場合において、因果関係推論における重要なステップとしてLSMの採用を約束し、支持している。

We evaluate the ability of large language models (LLMs) to infer causal relations from natural language. Compared to traditional natural language processing and deep learning techniques, LLMs show competitive performance in a benchmark of pairwise relations without needing (explicit) training samples. This motivates us to extend our approach to extrapolating causal graphs through iterated pairwise queries. We perform a preliminary analysis on a benchmark of biomedical abstracts with ground-truth causal graphs validated by experts. The results are promising and support the adoption of LLMs for such a crucial step in causal inference, especially in medical domains, where the amount of scientific text to analyse might be huge, and the causal statements are often implicit.

翻訳日:2023-12-25 15:07:25 公開日:2023-12-22

# モダリティを考慮したマルチモーダルインテント認識のためのトーケンレベルコントラスト学習

Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition ( http://arxiv.org/abs/2312.14667v1 )

ライセンス: Link先を確認

Qianrui Zhou, Hua Xu, Hao Li, Hanlei Zhang, Xiaohan Zhang, Yifan Wang, Kai Gao

(参考訳) マルチモーダルな意図認識は,実世界のマルチモーダルなシナリオにおいて,人間の言語や行動を理解する上で重要なタスクを構成する,ユーザの意図を理解するために,表現,身体の動き,発話のトーンといった多様なモダリティを活用することを目的としている。しかしながら、既存の手法の大半は、異なるモダリティ間の潜在的な相関や、非言語的モダリティから意味的特徴を効果的に学習する際の独自の制限を無視している。本稿では,モダリティ・アウェア・プロンプト(tcl-map)を用いたトークンレベルのコントラスト学習手法を提案する。テキストモダリティのための最適なマルチモーダルセマンティクス環境を確立するために、類似性に基づくモダリティアライメントとクロスモダリティアライメントアライメント機構を備えたテキスト、ビデオ、オーディオモダリティの機能を効果的に調整・融合するモダリティ・アウェア・プロンプト・モジュール(map)を開発した。提案するトークンレベルコントラスト学習フレームワーク(TCL)は,モダリティ対応のプロンプトと基底真理ラベルに基づいて,拡張サンプルを構築し,NT-Xent損失をラベルトークンに適用する。特に、TCLは、目的ラベルから導かれる最適なテキスト意味的洞察を利用して、他のモダリティの学習プロセスを導出する。広範な実験により,本手法は最先端手法と比較して著しく改善が得られた。さらに, アブレーション解析により, マルチモーダルプロンプト学習において有意な重要性を持つ手作りプロンプトよりも, モダリティ認識プロンプトが優れていることが示された。コードはhttps://github.com/thuiar/TCL-MAPで公開されている。

Multimodal intent recognition aims to leverage diverse modalities such as expressions, body movements and tone of speech to comprehend user's intent, constituting a critical task for understanding human language and behavior in real-world multimodal scenarios. Nevertheless, the majority of existing methods ignore potential correlations among different modalities and own limitations in effectively learning semantic features from nonverbal modalities. In this paper, we introduce a token-level contrastive learning method with modality-aware prompting (TCL-MAP) to address the above challenges. To establish an optimal multimodal semantic environment for text modality, we develop a modality-aware prompting module (MAP), which effectively aligns and fuses features from text, video and audio modalities with similarity-based modality alignment and cross-modality attention mechanism. Based on the modality-aware prompt and ground truth labels, the proposed token-level contrastive learning framework (TCL) constructs augmented samples and employs NT-Xent loss on the label token. Specifically, TCL capitalizes on the optimal textual semantic insights derived from intent labels to guide the learning processes of other modalities in return. Extensive experiments show that our method achieves remarkable improvements compared to state-of-the-art methods. Additionally, ablation analyses demonstrate the superiority of the modality-aware prompt over the handcrafted prompt, which holds substantial significance for multimodal prompt learning. The codes are released at https://github.com/thuiar/TCL-MAP.

翻訳日:2023-12-25 15:07:13 公開日:2023-12-22

# ボソニックcQEDにおける光-物質相互作用系間のオンデマンドトランスポジション

On-demand transposition across light-matter interaction regimes in bosonic cQED ( http://arxiv.org/abs/2312.14665v1 )

ライセンス: Link先を確認

Fernando Valadares, Ni-Ni Huang, Kyle Chu, Aleksandr Dorogov, Weipin Chua, Kong Lingda, Pengtao Song, Yvonne Y. Gao

(参考訳) 科学とテクノロジーにおける光・物質相互作用の多様な応用は、これらの相互作用が定性的に異なる形で現れることに由来する。ボソニックcQEDは高Q超伝導キャビティの光電場を非線形回路素子に結合させ、その相互作用のリッチなダイナミクスを量子情報処理に利用している。しかし,キャビティコヒーレンスを損なうことなくインタラクションレジームの高速スイッチングを実現することは大きな課題である。本研究は,トランスモンのナノ秒スケールの周波数調整性と,数百マイクロ秒の寿命の共振器を結合した最初の実験である。提案手法は,共振相互作用を用いたキャビティフォック状態の高速生成や,定性的に異なる相互作用系での相互交換トモグラフィ技術,アイドル進化における不必要なキャビティ・トランスモンダイナミクスの抑制など,量子情報処理の新たな機能を実現する。ボソニックcQEDツールキットにフラックスチューナビリティーを導入することで、我々の研究は単一のプラットフォーム内での光-物質相互作用のフル範囲を探索する新しいパラダイムを開拓し、堅牢で汎用的な量子情報処理への有用な新しい経路を提供する。

The diverse applications of light-matter interactions in science and technology stem from the qualitatively distinct ways these interactions manifest, prompting the development of physical platforms that can interchange between regimes on demand. Bosonic cQED employs the light field of high-Q superconducting cavities coupled to non-linear circuit elements, harnessing the rich dynamics of their interaction for quantum information processing. However, implementing fast switching of the interaction regime without deteriorating the cavity coherence is a significant challenge. We present the first experiment to achieve this feat, combining nanosecond-scale frequency tunability of a transmon coupled to a cavity with lifetime of hundreds of microseconds. Our implementation affords a range of new capabilities for quantum information processing; from fast creation of cavity Fock states using resonant interaction and interchanging tomography techniques at qualitatively distinct interaction regimes on the fly, to the suppression of unwanted cavity-transmon dynamics during idle evolution. By bringing flux tunability into the bosonic cQED toolkit, our work opens up a new paradigm to probe the full range of light-matter interaction dynamics within a single platform and provides valuable new pathways towards robust and versatile quantum information processing.

翻訳日:2023-12-25 15:06:39 公開日:2023-12-22

# NeRFアンサンブルを用いた密度不確かさの定量化:データとシーン制約の影響

Density Uncertainty Quantification with NeRF-Ensembles: Impact of Data and Scene Constraints ( http://arxiv.org/abs/2312.14664v1 )

ライセンス: Link先を確認

Miriam J\"ager, Steven Landgraf, Boris Jutzi

(参考訳) コンピュータグラフィックス、コンピュータビジョン、フォトグラムメトリーの分野では、Neural Radiance Fields(NeRF)が現在の研究と開発を駆動する主要なトピックである。しかし、NeRF生成した3Dシーンの再現とその後の表面再構成の品質は、ネットワーク出力、特に密度に大きく依存している。この重要な側面については,平均密度とともに密度不確かさ推定を提供するNeRF-Ensemblesの利用を提案する。我々は,低画質画像やポーズなどのデータ制約がトレーニングプロセスの劣化,密度の不確実性の増大,予測密度の低下につながることを示した。高品質な入力データであっても、密度の不確実性は、取得コンステレーション、オクルージョン、材料特性などのシーン制約によって異なる。 NeRF-Ensemblesは不確実性を定量化するツールを提供するだけでなく、2つの有望な利点を示す。単一 NeRF の代わりに NeRF-Ensembles を用いることで、小さな外周を除去し、構造全体の完全性を改善したスムーズな出力が得られる。さらに,密度の不確かさに対するパーセンタイルに基づくしきい値の適用は,後処理において大きな(フォギー)アーティファクトの除去に有効であることが証明された。私たちは3つの異なるデータセットで方法論を実行します。 (i)合成ベンチマークデータセット (ii)実際のベンチマークデータセット (iii)現実的な記録条件とセンサによる実データ。

In the fields of computer graphics, computer vision and photogrammetry, Neural Radiance Fields (NeRFs) are a major topic driving current research and development. However, the quality of NeRF-generated 3D scene reconstructions and subsequent surface reconstructions, heavily relies on the network output, particularly the density. Regarding this critical aspect, we propose to utilize NeRF-Ensembles that provide a density uncertainty estimate alongside the mean density. We demonstrate that data constraints such as low-quality images and poses lead to a degradation of the training process, increased density uncertainty and decreased predicted density. Even with high-quality input data, the density uncertainty varies based on scene constraints such as acquisition constellations, occlusions and material properties. NeRF-Ensembles not only provide a tool for quantifying the uncertainty but exhibit two promising advantages: Enhanced robustness and artifact removal. Through the utilization of NeRF-Ensembles instead of single NeRFs, small outliers are removed, yielding a smoother output with improved completeness of structures. Furthermore, applying percentile-based thresholds on density uncertainty outliers proves to be effective for the removal of large (foggy) artifacts in post-processing. We conduct our methodology on 3 different datasets: (i) synthetic benchmark dataset, (ii) real benchmark dataset, (iii) real data under realistic recording conditions and sensors.

翻訳日:2023-12-25 15:06:15 公開日:2023-12-22

# 深部非パラメトリック時系列予測器

Deep Non-Parametric Time Series Forecaster ( http://arxiv.org/abs/2312.14657v1 )

ライセンス: Link先を確認

Syama Sundar Rangapuram, Jan Gasthaus, Lorenzo Stella, Valentin Flunkert, David Salinas, Yuyang Wang, Tim Januschowski

(参考訳) 本稿では,時系列予測のための非パラメトリックベースラインモデルを提案する。従来の予測モデルとは異なり、提案手法は予測分布のパラメトリック形式を仮定せず、学習可能な戦略に従って経験的分布からサンプリングして予測を生成する。これにより、モデルは常に妥当な予測(すなわち観測されたデータ範囲内での予測)を生成することができ、いくつかのデータ分布の数値安定性に苦しむ古典的なモデルと異なり失敗することはない。さらに,提案手法のグローバルバージョンを開発し,複数の時系列にまたがる情報を活用することで,サンプリング戦略を自動的に学習する。実験的な評価は,提案手法がすべてのデータセットに対して合理的かつ一貫した性能を示し,予測ツールボックスで考慮すべき強いベースラインであることを証明している。

This paper presents non-parametric baseline models for time series forecasting. Unlike classical forecasting models, the proposed approach does not assume any parametric form for the predictive distribution and instead generates predictions by sampling from the empirical distribution according to a tunable strategy. By virtue of this, the model is always able to produce reasonable forecasts (i.e., predictions within the observed data range) without fail unlike classical models that suffer from numerical stability on some data distributions. Moreover, we develop a global version of the proposed method that automatically learns the sampling strategy by exploiting the information across multiple related time series. The empirical evaluation shows that the proposed methods have reasonable and consistent performance across all datasets, proving them to be strong baselines to be considered in one's forecasting toolbox.

翻訳日:2023-12-25 15:05:50 公開日:2023-12-22

# SAVAE: 変分ベイズオートエンコーダの生存分析への応用

SAVAE: Leveraging the variational Bayes autoencoder for survival analysis ( http://arxiv.org/abs/2312.14651v1 )

ライセンス: Link先を確認

Patricia A. Apell\'aniz and Juan Parras and Santiago Zazo

(参考訳) 多くの医学研究の分野と同様に、生存分析は、複雑な、高次元、異質、不完全、検閲された医療データをモデル化するためのディープラーニング技術の応用への関心が高まっている。現在の手法では、実際には有効でない可能性のあるデータ間の関係を仮定することが多い。そこで本研究では,変分オートエンコーダに基づく新しいアプローチであるsavae(survival analysis variational autoencoder)を提案する。 SAVAEは、生存分析のための調整されたELBO定式化を導入し、共変量と生存時間の様々なパラメトリック分布をサポートすることで、この分野に大きく貢献する。さまざまなメトリクスを一貫して実行し、さまざまな実験を通じて堅牢性と安定性を示す一般的な方法を提供する。提案手法は, 時間とイベント, 検閲, 共変性相互作用, 時間変化リスク関連を効果的に推定する。我々は、ゲノム、臨床、人口統計データを含む多様なデータセットでモデルを検証し、様々なレベルの検閲を行う。このアプローチは、Concordance IndexとIntegrated Brier Scoreで評価されるように、最先端技術と比較して競合性能を示す。 SAVAEはまた、共変量と時間をパラメトリックにモデル化する解釈可能なモデルも提供している。さらに、その生成アーキテクチャは、クラスタリング、データ計算、生存データからの潜時空間推論による合成患者データの生成など、さらなる応用を促進する。

As in many fields of medical research, survival analysis has witnessed a growing interest in the application of deep learning techniques to model complex, high-dimensional, heterogeneous, incomplete, and censored medical data. Current methods often make assumptions about the relations between data that may not be valid in practice. In response, we introduce SAVAE (Survival Analysis Variational Autoencoder), a novel approach based on Variational Autoencoders. SAVAE contributes significantly to the field by introducing a tailored ELBO formulation for survival analysis, supporting various parametric distributions for covariates and survival time (as long as the log-likelihood is differentiable). It offers a general method that consistently performs well on various metrics, demonstrating robustness and stability through different experiments. Our proposal effectively estimates time-to-event, accounting for censoring, covariate interactions, and time-varying risk associations. We validate our model in diverse datasets, including genomic, clinical, and demographic data, with varying levels of censoring. This approach demonstrates competitive performance compared to state-of-the-art techniques, as assessed by the Concordance Index and the Integrated Brier Score. SAVAE also offers an interpretable model that parametrically models covariates and time. Moreover, its generative architecture facilitates further applications such as clustering, data imputation, and the generation of synthetic patient data through latent space inference from survival data.

翻訳日:2023-12-25 15:05:37 公開日:2023-12-22

# ロバストステレオマッチングのためのグローバルオクルージョンアウェアトランスフォーマ

Global Occlusion-Aware Transformer for Robust Stereo Matching ( http://arxiv.org/abs/2312.14650v1 )

ライセンス: Link先を確認

Zihua Liu, Yizhou Li and Masatoshi Okutomi

(参考訳) 学習に基づくステレオマッチングアルゴリズムによる顕著な進歩にもかかわらず、オクルード領域などの不条件領域のパフォーマンスは依然としてボトルネックとなっている。受容領域が限られているため、既存のCNNベースの手法はこれらの不条件領域を効果的に扱うのに苦労する。この問題に対処するため,本稿では,長距離依存とオクルージョン・アウェアネスのグローバルコンテキストを活用する,GOAT(Global Occlusion-Aware Transformer)と呼ばれる新しいアテンションベースのステレオマッチングネットワークを提案する。ヤギアーキテクチャにおいて, 初期偏差マップと咬合マスクを並列注意機構を用いて推定するために, 並列偏差・咬合推定モジュールpdoが提案されている。閉塞領域における不均一性の推定をさらに高めるため,OGA (Oocclusion-aware Global aggregate module) を提案する。本モジュールは、オクルード領域の焦点範囲内で制限されたグローバル相関を利用して、オクルード領域の格差を洗練することを目的としている。 sceneflow, kitti 2015, middleburyなど,いくつかの公開ベンチマークデータセットで広範な実験が行われた。その結果,提案手法はすべてのベンチマーク,特にオクルード領域において有意な性能を示した。

Despite the remarkable progress facilitated by learning-based stereo-matching algorithms, the performance in the ill-conditioned regions, such as the occluded regions, remains a bottleneck. Due to the limited receptive field, existing CNN-based methods struggle to handle these ill-conditioned regions effectively. To address this issue, this paper introduces a novel attention-based stereo-matching network called Global Occlusion-Aware Transformer (GOAT) to exploit long-range dependency and occlusion-awareness global context for disparity estimation. In the GOAT architecture, a parallel disparity and occlusion estimation module PDO is proposed to estimate the initial disparity map and the occlusion mask using a parallel attention mechanism. To further enhance the disparity estimates in the occluded regions, an occlusion-aware global aggregation module (OGA) is proposed. This module aims to refine the disparity in the occluded regions by leveraging restricted global correlation within the focus scope of the occluded areas. Extensive experiments were conducted on several public benchmark datasets including SceneFlow, KITTI 2015, and Middlebury. The results show that the proposed GOAT demonstrates outstanding performance among all benchmarks, particularly in the occluded regions.

翻訳日:2023-12-25 15:05:13 公開日:2023-12-22

# genaiのためのpub/subメッセージブローカ

Pub/Sub Message Brokers for GenAI ( http://arxiv.org/abs/2312.14647v1 )

ライセンス: Link先を確認

Alaa Saleh, Susanna Pirttikangas and Lauri Lov\'en

(参考訳) 今日のデジタル世界では、Large Language Models(LLMs)のようなジェネレーティブ人工知能(GenAI)がますます普及し、多様なアプリケーションにまたがる範囲を広げている。この採用の増加により、データ中心のGenAIモデルに対する需要が大幅に増加し、堅牢なデータ通信インフラの必要性が浮かび上がっている。このニーズの中心はメッセージブローカで、さまざまなシステムコンポーネント内でデータ転送に必要なチャネルとして機能します。この調査は、従来のメッセージブローカと現代のメッセージブローカを総合的に分析することを目的としており、一般的なプラットフォームの比較研究を提供している。本研究は,オープンソースの可用性,統合監視ツール,メッセージ優先順位付け機構,並列処理機能,信頼性,分散とクラスタリング機能,認証プロセス,データ永続化戦略,耐障害性,スケーラビリティなど,数多くの基準を検討する。さらに、各メッセージブローカの設計と運用が課す固有の制約についても検討し、これらの制限が現実世界の適用性を理解する上で重要であることを認識した。そして、これらの洞察を活用して、高度なメッセージブローカフレームワークを提案します -- GenAIアプリケーションの進化する要求を満たすために必要な適応性と堅牢性を設計します。最後に,genaiコンテキストに特化したメッセージブローカ機構の強化について検討し,汎用的なメッセージブローカフレームワークの開発を重要視する。このようなフレームワークは、近い将来、GenAIの動的かつ増大する要求に対処して、迅速な適応を実現することができるだろう。この二元的アプローチを通じて、我々は、GenAIデータ通信の領域における将来のイノベーションとインフラの進歩を導くための基礎的なコンペディションに貢献するつもりです。

In today's digital world, Generative Artificial Intelligence (GenAI) such as Large Language Models (LLMs) is becoming increasingly prevalent, extending its reach across diverse applications. This surge in adoption has sparked a significant increase in demand for data-centric GenAI models, highlighting the necessity for robust data communication infrastructures. Central to this need are message brokers, which serve as essential channels for data transfer within various system components. This survey aims to delve into a comprehensive analysis of traditional and modern message brokers, offering a comparative study of prevalent platforms. Our study considers numerous criteria including, but not limited to, open-source availability, integrated monitoring tools, message prioritization mechanisms, capabilities for parallel processing, reliability, distribution and clustering functionalities, authentication processes, data persistence strategies, fault tolerance, and scalability. Furthermore, we explore the intrinsic constraints that the design and operation of each message broker might impose, recognizing that these limitations are crucial in understanding their real-world applicability. We then leverage these insights to propose a sophisticated message broker framework -- one designed with the adaptability and robustness necessary to meet the evolving requisites of GenAI applications. Finally, this study examines the enhancement of message broker mechanisms specifically for GenAI contexts, emphasizing the criticality of developing a versatile message broker framework. Such a framework would be poised for quick adaptation, catering to the dynamic and growing demands of GenAI in the foreseeable future. Through this dual-pronged approach, we intend to contribute a foundational compendium that can guide future innovations and infrastructural advancements in the realm of GenAI data communication.

翻訳日:2023-12-25 15:04:50 公開日:2023-12-22

# 複数訪問型健康状態推定による患者記録の協調合成

Collaborative Synthesis of Patient Records through Multi-Visit Health State Inference ( http://arxiv.org/abs/2312.14646v1 )

ライセンス: Link先を確認

Hongda Sun, Hongzhan Lin, Rui Yan

(参考訳) 電子健康記録(EHR)は医療における機械学習アプリケーションの基礎となり、実際の患者記録の有用性はプライバシやセキュリティ上の懸念によって制限されることが多い。合成EHR生成は、この制限を補うための追加の視点を提供する。既存のほとんどの手法は、医学的常識に則ったイベントの組み合わせを制御できないEHRデータにおいて、さまざまな種類のイベントを考慮せずに、実際のEHRデータに基づいて新しいレコードを合成する。本稿では,これらの制約に対処するために,協調的EHR合成のためのマルチビジットヘルスステータス推論モデルMSICを提案する。まず、確率的グラフィカルモデルとして合成EHR生成過程を定式化し、潜伏状態のモデル化により様々な種類の事象を密結合する。次に,複数回の訪問シナリオ用に調整された健康状態推定手法を導出し,過去の記録を効果的に活用し,現在および将来の記録を合成する。さらに、各医療イベントにテキスト記述を追加するための医用レポートの作成を提案し、ehrデータを合成するための幅広いアプリケーションを提供する。各訪問で異なる段落を生成するために,複数の生成元のメッセージパッシングを協調して,高品質なレポートを生成するために2相復号戦略を用いるマルチジェネレータ審議フレームワークを組み込んだ。広く使われているベンチマークMIMIC-IIIとMIMIC-IVに関する広範な実験は、MSICがプライバシーリスクを低く保ちながら、合成データの品質に関する最先端の成果を示すものである。

Electronic health records (EHRs) have become the foundation of machine learning applications in healthcare, while the utility of real patient records is often limited by privacy and security concerns. Synthetic EHR generation provides an additional perspective to compensate for this limitation. Most existing methods synthesize new records based on real EHR data, without consideration of different types of events in EHR data, which cannot control the event combinations in line with medical common sense. In this paper, we propose MSIC, a Multi-visit health Status Inference model for Collaborative EHR synthesis to address these limitations. First, we formulate the synthetic EHR generation process as a probabilistic graphical model and tightly connect different types of events by modeling the latent health states. Then, we derive a health state inference method tailored for the multi-visit scenario to effectively utilize previous records to synthesize current and future records. Furthermore, we propose to generate medical reports to add textual descriptions for each medical event, providing broader applications for synthesized EHR data. For generating different paragraphs in each visit, we incorporate a multi-generator deliberation framework to collaborate the message passing of multiple generators and employ a two-phase decoding strategy to generate high-quality reports. Our extensive experiments on the widely used benchmarks, MIMIC-III and MIMIC-IV, demonstrate that MSIC advances state-of-the-art results on the quality of synthetic data while maintaining low privacy risks.

翻訳日:2023-12-25 15:04:22 公開日:2023-12-22

# 結合複素syk模型の熱力学と動力学

Thermodynamics and dynamics of coupled complex SYK models ( http://arxiv.org/abs/2312.14644v1 )

ライセンス: Link先を確認

Jan C. Louw, Linda M. van Manen, Rishabh Jha

(参考訳) 大きな$qの複素SYKモデルは、様々なブラックホールで共有されるファンデルワールス(平均場)と同じ普遍性クラスに属することが知られている。同時に、マルダセナ=シェンカー=スタンフォード境界(MSS)も飽和し、最大カオスとなる。この研究は、SYK様モデルの共有普遍性クラスと量子カオスのロバスト性を確立し、異なる順序の大きなq$複素SYKモデルの結合系に拡張する。本稿では, 相転移を観察する熱力学的(臨界指数)特性と, 時間外相関器(OTOC)計算による動的(リャプノフ指数)特性の詳細な導出を行う。解析の結果, 相互作用強度比による追加スケーリングパラメータの導入にもかかわらず, 単一SYKモデルと同様, 低温で連続的な位相遷移を行うことがわかった。臨界指数は、ファンデルワールスガスや様々なAdSブラックホールと共有されるランダウ・ギンツブルク(平均場)普遍性クラスと一致している。さらに,結合syk系は低温下では最大q$制限値において最大カオス状態のままであり,マルダセナ・シェンカー・スタンフォード(mss)境界に固着し,これは1つの大きなq$複素sykモデルと一致する特徴である。これらの発見は、複雑な量子系における普遍性とカオスに関するより広範な探求の道を開き、我々の結合したSYK系は、量子カオスのMSS境界を飽和させながら、ファンデルワールスや様々なAdSブラックホールと同じ普遍性クラスに属することを示した。

It has been known that the large-$q$ complex SYK model falls under the same universality class as that of van der Waals (mean-field) which is also shared by a variety of black holes. At the same time, it also saturates the Maldacena-Shenker-Stanford (MSS) bound and is thus maximally chaotic. This work establishes the robustness of shared universality class and quantum chaos for SYK-like models by extending to a system of coupled large-$q$ complex SYK models of different orders. We provide a detailed derivation of thermodynamic (critical exponents) properties observing a phase transition and dynamic (Lyapunov exponent) properties via the out-of-time correlator (OTOC) calculations. Our analysis reveals that, despite the introduction of an additional scaling parameter through interaction strength ratios, the system undergoes a continuous phase transition at low temperatures, similar to that of a single SYK model. The critical exponents align with the Landau-Ginzburg (mean-field) universality class, shared with van der Waals gases and various AdS black holes. Furthermore, we demonstrate that the coupled SYK system remains maximally chaotic in the large-$q$ limit at low temperatures, adhering to the Maldacena-Shenker-Stanford (MSS) bound, a feature consistent with single large-$q$ complex SYK model. These findings open avenues for broader inquiries into the universality and chaos in complex quantum systems by showing that our coupled SYK system belong to the same universality class as that of van der Waals and various AdS black holes while saturating the MSS bound of quantum chaos.

翻訳日:2023-12-25 15:03:54 公開日:2023-12-22

# 測定による圧縮フォック状態の生成

Generation of squeezed Fock states by measurement ( http://arxiv.org/abs/2312.14643v1 )

ライセンス: Link先を確認

S. B. Korolev, E. N. Bashmakova, A. K. Tagantsev, T. Yu. Golubeva

(参考訳) 2モードの絡み合ったガウス状態(TMEG)からの1つ以上の光子サブトラクションによる圧縮フォック状態の生成は理論的に対処される。その結果,任意の順序フォック状態が生成可能であることを示し,tmeg状態のパラメータに課してそのような生成を保証すべき条件を得た。我々はこの条件が満たされる体制を普遍的解決体制と呼んだ。その結果, 任意のTMEG状態からの1光子サブトラクションにより, 第1圧縮Fock状態の生成が引き続き可能となるように, 上記条件は冗長であることがわかった。同時に、最初の圧縮されたフォック状態生成の最大生成確率は、普遍解状態に対応する。本研究では,ビームスプリッタと制御Z演算を用いた圧縮フォック状態の生成に関する記述に,上記の結果を適用した。最大確率でスクイズドフォック状態を得るために必要な,これらの設定パラメータと入力スクイズド状態のパラメータを推定した。

The generation of squeezed Fock states by the one or more photon subtraction from a two-mode entangled Gaussian (TMEG) state is theoretically addressed. We showed that an arbitrary order Fock state can be generated this way and we obtained a condition that should be imposed on the parameters of the TMEG state to guaranty such a generation. We called the regime, in which this condition is satisfied, universal solution regime. We showed that, for first squeezed Fock state, the above condition is redundant such that the generation of the first squeezed Fock state is still possible by a one photon subtraction from an arbitrary TMEG state. At the same time, the maximum generation probability of the first squeezed Fock state generation corresponds to the universal solution regime. We applied the above results to the description of generation of the squeezed Fock states using a beam splitter and a Controlled-Z operation. We have estimated the parameters of such setups and input squeezed states, which are necessary to obtain squeezed Fock states with the maximum probability.

翻訳日:2023-12-25 15:03:24 公開日:2023-12-22

# オーバーザ・エアフェデレーション学習におけるエネルギー効率と分布ロバスト性のバランス

Balancing Energy Efficiency and Distributional Robustness in Over-the-Air Federated Learning ( http://arxiv.org/abs/2312.14638v1 )

ライセンス: Link先を確認

Mohamed Badi, Chaouki Ben Issaid, Anis Elgabli and Mehdi Bennis

(参考訳) ワイヤレスエッジデバイスの増加により、エネルギー、帯域幅、レイテンシ、データの均一性に関する課題が拡大した。これらの課題は分散学習のボトルネックになっている。これらの問題に対処するため,エアコン(AirComp)を用いた分布的堅牢な連邦学習(FL)におけるエネルギー効率を保証する新しい手法を提案する。本研究では,エネルギー効率とロバスト性を効果的にバランスさせるために,エネルギー効率に配慮した決定論的手法と,分散ロバスト性に配慮した確率論的手法の2つの相補的な洞察を統合する新しいクライアント選択手法を導入する。シミュレーションの結果,提案アルゴリズムの有効性は,ロバスト性とエネルギー効率の両面から,ベースラインよりも優れた性能を示し,ベースラインよりも3倍以上の省エネを実現している。

The growing number of wireless edge devices has magnified challenges concerning energy, bandwidth, latency, and data heterogeneity. These challenges have become bottlenecks for distributed learning. To address these issues, this paper presents a novel approach that ensures energy efficiency for distributionally robust federated learning (FL) with over air computation (AirComp). In this context, to effectively balance robustness with energy efficiency, we introduce a novel client selection method that integrates two complementary insights: a deterministic one that is designed for energy efficiency, and a probabilistic one designed for distributional robustness. Simulation results underscore the efficacy of the proposed algorithm, revealing its superior performance compared to baselines from both robustness and energy efficiency perspectives, achieving more than 3-fold energy savings compared to the considered baselines.

翻訳日:2023-12-25 15:03:09 公開日:2023-12-22

# ニューラルフローマップ上の流体シミュレーション

Fluid Simulation on Neural Flow Maps ( http://arxiv.org/abs/2312.14635v1 )

ライセンス: Link先を確認

Yitong Deng, Hong-Xing Yu, Diyang Zhang, Jiajun Wu, and Bo Zhu

(参考訳) 本稿では,流れ図の理論に基づく流体シミュレーションにより,暗黙的ニューラル表現の新たなパラダイムをブリッジする新しいシミュレーション手法であるニューラル・フロー・マップを導入し,流体現象の最先端のシミュレーションを実現する。重なり合う,多解像度,空間的にスパースグリッドのピラミッドで小さなニューラルネットワークを融合させ,長期時空間速度場を高精度にコンパクトに表現する,新しいハイブリッドニューラルネットワーク表現(Spatially Sparse Neural Fields, SSNF)を考案する。このニューラル・ベロシティ・バッファを手元に,長期的な双方向フローマップとそのヤコビアンを機械的に対称的に計算し,既存の解に対する劇的な精度向上を図る。これらの長距離双方向フローマップは、低い散逸で高いアドベクション精度を実現し、複雑な渦構造を示す高忠実な非圧縮性フローシミュレーションを容易にする。本研究は, 跳躍渦, 衝突渦, 渦再接続, 移動障害物からの渦発生, 密度差など, 様々な困難なシミュレーションシナリオにおいて, 神経流体シミュレーションの有効性を実証する。実例では, エネルギー保存, 視覚の複雑さ, 実験観察への順守, 詳細な渦構造保存の観点から, 既存の手法よりも高い性能を示す。

We introduce Neural Flow Maps, a novel simulation method bridging the emerging paradigm of implicit neural representations with fluid simulation based on the theory of flow maps, to achieve state-of-the-art simulation of inviscid fluid phenomena. We devise a novel hybrid neural field representation, Spatially Sparse Neural Fields (SSNF), which fuses small neural networks with a pyramid of overlapping, multi-resolution, and spatially sparse grids, to compactly represent long-term spatiotemporal velocity fields at high accuracy. With this neural velocity buffer in hand, we compute long-term, bidirectional flow maps and their Jacobians in a mechanistically symmetric manner, to facilitate drastic accuracy improvement over existing solutions. These long-range, bidirectional flow maps enable high advection accuracy with low dissipation, which in turn facilitates high-fidelity incompressible flow simulations that manifest intricate vortical structures. We demonstrate the efficacy of our neural fluid simulation in a variety of challenging simulation scenarios, including leapfrogging vortices, colliding vortices, vortex reconnections, as well as vortex generation from moving obstacles and density differences. Our examples show increased performance over existing methods in terms of energy conservation, visual complexity, adherence to experimental observations, and preservation of detailed vortical structures.

翻訳日:2023-12-25 15:02:52 公開日:2023-12-22

# 説明可能・説明不能ロボットとのインタラクションにおけるマルチモーダルコミュニケーションパターンのマイニング

Mining multi-modal communication patterns in interaction with explainable and non-explainable robots ( http://arxiv.org/abs/2312.14634v1 )

ライセンス: Link先を確認

Suna Bensch and Amanda Eriksson

(参考訳) 説明可能で説明不能なロボットと対話する人間のインタラクションパターンについて検討する。説明不能なロボットは、説明可能なロボットとは対照的に、動作や非動作を説明せず、インタラクション中に他のフィードバックも与えないロボットである。 20人の人間が説明可能なpepperロボットか説明不能なpepperロボットのいずれかにボード上のオブジェクトを移動させるように指示したボードゲーム中に、人間の行動を記録し分析した。ビデオの転写と注釈は、アソシエーションルールマイニングのためのトランザクションに変換された。アソシエーション・ルールは、ロボットと人間の相互作用におけるコミュニケーションパターンを発見し、最も興味深いルールは、通常の2乗テストでもテストされた。統計的に有意な結果は、男性と説明不能なロボットと女性と説明可能なロボットの間に強い相関関係があり、人間がロボットのモダリティの一部を反映しているということである。また,人間のインタラクションパターンの文脈化が重要であり,関連ルールを調査ツールとして活用することが重要であることも示唆した。これらの結果は,人間の行動に適応するロボットの設計において重要である。

We investigate interaction patterns for humans interacting with explainable and non-explainable robots. Non-explainable robots are here robots that do not explain their actions or non-actions, neither do they give any other feedback during interaction, in contrast to explainable robots. We video recorded and analyzed human behavior during a board game, where 20 humans verbally instructed either an explainable or non-explainable Pepper robot to move objects on the board. The transcriptions and annotations of the videos were transformed into transactions for association rule mining. Association rules discovered communication patterns in the interaction between the robots and the humans, and the most interesting rules were also tested with regular chi-square tests. Some statistically significant results are that there is a strong correlation between men and non-explainable robots and women and explainable robots, and that humans mirror some of the robot's modality. Our results also show that it is important to contextualize human interaction patterns, and that this can be easily done using association rules as an investigative tool. The presented results are important when designing robots that should adapt their behavior to become understandable for the interacting humans.

翻訳日:2023-12-25 15:02:26 公開日:2023-12-22

# 金融システム設計のためのテキスト-SQL翻訳の強化

Enhancing Text-to-SQL Translation for Financial System Design ( http://arxiv.org/abs/2312.14725v1 )

ライセンス: Link先を確認

Yewei Song, Saad Ezzini, Xunzhu Tang, Cedric Lothritz, Jacques Klein, Tegawend\'e Bissyand\'e, Andrey Boytsov, Ulrick Ble, Anne Goujon

(参考訳) 自然言語質問をSQLクエリに変換するタスクであるText-to-SQLは、さまざまなビジネスプロセスの一部である。その自動化は新たな課題であり、ソフトウェア実践者が自然言語を使ってリレーショナルデータベースとシームレスに対話できるようにし、ビジネスニーズとソフトウェア能力のギャップを埋める。本稿では,様々なNLPタスクの最先端技術を実現したLarge Language Models (LLMs)について考察する。具体的には、テキストからSQLまでのパフォーマンス、評価手法、および入力最適化(プロンプトなど)をベンチマークする。本稿では,SQLクエリ間の類似性を適切に測定するための2つの新しい指標を提案する。全体としては,テキストからsqlへのタスクで適切なllmを選択する方法など,さまざまな調査結果をコミュニティと共有しています。さらに、木ベースの編集距離が、生成したSQLクエリとText2SQLアプローチのベンチマークのオラクルとの類似性を評価するための信頼性の高い指標であることを示す。このメトリクスは、研究者が事前の作業で生成されたクエリを実行するなど、計算コストのかかる実験を行う必要がなくなるため、重要である。本研究は、金融ドメインのユースケースを実装し、text2sqlシステムの進歩と、このドメインでの実用化に寄与する。

Text-to-SQL, the task of translating natural language questions into SQL queries, is part of various business processes. Its automation, which is an emerging challenge, will empower software practitioners to seamlessly interact with relational databases using natural language, thereby bridging the gap between business needs and software capabilities. In this paper, we consider Large Language Models (LLMs), which have achieved state of the art for various NLP tasks. Specifically, we benchmark Text-to-SQL performance, the evaluation methodologies, as well as input optimization (e.g., prompting). In light of the empirical observations that we have made, we propose two novel metrics that were designed to adequately measure the similarity between SQL queries. Overall, we share with the community various findings, notably on how to select the right LLM on Text-to-SQL tasks. We further demonstrate that a tree-based edit distance constitutes a reliable metric for assessing the similarity between generated SQL queries and the oracle for benchmarking Text2SQL approaches. This metric is important as it relieves researchers from the need to perform computationally expensive experiments such as executing generated queries as done in prior works. Our work implements financial domain use cases and, therefore contributes to the advancement of Text2SQL systems and their practical adoption in this domain.

翻訳日:2023-12-25 14:55:48 公開日:2023-12-22

# 離散選択モデルにおける画像:多モード入力におけるデータ同型対応

Images in Discrete Choice Modeling: Addressing Data Isomorphism in Multi-Modality Inputs ( http://arxiv.org/abs/2312.14724v1 )

ライセンス: Link先を確認

Brian Sifringer, Alexandre Alahi

(参考訳) 本稿では,dcm(離散選択モデリング)と機械学習の交点について検討し,dcmの実用機能への画像データの統合とそのモデル解釈性への影響について考察する。本稿では,DCMフレームワーク内の従来の表型入力と同型情報を共有する高次元画像データの埋め込み結果について検討する。ニューラルネットワーク(NN)コンポーネントは、共起が存在するときの画像から表層変数表現を学習し、複製することにより、DCMパラメータの解釈可能性を向上させる。我々は,冗長な情報を分離するためのアーキテクチャ設計調整と,ソース情報マスキングとインパインティングによる同型情報緩和の2つの手法を提案する。半合成データセットを用いて行った実験により, 設計上の変更が不決定性を示す一方で, データソースの直接緩和はDCMの解釈可能なパラメータの整合性を維持する上で, より効果的な戦略であることが示された。本稿は,実世界における知見の適用可能性について考察し,複雑なデータモダリティを結合したハイブリッドモデリングにおける今後の研究の意義について考察する。 MITのモラルマシンデータセットを用いて表と画像データの整合性を完全に制御し、Learning Multinomial Logit(L-MNL)フレームワークをデプロイすることにより、両方の入力を選択モデルにマージする。

This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning, focusing on the integration of image data into DCM's utility functions and its impact on model interpretability. We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework. Our study reveals that neural network (NN) components learn and replicate tabular variable representations from images when co-occurrences exist, thereby compromising the interpretability of DCM parameters. We propose and benchmark two methodologies to address this challenge: architectural design adjustments to segregate redundant information, and isomorphic information mitigation through source information masking and inpainting. Our experiments, conducted on a semi-synthetic dataset, demonstrate that while architectural modifications prove inconclusive, direct mitigation at the data source shows to be a more effective strategy in maintaining the integrity of DCM's interpretable parameters. The paper concludes with insights into the applicability of our findings in real-world settings and discusses the implications for future research in hybrid modeling that combines complex data modalities. Full control of tabular and image data congruence is attained by using the MIT moral machine dataset, and both inputs are merged into a choice model by deploying the Learning Multinomial Logit (L-MNL) framework.

翻訳日:2023-12-25 14:55:27 公開日:2023-12-22

# gerrymandering平面グラフ

Gerrymandering Planar Graphs ( http://arxiv.org/abs/2312.14721v1 )

ライセンス: Link先を確認

Jack Dippel, Max Dupr\'e la Tour, April Niu, Adrian Vetta

(参考訳) 地図再帰問題 (gerrymandering) の計算複雑性について検討する。数学的には、選挙地区設計者 (gerrymanderer) は、重み付きグラフを$k$連結成分 (districts) に分割し、その候補 (party) ができるだけ多くの地区で勝利する。先行研究は主に、グラフがパスまたはツリーである特別なケースに関するものである。私たちの焦点は、グラフが平面である現実的なケースに関するものです。我々は、候補数と$\lambda$が定数であり、頂点重み(投票重み)が多項式有界であるとき、ジェリーマンディング問題は$\lambda$-outerplanar graphsの多項式時間で解けることを証明した。対照的に、問題は2つの候補でさえ一般平面グラフにおいてNP完全である。これは、gerrymandering平面グラフの近似アルゴリズムの研究を動機付ける。しかし、候補数が大きければ、ゲリーマンデラーが1つの地区に勝てない場合と、ゲリーマンデラーが少なくとも1つの地区に勝てる場合とを区別することは困難である。これは即時、 P=NP でない限り、再制限問題は平面グラフの多項式時間では適用できないことを意味する。この結論は、優れた近似アルゴリズムの設計のターミナルであるように見えるが、そうではない。ゲリーマンデラーが勝つことができる範囲の最大数が極端に小さい場合にのみ適用されるため、近似可能性の境界は回避できる。実際、固定数の候補に対して、我々の主な結果は、最適値が十分大きな定数であれば、未重み付き平面グラフを再配置するための定数係数近似アルゴリズムが存在することである。

We study the computational complexity of the map redistricting problem (gerrymandering). Mathematically, the electoral district designer (gerrymanderer) attempts to partition a weighted graph into $k$ connected components (districts) such that its candidate (party) wins as many districts as possible. Prior work has principally concerned the special cases where the graph is a path or a tree. Our focus concerns the realistic case where the graph is planar. We prove that the gerrymandering problem is solvable in polynomial time in $\lambda$-outerplanar graphs, when the number of candidates and $\lambda$ are constants and the vertex weights (voting weights) are polynomially bounded. In contrast, the problem is NP-complete in general planar graphs even with just two candidates. This motivates the study of approximation algorithms for gerrymandering planar graphs. However, when the number of candidates is large, we prove it is hard to distinguish between instances where the gerrymanderer cannot win a single district and instances where the gerrymanderer can win at least one district. This immediately implies that the redistricting problem is inapproximable in polynomial time in planar graphs, unless P=NP. This conclusion appears terminal for the design of good approximation algorithms -- but it is not. The inapproximability bound can be circumvented as it only applies when the maximum number of districts the gerrymanderer can win is extremely small, say one. Indeed, for a fixed number of candidates, our main result is that there is a constant factor approximation algorithm for redistricting unweighted planar graphs, provided the optimal value is a large enough constant.

翻訳日:2023-12-25 14:55:02 公開日:2023-12-22

# 静止ボソニックモードのディジタルホモダインとヘテロダイン検出

Digital homodyne and heterodyne detection for stationary bosonic modes ( http://arxiv.org/abs/2312.14720v1 )

ライセンス: Link先を確認

Ingrid Strandberg, Axel Eriksson, Baptiste Royer, Mikael Kervinen, Simone Gasparinetti

(参考訳) ホモ・ヘテロダイン検出は伝搬電磁場を測定する基本的な技術である。しかし、これらの技法をキャビティに閉じ込められた定常場に適用することは困難である。この課題を克服するために,空洞と相互作用する2段階システムの間接的測定を繰り返すことを提案する。提案手法が単一ショットレベルでのホモ・ヘテロダイン検出の測定統計を忠実に再現できることを数値的に示す。このスキームは、回路量子電磁力学を含む様々な物理アーキテクチャで実装することができる。量子検証プロトコルを含む線形検出を必要とする量子アルゴリズムを定常モードで実装する方法について検討した。

Homo- and heterodyne detection are fundamental techniques for measuring propagating electromagnetic fields. However, applying these techniques to stationary fields confined in cavities poses a challenge. As a way to overcome this challenge, we propose to use repeated indirect measurements of a two-level system interacting with the cavity. We demonstrate numerically that the proposed measurement scheme faithfully reproduces measurement statistics of homo- or heterodyne detection at the single-shot level. The scheme can be implemented in various physical architectures, including circuit quantum electrodynamics. Our results pave the way to the implementation of quantum algorithms requiring linear detection, including quantum verification protocols, in stationary modes.

翻訳日:2023-12-25 14:54:32 公開日:2023-12-22

# Rydbergイオンを捕捉した三部量子ラビモデル

Tripartite quantum Rabi model with trapped Rydberg ions ( http://arxiv.org/abs/2312.14718v1 )

ライセンス: Link先を確認

Thomas J. Hamlyn, Chi Zhang, Igor Lesanovsky, and Weibin Li

(参考訳) ボソニックモードがスピン-スピン相互作用を通じて2つのスピン-1/2粒子に同時に結合する三成分量子ラビモデル(tqrm)について検討し、スピン-スピン-ボーソンカップリング--二成分スピン-ボーソンカップリングを特徴とする従来の量子ラビモデルから脱却する。 tqrmの対称性は、スピン状態間のエネルギー差を表すデチューニングパラメータに依存する。ゼロデチューニングにおいて、パリティ対称性はTQRMを量子ラビモデルに還元することができる。 3部結合強度が増加するにつれて、基底状態における超ラジカル遷移が予測される。非ゼロデチューニングでは、トータルスピンはTQRMの唯一の保存量として現れる。 3部結合が非ゼロである限り、基底状態において超放射能が優位であることがわかった。固有スペクトルが得られたTQRMのブラックG関数を解析的に導出する。 TQRMは、TQRM内で必要となる三部結合と単体相互作用が自然に存在する、リドバーグイオン量子シミュレータで実現可能である。

We investigate a tripartite quantum Rabi model (TQRM) wherein a bosonic mode concurrently couples to two spin-1/2 particles through a spin-spin interaction, resulting in a spin-spin-boson coupling--a departure from conventional quantum Rabi models featuring bipartite spin-boson couplings. The symmetries of the TQRM depend on the detuning parameter, representing the energy difference between the spin states. At zero detuning, a parity symmetry renders the TQRM reducible to a quantum Rabi model. A subradiant to superradiant transition in the groundstate is predicted as the tripartite coupling strength increases. For non-zero detuning, the total spin emerges as the sole conserved quantity in the TQRM. It is found that superradiance prevails in the groundstate as long as the tripartite coupling remains non-zero. We derive the Braak G-function of the TQRM analytically, with which the eigenspectra are obtained. The TQRM can be realized in a viable trapped Rydberg ion quantum simulator where the required tripartite couplings and single body interactions in the TQRM are naturally present.

翻訳日:2023-12-25 14:54:22 公開日:2023-12-22

# 逆転送多目的最適化

Inverse Transfer Multiobjective Optimization ( http://arxiv.org/abs/2312.14713v1 )

ライセンス: Link先を確認

Jiao Liu, Abhishek Gupta, and Yew-Soon Ong

(参考訳) 転送最適化により、関連するソースタスクからの経験的事前情報を活用することで、ターゲットタスクのデータ効率の最適化が可能になる。これは、厳密な評価予算の下で一連のトレードオフソリューションを求める多目的最適化設定において特に有用である。本稿では,多目的最適化における逆移動の概念を紹介する。逆伝達は、目的空間のパフォーマンスベクトルをタスク固有の決定空間における集団探索分布にマッピングするために確率的逆モデルを用いることで際立っている。このアイデアに基づいて,InvTrEMO(Inverse Transfer Multiobjective Evolutionary Optimizer)を提案する。 invtremoの重要な特徴は、意思決定空間がタスク間で正確に一致していない場合でも、多くのアプリケーション領域で広く使われている共通の客観的関数を利用する能力である。これにより、invTrEMOは異種ソースタスクからの情報をユニークかつ効果的に利用することができる。さらに、invTrEMOは、高精度の逆モデルを重要な副産物として提供し、ユーザの好みに基づいて、オンデマンドで調整されたソリューションを生成する。多目的および多目的ベンチマーク問題に関する実証研究は、実例研究と同様に、最先端の進化的およびベイズ最適化アルゴリズムと比較して、invTrEMOの高速収束率とモデリング精度を示す。 invTrEMOのソースコードはhttps://github.com/LiuJ-2023/invTrEMOで公開されている。

Transfer optimization enables data-efficient optimization of a target task by leveraging experiential priors from related source tasks. This is especially useful in multiobjective optimization settings where a set of trade-off solutions is sought under tight evaluation budgets. In this paper, we introduce a novel concept of inverse transfer in multiobjective optimization. Inverse transfer stands out by employing probabilistic inverse models to map performance vectors in the objective space to population search distributions in task-specific decision space, facilitating knowledge transfer through objective space unification. Building upon this idea, we introduce the first Inverse Transfer Multiobjective Evolutionary Optimizer (invTrEMO). A key highlight of invTrEMO is its ability to harness the common objective functions prevalent in many application areas, even when decision spaces do not precisely align between tasks. This allows invTrEMO to uniquely and effectively utilize information from heterogeneous source tasks as well. Furthermore, invTrEMO yields high-precision inverse models as a significant byproduct, enabling the generation of tailored solutions on-demand based on user preferences. Empirical studies on multi- and many-objective benchmark problems, as well as a practical case study, showcase the faster convergence rate and modelling accuracy of the invTrEMO relative to state-of-the-art evolutionary and Bayesian optimization algorithms. The source code of the invTrEMO is made available at https://github.com/LiuJ-2023/invTrEMO.

翻訳日:2023-12-25 14:54:01 公開日:2023-12-22

# 機械はロバスト、プライベート、効率的に学習できるのか?

Can Machines Learn Robustly, Privately, and Efficiently? ( http://arxiv.org/abs/2312.14712v1 )

ライセンス: Link先を確認

Youssef Allouah, Rachid Guerraoui, and John Stephan

(参考訳) 機械学習(ML)アプリケーションの成功は、膨大なデータセットと分散アーキテクチャに依存し、成長するにつれて、MLの課題が提示される。データがセンシティブな情報を含む実世界のシナリオでは、データ中毒やハードウェア障害といった問題が一般的である。プライバシと堅牢性の確保は、公共生活におけるMLの普及に不可欠である。本稿では,分散アーキテクチャにおけるこれらの目的達成に伴うコストについて検討する。分散MLにおけるプライバシとロバスト性の意味を概説し、それらを分離して効率的に達成する方法を明らかにする。しかし、これらの目的の統合は計算効率において顕著な妥協をもたらすと我々は主張する。この複雑なバランスを掘り下げて、MLアプリケーションにおけるプライバシ、堅牢性、計算効率の課題と解決策を探求します。

The success of machine learning (ML) applications relies on vast datasets and distributed architectures, which, as they grow, present challenges for ML. In real-world scenarios, where data often contains sensitive information, issues like data poisoning and hardware failures are common. Ensuring privacy and robustness is vital for the broad adoption of ML in public life. This paper examines the costs associated with achieving these objectives in distributed architectures. We overview the meanings of privacy and robustness in distributed ML, and clarify how they can be achieved efficiently in isolation. However, we contend that the integration of these objectives entails a notable compromise in computational efficiency. We delve into this intricate balance, exploring the challenges and solutions for privacy, robustness, and computational efficiency in ML applications.

翻訳日:2023-12-25 14:53:40 公開日:2023-12-22

# 極性認知デノジングを用いた感覚伝達におけるスタイルコンテンツトレードオフのバランス

Balancing the Style-Content Trade-Off in Sentiment Transfer Using Polarity-Aware Denoising ( http://arxiv.org/abs/2312.14708v1 )

ライセンス: Link先を確認

Sourabrata Mukherjee, Zden\v{e}k Kasner, Ond\v{r}ej Du\v{s}ek

(参考訳) テキストの感情伝達は、感情に依存しないコンテンツを保持しながら、文章の感情の極性を反転させることを目的としている。現在のモデルでは感情の変化は良好であるが, 翻訳文のコンテンツ保存は不十分である。本稿では,生成されたテキストの感情属性を正確に制御し,コンテンツの保存とスタイル・コンテンツのトレードオフのバランスを図る,極性認識に基づく感情伝達モデルを提案する。提案手法は,共有エンコーダを用いた表現学習と感情特異的デコーダを用いた感情制御生成の2つの段階からなる。実験結果から,本手法はコンテンツ保存の面では最先端ベースラインを上回っており,スタイル転送精度とフラレンシーの面では競争力を維持していることが示された。

Text sentiment transfer aims to flip the sentiment polarity of a sentence (positive to negative or vice versa) while preserving its sentiment-independent content. Although current models show good results at changing the sentiment, content preservation in transferred sentences is insufficient. In this paper, we present a sentiment transfer model based on polarity-aware denoising, which accurately controls the sentiment attributes in generated text, preserving the content to a great extent and helping to balance the style-content trade-off. Our proposed model is structured around two key stages in the sentiment transfer process: better representation learning using a shared encoder and sentiment-controlled generation using separate sentiment-specific decoders. Empirical results show that our methods outperforms state-of-the-art baselines in terms of content preservation while staying competitive in terms of style transfer accuracy and fluency.

翻訳日:2023-12-25 14:53:25 公開日:2023-12-22

# bonnbeetclouds3d: 実地条件下でのサトウキビ植物のポイントクラウドに基づくオルガンレベル表現型化に向けたデータセット

BonnBeetClouds3D: A Dataset Towards Point Cloud-based Organ-level Phenotyping of Sugar Beet Plants under Field Conditions ( http://arxiv.org/abs/2312.14706v1 )

ライセンス: Link先を確認

Elias Marks, Jonas B\"omer, Federico Magistri, Anurag Sah, Jens Behley, Cyrill Stachniss

(参考訳) 農業生産は今後数十年間、気候変動と持続可能性の必要性によって深刻な課題に直面しており、環境への影響を減らしている。自律型無人航空機(uavs)による作物の監視と、新鮮でレジリエントな作物品種の育成を組み合わせることで、ロボットによる非化学除草によるフィールドマネジメントの進歩は、これらの課題に対処するのに役立つ。表現型化と呼ばれる植物形質の分析は、植物の育種に不可欠な活動であるが、大量の手作業が伴う。本稿では,精密表現に必要とされる臓器の微細な形状解析の課題に対処する。この領域における実世界のデータの可利用性は比較的低いため、48種の植物種を含む実育種試験の高精細度画像をuavで取得し、形態学的および外観の多様性を網羅する新しいデータセットを提案する。これにより、異なる多様体にうまく一般化する自律表現型へのアプローチの開発が可能になる。複数視点からの高分解能画像の重ね合わせに基づいて,photogrammetric dense point clouds を計算し,先端および基部として植物,葉,塩分点の詳細な高精度な点ラベルを提供する。さらに,ドイツ連邦植物多様性局の専門家による実生植物における表現型形質の測定を行い,セグメンテーションやキーポイント検出だけでなく,下流のタスクにも新たなアプローチの評価が可能となった。提供されたラベル付きポイントクラウドは、細粒度植物分析を可能にし、自動表現型化アプローチの開発のさらなる進展を支援するとともに、表面再構成、ポイントクラウド完成、ポイントクラウドの意味解釈に関するさらなる研究を可能にする。

Agricultural production is facing severe challenges in the next decades induced by climate change and the need for sustainability, reducing its impact on the environment. Advancements in field management through non-chemical weeding by robots in combination with monitoring of crops by autonomous unmanned aerial vehicles (UAVs) and breeding of novel and more resilient crop varieties are helpful to address these challenges. The analysis of plant traits, called phenotyping, is an essential activity in plant breeding, it however involves a great amount of manual labor. With this paper, we address the problem of automatic fine-grained organ-level geometric analysis needed for precision phenotyping. As the availability of real-world data in this domain is relatively scarce, we propose a novel dataset that was acquired using UAVs capturing high-resolution images of a real breeding trial containing 48 plant varieties and therefore covering great morphological and appearance diversity. This enables the development of approaches for autonomous phenotyping that generalize well to different varieties. Based on overlapping high-resolution images from multiple viewing angles, we compute photogrammetric dense point clouds and provide detailed and accurate point-wise labels for plants, leaves, and salient points as the tip and the base. Additionally, we include measurements of phenotypic traits performed by experts from the German Federal Plant Variety Office on the real plants, allowing the evaluation of new approaches not only on segmentation and keypoint detection but also directly on the downstream tasks. The provided labeled point clouds enable fine-grained plant analysis and support further progress in the development of automatic phenotyping approaches, but also enable further research in surface reconstruction, point cloud completion, and semantic interpretation of point clouds.

翻訳日:2023-12-25 14:53:10 公開日:2023-12-22

# SCUNet++:Swin-UNetとCNN Bottleneckハイブリッドアーキテクチャを併用した肺塞栓CT画像分割の評価

SCUNet++: Assessment of Pulmonary Embolism CT Image Segmentation Leveraging Swin-UNet and CNN Bottleneck Hybrid Architecture with Multi-Fusion Dense Skip Connection ( http://arxiv.org/abs/2312.14705v1 )

ライセンス: Link先を確認

Yifei Chen, Binfeng Zou, Zhaoxin Guo, Yiyu Huang, Yifan Huang, Feiwei Qin, Qinhai Li, Changmiao Wang

(参考訳) 肺塞栓症 (PE) は右室肥大と重症症例の不全につながる肺疾患であり, 重症度は心筋梗塞と突然死のみに次いで2位である。肺動脈CT血管造影(CTPA)は,PEの診断法として広く用いられている。しかし,PE検出は画像技術の限界により臨床実践の課題を呈する。 CTPAはPEに似たノイズを発生させ、その存在が時間を要することを確認し、過剰な診断をしがちである。しかし,従来のPEのセグメンテーション法では,PECT画像の特徴の階層構造,局所的および大域的空間的特徴を十分に考慮できない。本稿では,SCUNet++ (Swin Conv UNet++) と呼ばれる自動PEセグメンテーション手法を提案する。この方法は、エンコーダとデコーダの間の複数の融合密なスキップ接続を内蔵し、スウィントランスをエンコーダとして利用する。そして、デコーダサブネットワークの様々なスケールの特徴を融合させ、スウィン・ユントや他の最先端の手法における必然的なダウンサンプリングによる空間的情報損失を補償し、上記の問題を解決する。本稿では,この手法の理論的解析を行い,FUMPEおよびCAD-PEで公開されているPECT画像データセット上で検証する。実験の結果,提案手法はFUMPEデータセットではDice類似係数83.47%,Hausdorff距離95.%ile(HD95)3.83,CAD-PEデータセットではDSC83.42%,HD955.10を達成できた。これらの結果から,本手法はPEセグメンテーションタスクにおいて高い性能を示し,PEの自動セグメンテーションの精度を高め,臨床医に強力な診断ツールを提供する可能性が示唆された。我々のソースコードと新しいFUMPEデータセットはhttps://github.com/JustlfC03/SCUNet-plusplus.comで入手できる。

Pulmonary embolism (PE) is a prevalent lung disease that can lead to right ventricular hypertrophy and failure in severe cases, ranking second in severity only to myocardial infarction and sudden death. Pulmonary artery CT angiography (CTPA) is a widely used diagnostic method for PE. However, PE detection presents challenges in clinical practice due to limitations in imaging technology. CTPA can produce noises similar to PE, making confirmation of its presence time-consuming and prone to overdiagnosis. Nevertheless, the traditional segmentation method of PE can not fully consider the hierarchical structure of features, local and global spatial features of PE CT images. In this paper, we propose an automatic PE segmentation method called SCUNet++ (Swin Conv UNet++). This method incorporates multiple fusion dense skip connections between the encoder and decoder, utilizing the Swin Transformer as the encoder. And fuses features of different scales in the decoder subnetwork to compensate for spatial information loss caused by the inevitable downsampling in Swin-UNet or other state-of-the-art methods, effectively solving the above problem. We provide a theoretical analysis of this method in detail and validate it on publicly available PE CT image datasets FUMPE and CAD-PE. The experimental results indicate that our proposed method achieved a Dice similarity coefficient (DSC) of 83.47% and a Hausdorff distance 95th percentile (HD95) of 3.83 on the FUMPE dataset, as well as a DSC of 83.42% and an HD95 of 5.10 on the CAD-PE dataset. These findings demonstrate that our method exhibits strong performance in PE segmentation tasks, potentially enhancing the accuracy of automatic segmentation of PE and providing a powerful diagnostic tool for clinical physicians. Our source code and new FUMPE dataset are available at https://github.com/JustlfC03/SCUNet-plusplus.

翻訳日:2023-12-25 14:52:38 公開日:2023-12-22

# 高精度SDEモデリングのための時間変化正規化フロー

Time-changed normalizing flows for accurate SDE modeling ( http://arxiv.org/abs/2312.14698v1 )

ライセンス: Link先を確認

Naoufal El Bekri and Lucas Drumetz and Franck Vermet

(参考訳) 生成パラダイムは、機械学習とディープラーニングモデルにおいてますます重要になっている。一般的な生成モデルには正規化フローがあり、これは微分同相変換を通じて基底分布を変換することで正確な精度推定を可能にする。時間分解フローを扱うための正規化フローフレームワークの拡張は、時系列、確率過程、神経確率微分方程式(sdes)をモデル化する強力なツールである動的正規化フローをもたらした。本研究では,ガウス過程の多種多様な族を構成するブラウン運動の時間的変形に基づく,時間変化正規化流れ(tcnf)の新たな変種を提案する。このアプローチにより、よく知られたOrnstein-Uhlenbeckプロセスなど、他の方法ではモデル化できないいくつかのSDEを効果的にモデル化し、事前の方法論を一般化し、結果の改善と推論と予測能力の向上につながる。

The generative paradigm has become increasingly important in machine learning and deep learning models. Among popular generative models are normalizing flows, which enable exact likelihood estimation by transforming a base distribution through diffeomorphic transformations. Extending the normalizing flow framework to handle time-indexed flows gave dynamic normalizing flows, a powerful tool to model time series, stochastic processes, and neural stochastic differential equations (SDEs). In this work, we propose a novel variant of dynamic normalizing flows, a Time Changed Normalizing Flow (TCNF), based on time deformation of a Brownian motion which constitutes a versatile and extensive family of Gaussian processes. This approach enables us to effectively model some SDEs, that cannot be modeled otherwise, including standard ones such as the well-known Ornstein-Uhlenbeck process, and generalizes prior methodologies, leading to improved results and better inference and prediction capability.

翻訳日:2023-12-25 14:52:00 公開日:2023-12-22

# Pola4All:偏光解析のための偏光応用とオープンソースツールキットの調査

Pola4All: survey of polarimetric applications and an open-source toolkit to analyze polarization ( http://arxiv.org/abs/2312.14697v1 )

ライセンス: Link先を確認

Joaquin Rodriguez, Lew-Fock-Chong Lew-Yan-Voon, Renato Martins, Olivier Morel

(参考訳) 光の偏光情報は、物体の素材の種類、ポーズ、形状など、コンピュータビジョンやシーン理解タスクのための豊富な手がかりを提供することができる。新しい安価な偏光センサーの出現に伴い、この画像モダリティは、ポーズ推定、3D再構成、水中ナビゲーション、深度推定といった問題を解決するために、広く一般に利用されるようになった。しかし、この感性モダリティの使用に関するいくつかの制限や、偏光画像を分析するための標準や公開ツールの欠如について観察する。さらに、偏光カメラメーカーは通常、カメラと通信するための取得ツールを提供しているが、偏光情報を利用する処理アルゴリズムはめったにない。本稿では、偏光イメージングを含む最近の応用の進歩を概観し、視覚の偏光に関する最近の進歩とロボットの知覚タスクに関する包括的調査を含む。また、既存のマイクログリッド偏光カメラのほとんどからの情報と通信し、処理するための共通標準を提供する、完全なソフトウェアツールキットも紹介する。このツールキットは、このモダリティのためにいくつかの画像処理アルゴリズムを実装しており、githubで公開されている。

Polarization information of the light can provide rich cues for computer vision and scene understanding tasks, such as the type of material, pose, and shape of the objects. With the advent of new and cheap polarimetric sensors, this imaging modality is becoming accessible to a wider public for solving problems such as pose estimation, 3D reconstruction, underwater navigation, and depth estimation. However, we observe several limitations regarding the usage of this sensorial modality, as well as a lack of standards and publicly available tools to analyze polarization images. Furthermore, although polarization camera manufacturers usually provide acquisition tools to interface with their cameras, they rarely include processing algorithms that make use of the polarization information. In this paper, we review recent advances in applications that involve polarization imaging, including a comprehensive survey of recent advances on polarization for vision and robotics perception tasks. We also introduce a complete software toolkit that provides common standards to communicate with and process information from most of the existing micro-grid polarization cameras on the market. The toolkit also implements several image processing algorithms for this modality, and it is publicly available on GitHub: https://github.com/vibot-lab/Pola4all_JEI_2023.

翻訳日:2023-12-25 14:51:44 公開日:2023-12-22

# オペレーター学習への数学的ガイド

A Mathematical Guide to Operator Learning ( http://arxiv.org/abs/2312.14688v1 )

ライセンス: Link先を確認

Nicolas Boull\'e and Alex Townsend

(参考訳) 演算子学習は、基礎となる力学系や偏微分方程式(PDE)の性質をデータから発見することを目的としている。ここでは、演算子学習のステップバイステップガイドを示す。演算子学習に適した問題の種類とPDEを説明し、様々なニューラルネットワークアーキテクチャについて議論し、数値PDEソルバを効果的に活用する方法を説明する。また、トレーニングデータの作成と管理、最適化の実施方法についてアドバイスします。数値線形代数の視点から動機づけることで,演算子学習における様々なニューラルネットワークアーキテクチャの背景にある直感を提供する。

Operator learning aims to discover properties of an underlying dynamical system or partial differential equation (PDE) from data. Here, we present a step-by-step guide to operator learning. We explain the types of problems and PDEs amenable to operator learning, discuss various neural network architectures, and explain how to employ numerical PDE solvers effectively. We also give advice on how to create and manage training data and conduct optimization. We offer intuition behind the various neural network architectures employed in operator learning by motivating them from the point-of-view of numerical linear algebra.

翻訳日:2023-12-25 14:51:23 公開日:2023-12-22

# カーネルの不均一性は自然画像表現のスパース性を改善する

Kernel Heterogeneity Improves Sparseness of Natural Images Representations ( http://arxiv.org/abs/2312.14685v1 )

ライセンス: Link先を確認

Hugo J. Ladret, Christian Casanova, Laurent Udo Perrinet

(参考訳) 生物学的ニューラルネットワークと人工ニューラルネットワークの両方が本質的にその性能と運用コストのバランスをとり、計算能力のバランスをとる。通常、効率的なニューロモルフィックニューラルネットワークは、入力の冗長性と次元性を減少させる表現を学ぶものである。これは例えば、スパースコーディングで達成され、自然画像から派生したスパース表現は、入力特徴のサンプリングとそれらの特徴の分散の両方において、異質な表現をもたらす。そこで本研究では,自然画像の構造,特に指向性特徴と対応するスパース符号の関連性を検討した。その結果,複数レベルの分散に散在する入力特徴の表現により,スパースコードのスパース性やレジリエンスが大幅に向上し,復元性能が向上した。これはモデル入力の構造を反響させ、自然画像の不均質なアレエータ構造を考慮できる。自然画像からの学習核は近似表現と密度表現のバランスをとることによって異種性を生み出し、すべての再構成指標を改善する。畳み込みスパース符号化アルゴリズムで用いられるカーネルの不均質性のパラメータ制御を用いて、不均質性がスパース性を強調し、均質性が表現の粒度を改善することを示した。より広い文脈では、これらの符号化戦略は深層畳み込みニューラルネットワークへの入力として機能する。このような分散符号化されたスパース画像データセットは計算効率を向上し、自然的および変動的な入力構造を利用するカーネルの不均一性の利点を強調し、ニューロモルフィックハードウェアのスループットを向上させることができる。

Both biological and artificial neural networks inherently balance their performance with their operational cost, which balances their computational abilities. Typically, an efficient neuromorphic neural network is one that learns representations that reduce the redundancies and dimensionality of its input. This is for instance achieved in sparse coding, and sparse representations derived from natural images yield representations that are heterogeneous, both in their sampling of input features and in the variance of those features. Here, we investigated the connection between natural images' structure, particularly oriented features, and their corresponding sparse codes. We showed that representations of input features scattered across multiple levels of variance substantially improve the sparseness and resilience of sparse codes, at the cost of reconstruction performance. This echoes the structure of the model's input, allowing to account for the heterogeneously aleatoric structures of natural images. We demonstrate that learning kernel from natural images produces heterogeneity by balancing between approximate and dense representations, which improves all reconstruction metrics. Using a parametrized control of the kernels' heterogeneity used by a convolutional sparse coding algorithm, we show that heterogeneity emphasizes sparseness, while homogeneity improves representation granularity. In a broader context, these encoding strategy can serve as inputs to deep convolutional neural networks. We prove that such variance-encoded sparse image datasets enhance computational efficiency, emphasizing the benefits of kernel heterogeneity to leverage naturalistic and variant input structures and possible applications to improve the throughput of neuromorphic hardware.

翻訳日:2023-12-25 14:51:13 公開日:2023-12-22

# 工学的正規微分方程式を分類アルゴリズム(EODECA):徹底的な特徴付けと試験

Engineered Ordinary Differential Equations as Classification Algorithm (EODECA): thorough characterization and testing ( http://arxiv.org/abs/2312.14681v1 )

ライセンス: Link先を確認

Raffaele Marino, Lorenzo Buffoni, Lorenzo Chicchi, Lorenzo Giambagli, Duccio Fanelli

(参考訳) EODECA (Engineered Ordinary Differential Equations as Classification Algorithm) は、機械学習と動的システム理論の共通部分における新しいアプローチであり、分類タスクのためのユニークなフレームワークである[1]。本手法は, 通常の微分方程式 (odes) を用いて, 複雑な分類課題を効率的に処理する力学系構造を特徴とする。論文は、EODECAの動的特性を考察し、ランダムな摂動に対するレジリエンスと、さまざまな分類シナリオにおける堅牢なパフォーマンスを強調した。特に、EODECAの設計には、安定したアトラクタをフェーズ空間に埋め込む機能があり、信頼性を高め、可逆的なダイナミクスを可能にする。本稿では,作業 [1] を拡張し,euler の離散化スキームを用いて包括的解析を行う。特に,EODECAの性能を5つの異なる分類問題で評価し,適応性と効率性を検討した。さらに, mnist と fashion mnist データセットに対する eodeca の有効性を実証し, それぞれ 98.06 %$ と 88.21 %$ という印象的な精度を示した。これらの結果は多層パーセプトロン(MLP)に匹敵するものであり、複雑なデータ処理タスクにおけるEODECAの可能性を示している。我々は、モデルの学習の旅をさらに探求し、前と後の両方のトレーニング環境における進化を評価し、安定した誘引者に向かう能力を強調します。また,eodecaの可逆性についても検討し,意思決定過程と内部作業に光を当てた。本稿では、機械学習アルゴリズムと動的システム方法論のギャップを埋め、より透明で堅牢な機械学習パラダイムに向けた重要なステップを示す。

EODECA (Engineered Ordinary Differential Equations as Classification Algorithm) is a novel approach at the intersection of machine learning and dynamical systems theory, presenting a unique framework for classification tasks [1]. This method stands out with its dynamical system structure, utilizing ordinary differential equations (ODEs) to efficiently handle complex classification challenges. The paper delves into EODECA's dynamical properties, emphasizing its resilience against random perturbations and robust performance across various classification scenarios. Notably, EODECA's design incorporates the ability to embed stable attractors in the phase space, enhancing reliability and allowing for reversible dynamics. In this paper, we carry out a comprehensive analysis by expanding on the work [1], and employing a Euler discretization scheme. In particular, we evaluate EODECA's performance across five distinct classification problems, examining its adaptability and efficiency. Significantly, we demonstrate EODECA's effectiveness on the MNIST and Fashion MNIST datasets, achieving impressive accuracies of $98.06\%$ and $88.21\%$, respectively. These results are comparable to those of a multi-layer perceptron (MLP), underscoring EODECA's potential in complex data processing tasks. We further explore the model's learning journey, assessing its evolution in both pre and post training environments and highlighting its ability to navigate towards stable attractors. The study also investigates the invertibility of EODECA, shedding light on its decision-making processes and internal workings. This paper presents a significant step towards a more transparent and robust machine learning paradigm, bridging the gap between machine learning algorithms and dynamical systems methodologies.

翻訳日:2023-12-25 14:50:44 公開日:2023-12-22

# 複合パイプラインにおける進化的自動機械学習と構造感度解析の統合

Integration Of Evolutionary Automated Machine Learning With Structural Sensitivity Analysis For Composite Pipelines ( http://arxiv.org/abs/2312.14770v1 )

ライセンス: Link先を確認

Nikolay O. Nikitin, Maiia Pinchuk, Valerii Pokrovskii, Peter Shevchenko, Andrey Getmanov, Yaroslav Aksenkin, Ilia Revin, Andrey Stebenkov, Ekaterina Poslavskaya, Anna V. Kalyuzhnaya

(参考訳) 自動機械学習(AutoML)システムは、所定の機械学習問題に対するエンドツーエンドソリューションを提案し、固定パイプラインか柔軟なパイプラインを生成する。固定パイプラインはタスクに依存しない構造であり、その一般的な構成はデータに関係なく同じである。対照的に、柔軟なパイプラインの構造は入力によって異なり、個々のタスクに適切に調整される。しかし、柔軟なパイプラインは構造的に過度に複雑になり、説明性に乏しい。本稿では,フレキシブルな解のロバスト性と解釈性を高める感度解析を取り入れ,フレキシブルなパイプラインの負の点を補償するevosa手法を提案する。 EVOSAは、パイプライングラフ上のエッジやノードの正および負の影響を定量的に推定し、この情報を進化的AutoMLオプティマイザに供給する。 evosaの正しさと効率性は表式,マルチモーダル,コンピュータビジョンのタスクで検証され,提案手法の一般化が示唆された。

Automated machine learning (AutoML) systems propose an end-to-end solution to a given machine learning problem, creating either fixed or flexible pipelines. Fixed pipelines are task independent constructs: their general composition remains the same, regardless of the data. In contrast, the structure of flexible pipelines varies depending on the input, making them finely tailored to individual tasks. However, flexible pipelines can be structurally overcomplicated and have poor explainability. We propose the EVOSA approach that compensates for the negative points of flexible pipelines by incorporating a sensitivity analysis which increases the robustness and interpretability of the flexible solutions. EVOSA quantitatively estimates positive and negative impact of an edge or a node on a pipeline graph, and feeds this information to the evolutionary AutoML optimizer. The correctness and efficiency of EVOSA was validated in tabular, multimodal and computer vision tasks, suggesting generalizability of the proposed approach across domains.

翻訳日:2023-12-25 14:43:08 公開日:2023-12-22

# Large Language Model (LLM) Bias Index -- LLMBI

Large Language Model (LLM) Bias Index -- LLMBI ( http://arxiv.org/abs/2312.14769v1 )

ライセンス: Link先を確認

Abiodun Finbarrs Oketunji, Muhammad Anas, Deepthi Saina

(参考訳) LLMBI(Large Language Model Bias Index)は、GPT-4のような大規模言語モデル(LLM)に固有のバイアスを定量化し、対処するための先駆的なアプローチである。多様な分野におけるLSMの普及と影響を認識している。本研究は,モデル応答を誘発する可能性のあるバイアスを系統的に測定し緩和する新しい計量 LLMBI を導入する。年齢,性別,人種的偏見に限らず,多次元の偏見を取り入れた複合スコアリングシステムを用いたLSMBIの定式化を行った。このメトリクスを運用するには, LLM応答の収集と注釈付け, バイアス検出のための洗練された自然言語処理(NLP)技術の適用, 特殊な数学的公式による LLMBI スコアの計算を含む多段階的なプロセスに携わる。この公式は、様々なバイアス次元の重み付け平均値、データセットの多様性の欠陥に対するペナルティ、感情バイアスに対する補正を統合する。 OpenAIのAPIからの応答を用いた実証分析では,バイアス検出の代表的な方法として,高度な感情分析を採用している。この研究は、LLMがテキスト生成において印象的な能力を示す一方で、異なる次元にまたがる様々なバイアスを示すことを明らかにしている。 LLMBIは、モデルと時間とともにバイアスを比較するための定量尺度を提供し、LLMの公平性と信頼性を高める上で、システムエンジニア、研究者、規制当局にとって重要なツールを提供する。偏見のない人間のような反応を模倣するLLMの可能性を強調している。さらに、社会規範や倫理基準の進化に合わせて、そのようなモデルを継続的に監視し、再検討する必要性を強調している。

The Large Language Model Bias Index (LLMBI) is a pioneering approach designed to quantify and address biases inherent in large language models (LLMs), such as GPT-4. We recognise the increasing prevalence and impact of LLMs across diverse sectors. This research introduces a novel metric, LLMBI, to systematically measure and mitigate biases potentially skewing model responses. We formulated LLMBI using a composite scoring system incorporating multiple dimensions of bias, including but not limited to age, gender, and racial biases. To operationalise this metric, we engaged in a multi-step process involving collecting and annotating LLM responses, applying sophisticated Natural Language Processing (NLP) techniques for bias detection, and computing the LLMBI score through a specially crafted mathematical formula. The formula integrates weighted averages of various bias dimensions, a penalty for dataset diversity deficiencies, and a correction for sentiment biases. Our empirical analysis, conducted using responses from OpenAI's API, employs advanced sentiment analysis as a representative method for bias detection. The research reveals LLMs, whilst demonstrating impressive capabilities in text generation, exhibit varying degrees of bias across different dimensions. LLMBI provides a quantifiable measure to compare biases across models and over time, offering a vital tool for systems engineers, researchers and regulators in enhancing the fairness and reliability of LLMs. It highlights the potential of LLMs in mimicking unbiased human-like responses. Additionally, it underscores the necessity of continuously monitoring and recalibrating such models to align with evolving societal norms and ethical standards.

翻訳日:2023-12-25 14:42:51 公開日:2023-12-22

# 量子ビオレント緩和条件について

On the Conditions for a Quantum Violent Relaxation ( http://arxiv.org/abs/2312.14768v1 )

ライセンス: Link先を確認

Giachetti Guido and Defenu Nicol\`o

(参考訳) 一般に、古典的な完全連結系は激しい緩和を受けることが知られている。この現象は、熱力学的限界における平均場効果に支配されているにもかかわらず、観測可能な値を有限時間スケールで定常な非熱的値に緩和することを指す。ここでは,熱力学的極限における2体,全対一の相互作用を持つ一般多体系の動力学を解析し,平均場有効ハミルトニアンのスペクトル上で非常に特異的な条件下での暴力的緩和を行うためには,これらの条件がほとんど満たされず,古典的条件に対して「量子」暴力的緩和がほとんど観測されないことを示す。我々の予測はスピンモデルの研究によって検証され、カップリングの値によって、暴力的関係と一般的な熱前相の間の遷移を示す。また, 量子ハミルトニアン-平均場模型のスピンバージョンを解析し, 暴力的相関を示さないことを示した。最後に,暴力的相対図を古典的限界に戻す方法について論じる。その結果、平均場状態においても量子効果がダイナミクスにかなり劇的な影響を与え、光と物質が結合した系の理解を深める方法が示されている。

In general, classical fully-connected systems are known to undergo violent relaxation. This phenomenon refers to the relaxation of observables to stationary, non-thermal, values on a finite timescale, despite their long-time dynamics being dominated by mean-field effects in the thermodynamic limit. Here, we analyze the ``quantum" violent relaxation by studying the dynamics of generic many-body systems with two-body, all-to-all, interactions in the thermodynamic limit. We show that, in order for violent relaxation to occur very specific conditions on the spectrum of the mean-field effective Hamiltonian have to be met. These conditions are hardly met and ``quantum" violent relaxation is observed rarely with respect to its classical counterpart. Our predictions are validated by the study of a spin model which, depending on the value of the coupling, shows a transition between violent-relaxation and a generic prethermal phase. We also analyze a spin version of the quantum Hamiltonian-Mean-Field model, which is shown not to exhibit violent-relaxation. Finally, we discuss how the violent-relaxation picture emerges back in the classical limit. Our results demonstrate how, even in the mean-field regime, quantum effects have a rather dramatic impact on the dynamics, paving the way to a better understanding of light-matter coupled systems.

翻訳日:2023-12-25 14:42:22 公開日:2023-12-22

# 拡張された潜在マルチビューサブスペースクラスタリング

Enhanced Latent Multi-view Subspace Clustering ( http://arxiv.org/abs/2312.14763v1 )

ライセンス: Link先を確認

Long Shi, Lei Cao, Jun Wang, Badong Chen

(参考訳) 潜在マルチビューサブスペースクラスタリングは、望ましいクラスタリング性能を持つことが示されている。しかし、元の潜在表現法は、データ行列を複数のビューから次元方向に沿って単一の行列に垂直に結合し、潜在表現行列を復元し、不完全な情報回復をもたらす可能性がある。本稿では,潜在空間表現を完全に回復するために,拡張潜在多視点サブスペースクラスタリング(ELMSC)法を提案する。 elmsc法は、マルチビューデータの表現を強化する拡張データマトリックスを構築することを含む。具体的には、様々なビューから拡張マトリックスのブロック対角位置へデータ行列を積み重ねて補完情報を利用する。一方、非ブロック対角エントリは、異なるビュー間の類似性に基づいて構成され、一貫した情報をキャプチャする。さらに,拡張自己表現行列の非対角ブロックに対するスパース正規化を適用し,一貫性情報の冗長な計算を回避する。最後に,ALMM(Alternating Direction Method of Multipliers)の枠組みに基づく新しい反復アルゴリズムを開発し,EMMSCの最適化問題を解く。実世界のデータセットに関する広範囲な実験により,提案するelmscが,最先端のマルチビュークラスタリング手法よりも高いクラスタリング性能を実現することを実証した。

Latent multi-view subspace clustering has been demonstrated to have desirable clustering performance. However, the original latent representation method vertically concatenates the data matrices from multiple views into a single matrix along the direction of dimensionality to recover the latent representation matrix, which may result in an incomplete information recovery. To fully recover the latent space representation, we in this paper propose an Enhanced Latent Multi-view Subspace Clustering (ELMSC) method. The ELMSC method involves constructing an augmented data matrix that enhances the representation of multi-view data. Specifically, we stack the data matrices from various views into the block-diagonal locations of the augmented matrix to exploit the complementary information. Meanwhile, the non-block-diagonal entries are composed based on the similarity between different views to capture the consistent information. In addition, we enforce a sparse regularization for the non-diagonal blocks of the augmented self-representation matrix to avoid redundant calculations of consistency information. Finally, a novel iterative algorithm based on the framework of Alternating Direction Method of Multipliers (ADMM) is developed to solve the optimization problem for ELMSC. Extensive experiments on real-world datasets demonstrate that our proposed ELMSC is able to achieve higher clustering performance than some state-of-art multi-view clustering methods.

翻訳日:2023-12-25 14:41:58 公開日:2023-12-22

# 自己閉量子軌道からの幾何相に対するアクションフォーマリズム

Action formalism for geometric phases from self-closing quantum trajectories ( http://arxiv.org/abs/2312.14760v1 )

ライセンス: Link先を確認

Dominic Shea and Alessandro Romito

(参考訳) 測定を受けると、量子系は確率的量子軌道に沿って進化し、最終的な射影計測においてポスト選択によって観測可能な幾何学的位相を自然に備えることができる。軌道を後選択して閉ループを形成すると、幾何相は測定強度によって駆動される位相遷移を行う。本稿では,単一量子ビット系の連続ガウス測度によって誘導される自閉軌跡の部分集合の幾何学的位相について検討する。動作法を用いて稀な自閉事象を解析できる確率経路積分を用いて定式化を開発し,測定誘起幾何位相を組み込む。測定強度パラメータの関数として,最も可能性の高い軌道の幾何位相が自己閉軌道の位相遷移を行うことを示す。さらに、最も可能性の高い自己閉軌道近傍におけるガウス補正は、全量子軌道の数値シミュレーションの結果と一致して、遷移点を定量的に変化させる。

When subject to measurements, quantum systems evolve along stochastic quantum trajectories that can be naturally equipped with a geometric phase observable via a post-selection in a final projective measurement. When post-selecting the trajectories to form a close loop, the geometric phase undergoes a topological transition driven by the measurement strength. Here, we study the geometric phase of a subset of self-closing trajectories induced by a continuous Gaussian measurement of a single qubit system. We utilize a stochastic path integral that enables the analysis of rare self-closing events using action methods and develop the formalism to incorporate the measurement-induced geometric phase therein. We show that the geometric phase of the most likely trajectories undergoes a topological transition for self-closing trajectories as a function of the measurement strength parameter. Moreover, the inclusion of Gaussian corrections in the vicinity of the most probable self-closing trajectory quantitatively changes the transition point in agreement with results from numerical simulations of the full set of quantum trajectories.

翻訳日:2023-12-25 14:41:37 公開日:2023-12-22

# グラフ学習における信号フィルタリングのための拡散マップ

Diffusion Maps for Signal Filtering in Graph Learning ( http://arxiv.org/abs/2312.14758v1 )

ライセンス: Link先を確認

Todd Hildebrant

(参考訳) 本稿では,グラフ信号の基底構造を理解するために,グラフシフト演算子としての拡散マップを提案する。本研究は,マルコフ変動最小化問題に対する拡散マップ生成フィルタを用いたグラフ学習の改善を評価する。本稿では,合成温度センサデータと実世界の温度センサデータを用いた実例を通して,本手法の有効性を示す。これらの例は、拡散マップグラフ信号モデルと他のよく使われるグラフ信号演算子を比較する。その結果、複雑な非ユークリッドデータ構造の分析と理解に新たなアプローチが得られた。

This paper explores the application diffusion maps as graph shift operators in understanding the underlying geometry of graph signals. The study evaluates the improvements in graph learning when using diffusion map generated filters to the Markov Variation minimization problem. The paper showcases the effectiveness of this approach through examples involving synthetically generated and real-world temperature sensor data. These examples also compare the diffusion map graph signal model with other commonly used graph signal operators. The results provide new approaches for the analysis and understanding of complex, non-Euclidean data structures.

翻訳日:2023-12-25 14:41:22 公開日:2023-12-22

# ジョセフソン接合における散逸性量子相転移の欠如:理論

Absence of a dissipative quantum phase transition in Josephson junctions: Theory ( http://arxiv.org/abs/2312.14754v1 )

ライセンス: Link先を確認

Carles Altimiras, Daniel Esteve, \c{C}a\u{g}lar Girit, H\'el\`ene le Sueur, Philippe Joyez

(参考訳) 強誘電体ジョセフソン接合(RSJ)の縮小密度行列を,ファインマン・ヴァーノン関数に基づく正確な数値スキームである確率的リウヴィル方程式法を用いて求める。すべてのパラメータを見てみると、同じ不飽和ジャンクションよりも超伝導が強いことが分かる。シュミドの超伝導絶縁量子相転移の痕跡は、長い間RSJで起こっていると信じられていた。この研究は、実験的な観測に基づいて、ムラニらによって2020年に発表された同様の結論を理論的に裏付けている。従来の研究における絶縁接合の予測は、紫外線遮断のないオーミック環境を考慮していたことが判明した。

We obtain the reduced density matrix of a resistively shunted Josephson junction (RSJ), using the stochastic Liouville equation method in imaginary time - an exact numerical scheme based on the Feynman-Vernon influence functional. For all parameters looked at, we find a shunted junction is more superconducting than the same unshunted junction. We find no trace of Schmid's superconducting-insulating quantum phase transition long believed to occur in the RSJ. This work confirms theoretically a similar conclusion drawn in 2020 by Murani et al., based on experimental observations. We reveal that predictions of an insulating junction in previous works were due to considering Ohmic environments with no UV cutoff.

翻訳日:2023-12-25 14:41:16 公開日:2023-12-22

# ダウンロードファウンデーションモデルのアクセシブルな微調整の危険性

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models ( http://arxiv.org/abs/2312.14751v1 )

ライセンス: Link先を確認

Alan Chan, Ben Bucknall, Herbie Bradley, David Krueger

(参考訳) プリトレーニングされた基礎モデルの重み付けの公開リリースは、ダウンロード可能なアクセス \citep{solaiman_gradient_2023} として知られている。私たちの研究は、ダウンロード可能なモデルの微調整がますます危険を増す可能性があると主張している。まず,微調整のアクセシビリティ向上に関する研究を強調する。 A)微調整の計算コストを削減し(B)、より多くのアクター間でそのコストを共有する能力を向上させる研究に分割した。第2に,悪質な使用を促進し,潜在的に危険であるモデルの監視を困難にすることで,よりアクセスしやすい微調整手法が危険を増大させる可能性がある。第3に,潜在的な緩和策と,よりアクセスしやすい微調整の利点について考察する。リスクに関する確実性については,対策の急激な発展の必要性を強調して結論付ける。

Public release of the weights of pretrained foundation models, otherwise known as downloadable access \citep{solaiman_gradient_2023}, enables fine-tuning without the prohibitive expense of pretraining. Our work argues that increasingly accessible fine-tuning of downloadable models may increase hazards. First, we highlight research to improve the accessibility of fine-tuning. We split our discussion into research that A) reduces the computational cost of fine-tuning and B) improves the ability to share that cost across more actors. Second, we argue that increasingly accessible fine-tuning methods may increase hazard through facilitating malicious use and making oversight of models with potentially dangerous capabilities more difficult. Third, we discuss potential mitigatory measures, as well as benefits of more accessible fine-tuning. Given substantial remaining uncertainty about hazards, we conclude by emphasizing the urgent need for the development of mitigations.

翻訳日:2023-12-25 14:41:02 公開日:2023-12-22

# 異常検出から自動ログラベリングへの進歩と先駆的根本原因解析

Progressing from Anomaly Detection to Automated Log Labeling and Pioneering Root Cause Analysis ( http://arxiv.org/abs/2312.14748v1 )

ライセンス: Link先を確認

Thorsten Wittkopp, Alexander Acker, Odej Kao

(参考訳) AIOpsの領域は、AIとMLの力でITの世界を変えつつある。ラベル付きデータに制限があるにもかかわらず、教師付きモデルは、特にディープラーニング環境でラベルを活用することの重要性を強調している。本研究は,ログ異常に対する分類法を導入し,ラベリング課題を軽減するための自動データラベリングを検討することで,この分野を強化する。さらに、多様な異常検出技術の可能性と、その特定の異常タイプとの整合について調査する。しかし、この探査は異常検出では停止しない。この研究は、根本原因分析が異常検出に続く未来を予見し、異常の根本原因を解明する。この未知の領域は、ITシステム管理に革命をもたらす大きな可能性を秘めている。本論文は, 異常検出と自動ラベル付けの理解を深め, 形質転換根本原因分析の段階を設定する。これらの進歩は、よりレジリエントなITシステムを約束し、継続的に進化する技術的状況において、運用効率とユーザ満足度を高めます。

The realm of AIOps is transforming IT landscapes with the power of AI and ML. Despite the challenge of limited labeled data, supervised models show promise, emphasizing the importance of leveraging labels for training, especially in deep learning contexts. This study enhances the field by introducing a taxonomy for log anomalies and exploring automated data labeling to mitigate labeling challenges. It goes further by investigating the potential of diverse anomaly detection techniques and their alignment with specific anomaly types. However, the exploration doesn't stop at anomaly detection. The study envisions a future where root cause analysis follows anomaly detection, unraveling the underlying triggers of anomalies. This uncharted territory holds immense potential for revolutionizing IT systems management. In essence, this paper enriches our understanding of anomaly detection, and automated labeling, and sets the stage for transformative root cause analysis. Together, these advances promise more resilient IT systems, elevating operational efficiency and user satisfaction in an ever-evolving technological landscape.

翻訳日:2023-12-25 14:40:46 公開日:2023-12-22

# 1次元弾性波シミュレーションのための量子計算概念

A quantum computing concept for 1-D elastic wave simulation ( http://arxiv.org/abs/2312.14747v1 )

ライセンス: Link先を確認

Malte Schade, Cyrill Boesch, Vaclav Hapla, Andreas Fichtner

(参考訳) 量子コンピューティングは、少なくとも一部のアプリケーションでは、従来のスーパーコンピュータでは提供できないスピードアップを約束しているため、近年かなりの注目を集めている。既存の量子コンピュータは、多くの場合、重要な問題を解決するには小さすぎるが、その将来的なドメイン科学への影響はすでに検討されている。この文脈内では、理論的な定式化と実量子コンピュータへの実装という、2つの要素を持つ異種媒体における1次元弾性波伝播の量子コンピューティングの概念を示す。この手法は有限差分近似に基づいており、続いて離散弾性波動方程式をSchr\"{o}dinger方程式に空間保存変換し、ゲートベースの量子コンピュータ上で直接シミュレートすることができる。誤差のない量子シミュレータの実装は、我々のアプローチを検証し、実量子コンピュータ IBM Brisbane 上の小さな問題による数値実験の基礎を形成する。後者は、誤りのないバージョンと定性的に一致するが、量子デコヒーレンスとノイズ効果によって汚染されるシミュレーション結果を生成する。連続バージョンによるSchr\"{o}dinger方程式への離散変換を補完することで、スペクトル要素法のような他の空間離散化スキームによる有限差分を置き換えることができる。誤差補正量子チップの出現を予測した結果,本手法と質量ばね解析の類似性から,量子コンピューティングの手法は,古典的計算機のシミュレーションよりも指数関数的に高速に動作する波動場シミュレーションに繋がる可能性が示唆された。

Quantum computing has attracted considerable attention in recent years because it promises speed-ups that conventional supercomputers cannot offer, at least for some applications. Though existing quantum computers are, in most cases, still too small to solve significant problems, their future impact on domain sciences is already being explored now. Within this context, we present a quantum computing concept for 1-D elastic wave propagation in heterogeneous media with two components: a theoretical formulation and an implementation on a real quantum computer. The method rests on a finite-difference approximation, followed by a sparsity-preserving transformation of the discrete elastic wave equation to a Schr\"{o}dinger equation, which can be simulated directly on a gate-based quantum computer. An implementation on an error-free quantum simulator verifies our approach and forms the basis of numerical experiments with small problems on the real quantum computer IBM Brisbane. The latter produce simulation results that qualitatively agree with the error-free version but are contaminated by quantum decoherence and noise effects. Complementing the discrete transformation to the Schr\"{o}dinger equation by a continuous version allows the replacement of finite differences by other spatial discretisation schemes, such as the spectral-element method. Anticipating the emergence of error-corrected quantum chips, an analogy between our method and analyses of coupled mass-spring systems suggests that our quantum computing approach may lead to wave field simulations that run exponentially faster than simulations on classical computers.

翻訳日:2023-12-25 14:40:29 公開日:2023-12-22

# ESBMC v7.4: インターバルのパワーを損なう

ESBMC v7.4: Harnessing the Power of Intervals ( http://arxiv.org/abs/2312.14746v1 )

ライセンス: Link先を確認

Rafael Menezes, Mohannad Aldughaim, Bruno Farias, Xianzhiyu Li, Edoardo Manino, Fedor Shmarov, Kunjian Song, Franz Brau{\ss}e, Mikhail R. Gadelha, Norbert Tihanyi, Konstantin Korovin, Lucas C. Cordeiro

(参考訳) ESBMCはモデルチェックのために多くの最先端技術を実装しています。従来サポートされていたプログラムやプロパティの検証結果を得るために,新たに改良された機能について報告する。 ESBMCは、プログラム内の式を静的に解析し、検証性能を向上させる。これにはブールと整数の区間に基づく推論、前方と後方の請負業者、そしてそれらのユビキティのためにシングルトン間隔に関する特定の最適化が含まれる。他の関連する改善は、並列プログラムの検証、およびいくつかの操作モデル、内部モデル、およびpthreadやC数学ライブラリなどのライブラリの検証である。拡張メモリ安全性解析により、到達可能なメモリリークの追跡が可能になった。

ESBMC implements many state-of-the-art techniques for model checking. We report on new and improved features that allow us to obtain verification results for previously unsupported programs and properties. ESBMC employs a new static interval analysis of expressions in programs to increase verification performance. This includes interval-based reasoning over booleans and integers, forward and backward contractors, and particular optimizations related to singleton intervals because of their ubiquity. Other relevant improvements concern the verification of concurrent programs, as well as several operational models, internal ones, and also those of libraries such as pthread and the C mathematics library. An extended memory safety analysis now allows tracking of memory leaks that are considered still reachable.

翻訳日:2023-12-25 14:39:58 公開日:2023-12-22

# 捕捉イオン量子コンピュータにおける静電相互作用エネルギーの推定

Estimation of electrostatic interaction energies on a trapped-ion quantum computer ( http://arxiv.org/abs/2312.14739v1 )

ライセンス: Link先を確認

Pauline J. Ollitrault, Matthias Loipersberger, Robert M. Parrish, Alexander Erhard, Christine Maier, Christian Sommer, Juris Ulmanis, Thomas Monz, Christian Gogolin, Christofer S. Tautermann, Gian-Luca R. Anselmetti, Matthias Degroote, Nikolaj Moll, Raffaele Santagati, Michael Streif

(参考訳) トラップイオン量子コンピュータを用いた静電相互作用エネルギーのハードウェア実装について述べる。計算系として,一酸化窒素還元酵素 (NOR) を触媒とした$\mathrm{NO}$から$\mathrm{N}_2\mathrm{O}$への還元に着目した。量子コンピュータは、NOR活性空間内で近似基底状態を生成するために使用される。必要な1粒子密度行列を効率的に測定するために,回路長を延ばさずに,フェルミオン基底回転を量子回路に組み込む。計算の基礎における測定は、古典的コンピュータ上の静電相互作用エネルギーを計算する入力として使用される。実験結果は, ハードウェアノイズにもかかわらず, 化学的精度で静電相互作用エネルギーを求めるため, 同じ回路の古典的なノイズレスシミュレーションと強く一致した。この研究は、相互作用エネルギーのような特定の観測対象に適したアルゴリズムは、単純な超分子的アプローチでは個々の基底状態エネルギーよりもはるかに少ない量子資源を必要とすることを示している。

We present the first hardware implementation of electrostatic interaction energies using a trapped-ion quantum computer. As test system for our computation, we focus on the reduction of $\mathrm{NO}$ to $\mathrm{N}_2\mathrm{O}$ catalyzed by a nitric oxide reductase (NOR). The quantum computer is used to generate an approximate ground state within the NOR active space. To efficiently measure the necessary one-particle density matrices, we incorporate fermionic basis rotations into the quantum circuit without extending the circuit length, laying the groundwork for further efficient measurement routines using factorizations. Measurements in the computational basis are then used as inputs for computing the electrostatic interaction energies on a classical computer. Our experimental results strongly agree with classical noise-less simulations of the same circuits, finding electrostatic interaction energies within chemical accuracy despite hardware noise. This work shows that algorithms tailored to specific observables of interest, such as interaction energies, may require significantly fewer quantum resources than individual ground state energies would in the straightforward supermolecular approach.

翻訳日:2023-12-25 14:39:47 公開日:2023-12-22

# Combinatoryカテゴリー文法を用いた対話文の計算意味と評価ベンチマーク

Computational Semantics and Evaluation Benchmark for Interrogative Sentences via Combinatory Categorial Grammar ( http://arxiv.org/abs/2312.14737v1 )

ライセンス: Link先を確認

Hayate Funakura, Koji Mineshima

(参考訳) 本稿では, Combinatory Categorial Grammar (CCG) の枠組みの中で,多種多様な極性質問に対する構成意味論について述べる。提案する分析の説明力を評価するために,質問文の意味性を評価するための質問応答データセットQSEMを提案する。我々は既存のCCGパーサを用いて分析を行い、データセットを用いて評価を行う。評価の結果,QSEMに含まれるサンプルの約半数に対して,CCG木を用いた注釈付きデータと意味表現が得られた。さらに,CCGの理論的能力と既存のCCGパーサの能力の相違についても論じる。

We present a compositional semantics for various types of polar questions and wh-questions within the framework of Combinatory Categorial Grammar (CCG). To assess the explanatory power of our proposed analysis, we introduce a question-answering dataset QSEM specifically designed to evaluate the semantics of interrogative sentences. We implement our analysis using existing CCG parsers and conduct evaluations using the dataset. Through the evaluation, we have obtained annotated data with CCG trees and semantic representations for about half of the samples included in QSEM. Furthermore, we discuss the discrepancy between the theoretical capacity of CCG and the capabilities of existing CCG parsers.

翻訳日:2023-12-25 14:39:27 公開日:2023-12-22

# メタプロンプトを用いた視覚知覚のための高調波拡散モデル

Harnessing Diffusion Models for Visual Perception with Meta Prompts ( http://arxiv.org/abs/2312.14733v1 )

ライセンス: Link先を確認

Qiang Wan, Zilong Huang, Bingyi Kang, Jiashi Feng, Li Zhang

(参考訳) 視覚モデルの生成的前訓練の問題は、長年の余波として続いている。現在,テキスト・ツー・イメージ(t2i)拡散モデルは,テキスト入力にマッチする高精細な画像を生成するための優れた習熟度を示す。拡散モデルを使用して視覚的知覚タスクに取り組むことができるか? 本稿では,視覚知覚タスクにおける拡散モデルを利用した簡易かつ効果的なスキームを提案する。我々の重要な洞察は、学習可能な埋め込み(メタプロンプト)を事前訓練された拡散モデルに導入し、知覚のための適切な特徴を抽出することである。メタプロンプトの効果は2倍である。まず、T2Iモデルのテキスト埋め込みを直接置き換えることで、特徴抽出中にタスク関連機能を活性化することができる。第二に、抽出された機能を再配置して、モデルがタスクの最も関連する機能に集中することを保証するために使用される。さらに,拡散モデルの性質をフル活用し,より強力な視覚的特徴をもたらす再帰的改善訓練戦略を設計する。様々なベンチマークにわたる大規模な実験により、我々のアプローチの有効性が検証された。提案手法は,NYU深度V2およびKITTIの深度推定タスクとCityScapesのセマンティックセグメンテーションタスクにおいて,新たな性能記録を実現する。同時に,提案手法は,ade20kにおける意味セグメンテーションやcocoデータセットにおけるポーズ推定に匹敵する結果を得るとともに,そのロバスト性と汎用性を示す。

The issue of generative pretraining for vision models has persisted as a long-standing conundrum. At present, the text-to-image (T2I) diffusion model demonstrates remarkable proficiency in generating high-definition images matching textual inputs, a feat made possible through its pre-training on large-scale image-text pairs. This leads to a natural inquiry: can diffusion models be utilized to tackle visual perception tasks? In this paper, we propose a simple yet effective scheme to harness a diffusion model for visual perception tasks. Our key insight is to introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception. The effect of meta prompts are two-fold. First, as a direct replacement of the text embeddings in the T2I models, it can activate task-relevant features during feature extraction. Second, it will be used to re-arrange the extracted features to ensures that the model focuses on the most pertinent features for the task on hand. Additionally, we design a recurrent refinement training strategy that fully leverages the property of diffusion models, thereby yielding stronger visual features. Extensive experiments across various benchmarks validate the effectiveness of our approach. Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes. Concurrently, the proposed method attains results comparable to the current state-of-the-art in semantic segmentation on ADE20K and pose estimation on COCO datasets, further exemplifying its robustness and versatility.

翻訳日:2023-12-25 14:39:16 公開日:2023-12-22

# 北エフハニカムモデルにおける高密度渦格子の有効モデル

Effective models for dense vortex lattices in the Kitaev honeycomb model ( http://arxiv.org/abs/2312.14729v1 )

ライセンス: Link先を確認

David J. Alspaugh, Jean-No\"el Fuchs, Anna Ritz-Zwilling and Julien Vidal

(参考訳) 北エフハニカムモデルにおいて,高密度渦構成のための低エネルギー有効モデルを導入する。具体的には,渦フリープラーペットが渦フル背景に対して三角形格子を形成する渦の構成を考える。渦密度によって、これらの「二重」構成は、翻訳と反転対称性によって分類された2つの族のいずれかに属する。時間反転対称性破断項の関数として、ある族は偶数チャーン数を拡張されたギャップレス位相で割ったガッピング位相を示し、もう一方は偶数または奇数チャーン数を持つガッピング位相を臨界点で割った。我々は,各家系に有効なモデルを構築し,これらのモデルのパラメータを状態の積分密度に適合させて決定し,キタエフハニカムモデルのエネルギースペクトルとチャーン数を再現する。また、位相図を導き、これらのモデルの妥当性を決定する。

We introduce low-energy effective models for dense configurations of vortices in the Kitaev honeycomb model. Specifically, we consider configurations of vortices in which vortex-free plaquettes form triangular lattices against a vortex-full background. Depending on the vortex density, these "dual" configurations belong to either one of two families classified by translation and inversion symmetry. As a function of a time-reversal symmetry breaking term, one family exhibits gapped phases with even Chern numbers separated by extended gapless phases, while the other exhibits gapped phases with even or odd Chern numbers, separated by critical points. We construct an effective model for each family, determine the parameters of these models by fitting the integrated density of states, and reproduce energy spectra and Chern numbers of the Kitaev honeycomb model. We also derive phase diagrams and determine these models' validity.

翻訳日:2023-12-25 14:38:47 公開日:2023-12-22

# ストロンチウム量子ガス顕微鏡

A strontium quantum-gas microscope ( http://arxiv.org/abs/2312.14818v1 )

ライセンス: Link先を確認

Sandra Buob, Jonatan H\"oschele, Vasiliy Makhalov, Antonio Rubio-Abadal, Leticia Tarruell

(参考訳) 量子ガス顕微鏡の開発は、量子変性多体系を単一原子レベルで探索する新しい方法をもたらした。これまで、これらの設定のほとんどはアルカリ原子に焦点を合わせてきた。アルカリ元素への量子ガス顕微鏡の拡張は、量子シミュレーションの分野にSU(N)対称フェルミオン同位体や超狭光遷移のような新しいツールを提供する。ここでは,Hubbard-Regime光学格子における$^{84}$Srボソニック量子ガスのサイト分解像を実演する。量子ガスは2次元の面内格子と光シート電位によって閉じ込められ、ストロンチウムのクロックマジック波長813.4nmで動作する。空間分解能の高い広帯域461nm遷移を用いた蛍光イメージングを実現する。同時に、狭い689nmの相互結合線で魅力的なシホス冷却を行う。蛍光画像から原子占有率を再構成し,94%以上の撮像率を得た。最後に,Bose-Hubbard系における$^{84}$Sr超流動を実現する。単原子分解能を持つ位相コヒーレンスプローブである伸長時の干渉パターンを観察した。ストロンチウム量子ガス顕微鏡は、散逸するハバード模型、原子配列の量子光学、および顕微鏡レベルでsu(n)フェルミオンを研究するための新しいプラットフォームを提供する。

The development of quantum-gas microscopes has brought novel ways of probing quantum degenerate many-body systems at the single-atom level. Until now, most of these setups have focused on alkali atoms. Expanding quantum-gas microscopy to alkaline-earth elements will provide new tools, such as SU(N)-symmetric fermionic isotopes or ultranarrow optical transitions, to the field of quantum simulation. Here, we demonstrate the site-resolved imaging of a $^{84}$Sr bosonic quantum gas in a Hubbard-regime optical lattice. The quantum gas is confined by a two-dimensional in-plane lattice and a light-sheet potential, which operate at the strontium clock-magic wavelength of 813.4 nm. We realize fluorescence imaging using the broad 461 nm transition, which provides high spatial resolution. Simultaneously, we perform attractive Sisyphus cooling with the narrow 689 nm intercombination line. We reconstruct the atomic occupation from the fluorescence images, obtaining imaging fidelities above 94%. Finally, we realize a $^{84}$Sr superfluid in the Bose-Hubbard regime. We observe its interference pattern upon expansion, a probe of phase coherence, with single-atom resolution. Our strontium quantum-gas microscope provides a new platform to study dissipative Hubbard models, quantum optics in atomic arrays, and SU(N) fermions at the microscopic level.

翻訳日:2023-12-25 14:31:08 公開日:2023-12-22

# PARDINUS:オートエンコーダに基づく写真追跡空白画像の削除を弱めに監視

PARDINUS: Weakly supervised discarding of photo-trapping empty images based on autoencoders ( http://arxiv.org/abs/2312.14812v1 )

ライセンス: Link先を確認

David de la Rosa, Antonio J Rivera, Mar\'ia J del Jesus, Francisco Charte

(参考訳) 写真撮影カメラは野生生物の監視に広く利用されている。これらのカメラは、動きが検出されたときに写真を撮り、動物が現れる像を捉えます。これらの画像の大部分は空で、画像には野生生物は現れない。画像のフィルタリングは、生物学者の手作業で何時間もかかるので、簡単な作業ではない。したがって、このタスクの自動化には顕著な関心がある。空のフォトトラッピング画像の自動破棄は、機械学習の分野ではまだオープンフィールドである。既存のソリューションは、トレーニングフェーズで画像のアノテーションを必要とする最先端の教師付き畳み込みニューラルネットワークに依存することが多い。 PARDINUS (Weakly suPervised discARDINg of photo-trapping empty image based on aUtoencoderS) は、弱教師付き学習の基礎の上に構築され、この手法が更なるラベル付け作業を必要とする他の完全教師付き手法に等しいか、超えていることを証明している。

Photo-trapping cameras are widely employed for wildlife monitoring. Those cameras take photographs when motion is detected to capture images where animals appear. A significant portion of these images are empty - no wildlife appears in the image. Filtering out those images is not a trivial task since it requires hours of manual work from biologists. Therefore, there is a notable interest in automating this task. Automatic discarding of empty photo-trapping images is still an open field in the area of Machine Learning. Existing solutions often rely on state-of-the-art supervised convolutional neural networks that require the annotation of the images in the training phase. PARDINUS (Weakly suPervised discARDINg of photo-trapping empty images based on aUtoencoderS) is constructed on the foundation of weakly supervised learning and proves that this approach equals or even surpasses other fully supervised methods that require further labeling work.

翻訳日:2023-12-25 14:30:49 公開日:2023-12-22

# レースカーのロック差分を正確に制御する三輪車モデル

A Tricycle Model to Accurately Control an Autonomous Racecar with Locked Differential ( http://arxiv.org/abs/2312.14808v1 )

ライセンス: Link先を確認

Ayoub Raji, Nicola Musiu, Alessandro Toschi, Francesco Prignoli, Eugenio Mascaro, Pietro Musso, Francesco Amerotti, Alexander Liniger, Silvio Sorrentino, Marko Bertogna

(参考訳) 本稿では,自律オープンホイールレースカーの側方ダイナミクスに対するロックド・ディファレンシャルの効果をモデル化する新しい定式化法を提案する。このモデルはモデル予測コントローラで使用されており、マイクロステップ離散化アプローチを用いてダイナミクスを正確に線形化し、リアルタイム実装に適した予測を生成する。モデルの安定性解析と,オフライン軌道生成パイプライン,オンライン局所速度プロファイルプランナ,低レベル縦型コントローラを含む全体計画制御スキームの概要について述べる。横道追跡の改善は、モンツァF1レーストラックでの最初のインディ自律チャレンジイベントでダララ AV-21で生産された予備的な実験結果で実証された。タイヤリミットに近い動作を行う場合の解の有効性を実証する高忠実度シミュレータにおいて, 最終調整およびチューニングを行った。

In this paper, we present a novel formulation to model the effects of a locked differential on the lateral dynamics of an autonomous open-wheel racecar. The model is used in a Model Predictive Controller in which we included a micro-steps discretization approach to accurately linearize the dynamics and produce a prediction suitable for real-time implementation. The stability analysis of the model is presented, as well as a brief description of the overall planning and control scheme which includes an offline trajectory generation pipeline, an online local speed profile planner, and a low-level longitudinal controller. An improvement of the lateral path tracking is demonstrated in preliminary experimental results that have been produced on a Dallara AV-21 during the first Indy Autonomous Challenge event on the Monza F1 racetrack. Final adjustments and tuning have been performed in a high-fidelity simulator demonstrating the effectiveness of the solution when performing close to the tire limits.

翻訳日:2023-12-25 14:30:31 公開日:2023-12-22

# 量子コンピューティングの幾何学

The Geometry of Quantum Computing ( http://arxiv.org/abs/2312.14807v1 )

ライセンス: Link先を確認

E. Ercolessi, R. Fioresi, T. Weber

(参考訳) 本稿では,いくつかの量子計算問題の幾何学的モデリングについて概説する。この用語の簡単な導入の後、量子情報幾何学とZX-計算に焦点を合わせ、量子コンピューティング問題と量子群、すなわちホップ代数との接続を確立する。

In this expository paper we present a brief introduction to the geometrical modeling of some quantum computing problems. After a brief introduction to establish the terminology, we focus on quantum information geometry and ZX-calculus, establishing a connection between quantum computing questions and quantum groups, i.e. Hopf algebras.

翻訳日:2023-12-25 14:30:15 公開日:2023-12-22

# 海洋生物音響データに対する信号対雑音比が生成的対立ネットワークに及ぼす影響

The Effects of Signal-to-Noise Ratio on Generative Adversarial Networks Applied to Marine Bioacoustic Data ( http://arxiv.org/abs/2312.14806v1 )

ライセンス: Link先を確認

Georgia Atkinson, Nick Wright, A. Stephen McGough and Per Berggren

(参考訳) 近年,海洋生物音響分野におけるデータセットの補足にgans(generative adversarial network)が用いられている。これはデータ収集コスト、データの分散性、前処理支援などの要因によって引き起こされる。海洋生物音響データの顕著な課題の1つは、GANのような深層学習技術を適用する際に困難を呈する低信号-雑音比(SNR)である。本研究では,SNRがGAN演奏に与える影響について検討し,GAN演奏に対する3つの評価手法について検討し,特にWaveGANにおけるSNRの効果について興味深い結果を得た。

In recent years generative adversarial networks (GANs) have been used to supplement datasets within the field of marine bioacoustics. This is driven by factors such as the cost to collect data, data sparsity and aid preprocessing. One notable challenge with marine bioacoustic data is the low signal-to-noise ratio (SNR) posing difficulty when applying deep learning techniques such as GANs. This work investigates the effect SNR has on the audio-based GAN performance and examines three different evaluation methodologies for GAN performance, yielding interesting results on the effects of SNR on GANs, specifically WaveGAN.

翻訳日:2023-12-25 14:30:10 公開日:2023-12-22

# 自由空間結合トラップイオンを有する量子リピータノード

Quantum repeater node with free-space coupled trapped ions ( http://arxiv.org/abs/2312.14805v1 )

ライセンス: Link先を確認

Max Bergerhoff, Omar Elshehy, Stephan Kucera, Matthias Kreis, and J\"urgen Eschner

(参考訳) 量子中継セル(quantum repeater cell)は、直接伝送におけるファイバー損失が避けられないため、距離制限を克服できる量子ネットワークの基本構成要素である。我々は、量子記憶として働く同じトラップにおいて、2つの自由空間結合$^{40}$ca$^+$イオンに基づく量子リピータセルの実装を実証する。本研究では, 個々のイオンからの単一光子の放出を制御し, 原子光子と光子光子の絡み合いの非同期発生を実証する。我々は,生成率のスケーリングと忠実性について考察する。

The quantum repeater cell is a basic building block for a quantum network, as it allows to overcome the distance limitations due to unavoidable fiber loss in direct transmission. We demonstrate the implementation of a quantum repeater cell, based on two free-space coupled $^{40}$Ca$^+$ ions in the same trap that act as quantum memories. We demonstrate the asynchronous generation of atom-photon and photon-photon entanglement by controlled emission of single photons from the individually addressed ions and entanglement swapping. We discuss the fidelity as well as the scaling of the generated rate.

翻訳日:2023-12-25 14:29:57 公開日:2023-12-22

# 大きな言語モデルを使って株式を宣伝する

Use large language models to promote equity ( http://arxiv.org/abs/2312.14804v1 )

ライセンス: Link先を確認

Emma Pierson, Divya Shanmugam, Rajiv Movva, Jon Kleinberg, Monica Agrawal, Mark Dredze, Kadija Ferryman, Judy Wawira Gichoya, Dan Jurafsky, Pang Wei Koh, Karen Levy, Sendhil Mullainathan, Ziad Obermeyer, Harini Suresh, Keyon Vafa

(参考訳) 大きな言語モデル(LLM)の進歩は、彼らの社会的影響に対する関心の爆発を引き起こした。ソーシャルエクイティへの影響に関する議論の多くは、"どのようにllmが偏り、どのようにバイアスを軽減できるか"というような質問に焦点をあてて、警告的あるいは否定的になっている。 AIが一般的に、特にLLMがバイアスを封じ込めている方法は、十分に文書化されている。しかし、同じように重要で議論の少ない、機会に焦点を絞ったカウンターポイントは、"llmが株式を促進できる有望なアプリケーションは何でしょうか? LLMがより公平な世界を実現するためには、バイアスや障害モードに対して防御を行うだけでは十分ではありません。エクイティエンハンティング(エクイティエンハンシング)のユースケースに積極的に適用することで、過小評価されたグループに対する機会を増やし、社会的差別を減らすことも必要です。 aiの影響を決定する選択肢はたくさんありますし、パイプラインの非常に早い段階での基本的な選択は、aiを適用すべき問題です。パイプラインでのみ焦点を合わせれば -- LLMが本質的に電力を消費するユースケースを促進することで、より公平になる -- 、その影響を公平に導く重要な機会を逃すことになるでしょう。本稿では,リスクと注意点を明確に保ちつつ,新たに可能な4つの研究指針を提示することで,株式を促進することへのllmの新たな可能性について強調する。

Advances in large language models (LLMs) have driven an explosion of interest about their societal impacts. Much of the discourse around how they will impact social equity has been cautionary or negative, focusing on questions like "how might LLMs be biased and how would we mitigate those biases?" This is a vital discussion: the ways in which AI generally, and LLMs specifically, can entrench biases have been well-documented. But equally vital, and much less discussed, is the more opportunity-focused counterpoint: "what promising applications do LLMs enable that could promote equity?" If LLMs are to enable a more equitable world, it is not enough just to play defense against their biases and failure modes. We must also go on offense, applying them positively to equity-enhancing use cases to increase opportunities for underserved groups and reduce societal discrimination. There are many choices which determine the impact of AI, and a fundamental choice very early in the pipeline is the problems we choose to apply it to. If we focus only later in the pipeline -- making LLMs marginally more fair as they facilitate use cases which intrinsically entrench power -- we will miss an important opportunity to guide them to equitable impacts. Here, we highlight the emerging potential of LLMs to promote equity by presenting four newly possible, promising research directions, while keeping risks and cautionary points in clear view.

翻訳日:2023-12-25 14:29:49 公開日:2023-12-22

# 複雑なデータ検索のためのセマンティックパーシング:リレーショナルデータベースへのノンコードアクセスのためのクエリプラン対SQLのターゲット

Semantic Parsing for Complex Data Retrieval: Targeting Query Plans vs. SQL for No-Code Access to Relational Databases ( http://arxiv.org/abs/2312.14798v1 )

ライセンス: Link先を確認

Ben Eyal, Amir Bachar, Ophir Haroche, Michael Elhadad

(参考訳) 大きな言語モデル(LLM)は、与えられたデータベーススキーマに基づいて自然言語の質問からSQLクエリを生成するタスクであるtext-to-SQLの進歩を加速させた。 SQLの宣言的な性質にもかかわらず、それは引き続き複雑なプログラミング言語である。本稿では,より単純な構文と複雑なクエリのモジュール仕様を備えた代替クエリ言語の可能性を検討する。目的は、現代のニューラルセマンティックパーシングアーキテクチャによってより容易に学習できるクエリ言語を作成すると同時に、対話型クエリプランアシスタントによって生成されたクエリプランの有効性をよりよく評価することである。提案されている代替クエリ言語はQuery Plan Language (QPL)と呼ばれる。モジュール式として設計されており、sql common table expression (cte) の制限された形式に変換できる。 qplの目的は、ユーザが自然言語で質問を表現できるだけでなく、検証しやすいターゲット言語を提供することによって、非プログラマが複雑なデータ検索にアクセスできるようにすることである。本稿は、QPLのモジュラリティが複雑なクエリプランを構成的に生成する上で、ニューラルネットワークLLMのメリットを実証する。これには質問分解戦略と計画段階が含まれる。我々は、QPLに変換されたSpiderテキスト-SQLデータセットのバージョンの実験を行う。 qplプログラムの階層構造により,クエリの複雑さを自然に測定できる。この評価に基づき、複雑な合成クエリ上で既存のテキスト-SQLシステムの低精度を同定する。複雑なクエリの課題に対して,微調整 LLM と様々なプロンプト戦略を用いて,反復的かつユーザ制御的な方法で対処する方法を提案する。

Large Language Models (LLMs) have spurred progress in text-to-SQL, the task of generating SQL queries from natural language questions based on a given database schema. Despite the declarative nature of SQL, it continues to be a complex programming language. In this paper, we investigate the potential of an alternative query language with simpler syntax and modular specification of complex queries. The purpose is to create a query language that can be learned more easily by modern neural semantic parsing architectures while also enabling non-programmers to better assess the validity of the query plans produced by an interactive query plan assistant. The proposed alternative query language is called Query Plan Language (QPL). It is designed to be modular and can be translated into a restricted form of SQL Common Table Expressions (CTEs). The aim of QPL is to make complex data retrieval accessible to non-programmers by allowing users to express their questions in natural language while also providing an easier-to-verify target language. The paper demonstrates how neural LLMs can benefit from QPL's modularity to generate complex query plans in a compositional manner. This involves a question decomposition strategy and a planning stage. We conduct experiments on a version of the Spider text-to-SQL dataset that has been converted to QPL. The hierarchical structure of QPL programs enables us to measure query complexity naturally. Based on this assessment, we identify the low accuracy of existing text-to-SQL systems on complex compositional queries. We present ways to address the challenge of complex queries in an iterative, user-controlled manner, using fine-tuned LLMs and a variety of prompting strategies in a compositional manner.

翻訳日:2023-12-25 14:29:27 公開日:2023-12-22

# マルチコストシナリオにおけるサポートベクトルマシンについて

On support vector machines under a multiple-cost scenario ( http://arxiv.org/abs/2312.14795v1 )

ライセンス: Link先を確認

Sandra Ben\'itez-Pe\~na and Rafael Blanquero and Emilio Carrizosa and Pepa Ram\'irez-Cobo

(参考訳) Support Vector Machine(SVM)はバイナリ分類において強力なツールであり、優れた誤分類率を持つことで知られている。一方、医学診断、チャーン、詐欺予測などの現実世界の分類問題の多くは、異なるクラスで異なる可能性のある誤分類コストを伴っている。しかし、そのような誤分類コストに対して正確な値を提供することは困難であり、許容できる誤分類率を識別することがより容易である。本稿では,問題定式化に性能制約を組み込むことで,誤分類コストを考慮した新しいSVMモデルを提案する。具体的には,最大辺が与えられた閾値以下の誤分類率を有する超平面を求める。そのような最大辺超平面は、線形制約と整数変数を持つ二次凸問題を解くことによって得られる。報告された数値的経験から、我々のモデルは、あるクラスにおける誤分類率(おそらく他のクラスにおける誤分類率の増加による)をユーザが制御でき、実行時間の観点からも実現可能であることを示す。

Support Vector Machine (SVM) is a powerful tool in binary classification, known to attain excellent misclassification rates. On the other hand, many realworld classification problems, such as those found in medical diagnosis, churn or fraud prediction, involve misclassification costs which may be different in the different classes. However, it may be hard for the user to provide precise values for such misclassification costs, whereas it may be much easier to identify acceptable misclassification rates values. In this paper we propose a novel SVM model in which misclassification costs are considered by incorporating performance constraints in the problem formulation. Specifically, our aim is to seek the hyperplane with maximal margin yielding misclassification rates below given threshold values. Such maximal margin hyperplane is obtained by solving a quadratic convex problem with linear constraints and integer variables. The reported numerical experience shows that our model gives the user control on the misclassification rates in one class (possibly at the expense of an increase in misclassification rates for the other class) and is feasible in terms of running times.

翻訳日:2023-12-25 14:29:04 公開日:2023-12-22

# euオンラインプラットフォームのソフトウェアドキュメンテーションにおけるランキング透明性の遵守に関する実証的研究

An Empirical Study on Compliance with Ranking Transparency in the Software Documentation of EU Online Platforms ( http://arxiv.org/abs/2312.14794v1 )

ライセンス: Link先を確認

Francesco Sovrano, Micha\"el Lognoul, Alberto Bacchelli

(参考訳) 欧州連合(eu)のプラットフォーム・ツー・ビジネス(p2b)規制の遵守は、オンラインプラットフォームでは困難であり、当局にとってコンプライアンスの評価は困難である。これは部分的には、ランキングの透明性に関する情報(ソフトウェアドキュメントなど)を評価する自動化ツールの欠如によるものだ。私たちの研究はこの問題に2つの方法で取り組む。まず、主要な6つのプラットフォーム(Amazon、Bing、Booking、Google、Tripadvisor、Yahoo)のコンプライアンスを実証的に評価し、ドキュメントにかなりの違いがあることを明らかにする。第2に,ChatGPTと情報検索技術に基づく自動コンプライアンス評価ツールの導入とテストを行う。これらのツールは人的判断に対して評価され、コンプライアンス評価のための信頼できるプロキシとして有望な結果を示す。今回の発見は、規制遵守の強化に寄与し、これらのプラットフォームにおけるビジネス格差を含む不平等の低減を目指す国連持続可能な開発目標10.3に適合する可能性がある。

Compliance with the European Union's Platform-to-Business (P2B) Regulation is challenging for online platforms, and assessing their compliance can be difficult for public authorities. This is partly due to the lack of automated tools for assessing the information (e.g., software documentation) platforms provide concerning ranking transparency. Our study tackles this issue in two ways. First, we empirically evaluate the compliance of six major platforms (Amazon, Bing, Booking, Google, Tripadvisor, and Yahoo), revealing substantial differences in their documentation. Second, we introduce and test automated compliance assessment tools based on ChatGPT and information retrieval technology. These tools are evaluated against human judgments, showing promising results as reliable proxies for compliance assessments. Our findings could help enhance regulatory compliance and align with the United Nations Sustainable Development Goal 10.3, which seeks to reduce inequality, including business disparities, on these platforms.

翻訳日:2023-12-25 14:28:44 公開日:2023-12-22

# 速度-歪み-知覚-分類トレードオフ:逆領域GANによる連成音源符号化と変調

The Rate-Distortion-Perception-Classification Tradeoff: Joint Source Coding and Modulation via Inverse-Domain GANs ( http://arxiv.org/abs/2312.14792v1 )

ライセンス: Link先を確認

Junli Fang, Jo\~ao F. C. Mota, Baoshan Lu, Weicheng Zhang, Xuemin Hong

(参考訳) jscm(joint source coding and modulation)フレームワークは、データから自動的に学習できるディープラーニングの最近の開発によって実現され、エンドツーエンドで最高の圧縮符号と変調スキームが実現されている。本稿では,jscmシナリオにおいて,チャネルレート,歪み,知覚,分類精度との間に厳密なトレードオフが存在することを示す。次に,そのトレードオフをナビゲートする2つの画像圧縮手法を提案する。inverse-domain generative adversarial network (id-gan)と,id-ganの性能に関する洞察を提示するよりシンプルでヒューリスティックな手法である。実験の結果は理論的な結果と相関するだけでなく,提案したID-GANアルゴリズムは従来の分離手法や最近の深層JSCMアーキテクチャと比較してシステム性能を著しく向上することを示した。

The joint source coding and modulation (JSCM) framework was enabled by recent developments in deep learning, which allows to automatically learn from data, and in an end-to-end fashion, the best compression codes and modulation schemes. In this paper, we show the existence of a strict tradeoff between channel rate, distortion, perception, and classification accuracy in a JSCM scenario. We then propose two image compression methods to navigate that tradeoff: an inverse-domain generative adversarial network (ID-GAN), which achieves extreme compression, and a simpler, heuristic method that reveals insights about the performance of ID-GAN. Experiment results not only corroborate the theoretical findings, but also demonstrate that the proposed ID-GAN algorithm significantly improves system performance compared to traditional separation-based methods and recent deep JSCM architectures.

翻訳日:2023-12-25 14:28:26 公開日:2023-12-22

# 固有値探索と勾配降下のための改良量子アルゴリズム

Improved Quantum Algorithms for Eigenvalues Finding and Gradient Descent ( http://arxiv.org/abs/2312.14786v1 )

ライセンス: Link先を確認

Nhat A. Nghiem and Tzu-Chieh Wei

(参考訳) ブロック符号化は、最近開発された量子アルゴリズムの統一フレームワークを形成する量子信号処理において重要な要素である。探索、振幅推定、ハミルトニアンシミュレーションなどいくつかの問題において、リソース利用の単純化と最適化のために、量子信号処理の能力はこれらを超え、新しい量子アルゴリズムを考案する未解決の可能性を提供する。本稿では,前述した2つの量子アルゴリズムである最大固有値推定と量子勾配降下を実質的に拡張するためにブロック符号化を利用する。高度な手順を含む以前の研究とは異なり、ユニタリブロックエンコーディングを用いて、これらの新しい量子アルゴリズムは、基本操作であっても、元のアルゴリズムに存在する主要なスケーリング要因を排除できることを示しています。これにより、複雑な計算問題に驚くほどの効率で対処できるより効率的な量子アルゴリズムが得られる。さらに,提案手法を,行列反転や複数の固有値推定など,異なる文脈に拡張する方法を示す。

Block encoding is a key ingredient in the recently developed quantum signal processing that forms a unifying framework for quantum algorithms. Initially showcased for simplifying and optimizing resource utilization in several problems, such as searching, amplitude estimation, and Hamiltonian simulation, the capabilities of the quantum signal processing go beyond these and offer untapped potential for devising new quantum algorithms. In this article, we utilize block encoding to substantially enhance two previously proposed quantum algorithms: largest eigenvalue estimation and quantum gradient descent. Unlike previous works that involve sophisticated procedures, our findings, using the unitary block encoding, demonstrate that even with elementary operations, these new quantum algorithms can eliminate major scaling factors present in their original counterparts. This yields much more efficient quantum algorithms capable of tackling complex computational problems with remarkable efficiency. Furthermore, we show how to extend our proposed method to different contexts, including matrix inversion and multiple eigenvalues estimation.

翻訳日:2023-12-25 14:28:08 公開日:2023-12-22

# ロボットソフトウェア開発のためのROSパッケージ検索 : 知識グラフに基づくアプローチ

ROS package search for robot software development: a knowledge graph-based approach ( http://arxiv.org/abs/2312.14781v1 )

ライセンス: Link先を確認

Shuo Wang, Xinjun Mao, Shuo Yang, Menghan Wu, Zhang Zhang

(参考訳) ROS(Robot Operating System)パッケージは、ロボットソフトウェア開発で効果的に再利用できるソフトウェアアーティファクトの一種として、ますます人気が高まっている。実際、利用可能な大量のパッケージからソフトウェアの機能要件によく適合する適切なROSパッケージを見つけることは、現在の検索方法を用いた非自明なタスクである。 ROSパッケージの従来の検索手法は、ロボットタスクに関連するキーワードを汎用検索エンジンやコードホスティングプラットフォームに入力して、潜在的に適切なROSパッケージのほぼ全ての結果を得る。しかし,タスク関連キーワードがROSパッケージの機能と正確に一致しないため,これらの検索手法の精度は比較的低い。本稿では, ROSパッケージの検索精度を向上させるために, セマンティックレベルの ROS Package Knowledge Graph (RPKG) を利用した, セマンティックベースの検索手法を提案する。まず、RPKGを構築するために、ROSパッケージのテキスト記述のデータセットから意味概念を抽出するために多次元特徴抽出技術を用いる。このプロセスから抽出されたセマンティックな特徴は、かなりの数のエンティティと関係をもたらします。その後、ロボットドメイン固有の小さなコーパスを作成し、さらに事前訓練された言語モデルBERT-ROSを作成し、抽出した特徴のセマンティクスを効果的に表現する埋め込みを生成する。これらの埋め込みは、RPKG内のROSパッケージ検索プロセスにおいて、意味レベルの理解と比較を促進する上で重要な役割を果たす。次に,従来のキーワード検索法よりも正確なrosパッケージを検索するユーザ検索クエリから,複数の特徴の重み付き類似性を取り入れた,新しい意味マッチングに基づく検索アルゴリズムを提案する。

ROS (Robot Operating System) packages have become increasingly popular as a type of software artifact that can be effectively reused in robotic software development. Indeed, finding suitable ROS packages that closely match the software's functional requirements from the vast number of available packages is a nontrivial task using current search methods. The traditional search methods for ROS packages often involve inputting keywords related to robotic tasks into general-purpose search engines or code hosting platforms to obtain approximate results of all potentially suitable ROS packages. However, the accuracy of these search methods remains relatively low because the task-related keywords may not precisely match the functionalities offered by the ROS packages. To improve the search accuracy of ROS packages, this paper presents a novel semantic-based search approach that relies on the semantic-level ROS Package Knowledge Graph (RPKG) to automatically retrieve the most suitable ROS packages. Firstly, to construct the RPKG, we employ multi-dimensional feature extraction techniques to extract semantic concepts from the dataset of ROS package text descriptions. The semantic features extracted from this process result in a substantial number of entities and relationships. Subsequently, we create a robot domain-specific small corpus and further fine-tune a pre-trained language model, BERT-ROS, to generate embeddings that effectively represent the semantics of the extracted features. These embeddings play a crucial role in facilitating semantic-level understanding and comparisons during the ROS package search process within the RPKG. Secondly, we introduce a novel semantic matching-based search algorithm that incorporates the weighted similarities of multiple features from user search queries, which searches out more accurate ROS packages than the traditional keyword search method.

翻訳日:2023-12-25 14:27:51 公開日:2023-12-22

# 局所密度構造を用いた画像から画像への変換GANの圧縮

Compressing Image-to-Image Translation GANs Using Local Density Structures on Their Learned Manifold ( http://arxiv.org/abs/2312.14776v1 )

ライセンス: Link先を確認

Alireza Ganjdanesh, Shangqian Gao, Hirad Alipanah, Heng Huang

(参考訳) generative adversarial networks (gans) は、画像から画像への変換のための複雑なデータ分布のモデリングにおいて顕著な成功を示している。それでも、彼らの高い計算要求は、エッジデバイスのような実践的なシナリオへの展開を禁止している。既存のGAN圧縮法は主に知識蒸留や畳み込み分類器の刈り取り技術に依存している。したがって、彼らは GAN の臨界特性、すなわちその学習多様体上の局所密度構造を無視している。そこで,新たな視点からgan圧縮にアプローチし,prunedモデルに学習多様体上の元のパラメータ重モデルの密度構造を保存するよう明示的に促す。原生成器の学習多様体を生成試料周辺の局所近傍に分割することにより,prunedモデルの目的を達成する。そこで我々は,カーネル密度推定法に類似した各近傍の局所密度構造を保存するために,プルーニングモデルを定式化する新しいプルーニング目標を提案する。また, 判別器と生成器を2つのプルーニング剤でプルーニングする協調プルーニングスキームを開発した。我々は,対応するモデルのアーキテクチャを決定する際に,ピアのフィードバックを交換することで,ジェネレータと識別器の相互作用を捉えるエージェントを設計する。このような設計により, プルーニング法は高性能サブネットワークを効率よく見つけることができ, プルーニング時のベースラインと比較して, ジェネレータと判別器のバランスをより効率的に維持できる。画像変換GANモデルであるPix2PixとCycleGANについて,様々なベンチマークデータセットとアーキテクチャを用いて実験を行った。

Generative Adversarial Networks (GANs) have shown remarkable success in modeling complex data distributions for image-to-image translation. Still, their high computational demands prohibit their deployment in practical scenarios like edge devices. Existing GAN compression methods mainly rely on knowledge distillation or convolutional classifiers' pruning techniques. Thus, they neglect the critical characteristic of GANs: their local density structure over their learned manifold. Accordingly, we approach GAN compression from a new perspective by explicitly encouraging the pruned model to preserve the density structure of the original parameter-heavy model on its learned manifold. We facilitate this objective for the pruned model by partitioning the learned manifold of the original generator into local neighborhoods around its generated samples. Then, we propose a novel pruning objective to regularize the pruned model to preserve the local density structure over each neighborhood, resembling the kernel density estimation method. Also, we develop a collaborative pruning scheme in which the discriminator and generator are pruned by two pruning agents. We design the agents to capture interactions between the generator and discriminator by exchanging their peer's feedback when determining corresponding models' architectures. Thanks to such a design, our pruning method can efficiently find performant sub-networks and can maintain the balance between the generator and discriminator more effectively compared to baselines during pruning, thereby showing more stable pruning dynamics. Our experiments on image translation GAN models, Pix2Pix and CycleGAN, with various benchmark datasets and architectures demonstrate our method's effectiveness.

翻訳日:2023-12-25 14:27:22 公開日:2023-12-22

# クロスエイジおよびクロスサイトドメインシフトが新生児および新生児脳の深層学習に基づく白質繊維推定に及ぼす影響

Cross-Age and Cross-Site Domain Shift Impacts on Deep Learning-Based White Matter Fiber Estimation in Newborn and Baby Brains ( http://arxiv.org/abs/2312.14773v1 )

ライセンス: Link先を確認

Rizhong Lin, Ali Gholipour, Jean-Philippe Thiran, Davood Karimi, Hamza Kebiri and Meritxell Bach Cuadra

(参考訳) 深層学習モデルでは拡散磁気共鳴イメージングデータから組織微細構造を推定できることが示されている。しかし、これらのモデルは、異なるスキャナーやプロトコルからのデータや、様々な年齢でスキャンされた幼児や子供の発達した脳など、固有の変異のあるデータに適用された場合、ドメインシフトの課題に直面している。データ調和や成人脳の領域適応など、これらの課題に対処するいくつかの手法が提案されている。しかし、これらの手法は乳幼児の急速に発達する脳における繊維配向分布関数の推定には未解明のままである。本研究では,201人の新生児と165人の赤ちゃんの2つのコホート間の年齢効果とドメインシフトについて,モーメント法と微調整戦略を用いて詳細に検討した。以上の結果から,新生児と比較して乳児の微構造発達の変動が深層学習モデルのクロスエイジング性能に直接影響することが示唆された。また,少数の対象領域サンプルがドメインシフト問題を著しく軽減できることを実証した。

Deep learning models have shown great promise in estimating tissue microstructure from limited diffusion magnetic resonance imaging data. However, these models face domain shift challenges when test and train data are from different scanners and protocols, or when the models are applied to data with inherent variations such as the developing brains of infants and children scanned at various ages. Several techniques have been proposed to address some of these challenges, such as data harmonization or domain adaptation in the adult brain. However, those techniques remain unexplored for the estimation of fiber orientation distribution functions in the rapidly developing brains of infants. In this work, we extensively investigate the age effect and domain shift within and across two different cohorts of 201 newborns and 165 babies using the Method of Moments and fine-tuning strategies. Our results show that reduced variations in the microstructural development of babies in comparison to newborns directly impact the deep learning models' cross-age performance. We also demonstrate that a small number of target domain samples can significantly mitigate domain shift problems.

翻訳日:2023-12-25 14:26:54 公開日:2023-12-22

# 乱流: コードのための命令調整型大規模言語モデルの体系的および自動テスト

Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code ( http://arxiv.org/abs/2312.14856v1 )

ライセンス: Link先を確認

Shahin Honarvar, Mark van der Wilk, Alastair Donaldson

(参考訳) 本稿では,新しいベンチマークである乱流を用いて,命令調整型大規模言語モデル(LLM)のコード生成における正確性と堅牢性を体系的に評価する手法を提案する。 turbulence は、自然言語 $\textit{question templates}$ の大規模なセットで構成されており、それぞれがプログラミングの問題であり、様々な形式で問うことができるようにパラメータ化されている。各質問テンプレートには関連する$\textit{test oracle}$があり、llmによって返されるコードソリューションが正しいかどうかを判断する。したがって、単一の質問テンプレートから LLM に $\textit{neighbourhood}$ と非常に似たプログラミング質問を問うことができ、各質問に対して返された結果の正しさを評価することができる。例えば、$\textit{anomalies}$, LLMが近隣で$\textit{almost all}$を正しく解決するが、特定のパラメータのインスタンス化には失敗する。我々は,OpenAI,Cohere,Metaの5つのLLMに対して,それぞれ2つの温度構成で実験を行った。以上の結果から, 乱流はLLM推論能力のギャップを明らかにすることができることがわかった。 LLMが近隣の問題を解決することができるが、近隣全体の問題を解決するために一般化することができないケースを体系的に識別することによって、我々の手法は$\textit{robustness}$問題をハイライトするのに効果的である。我々は、llmが間違ったコード結果を返す際に犯す誤りの種類に光を当てるデータと例を示します。

We present a method for systematically evaluating the correctness and robustness of instruction-tuned large language models (LLMs) for code generation via a new benchmark, Turbulence. Turbulence consists of a large set of natural language $\textit{question templates}$, each of which is a programming problem, parameterised so that it can be asked in many different forms. Each question template has an associated $\textit{test oracle}$ that judges whether a code solution returned by an LLM is correct. Thus, from a single question template, it is possible to ask an LLM a $\textit{neighbourhood}$ of very similar programming questions, and assess the correctness of the result returned for each question. This allows gaps in an LLM's code generation abilities to be identified, including $\textit{anomalies}$ where the LLM correctly solves $\textit{almost all}$ questions in a neighbourhood but fails for particular parameter instantiations. We present experiments against five LLMs from OpenAI, Cohere and Meta, each at two temperature configurations. Our findings show that, across the board, Turbulence is able to reveal gaps in LLM reasoning ability. This goes beyond merely highlighting that LLMs sometimes produce wrong code (which is no surprise): by systematically identifying cases where LLMs are able to solve some problems in a neighbourhood but do not manage to generalise to solve the whole neighbourhood, our method is effective at highlighting $\textit{robustness}$ issues. We present data and examples that shed light on the kinds of mistakes that LLMs make when they return incorrect code results.

翻訳日:2023-12-25 14:19:37 公開日:2023-12-22

# TACO:アルゴリズムによるCOde生成データセットのトピック

TACO: Topics in Algorithmic COde generation dataset ( http://arxiv.org/abs/2312.14852v1 )

ライセンス: Link先を確認

Rongao Li (1 and 2), Jie Fu (1), Bo-Wen Zhang (1), Tao Huang (2), Zhihong Sun (2), Chen Lyu (2), Guang Liu (1), Zhi Jin (3), Ge Li (3) ((1) Beijing Academy of Artificial Intelligence, (2) School of Information Science and Engineering, Shandong Normal University, China, (3) Key Lab of HCST (PKU), MOE, SCS, Peking University, China)

(参考訳) 我々は,オープンソースの大規模コード生成データセットであるtacoを紹介し,アルゴリズムの光学に重点を置いて,コード生成モデルの分野でより困難なトレーニングデータセットと評価ベンチマークを提供する。 TACOには、現実のプログラミングシナリオにおける問題理解と推論能力を向上または評価する、より難しい競合レベルのプログラミング質問が含まれている。トレーニングとテストセットには25433と1000のコーディング問題があり、最大155万の多様な解答がある。さらに、各TACO問題には、タスクトピック、アルゴリズム、プログラミングスキル、難易度といったいくつかのきめ細かいラベルが含まれており、コード生成モデルのトレーニングと評価をより正確に参照している。データセットと評価スクリプトはHugging Face Hub(https://huggingface.co/datasets/BAAI/TACO)とGithub(https://github.com/FlagOpen/TACO)で入手できる。

We introduce TACO, an open-source, large-scale code generation dataset, with a focus on the optics of algorithms, designed to provide a more challenging training dataset and evaluation benchmark in the field of code generation models. TACO includes competition-level programming questions that are more challenging, to enhance or evaluate problem understanding and reasoning abilities in real-world programming scenarios. There are 25433 and 1000 coding problems in training and test set, as well as up to 1.55 million diverse solution answers. Moreover, each TACO problem includes several fine-grained labels such as task topics, algorithms, programming skills, and difficulty levels, providing a more precise reference for the training and evaluation of code generation models. The dataset and evaluation scripts are available on Hugging Face Hub (https://huggingface.co/datasets/BAAI/TACO) and Github (https://github.com/FlagOpen/TACO).

翻訳日:2023-12-25 14:19:06 公開日:2023-12-22

# kemeny定数を用いた最適マルコフ鎖分割のためのグラフニューラルネットワークの大規模トレーディング

Large Scale Traning of Graph Neural Networks for Optimal Markov-Chain Partitioning Using the Kemeny Constant ( http://arxiv.org/abs/2312.14847v1 )

ライセンス: Link先を確認

Sam Alexander Martino, Jo\~ao Morado, Chenghao Li, Zhenghao Lu, Edina Rosta

(参考訳) 従来のクラスタリングアルゴリズムは、グラフ内の複雑な関係を捉え、任意のクラスタリング基準に一般化するのに苦労することが多い。グラフデータの表現を学習する強力なフレームワークとしてのグラフニューラルネットワーク(GNN)の出現は、その問題を解決するための新しいアプローチを提供する。これまでの研究は、GNNが様々な基準を用いてパーティショニングを提案できることを示したが、これらのアプローチはまだマルコフ連鎖や運動ネットワークに拡張されていない。これらは分子システムの研究で頻繁に発生し、特に生化学的モデリングのコミュニティに興味を持つ。本稿では,マルコフ連鎖のグラフ分割問題に対処するために,複数のgnnベースのアーキテクチャを提案する。このアプローチは、提案されたパーティショニングがケメニー定数をどの程度変更するかを最小化することを目的としている。本稿では,エンコーダデコーダアーキテクチャを用いて,リニアレイヤを持つGraphSAGEベースのGNNが,このコンテキストにおいてより大きく,より表現力に富んだアテンションベースモデルよりも優れていることを示す。概念実証として,まずランダムに連結されたグラフをクラスタ化する手法を実証する。また、運動ネットワークとして1次元自由エネルギープロファイルに対応する線形鎖構造を用いる。その後,分子動力学から得られたデータセットを用いた実験により,本手法の有効性を示す。本手法の性能をpcca+などの他の分割手法と比較する。本稿では,特徴量選択とハイパーパラメータ選択の重要性を検討し,gnnの大規模並列学習のための汎用的戦略を提案する。

Traditional clustering algorithms often struggle to capture the complex relationships within graphs and generalise to arbitrary clustering criteria. The emergence of graph neural networks (GNNs) as a powerful framework for learning representations of graph data provides new approaches to solving the problem. Previous work has shown GNNs to be capable of proposing partitionings using a variety of criteria, however, these approaches have not yet been extended to work on Markov chains or kinetic networks. These arise frequently in the study of molecular systems and are of particular interest to the biochemical modelling community. In this work, we propose several GNN-based architectures to tackle the graph partitioning problem for Markov Chains described as kinetic networks. This approach aims to minimize how much a proposed partitioning changes the Kemeny constant. We propose using an encoder-decoder architecture and show how simple GraphSAGE-based GNNs with linear layers can outperform much larger and more expressive attention-based models in this context. As a proof of concept, we first demonstrate the method's ability to cluster randomly connected graphs. We also use a linear chain architecture corresponding to a 1D free energy profile as our kinetic network. Subsequently, we demonstrate the effectiveness of our method through experiments on a data set derived from molecular dynamics. We compare the performance of our method to other partitioning techniques such as PCCA+. We explore the importance of feature and hyperparameter selection and propose a general strategy for large-scale parallel training of GNNs for discovering optimal graph partitionings.

翻訳日:2023-12-25 14:18:48 公開日:2023-12-22

# Rydberg tweezerアレイにおける不均衡ホッピングを伴う2成分Bose-Hubbardモデルのシミュレーション

Simulating a two component Bose-Hubbard model with imbalanced hopping in a Rydberg tweezer array ( http://arxiv.org/abs/2312.14846v1 )

ライセンス: Link先を確認

Y. Zhang, A. Gaddie, H-V. Do, G. W. Biedermann, R. J. Lewis-Swan

(参考訳) 中性原子の光学トウェザーアレイは、相互作用の範囲とハミルトニアンによって量子シミュレーションのための汎用的なプラットフォームを提供する。本稿では,共振双極子相互作用を特徴とする多層Rydberg原子配列を用いた2成分Bose-Hubbardモデルを提案する。 bose-hubbardモデルの局所ヒルベルト空間を符号化するために使用できる状態の多様性は、各成分の相対ホッピング率の制御とスピンフリップホッピングの実現を可能にする。数値シミュレーションを用いて、多レベルリドバーグ原子がモデルの多様な非平衡クエンチダイナミクスを探求する機会を与えることを示す。例えば、有効スピンの緩和時間スケールと荷電自由度を分離し、ハードコアボソン相互作用に起因する動的制約により、2つの成分の有効ホッピング速度が大きく異なる場合の緩やかな緩和の仕組みを観察する。本稿では,最新のrydberg tweezer配列で提案を実現する技術的詳細について述べる。

Optical tweezer arrays of neutral atoms provide a versatile platform for quantum simulation due to the range of interactions and Hamiltonians that can be realized and explored. We propose to simulate a two-component Bose-Hubbard model with power-law hopping using arrays of multilevel Rydberg atoms featuring resonant dipolar interactions. The diversity of states that can be used to encode the local Hilbert space of the Bose-Hubbard model enables control of the relative hopping rate of each component and even the realization of spin-flip hopping. We use numerical simulations to show how multilevel Rydberg atoms provide an opportunity to explore the diverse non-equilibrium quench dynamics of the model. For example, we demonstrate a separation of the relaxation timescales of effective spin and charge degrees of freedom, and observe regimes of slow relaxation when the effective hopping rates of the two components are vastly different due to dynamical constraints arising from hardcore boson interactions. We discuss the technical details of realizing our proposal in state-of-the-art Rydberg tweezer arrays.

翻訳日:2023-12-25 14:18:28 公開日:2023-12-22

# メタファー翻訳の精神医学への応用について

On the Use of Metaphor Translation in Psychiatry ( http://arxiv.org/abs/2312.14845v1 )

ライセンス: Link先を確認

Lois Wong

(参考訳) 英語能力に限界がある個人にメンタルヘルスを提供すること(LEP)は、精神医学において迫る問題である。精神科医療の訓練を受けた人の大多数は英語話者であるため、LEP患者に与えられるメンタルヘルスケアの質は英語話者に提供されるものよりも著しく低い。メンタルヘルスケアの提供は、患者と医療提供者の間のコミュニケーションと理解に焦点を合わせており、物理的な医療の領域よりもはるかに多く、英語話者は、LEPのメタファのような比喩的な言語を理解できないことが多い。したがって、フィギュラティブ言語翻訳は、公平な精神医学的ケアを提供するのに有用である。現在、メタファーは精神的な問題に苦しむ個人を識別し、それらの個人が自分の経験を理解し、伝達するのを手助けすることの両方において最重要であることが示されている。そこで本稿は,精神医学領域における機械翻訳の可能性を調査し,既存の機械の伝達可能性やメタファー翻訳研究のさらなる研究の必要性を明らかにすることを目的とする。

Providing mental healthcare to individuals with limited English proficiency (LEP) remains a pressing problem within psychiatry. Because the majority of individuals trained in providing psychiatric care are English speakers, the quality of mental healthcare given to LEP patients is significantly lower than that provided for English speakers. The provision of mental healthcare is contingent on communication and understanding between the patient and healthcare provider, much more so than in the realm of physical healthcare, and English speakers are often unable to comprehend figurative language such as metaphors used by LEPs. Hence, Figurative Language Translation is invaluable to providing equitable psychiatric care. Now, metaphor has been shown to be paramount in both identifying individuals struggling with mental problems and helping those individuals understand and communicate their experiences. Therefore, this paper aims to survey the potential of Machine Translation for providing equitable psychiatric healthcare and highlights the need for further research on the transferability of existing machine and metaphor translation research in the domain of psychiatry.

翻訳日:2023-12-25 14:18:09 公開日:2023-12-22

# SusDevOps: ソフトウェアエンジニアリングの第一原則に持続可能性を促進する

SusDevOps: Promoting Sustainability to a First Principle in Software Engineering ( http://arxiv.org/abs/2312.14843v1 )

ライセンス: Link先を確認

Istvan David

(参考訳) 持続性は現代のソフトウェアシステムの重要な特性になりつつある。持続可能なソフトウェア工学に関する知識は大幅に増えていますが、ソフトウェアデリバリライフサイクル内でサステナビリティ関連の活動を行うエンドツーエンドのフレームワークは欠落しています。この記事では、DevOpsコンテキストにおける第一原則への持続可能性を促進するSusDevOpsフレームワークを提案する。ソフトウェア開発スタートアップ企業を事例として,SusDevOpsのライフサイクルフェーズとテクニックを実演する。

Sustainability is becoming a key property of modern software systems. While there is a substantial and growing body of knowledge on engineering sustainable software, end-to-end frameworks that situate sustainability-related activities within the software delivery lifecycle are missing. In this article, we propose the SusDevOps framework that promotes sustainability to a first principle within a DevOps context. We demonstrate the lifecycle phases and techniques of SusDevOps through the case of a software development startup company.

翻訳日:2023-12-25 14:17:52 公開日:2023-12-22

# 旅行セールスマン問題に対するラグランジアン乗算器の学習

Learning Lagrangian Multipliers for the Travelling Salesman Problem ( http://arxiv.org/abs/2312.14836v1 )

ライセンス: Link先を確認

Augustin Parjadis, Quentin Cappart, Bistra Dilkina, Aaron Ferber, Louis-Martin Rousseau

(参考訳) ラグランジアン緩和(英: lagrangian relax)は、最適化問題における制約を緩和するために用いられる多目的数学の手法であり、双対境界の生成により、実現可能な解の最適性と、制約プログラミング(重み付き回路制約など)における効率的なプロパゲータの設計を証明できる。しかしながら、ラグランジアン乗法(例えば劣勾配法)を導出する従来の過程はしばしば計算集約的であり、大規模あるいは時間に敏感な問題に対する実用性を制限している。そこで本研究では,グラフニューラルネットワークの能力を活用して問題構造を活用し,精度の高いラグランジアン乗算器を効率的に生成することを目的とした,教師なし学習手法を提案する。この手法を、旅行セールスマン問題に対する有名なヘルド・カルプ・ラグランジアン緩和に適用する。中心となる考え方は、正確なラグランジアン乗算を予測し、ヘルド=カルプ緩和境界を生成するための暖かい出発点としてそれらを用いることである。これらの境界は、分岐とバウンドのアルゴリズムによって実行されるフィルタリングプロセスを強化するために使われる。実現可能な解を見つけることに焦点を当てた既存の文献の多くとは対照的に、我々のアプローチは両面で動作し、学習が最適性の証明を加速できることを示す。我々は,200都市までの事例を考慮し,メートル法トラベルセールスマン問題の様々な分布について実験を行う。その結果、本手法は、重み付き回路のグローバル制約のフィルタリングレベルを改善し、タイムアウトまでの未解決インスタンスに対する係数2による最適性ギャップを減らし、解決インスタンスの実行時間を10%削減できることを示した。

Lagrangian relaxation is a versatile mathematical technique employed to relax constraints in an optimization problem, enabling the generation of dual bounds to prove the optimality of feasible solutions and the design of efficient propagators in constraint programming (such as the weighted circuit constraint). However, the conventional process of deriving Lagrangian multipliers (e.g., using subgradient methods) is often computationally intensive, limiting its practicality for large-scale or time-sensitive problems. To address this challenge, we propose an innovative unsupervised learning approach that harnesses the capabilities of graph neural networks to exploit the problem structure, aiming to generate accurate Lagrangian multipliers efficiently. We apply this technique to the well-known Held-Karp Lagrangian relaxation for the travelling salesman problem. The core idea is to predict accurate Lagrangian multipliers and to employ them as a warm start for generating Held-Karp relaxation bounds. These bounds are subsequently utilized to enhance the filtering process carried out by branch-and-bound algorithms. In contrast to much of the existing literature, which primarily focuses on finding feasible solutions, our approach operates on the dual side, demonstrating that learning can also accelerate the proof of optimality. We conduct experiments across various distributions of the metric travelling salesman problem, considering instances with up to 200 cities. The results illustrate that our approach can improve the filtering level of the weighted circuit global constraint, reduce the optimality gap by a factor two for unsolved instances up to a timeout, and reduce the execution time for solved instances by 10%.

翻訳日:2023-12-25 14:17:44 公開日:2023-12-22

# リッチ中国語記述に基づくプロトタイプガイドによる人物検索

Prototype-Guided Text-based Person Search based on Rich Chinese Descriptions ( http://arxiv.org/abs/2312.14834v1 )

ライセンス: Link先を確認

Ziqiang Wu, Bingpeng Ma

(参考訳) テキストベース人物検索は,人物検出とテキストベース人物検索の統一課題と見なすことができる,未カットシーン画像からの問合せテキストに基づいて,対象人物のローカライズと識別を同時に行うことを目的としている。本研究では,広く利用されている人物検索データセットPRWに基づく大規模ベンチマークデータセットPRW-TPS-CNを提案する。私たちのデータセットには47,102の文が含まれています。これらのテキストは上から下までの人物像を正確に記述しており、これは自然な記述順序に従っている。また、より包括的な評価のために、私たちのデータセットに中国語と英語の記述も提供します。これらの特徴はデータセットをより適用しやすくします。個人検出とテキストに基づく人物検索の不整合を軽減するために,PRW-TPS-CNデータセットのリッチテキストを活用する。本研究では,複数のテキストをテキストプロトタイプとして集約して,人物の顕著なテキスト特徴を維持することを提案する。全体のプロトタイプは画像アテンションマップを生成し、テキストベースの人物検索の低下を引き起こす検出ミスアライメントを解消する。これにより、人物検出とテキストに基づく人物検索との矛盾が軽減される。 PRW-TPS-CNデータセットについて広範な実験を行った。実験の結果, PRW-TPS-CNデータセットの有効性と, 提案手法の最先端性能が示された。

Text-based person search aims to simultaneously localize and identify the target person based on query text from uncropped scene images, which can be regarded as the unified task of person detection and text-based person retrieval task. In this work, we propose a large-scale benchmark dataset named PRW-TPS-CN based on the widely used person search dataset PRW. Our dataset contains 47,102 sentences, which means there is quite more information than existing dataset. These texts precisely describe the person images from top to bottom, which in line with the natural description order. We also provide both Chinese and English descriptions in our dataset for more comprehensive evaluation. These characteristics make our dataset more applicable. To alleviate the inconsistency between person detection and text-based person retrieval, we take advantage of the rich texts in PRW-TPS-CN dataset. We propose to aggregate multiple texts as text prototypes to maintain the prominent text features of a person, which can better reflect the whole character of a person. The overall prototypes lead to generating the image attention map to eliminate the detection misalignment causing the decrease of text-based person retrieval. Thus, the inconsistency between person detection and text-based person retrieval is largely alleviated. We conduct extensive experiments on the PRW-TPS-CN dataset. The experimental results show the PRW-TPS-CN dataset's effectiveness and the state-of-the-art performance of our approach.

翻訳日:2023-12-25 14:17:15 公開日:2023-12-22

# 電界波の夢:拡散モデルを用いた心臓励起波の生成モデル

Dreaming of Electrical Waves: Generative Modeling of Cardiac Excitation Waves using Diffusion Models ( http://arxiv.org/abs/2312.14830v1 )

ライセンス: Link先を確認

Tanish Baranwal, Jan Lebert, Jan Christoph

(参考訳) 心臓の電気波は、心房細動や心室細動などの不整脈が持続する間に回転する渦巻波またはスクロール波を形成する。波動力学は通常、励起媒質中の反応拡散ダイナミクスを記述する結合偏微分方程式を用いてモデル化される。最近では、物理的および生物学的システムにおいて時空間パターンを生成する代替として、データ駆動生成モデリングが出現している。本稿では,心筋組織における電磁波パターン生成モデルのための拡散確率モデルについて検討する。我々は、非条件および条件付き生成タスクにおいて、そのような波動パターンを生成できるように、模擬波動パターンを用いた拡散モデルを訓練した。例えば,表面2次元計測から3次元波動を再構成し,パラメータ固有ダイナミクスを進化・生成するなど,インパインティングタスクについて検討した。拡散生成溶液を生体物理モデルを用いて得られた溶液と比較し, 拡散モデルがスパイラル波とスクロール波のダイナミクスを再現することを学び, 心筋組織における励起波のモデリングのためのデータ駆動型アプローチとして機能することを発見した。例えば、心室細動(vf)ダイナミックスを瞬時に開始することが可能であり、ペーシングプロトコルを適用しなくてもウェーブブレイクを誘発できることがわかった。 vfダイナミクスは任意の心室ジオメトリーで生成でき、時間とともに進化することができる。しかし, 拡散モデルでは, 制約が不十分な場合, 波動パターンを「ハロシン化」することがわかった。これらの制限に拘わらず、拡散モデルは心不整脈研究や診断に多くの可能性を持つ興味深い強力なツールである。

Electrical waves in the heart form rotating spiral or scroll waves during life-threatening arrhythmias such as atrial or ventricular fibrillation. The wave dynamics are typically modeled using coupled partial differential equations, which describe reaction-diffusion dynamics in excitable media. More recently, data-driven generative modeling has emerged as an alternative to generate spatio-temporal patterns in physical and biological systems. Here, we explore denoising diffusion probabilistic models for the generative modeling of electrical wave patterns in cardiac tissue. We trained diffusion models with simulated electrical wave patterns to be able to generate such wave patterns in unconditional and conditional generation tasks. For instance, we explored inpainting tasks, such as reconstructing three-dimensional wave dynamics from superficial two-dimensional measurements, and evolving and generating parameter-specific dynamics. We characterized and compared the diffusion-generated solutions to solutions obtained with biophysical models and found that diffusion models learn to replicate spiral and scroll waves dynamics so well that they could serve as an alternative data-driven approach for the modeling of excitation waves in cardiac tissue. For instance, we found that it is possible to initiate ventricular fibrillation (VF) dynamics instantaneously without having to apply pacing protocols in order to induce wavebreak. The VF dynamics can be created in arbitrary ventricular geometries and can be evolved over time. However, we also found that diffusion models `hallucinate' wave patterns when given insufficient constraints. Regardless of these limitations, diffusion models are an interesting and powerful tool with many potential applications in cardiac arrhythmia research and diagnostics.

翻訳日:2023-12-25 14:16:51 公開日:2023-12-22

# Plan, Posture and Go: オープンワールドテキスト・ツー・モーション・ジェネレーションを目指して

Plan, Posture and Go: Towards Open-World Text-to-Motion Generation ( http://arxiv.org/abs/2312.14828v1 )

ライセンス: Link先を確認

Jinpeng Liu, Wenxun Dai, Chunyu Wang, Yiji Cheng, Yansong Tang, Xin Tong

(参考訳) 従来のテキストからモーションへの生成法は通常、限られたテキストとモーションのペアで訓練されるため、オープンワールドシナリオへの一般化は困難である。 CLIPモデルを用いて動き空間とテキスト空間を整列し、自然言語の動作記述から動き生成を可能にする研究もある。しかし、それらは依然として限定的で非現実的な動きを発生させることに制限されている。これらの問題に対処するため,動作プランナ,姿勢ディフューザ,go-diffuser の3つのモジュールからなる PRO-Motion という分割型フレームワークを提案する。モーションプランナーは、大きな言語モデル(llm)に目標の動きにおける主要な姿勢を記述する一連のスクリプトを生成するよう指示する。自然言語とは異なり、スクリプトは、非常に単純なテキストテンプレートに従って、あらゆる可能な姿勢を記述できる。これにより、スクリプトを姿勢に変換する姿勢微分器の複雑さが大幅に減少し、オープンワールド生成への道が開ける。最後に、go-diffuserは別の拡散モデルとして実装され、すべての姿勢に対する全体翻訳と回転を推定し、現実的な動きをもたらす。実験により,本手法が他の手法よりも優れていることを示すとともに,複雑なオープンワールドプロンプトから多様で現実的な動作を生成できることを実証した。プロジェクトページはhttps://moonsliu.github.io/pro-motion。

Conventional text-to-motion generation methods are usually trained on limited text-motion pairs, making them hard to generalize to open-world scenarios. Some works use the CLIP model to align the motion space and the text space, aiming to enable motion generation from natural language motion descriptions. However, they are still constrained to generate limited and unrealistic in-place motions. To address these issues, we present a divide-and-conquer framework named PRO-Motion, which consists of three modules as motion planner, posture-diffuser and go-diffuser. The motion planner instructs Large Language Models (LLMs) to generate a sequence of scripts describing the key postures in the target motion. Differing from natural languages, the scripts can describe all possible postures following very simple text templates. This significantly reduces the complexity of posture-diffuser, which transforms a script to a posture, paving the way for open-world generation. Finally, go-diffuser, implemented as another diffusion model, estimates whole-body translations and rotations for all postures, resulting in realistic motions. Experimental results have shown the superiority of our method with other counterparts, and demonstrated its capability of generating diverse and realistic motions from complex open-world prompts such as "Experiencing a profound sense of joy". The project page is available at https://moonsliu.github.io/Pro-Motion.

翻訳日:2023-12-25 14:16:25 公開日:2023-12-22

# 量子実時間発展のためのテンソル正規化群法

Tensor Renormalization Group Methods for Quantum Real-time Evolution ( http://arxiv.org/abs/2312.14825v1 )

ライセンス: Link先を確認

Michael Hite and Yannick Meurice

(参考訳) 格子ゲージ理論における実時間発展のab-initio計算は、非常に興味深い応用であるが、計算の難解な側面を提示している。ユークリッド時間格子場理論の文脈で開発されたテンソル再正規化群法は, トロタライズ展開作用素のリアルタイム計算に応用できることを示す。本稿では,各種観測器の切断手順の最適化について検討する。この数値解法を1次元量子イジングモデルに適用し,順序相の外部横場を用いて計算を行い,$n_{s}=4$および8サイトの普遍量子計算と比較する。

Ab-initio calculations of real-time evolution for lattice gauge theory have very interesting potential applications but present challenging computational aspects. We show that tensor renormalization group methods developed in the context of Euclidean-time lattice field theory can be applied to calculation of Trotterized evolution operators at real time. We discuss the optimization of truncation procedures for various observables. We apply the numerical methods to the 1D Quantum Ising Model with an external transverse field in the ordered phase and compare with universal quantum computing for $N_{s}=4$ and 8 sites.

翻訳日:2023-12-25 14:16:00 公開日:2023-12-22

# 無信仰DRLとMCTSによる検査・保守計画の検討

An investigation of belief-free DRL and MCTS for inspection and maintenance planning ( http://arxiv.org/abs/2312.14824v1 )

ライセンス: Link先を確認

Daniel Koutas, Elizabeth Bismut, Daniel Straub

(参考訳) 本稿では,検査・保守(I&M)計画において発生するような,不確実性の下での逐次決定プロセスのための新しいDeep Reinforcement Learning(DRL)アーキテクチャを提案する。 I&M計画のための他のDRLアルゴリズムとは異なり、提案された+RQNアーキテクチャは信念状態の計算を不要とし、代わりに誤観測を直接処理する。このアルゴリズムは、劣化する一成分系の基本的なI&M計画問題に適用する。さらに,モンテカルロ木を用いたI&M問題探索の性能について検討し,+RQNと比較した。この比較は、2つの方法の結果のポリシーの統計分析と、信念空間におけるそれらの可視化を含む。

We propose a novel Deep Reinforcement Learning (DRL) architecture for sequential decision processes under uncertainty, as encountered in inspection and maintenance (I&M) planning. Unlike other DRL algorithms for (I&M) planning, the proposed +RQN architecture dispenses with computing the belief state and directly handles erroneous observations instead. We apply the algorithm to a basic I&M planning problem for a one-component system subject to deterioration. In addition, we investigate the performance of Monte Carlo tree search for the I&M problem and compare it to the +RQN. The comparison includes a statistical analysis of the two methods' resulting policies, as well as their visualization in the belief space.

翻訳日:2023-12-25 14:15:50 公開日:2023-12-22

# 部分データからの極性双対と量子共分散行列の再構成

Polar Duality and the Reconstruction of Quantum Covariance Matrices from Partial Data ( http://arxiv.org/abs/2312.14823v1 )

ライセンス: Link先を確認

Maurice A. de Gosson

(参考訳) 先行研究で導入されたラグランジアンとシンプレクティック極双対性の概念を用いて,量子共分散行列の再構成の問題に対処する。我々は、パウリの再構成問題を非自明に一般化するガウス量子状態に適用し、そのような状態の簡単なトモグラフィー的特徴を述べる。

We address the problem of the reconstruction of quantum covariance matrices using the notion of Lagrangian and symplectic polar duality introduced in previous work. We apply our constructions to Gaussian quantum states which leads to a non-trivial generalization of Pauli's reconstruction problem and we state a simple tomographic characterization of such states.

翻訳日:2023-12-25 14:15:37 公開日:2023-12-22

# 最適移動による自己注意の規則性理解

Understanding the Regularity of Self-Attention with Optimal Transport ( http://arxiv.org/abs/2312.14820v1 )

ライセンス: Link先を確認

Val\'erie Castin, Pierre Ablin, Gabriel Peyr\'e

(参考訳) トランスフォーマーとそのマルチヘッドアテンションメカニズムは、幅広いドメインで最先端のモデルを上回ることで、わずか数年でマシンラーニングの状況を完全に変えました。しかし、理論的な観点から彼らの堅牢性についてはほとんど分かっていない。ニューラルネットワークのロバスト性を測定する攻撃非依存的な方法を提供する,自己注意の局所的なリプシッツ定数を研究することで,この問題に対処する。入力をwasserstein距離を備えた確率測度として見ることにより,測定理論の枠組みを採用する。これにより、無限長の入力に対する注意を一般化し、コンパクト集合上の自己アテンションのリプシッツ定数の上界と下界を導出することができる。下限は先行結果を大幅に改善し、コンパクト集合の半径と指数関数的に増大し、入力空間に付加的な制約を伴わずに堅牢性保証を得る可能性を排除する。我々の結果は、高局所リプシッツ定数の測度は典型的にはいくつかのディラックから構成されており、非常に不均衡な質量分布であることも指摘している。最後に,指標数を変化させる摂動下での自己アテンションの安定性を解析し,測定理論の枠組みにおいて自然な問題と考えられる。特に、いくつかの入力に対して、トークンを摂動前に重複する攻撃は、単にトークンを移動させる攻撃よりも効率的であることを示す。この現象を質量分割と呼ぶ。

Transformers and their multi-head attention mechanism have completely changed the machine learning landscape in just a few years, by outperforming state-of-art models in a wide range of domains. Still, little is known about their robustness from a theoretical perspective. We tackle this problem by studying the local Lipschitz constant of self-attention, that provides an attack-agnostic way of measuring the robustness of a neural network. We adopt a measure-theoretic framework, by viewing inputs as probability measures equipped with the Wasserstein distance. This allows us to generalize attention to inputs of infinite length, and to derive an upper bound and a lower bound on the Lipschitz constant of self-attention on compact sets. The lower bound significantly improves prior results, and grows more than exponentially with the radius of the compact set, which rules out the possibility of obtaining robustness guarantees without any additional constraint on the input space. Our results also point out that measures with a high local Lipschitz constant are typically made of a few diracs, with a very unbalanced distribution of mass. Finally, we analyze the stability of self-attention under perturbations that change the number of tokens, which appears to be a natural question in the measure-theoretic framework. In particular, we show that for some inputs, attacks that duplicate tokens before perturbing them are more efficient than attacks that simply move tokens. We call this phenomenon mass splitting.

翻訳日:2023-12-25 14:15:30 公開日:2023-12-22

# 光制御単一分子によるフォノン寿命の増大

Enhanced phonon lifetimes with optically controlled single molecules ( http://arxiv.org/abs/2312.14819v1 )

ライセンス: Link先を確認

Victor Ceban and Mihai A. Macovei

(参考訳) 有機結晶からなるメカニカル共振器に埋め込まれた単一分子のフォノンダイナミクスについて検討した。システム全体が、バッドキャビティ限界内の光共振器に配置される。分子集団の光制御がフォノンダイナミクスに影響を及ぼすことが判明した。長寿命フォノンは遷移周波数の変調によって分子の崩壊ダイナミクスを減速させるときに得られる。実験結果は,他の2レベルエミッタと機械共振器を用いたオプティメカルセットアップにも有効である。

We have investigated the phonon dynamics of a single-molecule embedded in a mechanical resonator made of an organic crystal. The whole system is placed in an optical resonator within the bad cavity limit. We have found that the optical control of the molecular population affects the phonon dynamics. Long-lived phonons are obtained when slowing-down the decay dynamics of the molecule via modulation of the transition frequency. The discussed results are also valid for optomechanical setups based on other types of two-level emitters and mechanical resonators.

翻訳日:2023-12-25 14:15:06 公開日:2023-12-22

# FAST:ブラックボックス生成モデルにおける弱学習のための類似性認識

FAST: Feature Aware Similarity Thresholding for Weak Unlearning in Black-Box Generative Models ( http://arxiv.org/abs/2312.14895v1 )

ライセンス: Link先を確認

Subhodip Panda, Prathosh AP

(参考訳) プライバシーに関する懸念の高まりと規制枠組みの遵守によって推進される、深層生成モデルの規制の強化は、これらのモデルに対する正確な制御メカニズムの必要性を強調する。この緊急性は特に、否定的、攻撃的、または潜在的に有害なコンテンツを含む生成モデルが出力を生成する事例によって強調される。これに対し、機械学習は特定の知識を選択的に忘れたり、事前訓練されたモデルから望ましくないデータサブセットの影響を取り除いたりする。しかし、現代の機械学習のアプローチでは、学習中にモデルパラメータやアーキテクチャの詳細へのアクセスを想定することが多い。下流タスクでは、これらのモデルはブラックボックスシステムとして機能し、アクセシブルな事前訓練パラメータ、アーキテクチャ、トレーニングデータを持つ。このようなシナリオでは、望ましくない出力をフィルタリングする可能性も現実的な代替となる。第一に,フィルタリングと未学習プロセスの関係を明らかにすること,第二に,ブラックボックスシステムとして特徴付けられるモデルから生成された望ましくない出力の表示を緩和する手法を定式化することである。本研究における理論的分析は,ブラックボックスモデルの文脈において,フィルタリングが弱いアンラーニングの一形態であることを示す。提案手法は,潜在空間における不必要な特徴の表現を体系的に符号化することにより,望ましくない出力を効果的に抑制する。

The heightened emphasis on the regulation of deep generative models, propelled by escalating concerns pertaining to privacy and compliance with regulatory frameworks, underscores the imperative need for precise control mechanisms over these models. This urgency is particularly underscored by instances in which generative models generate outputs that encompass objectionable, offensive, or potentially injurious content. In response, machine unlearning has emerged to selectively forget specific knowledge or remove the influence of undesirable data subsets from pre-trained models. However, modern machine unlearning approaches typically assume access to model parameters and architectural details during unlearning, which is not always feasible. In multitude of downstream tasks, these models function as black-box systems, with inaccessible pre-trained parameters, architectures, and training data. In such scenarios, the possibility of filtering undesired outputs becomes a practical alternative. The primary goal of this study is twofold: first, to elucidate the relationship between filtering and unlearning processes, and second, to formulate a methodology aimed at mitigating the display of undesirable outputs generated from models characterized as black-box systems. Theoretical analysis in this study demonstrates that, in the context of black-box models, filtering can be seen as a form of weak unlearning. Our proposed \textbf{\textit{Feature Aware Similarity Thresholding(FAST)}} method effectively suppresses undesired outputs by systematically encoding the representation of unwanted features in the latent space.

翻訳日:2023-12-25 14:09:17 公開日:2023-12-22

# DRStageNet: 基礎画像からの糖尿病網膜症の深層学習

DRStageNet: Deep Learning for Diabetic Retinopathy Staging from Fundus Images ( http://arxiv.org/abs/2312.14891v1 )

ライセンス: Link先を確認

Yevgeniy Men, Jonathan Fhima, Leo Anthony Celi, Lucas Zago Ribeiro, Luis Filipe Nakayama, Joachim A. Behar

(参考訳) 糖尿病網膜症(英: Diabetic retinopathy, DR)は、糖尿病の合併症である。視覚障害の予防にはタイムリーな識別が不可欠である。近年,デジタルファウンダス画像(DFI)からのDRステージングアルゴリズムが提案されている。しかし、モデルがトレーニングされたソースドメインと、それがデプロイされるターゲットドメインとの間の分散シフトのために、モデルはしばしば一般化できない。ソースドメインとターゲットドメインが完全にオーバーラップしていない場合、一般的な、特に難しいシフトが発生する。本研究では,この課題を軽減するために設計されたディープラーニングモデルDRStageNetを紹介する。我々は, 患者人口, 民族, 地理的起源, コンコービデンスをカバーする合計93,534のDFIを含む7つの公開データセットを使用した。我々は、自己教師型視覚変換器の事前訓練モデルであるDINOv2を微調整し、一般化性能を高めるためにマルチソース領域の微調整戦略を実装した。我々は,最近発表された基盤モデルを含む2つの最先端ベンチマークに対して,本手法の優位性をベンチマークし,実証する。我々は, 分解能の高いヒートマップを提供するために, grad-rollout法を回帰タスクに適用した。誤差解析の結果,主誤差の59\%が不正な参照ラベルであった。 DRStageNetはURL[原稿の受け入れ]でアクセスできます。

Diabetic retinopathy (DR) is a prevalent complication of diabetes associated with a significant risk of vision loss. Timely identification is critical to curb vision impairment. Algorithms for DR staging from digital fundus images (DFIs) have been recently proposed. However, models often fail to generalize due to distribution shifts between the source domain on which the model was trained and the target domain where it is deployed. A common and particularly challenging shift is often encountered when the source- and target-domain supports do not fully overlap. In this research, we introduce DRStageNet, a deep learning model designed to mitigate this challenge. We used seven publicly available datasets, comprising a total of 93,534 DFIs that cover a variety of patient demographics, ethnicities, geographic origins and comorbidities. We fine-tune DINOv2, a pretrained model of self-supervised vision transformer, and implement a multi-source domain fine-tuning strategy to enhance generalization performance. We benchmark and demonstrate the superiority of our method to two state-of-the-art benchmarks, including a recently published foundation model. We adapted the grad-rollout method to our regression task in order to provide high-resolution explainability heatmaps. The error analysis showed that 59\% of the main errors had incorrect reference labels. DRStageNet is accessible at URL [upon acceptance of the manuscript].

翻訳日:2023-12-25 14:08:50 公開日:2023-12-22

# NPHardEval: 複雑性クラスによる大規模言語モデルの推論能力の動的ベンチマーク

NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes ( http://arxiv.org/abs/2312.14890v1 )

ライセンス: Link先を確認

Lizhou Fan, Wenyue Hua, Lingyao Li, Haoyang Ling, Yongfeng Zhang, Libby Hemphill

(参考訳) 複雑な推論能力は、現在のLLMの最も重要な特徴の1つであり、複雑な意思決定タスクにおいて重要な役割を果たすために利用されてきた。したがって,LLMの推論能力を評価するために,大規模言語モデル (LLM) の推論能力に関する多くのベンチマークが確立されている。しかし、現在のベンチマークはLLMが達成できる推論能力の全範囲を厳格に評価する上で不十分である。これらのベンチマークは公開アクセス可能で静的であるため、モデルが特定のベンチマークメトリクスに対する応答を調整できる可能性があり、その結果、パフォーマンスが増大する。これらの制限に対処するため、我々の研究は NPHardEval という新しいベンチマークを導入した。このベンチマークは、900のアルゴリズム質問の範囲でLLMの推論能力を評価し、NP-Hard複雑性クラスまで拡張するように設計されている。これらの質問は、NPハード複雑性クラス以下の幅広い複雑性クラスを表現するために慎重に選ばれ、LLMの推論能力の厳密な測度を提供する。本研究では,LLMにおける推論の現況に光を当て,複雑なクラス間でのLLMの性能の比較を通して,客観的かつ厳密な視点を提供する。さらに、このベンチマークは動的更新メカニズムで設計されており、データポイントは毎月更新される。このような定期的な更新は、ベンチマークに過剰に適合するllmのリスクを緩和し、より正確で信頼性の高い推論能力の評価を促進する上で、重要な役割を果たす。 NPHardEvalのベンチマークデータセットとコードはhttps://github.com/casmlab/NPHardEvalで公開されている。

Complex reasoning ability is one of the most important features of current LLMs, which has also been leveraged to play an integral role in complex decision-making tasks. Therefore, the investigation into the reasoning capabilities of Large Language Models (LLMs) is critical: numerous benchmarks have been established to assess the reasoning abilities of LLMs. However, current benchmarks are inadequate in offering a rigorous evaluation of the full extent of reasoning abilities that LLMs are capable of achieving. They are also prone to the risk of overfitting, as these benchmarks, being publicly accessible and static, allow models to potentially tailor their responses to specific benchmark metrics, thereby inflating their performance. Addressing these limitations, our research introduces a new benchmark, named NPHardEval. This benchmark is designed to evaluate the reasoning abilities of LLMs across a broad spectrum of 900 algorithmic questions, extending up to the NP-Hard complexity class. These questions are meticulously chosen to represent a wide range of complexity class below the NP-hard complexity class, offering a rigorous measure of the reasoning ability of LLMs. Through this study, we shed light on the current state of reasoning in LLMs, providing an objective and rigorous perspective through the comparison of LLMs' performance across complex classes. Moreover, this benchmark is designed with a dynamic update mechanism, where the datapoints are refreshed on a monthly basis. Such regular updates play a crucial role in mitigating the risk of LLMs overfitting to the benchmark, promoting a more accurate and reliable assessment of their reasoning capabilities. The benchmark dataset and code of NPHardEval are available at https://github.com/casmlab/NPHardEval.

翻訳日:2023-12-25 14:08:30 公開日:2023-12-22

# 非個人データと個人データからのレート最適分類について

On rate-optimal classification from non-private and from private data ( http://arxiv.org/abs/2312.14889v1 )

ライセンス: Link先を確認

Bal\'azs Csan\'ad Cs\'aji, L\'aszl\'o Gy\"orfi, Ambrus Tam\'as

(参考訳) 本稿では,古典的な分類問題を再考するが,プライバシー制約を課す。このような制約下では、生データ$(X_1,Y_1),\ldots,(X_n,Y_n)$を直接観察することはできず、全ての分類器は適切な局所微分プライバシー機構のランダム化結果の関数である。統計学者は、このプライバシーメカニズムの形式を自由に選択でき、ここでは、各特徴ベクトルの位置情報とラベルの$Y_i$の区別にLaplace分散ノイズを追加します。分類規則は、よく研究された分割分類規則の民営化版である。標準のリプシッツ条件とマージン条件に加えて、非プライベートデータとプライベートデータの両方に対して、分類誤差確率の正確な収束率を計算する新しい特徴が導入された。

In this paper we revisit the classical problem of classification, but impose privacy constraints. Under such constraints, the raw data $(X_1,Y_1),\ldots,(X_n,Y_n)$ cannot be directly observed, and all classifiers are functions of the randomised outcome of a suitable local differential privacy mechanism. The statistician is free to choose the form of this privacy mechanism, and here we add Laplace distributed noise to a discretisation of the location of each feature vector $X_i$ and to its label $Y_i$. The classification rule is the privatized version of the well-studied partitioning classification rule. In addition to the standard Lipschitz and margin conditions, a novel characteristic is introduced, by which the exact rate of convergence of the classification error probability is calculated, both for non-private and private data.

翻訳日:2023-12-25 14:07:44 公開日:2023-12-22

# 共分散カーネルからのガウス過程のサンプルパス規則性

Sample Path Regularity of Gaussian Processes from the Covariance Kernel ( http://arxiv.org/abs/2312.14886v1 )

ライセンス: Link先を確認

Natha\"el Da Costa, Marvin Pf\"ortner, Lancelot Da Costa, Philipp Hennig

(参考訳) ガウス過程 (GPs) は函数空間上の確率分布を定義するための最も一般的な形式である。 GPの応用は無数であるが、GPサンプルパスの包括的理解、すなわち確率測度を定義する関数空間は不足している。実際には、GPは確率測度によってではなく、平均関数と共分散核によって構成される。本稿では,対応するgpのサンプルパスに対する共分散核について,与えられた正則性を達成するための必要十分条件を与える。定常および等方的 GP の場合をさらに単純化する、特に直感的な条件を与えるため、H\"古い正則性の枠組みを用いる。そして,この結果により,Mat\'ern GP などの機械学習アプリケーションでよく用いられるGPのサンプルパス規則性の,新規かつ異常に厳密な特徴付けが可能であることを示す。

Gaussian processes (GPs) are the most common formalism for defining probability distributions over spaces of functions. While applications of GPs are myriad, a comprehensive understanding of GP sample paths, i.e. the function spaces over which they define a probability measure on, is lacking. In practice, GPs are not constructed through a probability measure, but instead through a mean function and a covariance kernel. In this paper we provide necessary and sufficient conditions on the covariance kernel for the sample paths of the corresponding GP to attain a given regularity. We use the framework of H\"older regularity as it grants us particularly straightforward conditions, which simplify further in the cases of stationary and isotropic GPs. We then demonstrate that our results allow for novel and unusually tight characterisations of the sample path regularities of the GPs commonly used in machine learning applications, such as the Mat\'ern GPs.

翻訳日:2023-12-25 14:07:21 公開日:2023-12-22

# ランジュバン拡散を用いた多様体上のサンプリングと推定

Sampling and estimation on manifolds using the Langevin diffusion ( http://arxiv.org/abs/2312.14882v1 )

ライセンス: Link先を確認

Karthik Bharath, Alexander Lewis, Akash Sharma, Michael V Tretyakov

(参考訳) 誤差境界は、コンパクトリーマン多様体上の不変測度 $d\mu_\phi \propto e^{-\phi} \mathrm{dvol}_g $ で本質的に定義されたランゲヴィン拡散の離散化を用いてサンプリングと推定のために導出される。離散化されたマルコフ過程に基づく$\mu_\phi $の線形汎関数の2つの推定器は、単一の軌跡に基づく時間分解推定器と、複数の独立軌跡に基づくアンサンブル吸収推定器である。離散化ステップサイズにおける$\phi$, first-order error bounds の平滑性という名目上のレベル以上の制限を課すことなく、両方の推定子のバイアスと分散を導出する。誤差の順序はユークリッド空間と平坦空間の最適速度と一致し、不変測度 $\mu_\phi$ と離散化されたマルコフ過程の定常測度の間の距離上の一階境界につながる。 2つの偏微分方程式とランジュバン拡散に対応する作用素の半群との関係を利用した証明技術の一般性は、ランジュバン拡散に関連するより一般的なサンプリングアルゴリズムの研究に役立てることができる。非コンパクト多様体の場合への解析の拡張条件について述べる。正曲率と負曲率の多様体上の分布、対コンケーブ、その他の数値的挿絵は導出境界上で解明され、サンプリングアルゴリズムの実用的有用性を示す。

Error bounds are derived for sampling and estimation using a discretization of an intrinsically defined Langevin diffusion with invariant measure $d\mu_\phi \propto e^{-\phi} \mathrm{dvol}_g $ on a compact Riemannian manifold. Two estimators of linear functionals of $\mu_\phi $ based on the discretized Markov process are considered: a time-averaging estimator based on a single trajectory and an ensemble-averaging estimator based on multiple independent trajectories. Imposing no restrictions beyond a nominal level of smoothness on $\phi$, first-order error bounds, in discretization step size, on the bias and variances of both estimators are derived. The order of error matches the optimal rate in Euclidean and flat spaces, and leads to a first-order bound on distance between the invariant measure $\mu_\phi$ and a stationary measure of the discretized Markov process. Generality of the proof techniques, which exploit links between two partial differential equations and the semigroup of operators corresponding to the Langevin diffusion, renders them amenable for the study of a more general class of sampling algorithms related to the Langevin diffusion. Conditions for extending analysis to the case of non-compact manifolds are discussed. Numerical illustrations with distributions, log-concave and otherwise, on the manifolds of positive and negative curvature elucidate on the derived bounds and demonstrate practical utility of the sampling algorithm.

翻訳日:2023-12-25 14:06:31 公開日:2023-12-22

# SutraNets: 時系列・確率予測のためのサブシリーズ自動回帰ネットワーク

SutraNets: Sub-series Autoregressive Networks for Long-Sequence, Probabilistic Forecasting ( http://arxiv.org/abs/2312.14880v1 )

ライセンス: Link先を確認

Shane Bergsma, Timothy Zeyl, Lei Guo

(参考訳) 本稿では,長周期時系列のニューラル確率予測のための新しい手法であるkyoNetsを提案する。経網は自己回帰生成モデルを用いて、長い列の確率を条件付き確率の積に分解する。長いシーケンスを生成する場合、ほとんどの自己回帰的アプローチは有害なエラー蓄積と長距離依存関係のモデリングにおける課題に苦しむ。低周波サブシリーズに対する長変量予測を多変量予測として扱う。自己回帰は時間とサブシリーズをまたいで進行し、コヒーレントな多変量(そして、それゆえ高周波不変量)出力を保証する。サブシリーズは少ないステップで生成できるため、リガネットはエラー蓄積や信号経路距離を効果的に削減する。 6つの実世界のデータセットにおける競合の代替案よりも予測精度が大幅に向上し、サブシリーズの数を変動させ、基礎となるシーケンスモデルの深さと幅をスケールする。

We propose SutraNets, a novel method for neural probabilistic forecasting of long-sequence time series. SutraNets use an autoregressive generative model to factorize the likelihood of long sequences into products of conditional probabilities. When generating long sequences, most autoregressive approaches suffer from harmful error accumulation, as well as challenges in modeling long-distance dependencies. SutraNets treat long, univariate prediction as multivariate prediction over lower-frequency sub-series. Autoregression proceeds across time and across sub-series in order to ensure coherent multivariate (and, hence, high-frequency univariate) outputs. Since sub-series can be generated using fewer steps, SutraNets effectively reduce error accumulation and signal path distances. We find SutraNets to significantly improve forecasting accuracy over competitive alternatives on six real-world datasets, including when we vary the number of sub-series and scale up the depth and width of the underlying sequence models.

翻訳日:2023-12-25 14:05:40 公開日:2023-12-22

# Pangu-Agent:構造化推論による微調整可能なジェネリストエージェント

Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning ( http://arxiv.org/abs/2312.14878v1 )

ライセンス: Link先を確認

Filippos Christianos, Georgios Papoudakis, Matthieu Zimmer, Thomas Coste, Zhihao Wu, Jingxuan Chen, Khyati Khandelwal, James Doran, Xidong Feng, Jiacheng Liu, Zheng Xiong, Yicheng Luo, Jianye Hao, Kun Shao, Haitham Bou-Ammar, Jun Wang

(参考訳) 人工知能(AI)エージェントを作成するための重要な方法は強化学習(RL)である。しかし、認識を行動にマッピングするスタンドアロンのRLポリシーの構築は、主に複数のタスクにまたがる汎用性の欠如と、大量のトレーニングデータの必要性など、深刻な問題に直面している。主な原因は、政策を策定する際、事前情報を知覚行動サイクルに効果的に統合できないことである。大規模言語モデル(LLM)は、クロスドメイン知識をAIエージェントに組み込む基本的な方法として登場したが、特定の決定問題に対する重要な学習と適応は欠如している。本稿では、構造化推論をAIエージェントのポリシーに統合し学習するための一般的なフレームワークモデルを提案する。私たちの方法論は、人間の脳にあるモジュラリティによって動機付けられています。このフレームワークは、内在的および外在的関数の構築を利用して、推論構造に関する以前の理解を追加する。また、認知プロセスのモジュール構造と一致して、すべてのモジュールや関数内でモデルを学習する適応能力も提供する。フレームワークの詳細を説明し、他のAIパイプラインや既存のフレームワークと比較する。本稿では,本手法の有効性を示す実験を取り上げ,実用的応用について検討する。この結果から,組織的推論や事前知識が組み込まれている場合,AIエージェントの動作と適応性が向上することが示唆された。これにより、レジリエントで一般的なaiエージェントシステムへのドアが開く。

A key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL). However, constructing a standalone RL policy that maps perception to action directly encounters severe problems, chief among them being its lack of generality across multiple tasks and the need for a large amount of training data. The leading cause is that it cannot effectively integrate prior information into the perception-action cycle when devising the policy. Large language models (LLMs) emerged as a fundamental way to incorporate cross-domain knowledge into AI agents but lack crucial learning and adaptation toward specific decision problems. This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies. Our methodology is motivated by the modularity found in the human brain. The framework utilises the construction of intrinsic and extrinsic functions to add previous understandings of reasoning structures. It also provides the adaptive ability to learn models inside every module or function, consistent with the modular structure of cognitive processes. We describe the framework in-depth and compare it with other AI pipelines and existing frameworks. The paper explores practical applications, covering experiments that show the effectiveness of our method. Our results indicate that AI agents perform and adapt far better when organised reasoning and prior knowledge are embedded. This opens the door to more resilient and general AI agent systems.

翻訳日:2023-12-25 14:05:22 公開日:2023-12-22

# 社会的選択理論を用いた大規模言語モデルからのロバスト知識抽出

Robust Knowledge Extraction from Large Language Models using Social Choice Theory ( http://arxiv.org/abs/2312.14877v1 )

ライセンス: Link先を確認

Nico Potyka, Yuqicheng Zhu, Yunjie He, Evgeny Kharlamov, Steffen Staab

(参考訳) 大規模言語モデル(llm)は、会話エージェント、クリエイティブライティング、テキストの改善、一般的なクエリ応答など、幅広いアプリケーションをサポートする可能性がある。しかし、ランダムに答えを生成し、答えは通常堅牢ではないため、医学のような高レベルのドメインでのクエリ応答には適していない。 LLMクエリのロバスト性を改善するために,ランク付けクエリを繰り返し使用し,ソーシャル選択理論の手法を用いてクエリを集約する手法を提案する。医学的診断や障害診断などの診断環境におけるランキングクエリについて検討し、文献からの部分ボルダ選択関数が複数のクエリ結果のマージにどのように適用できるかについて議論する。我々は、我々の設定におけるいくつかの興味深い特性について論じ、我々のアプローチの堅牢性を実証的に評価する。

Large-language models (LLMs) have the potential to support a wide range of applications like conversational agents, creative writing, text improvement, and general query answering. However, they are ill-suited for query answering in high-stake domains like medicine because they generate answers at random and their answers are typically not robust - even the same query can result in different answers when prompted multiple times. In order to improve the robustness of LLM queries, we propose using ranking queries repeatedly and to aggregate the queries using methods from social choice theory. We study ranking queries in diagnostic settings like medical and fault diagnosis and discuss how the Partial Borda Choice function from the literature can be applied to merge multiple query results. We discuss some additional interesting properties in our setting and evaluate the robustness of our approach empirically.

翻訳日:2023-12-25 14:04:58 公開日:2023-12-22

# 進化プログラム合成によるマルチグリッド手法設計の自動化

Automating the Design of Multigrid Methods with Evolutionary Program Synthesis ( http://arxiv.org/abs/2312.14875v1 )

ライセンス: Link先を確認

Jonas Schmitt

(参考訳) 自然の最も基本的な法則の多くは偏微分方程式(PDE)として定式化することができる。したがって、これらの方程式を理解することは近代科学と工学の多くの分野において非常に重要である。しかし、多くのPDEの一般解は不明であるため、これらの方程式の効率的な近似解は人類の最大の課題の一つである。マルチグリッドはPDEを数値的に解く最も効果的な方法の1つであるが、多くの場合、効率的もしくは少なくとも動作するマルチグリッドソルバの設計はオープンな問題である。この論文は、進化プログラム合成手法である文法誘導型遺伝的プログラミングが、高い効率性と一般化を達成する前例のない構造のマルチグリッド法を発見できることを証明している。そこで我々は,同じマルチグリッド型ソルバを内部構造を適応させることなく,異なるサイズの問題に適用することが可能な,記号的に操作可能な形式言語におけるマルチグリッドメソッドの自動生成を実現する,新しい文脈自由文法を開発した。効率的なマルチグリッド手法の自動設計をプログラム合成タスクとして扱うことで、異なる平滑化と粗いグリッド補正ステップの組み合わせを含む、マルチグリッド操作の新しいシーケンスを離散化階層の各レベルで見つけることができる。このアプローチの実現可能性を証明するため,PythonフレームワークであるEvoStencilsの形で実装されている。この実装は、pythonオブジェクトの有向非巡回グラフの形でマルチグリッドメソッドのアルゴリズムシーケンスを表現することから、コード生成フレームワークexastencilsと進化的計算ライブラリdeapの機能を使った自動生成と最適化までの全ステップを含んでいる。

Many of the most fundamental laws of nature can be formulated as partial differential equations (PDEs). Understanding these equations is, therefore, of exceptional importance for many branches of modern science and engineering. However, since the general solution of many PDEs is unknown, the efficient approximate solution of these equations is one of humanity's greatest challenges. While multigrid represents one of the most effective methods for solving PDEs numerically, in many cases, the design of an efficient or at least working multigrid solver is an open problem. This thesis demonstrates that grammar-guided genetic programming, an evolutionary program synthesis technique, can discover multigrid methods of unprecedented structure that achieve a high degree of efficiency and generalization. For this purpose, we develop a novel context-free grammar that enables the automated generation of multigrid methods in a symbolically-manipulable formal language, based on which we can apply the same multigrid-based solver to problems of different sizes without having to adapt its internal structure. Treating the automated design of an efficient multigrid method as a program synthesis task allows us to find novel sequences of multigrid operations, including the combination of different smoothing and coarse-grid correction steps on each level of the discretization hierarchy. To prove the feasibility of this approach, we present its implementation in the form of the Python framework EvoStencils, which is freely available as open-source software. This implementation comprises all steps from representing the algorithmic sequence of a multigrid method in the form of a directed acyclic graph of Python objects to its automatic generation and optimization using the capabilities of the code generation framework ExaStencils and the evolutionary computation library DEAP.

翻訳日:2023-12-25 14:04:43 公開日:2023-12-22

# brainvis:画像再構成による脳と視覚信号の橋渡しを探索する

BrainVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction ( http://arxiv.org/abs/2312.14871v1 )

ライセンス: Link先を確認

Honghao Fu, Zhiqi Shen, Jing Jih Chin, Hao Wang

(参考訳) 脳信号からの視覚刺激の分析と再構成は、人間の視覚系の理解を効果的に進める。しかし、脳波信号は複雑であり、大量のノイズを含んでいる。これは、脳波埋め込みを細かな意味情報と整合させることの難しさや、トレーニングのために追加の大規模な自己収集データセットに依存することなど、脳波からの視覚刺激再構成の既存の作業に実質的な制限をもたらす。これらの課題に対処するために、BrainVisと呼ばれる新しいアプローチを提案する。まず,脳波信号を様々な単位に分割し,学習難易度を高めるため,脳波の時間領域特性を自己監督的に取得する手法を提案する。さらに,脳波の表現性を高めるために周波数領域機能を利用することも提案する。次に,脳波の時間-周波数埋め込みとCLIP空間の粗いセマンティクスと微粒なセマンティクスの補間を同時に調整し,一次視覚成分の強調と相互アライメントの困難さを低減する。最後に,カスケード拡散モデルを用いて画像の再構成を行う。提案したBrainVisは,意味的忠実度復元と生成品質の両面で,芸術の状態を上回ります。特に、トレーニングデータスケールを以前の作業の10%に削減しました。

Analyzing and reconstructing visual stimuli from brain signals effectively advances understanding of the human visual system. However, the EEG signals are complex and contain a amount of noise. This leads to substantial limitations in existing works of visual stimuli reconstruction from EEG, such as difficulties in aligning EEG embeddings with the fine-grained semantic information and a heavy reliance on additional large self-collected dataset for training. To address these challenges, we propose a novel approach called BrainVis. Firstly, we divide the EEG signals into various units and apply a self-supervised approach on them to obtain EEG time-domain features, in an attempt to ease the training difficulty. Additionally, we also propose to utilize the frequency-domain features to enhance the EEG representations. Then, we simultaneously align EEG time-frequency embeddings with the interpolation of the coarse and fine-grained semantics in the CLIP space, to highlight the primary visual components and reduce the cross-modal alignment difficulty. Finally, we adopt the cascaded diffusion models to reconstruct images. Our proposed BrainVis outperforms state of the arts in both semantic fidelity reconstruction and generation quality. Notably, we reduce the training data scale to 10% of the previous work.

翻訳日:2023-12-25 14:04:14 公開日:2023-12-22

# 財務報告の数値推論

Numerical Reasoning for Financial Reports ( http://arxiv.org/abs/2312.14870v1 )

ライセンス: Link先を確認

Abhinav Arun and Ashish Dhiman and Mehul Soni and Yibei Hu

(参考訳) 財務報告は、会社の運用に関する重要な洞察を提供するが、一般的に3040ページに及ぶ広範な報告書は、ダイナミックマーケットにおける迅速な意思決定の課題を提起している。この問題に対処するために、我々は、これらのレポートに基づく質問から重要な指標と運用メトリクスを抽出するために、微調整されたLarge Language Models (LLMs)を活用しました。我々は、重要なデータを見つける方法を考案し、FinQAデータセットを利用してLlama-2 7BとT5モデルの両方を微調整し、質問応答をカスタマイズした。我々は,数値推論と計算における競合精度である最終数値解のベースラインに匹敵する結果を得た。

Financial reports offer critical insights into a company's operations, yet their extensive length typically spanning 30 40 pages poses challenges for swift decision making in dynamic markets. To address this, we leveraged finetuned Large Language Models (LLMs) to distill key indicators and operational metrics from these reports basis questions from the user. We devised a method to locate critical data, and leverage the FinQA dataset to fine-tune both Llama-2 7B and T5 models for customized question answering. We achieved results comparable to baseline on the final numerical answer, a competitive accuracy in numerical reasoning and calculation.

翻訳日:2023-12-25 14:03:52 公開日:2023-12-22

# 時空間線形:普遍的多変量時系列予測に向けて

Spatiotemporal-Linear: Towards Universal Multivariate Time Series Forecasting ( http://arxiv.org/abs/2312.14869v1 )

ライセンス: Link先を確認

Aiyinsi Zuo, Haixi Zhang, Zirui Li, Ce Zheng

(参考訳) 複雑な多変量時系列予測(TSF)の分野において、一般的なテクニックは、トランスフォーマーベースの設計からリカレントニューラルネットワークまで、複雑なディープラーニングアーキテクチャに依存することが多い。しかし、近年の知見から、単純な線形モデルは多様なデータセットの洗練された構成を克服できることが示唆されている。これらのモデルは、観測を複数の将来の時間ステップに直接マッピングし、反復的多段階予測における誤差蓄積を最小限にする。しかし、これらのモデルはデータに空間的および時間的情報を組み込むことができず、洞察に富んだ予測を導くパターンや依存関係を捉えるのに重要である。この監視は、特に特定のシーケンス長とデータセット条件下でのパフォーマンスボトルネックを招き、その普遍的な適用を妨げます。これに対して,STL(SpatioTemporal-Linear)フレームワークを提案する。 STLは、Linearベースのアーキテクチャを拡張するために、時間組込みと空間インフォームドのバイパスをシームレスに統合する。これらの余分なルートはデータに対するより堅牢で洗練された回帰を提供し、特に観測量に制限があり、依存関係をキャプチャする単純な線形レイヤの容量が減少する。実証的な証拠は、STLの長所を強調し、さまざまな観測期間と予測期間とデータセットにわたって、線形とトランスフォーマーのベンチマークを上回っている。このような堅牢性は、トラフィックの軌跡やまれな疾患の進行予測などを含む、さまざまな応用分野にまたがる適合性を強調する。この談話を通じて、深層学習技術を用いた多変量時系列予測において、STLの特異な能力がより一般的なパラダイムとなることを検証するだけでなく、普遍的なアプリケーションのためのデータスカース予測シナリオに取り組む必要性も強調する。コードは利用可能になる。

Within the field of complicated multivariate time series forecasting (TSF), popular techniques frequently rely on intricate deep learning architectures, ranging from transformer-based designs to recurrent neural networks. However, recent findings suggest that simple Linear models can surpass sophisticated constructs on diverse datasets. These models directly map observation to multiple future time steps, thereby minimizing error accumulation in iterative multi-step prediction. Yet, these models fail to incorporate spatial and temporal information within the data, which is critical for capturing patterns and dependencies that drive insightful predictions. This oversight often leads to performance bottlenecks, especially under specific sequence lengths and dataset conditions, preventing their universal application. In response, we introduce the SpatioTemporal-Linear (STL) framework. STL seamlessly integrates time-embedded and spatially-informed bypasses to augment the Linear-based architecture. These extra routes offer a more robust and refined regression to the data, particularly when the amount of observation is limited and the capacity of simple linear layers to capture dependencies declines. Empirical evidence highlights STL's prowess, outpacing both Linear and Transformer benchmarks across varied observation and prediction durations and datasets. Such robustness accentuates its suitability across a spectrum of applications, including but not limited to, traffic trajectory and rare disease progression forecasting. Through this discourse, we not only validate the STL's distinctive capacities to become a more general paradigm in multivariate time-series prediction using deep-learning techniques but also stress the need to tackle data-scarce prediction scenarios for universal application. Code will be made available.

翻訳日:2023-12-25 14:03:42 公開日:2023-12-22

# VIEScore: 条件付き画像合成評価のための説明可能なメトリクスを目指して

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation ( http://arxiv.org/abs/2312.14867v1 )

ライセンス: Link先を確認

Max Ku and Dongfu Jiang and Cong Wei and Xiang Yue and Wenhu Chen

(参考訳) 条件付き画像生成研究の急速に進歩する分野では、様々なモデルの性能と能力を効果的に評価する上で、限定的な説明可能性などの課題がある。本稿では、条件付き画像生成タスクを評価するための視覚指示誘導説明可能なメトリクスVIESCOREを紹介する。 VIESCOREは、Multimodal Large Language Models(MLLM)の一般的な知識をバックボーンとして活用し、トレーニングや微調整を必要としない。条件付き画像タスクにおいて,VIESCOREを7つの重要なタスクで評価した結果,(1)VIESCORE(GPT4-v)は人間と0.3のスピアマン相関を高い精度で達成し,その相関は0.45であることがわかった。 2) VIESCORE (オープンソースMLLM) は合成画像の評価において GPT-4v よりも著しく弱い。 (3)VIESCOREは、生成タスクにおける人間の評価と同等に相関するが、編集タスクでは困難である。これらの結果から,VIESCOREは画像合成タスクの評価において,人間の判断に取って代わる大きな可能性を秘めていると考えられる。

In the rapidly advancing field of conditional image generation research, challenges such as limited explainability lie in effectively evaluating the performance and capabilities of various models. This paper introduces VIESCORE, a Visual Instruction-guided Explainable metric for evaluating any conditional image generation tasks. VIESCORE leverages general knowledge from Multimodal Large Language Models (MLLMs) as the backbone and does not require training or fine-tuning. We evaluate VIESCORE on seven prominent tasks in conditional image tasks and found: (1) VIESCORE (GPT4-v) achieves a high Spearman correlation of 0.3 with human evaluations, while the human-to-human correlation is 0.45. (2) VIESCORE (with open-source MLLM) is significantly weaker than GPT-4v in evaluating synthetic images. (3) VIESCORE achieves a correlation on par with human ratings in the generation tasks but struggles in editing tasks. With these results, we believe VIESCORE shows its great potential to replace human judges in evaluating image synthesis tasks.

翻訳日:2023-12-25 14:03:11 公開日:2023-12-22

# YAYI 2: 多言語オープンソース大規模言語モデル

YAYI 2: Multilingual Open-Source Large Language Models ( http://arxiv.org/abs/2312.14862v1 )

ライセンス: Link先を確認

Yin Luo, Qingchao Kong, Nan Xu, Jia Cao, Bao Hao, Baoyu Qu, Bo Chen, Chao Zhu, Chenyang Zhao, Donglei Zhang, Fan Feng, Feifei Zhao, Hailong Sun, Hanxuan Yang, Haojun Pan, Hongyu Liu, Jianbin Guo, Jiangtao Du, Jingyi Wang, Junfeng Li, Lei Sun, Liduo Liu, Lifeng Dong, Lili Liu, Lin Wang, Liwen Zhang, Minzheng Wang, Pin Wang, Ping Yu, Qingxiao Li, Rui Yan, Rui Zou, Ruiqun Li, Taiwen Huang, Xiaodong Wang, Xiaofei Wu, Xin Peng, Xina Zhang, Xing Fang, Xinglin Xiao, Yanni Hao, Yao Dong, Yigang Wang, Ying Liu, Yongyu Jiang, Yungan Wang, Yuqi Wang, Zhangsheng Wang, Zhaoxin Yu, Zhen Luo, Wenji Mao, Lei Wang, Dajun Zeng

(参考訳) 自然言語処理の最近の進歩として、大規模言語モデル(llm)は多くの実世界のタスクで人間レベルの言語理解と生成能力を達成し、人工知能への潜在的な道だと見なされている。 LLMの研究をより促進するために、Llama 2 や Falcon など多くのオープンソース LLM が最近提案され、プロプライエタリなモデルに匹敵するパフォーマンスを得た。しかし、これらのモデルは主に英語のシナリオ用に設計されており、中国の文脈ではパフォーマンスが悪い。本稿では,300億のパラメータを持つベースモデルとチャットモデルを含むYAYI 2を提案する。 YAYI 2は、トレーニング済みのデータ処理パイプラインによってフィルタされた2.65兆のトークンを含む多言語コーパス上で、スクラッチから事前トレーニングされる。ベースモデルは、数百万の指示による教師付き微調整と、人間のフィードバックからの強化学習によって、人間の価値と整合する。 MMLUやCMMLUのような複数のベンチマークでの大規模な実験は、提案されたYAYI 2が他の同様のサイズのオープンソースモデルより優れていることを一貫して証明している。

As the latest advancements in natural language processing, large language models (LLMs) have achieved human-level language understanding and generation abilities in many real-world tasks, and even have been regarded as a potential path to the artificial general intelligence. To better facilitate research on LLMs, many open-source LLMs, such as Llama 2 and Falcon, have recently been proposed and gained comparable performances to proprietary models. However, these models are primarily designed for English scenarios and exhibit poor performances in Chinese contexts. In this technical report, we propose YAYI 2, including both base and chat models, with 30 billion parameters. YAYI 2 is pre-trained from scratch on a multilingual corpus which contains 2.65 trillion tokens filtered by our pre-training data processing pipeline. The base model is aligned with human values through supervised fine-tuning with millions of instructions and reinforcement learning from human feedback. Extensive experiments on multiple benchmarks, such as MMLU and CMMLU, consistently demonstrate that the proposed YAYI 2 outperforms other similar sized open-source models.

翻訳日:2023-12-25 14:02:50 公開日:2023-12-22

# macs: マスコンディショニングされた3dハンドと物体の動き合成

MACS: Mass Conditioned 3D Hand and Object Motion Synthesis ( http://arxiv.org/abs/2312.14929v1 )

ライセンス: Link先を確認

Soshi Shimada, Franziska Mueller, Jan Bednarik, Bardia Doosti, Bernd Bickel, Danhang Tang, Vladislav Golyanik, Jonathan Taylor, Christian Theobalt, Thabo Beeler

(参考訳) 質量のような物体の物理的性質は、我々の手でそれを操作する方法に大きな影響を与えます。驚くべきことに、これまでの3dモーション合成の作業では、この側面は無視されている。本研究は, 合成した3次元手の動きの自然性を改善するために, MACSによる最初のMAss Conditioned 3Dハンドとオブジェクトモーション合成手法を提案する。提案手法はカスケード拡散モデルに基づき,物体質量と相互作用型に基づいて再現可能な相互作用を生成する。 MACSはまた、手動で描画された3Dオブジェクトの軌跡を入力として受け入れ、オブジェクトの質量によって条件付けられた自然な3Dハンドモーションを合成する。この柔軟性により、MLタスク用の合成トレーニングデータの生成、グラフィックワークフロー用のハンドの高速アニメーション、コンピュータゲーム用のキャラクターインタラクションの生成など、さまざまなダウンストリームアプリケーションにMACSを使用することができる。我々は,MACSが訓練中に見つからない補間および外挿された物体の質量を合理的に一般化するのに,小規模データセットが十分であることを示す。さらにmacは,表面接触合成モデルであるconnetが生成するマスコンディショニングコンタクトラベルにより,被写体に対する適度な一般化を示す。総合的なユーザ調査により、合成された3Dハンドオブジェクトの相互作用は、極めて可塑性でリアルであることが確認された。

The physical properties of an object, such as mass, significantly affect how we manipulate it with our hands. Surprisingly, this aspect has so far been neglected in prior work on 3D motion synthesis. To improve the naturalness of the synthesized 3D hand object motions, this work proposes MACS the first MAss Conditioned 3D hand and object motion Synthesis approach. Our approach is based on cascaded diffusion models and generates interactions that plausibly adjust based on the object mass and interaction type. MACS also accepts a manually drawn 3D object trajectory as input and synthesizes the natural 3D hand motions conditioned by the object mass. This flexibility enables MACS to be used for various downstream applications, such as generating synthetic training data for ML tasks, fast animation of hands for graphics workflows, and generating character interactions for computer games. We show experimentally that a small-scale dataset is sufficient for MACS to reasonably generalize across interpolated and extrapolated object masses unseen during the training. Furthermore, MACS shows moderate generalization to unseen objects, thanks to the mass-conditioned contact labels generated by our surface contact synthesis model ConNet. Our comprehensive user study confirms that the synthesized 3D hand-object interactions are highly plausible and realistic.

翻訳日:2023-12-25 13:55:54 公開日:2023-12-22

# 人のフィードバックからの強化学習に関する調査

A Survey of Reinforcement Learning from Human Feedback ( http://arxiv.org/abs/2312.14925v1 )

ライセンス: Link先を確認

Timo Kaufmann, Paul Weng, Viktor Bengs, Eyke H\"ullermeier

(参考訳) 人間からのフィードバックからの強化学習(RLHF)は、工学的な報酬関数に頼るのではなく、人間のフィードバックから学習する強化学習(RL)の一種である。プレファレンスベース強化学習(pbrl)の関連設定に関する先行研究に基づき、人工知能と人間とコンピュータの相互作用の交差点に位置する。この位置付けは、知的システムのパフォーマンスと適応性を高めるとともに、目的と人間の価値の整合性を向上させるための有望な道を提供する。 LLM(Large Language Models)のトレーニングは、RLHFが人間の目的に向けたモデルの能力をターゲットにする決定的な役割を担った近年において、この可能性を著しく証明している。本稿では、RLHFの基礎を概観し、機械エージェントと人間の入力の間の複雑なダイナミクスを探求する。近年, LLM の RLHF に焦点が当てられているが,本調査では多種多様な応用, 広範にわたる影響について, より広い視点で検討している。我々は,rlhfを支える基本原理を考察し,アルゴリズムと人間のフィードバックの共生関係を考察し,この分野の主要な研究動向について考察した。本稿は,RLHF研究の現況を合成することによって,この急成長する研究分野の包括的理解を研究者や実践者に提供することを目的とする。

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of Large Language Models (LLMs) has impressively demonstrated this potential in recent years, where RLHF played a decisive role in targeting the model's capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between machine agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.

翻訳日:2023-12-25 13:55:31 公開日:2023-12-22

# 前向きアルゴリズムによる畳み込みニューラルネットワークの学習

Training Convolutional Neural Networks with the Forward-Forward algorithm ( http://arxiv.org/abs/2312.14924v1 )

ライセンス: Link先を確認

Riccardo Scodellaro, Ajinkya Kulkarni, Frauke Alves, Matthias Schr\"oter

(参考訳) 最近のディープニューラルネットワークによる画像解析の成功は、畳み込みニューラルネットワーク(CNN)によってほぼ完全に達成されている。これらのcnnのトレーニングは、実際にはすべてのディープニューラルネットワークアーキテクチャにおいて、ネットワークの出力と望ましい結果を比較するバックプロパゲーションアルゴリズムを使用しており、ネットワークの重み付けを望ましい結果に向けてチューニングするために差が使用される。 2022年のプレプリントで、Geoffrey Hinton氏は、望ましい結果とネットワークの入力時のイメージを渡す別のトレーニング方法を提案した。このフォーワードフォワード(FF)アルゴリズムは、現在まで完全に接続されたネットワークでしか使われていない。本稿では,FFパラダイムをCNNに拡張する方法について述べる。新たな空間拡張ラベル法を特徴とするff学習cnnは,mnist手書き文字データセット上で99.0%の分類精度を実現する。提案アルゴリズムの性能に異なるハイパーパラメータがどう影響するかを示し、標準バックプロパゲーション手法を用いてトレーニングしたCNNと比較する。さらに、クラスアクティベーションマップを用いて、FFアルゴリズムによってどの種類の機能が学習されるかを調べる。

The recent successes in analyzing images with deep neural networks are almost exclusively achieved with Convolutional Neural Networks (CNNs). The training of these CNNs, and in fact of all deep neural network architectures, uses the backpropagation algorithm where the output of the network is compared with the desired result and the difference is then used to tune the weights of the network towards the desired outcome. In a 2022 preprint, Geoffrey Hinton suggested an alternative way of training which passes the desired results together with the images at the input of the network. This so called Forward Forward (FF) algorithm has up to now only been used in fully connected networks. In this paper, we show how the FF paradigm can be extended to CNNs. Our FF-trained CNN, featuring a novel spatially-extended labeling technique, achieves a classification accuracy of 99.0% on the MNIST hand-written digits dataset. We show how different hyperparameters affect the performance of the proposed algorithm and compare the results with CNN trained with the standard backpropagation approach. Furthermore, we use Class Activation Maps to investigate which type of features are learnt by the FF algorithm.

翻訳日:2023-12-25 13:55:07 公開日:2023-12-22

# Fast-NTK:大規模モデルのためのパラメータ効率の良い未学習

Fast-NTK: Parameter-Efficient Unlearning for Large-Scale Models ( http://arxiv.org/abs/2312.14923v1 )

ライセンス: Link先を確認

Guihong Li, Hsiang Hsu, Chun-Fu Chen, and Radu Marculescu

(参考訳) 機械学習の急速な成長により、ユーザはデータの削除を要求できる‘忘れられる権利’のような立法活動が加速した。これに対して ``machine unlearning'' では,スクラッチから再トレーニングを必要とせずに,不要なデータの選択的削除を提案する。 neural-tangent-kernel-based (ntk-based)アンラーニング手法は性能に優れているが、特に大規模モデルやデータセットでは計算の複雑さが著しい。このアルゴリズムは,CNNの細調整バッチ正規化層や視覚変換器の視覚的プロンプトなどのパラメータ効率の高い微調整手法を取り入れることで,計算複雑性を大幅に低減する。実験結果から,より大規模なニューラルネットワークやデータセット(88mパラメータ,5kイメージなど)に対するスケーラビリティが,より小さなケース(例えば8mパラメータ,500イメージ)向けに設計された従来のフルモデルntkベースのアプローチの限界を上回っていることが示された。特に当社のアプローチは,retainセットのみをリトレーニングする従来の方法に匹敵するパフォーマンスを維持しています。これにより、ディープニューラルネットワークにおける実践的でスケーラブルなNTKベースのアンラーニングが可能になる。

The rapid growth of machine learning has spurred legislative initiatives such as ``the Right to be Forgotten,'' allowing users to request data removal. In response, ``machine unlearning'' proposes the selective removal of unwanted data without the need for retraining from scratch. While the Neural-Tangent-Kernel-based (NTK-based) unlearning method excels in performance, it suffers from significant computational complexity, especially for large-scale models and datasets. Our work introduces ``Fast-NTK,'' a novel NTK-based unlearning algorithm that significantly reduces the computational complexity by incorporating parameter-efficient fine-tuning methods, such as fine-tuning batch normalization layers in a CNN or visual prompts in a vision transformer. Our experimental results demonstrate scalability to much larger neural networks and datasets (e.g., 88M parameters; 5k images), surpassing the limitations of previous full-model NTK-based approaches designed for smaller cases (e.g., 8M parameters; 500 images). Notably, our approach maintains a performance comparable to the traditional method of retraining on the retain set alone. Fast-NTK can thus enable for practical and scalable NTK-based unlearning in deep neural networks.

翻訳日:2023-12-25 13:54:47 公開日:2023-12-22

# 高次統計から効率的に学ぶ:仮説テスト、ランダム特徴、ニューラルネットワーク

Learning from higher-order statistics, efficiently: hypothesis tests, random features, and neural networks ( http://arxiv.org/abs/2312.14922v1 )

ライセンス: Link先を確認

Eszter Sz\'ekely, Lorenzo Bardone, Federica Gerace, Sebastian Goldt

(参考訳) ニューラルネットワークは高次元データセットにおける統計的パターンの発見に優れる。実際、3つ以上の変数間の非ガウス相関を定量化する高次累積は、ニューラルネットワークの性能にとって特に重要である。しかし、高次累積から特徴を抽出するニューラルネットワークはどの程度効率的か? 我々はこの問題をスパイク累積モデルで研究し、統計学者は-$d$次元入力の-$p\ge 4$累積から特権的な方向または「スパイク」を復元する必要がある。まず,スパイク累積モデルからの入力と等方的ガウス入力を区別するために必要となるサンプル数〜n$を解析することにより,スパイク回復の基本統計と計算限界を特徴付ける。統計的微分可能性には$n\gtrsim d$サンプルが必要であるのに対し、多項式時間における2つの分布を区別するには、幅広い種類のアルゴリズム、すなわち低次予想でカバーされているものに対して$n \gtrsim d^2$サンプルが必要である。これらの結果は,この問題に広く統計学と計算学のギャップが存在することを示唆している。数値実験により、ニューラルネットワークは2つの分布を二次的なサンプル複雑性で区別することを学び、ランダムな特徴のような"怠慢"な手法は、この方法でのランダムな推測よりも優れていることが示されている。その結果、ニューラルネットワークはスパイク累積モデルにおける高次相関から情報を効率的に抽出し、ニューラルネットワークが必要とするデータ量と高次累積モデルから学習するためのランダム特徴のギャップを明らかにする。

Neural networks excel at discovering statistical patterns in high-dimensional data sets. In practice, higher-order cumulants, which quantify the non-Gaussian correlations between three or more variables, are particularly important for the performance of neural networks. But how efficient are neural networks at extracting features from higher-order cumulants? We study this question in the spiked cumulant model, where the statistician needs to recover a privileged direction or "spike" from the order-$p\ge 4$ cumulants of~$d$-dimensional inputs. We first characterise the fundamental statistical and computational limits of recovering the spike by analysing the number of samples~$n$ required to strongly distinguish between inputs from the spiked cumulant model and isotropic Gaussian inputs. We find that statistical distinguishability requires $n\gtrsim d$ samples, while distinguishing the two distributions in polynomial time requires $n \gtrsim d^2$ samples for a wide class of algorithms, i.e. those covered by the low-degree conjecture. These results suggest the existence of a wide statistical-to-computational gap in this problem. Numerical experiments show that neural networks learn to distinguish the two distributions with quadratic sample complexity, while "lazy" methods like random features are not better than random guessing in this regime. Our results show that neural networks extract information from higher-order correlations in the spiked cumulant model efficiently, and reveal a large gap in the amount of data required by neural networks and random features to learn from higher-order cumulants.

翻訳日:2023-12-25 13:54:23 公開日:2023-12-22

# イネ表現型データのための新しいサンプルクラスタリングアルゴリズム

A Novel Sampled Clustering Algorithm for Rice Phenotypic Data ( http://arxiv.org/abs/2312.14920v1 )

ライセンス: Link先を確認

Mithun Singh, Kapil Ahuja, Milind B. Ratnaparkhe

(参考訳) 植物種のフェノタイプ(または物理的)特性は、一般的にクラスタリングに使用される。最近の研究の一つ(Shastri et al. (2021))では、確率的サンプリング(ピボットサンプリング)とスペクトル的クラスタリングアルゴリズムを用いてダイズ種を分類した。これらの手法は、低コストで高精度なクラスタリングを得るために使われた。本研究では,初期のアルゴリズムをイネの群落に拡張する。基本アルゴリズムを3つの方法で改善する。まず,スペクトルクラスタリングにおける類似性行列を構築する新しい関数を提案する。一般に、自然指数関数はこの目的のために用いられる。スペクトルグラフ理論とチーガーの不等式に基づき、代わりに基本"a"指数関数を用いることを提案する。これはクラスタリングに好適な類似性行列スペクトルを与え、固有値解析によってサポートする。第二に、スペクトルクラスタリングで類似性行列を構築するために使われる関数は、以前固定因子(グローバルスケーリングと呼ばれる)でスケールされた。 Zelnik-Manor と Perona (2004) のアイデアに基づいて、行列要素(局所スケーリングと呼ばれる)によって変化する因子を使い、よりうまく機能する。第三に、重要なサンプリングアルゴリズムにおけるspecieの包含確率を計算するために、我々は以前、specieの特性値がそれぞれの基底値からどれだけ離れているか(すべての種で計算されている)を捉えた偏差の概念を用いていた。基本値を見つけるために、以前は最大関数が使われていた。現在では中央値関数を使っており、より直感的です。我々はこの選択を統計分析を用いて支持する。 1865種のイネについての実験を行い、シルエット値の観点から、我々の新しいサンプリングスペクトルクラスタリングは階層クラスタリングよりも61%優れていることを実証した。また,新しいアルゴリズムは,サンプリングによる階層的クラスタリングよりもかなり高速である。

Phenotypic (or Physical) characteristics of plant species are commonly used to perform clustering. In one of our recent works (Shastri et al. (2021)), we used a probabilistically sampled (using pivotal sampling) and spectrally clustered algorithm to group soybean species. These techniques were used to obtain highly accurate clusterings at a reduced cost. In this work, we extend the earlier algorithm to cluster rice species. We improve the base algorithm in three ways. First, we propose a new function to build the similarity matrix in Spectral Clustering. Commonly, a natural exponential function is used for this purpose. Based upon the spectral graph theory and the involved Cheeger's inequality, we propose the use a base "a" exponential function instead. This gives a similarity matrix spectrum favorable for clustering, which we support via an eigenvalue analysis. Second, the function used to build the similarity matrix in Spectral Clustering was earlier scaled with a fixed factor (called global scaling). Based upon the idea of Zelnik-Manor and Perona (2004), we now use a factor that varies with matrix elements (called local scaling) and works better. Third, to compute the inclusion probability of a specie in the pivotal sampling algorithm, we had earlier used the notion of deviation that captured how far specie's characteristic values were from their respective base values (computed over all species). A maximum function was used before to find the base values. We now use a median function, which is more intuitive. We support this choice using a statistical analysis. With experiments on 1865 rice species, we demonstrate that in terms of silhouette values, our new Sampled Spectral Clustering is 61% better than Hierarchical Clustering (currently prevalent). Also, our new algorithm is significantly faster than Hierarchical Clustering due to the involved sampling.

翻訳日:2023-12-25 13:53:54 公開日:2023-12-22

# Lift-Attend-Splat:変圧器を用いたバードアイビューカメラライダー融合

Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers ( http://arxiv.org/abs/2312.14919v1 )

ライセンス: Link先を確認

James Gunn, Zygmunt Lenyk, Anuj Sharma, Andrea Donati, Alexandru Buburuzan, John Redford, and Romain Mueller

(参考訳) 補完的なセンサモダリティの組み合わせは、自律運転(ad)のような安全クリティカルなロボティクスアプリケーションのための堅牢な認識を提供するために不可欠である。近年のAD用カメラとライダーの融合法は,ライダーからの深度情報を直接利用するよりも,単眼深度推定に頼っている。ここでは,本手法が期待通り深度を生かしていないこと,また,過度に深度推定を改良しても物体検出性能は向上せず,また,絶対的に深度推定を除去しても物体検出性能は劣化しないことを示す。これは、単眼深度に依存することは、カメラとライダーの融合において不要なアーキテクチャ上のボトルネックであることを示唆している。そこで本研究では,単眼深度推定を完全にバイパスし,単純な注意機構を用いて鳥眼網のカメラとライダーの機能を選択・融合する新しい融合手法を提案する。提案手法は,lidar機能の利用に基づいてカメラ機能の利用を変調し,単眼深度推定に基づくベースラインよりも,nuscenesデータセット上でより優れた3dオブジェクト検出を実現することを示す。

Combining complementary sensor modalities is crucial to providing robust perception for safety-critical robotics applications such as autonomous driving (AD). Recent state-of-the-art camera-lidar fusion methods for AD rely on monocular depth estimation which is a notoriously difficult task compared to using depth information from the lidar directly. Here, we find that this approach does not leverage depth as expected and show that naively improving depth estimation does not lead to improvements in object detection performance and that, strikingly, removing depth estimation altogether does not degrade object detection performance. This suggests that relying on monocular depth could be an unnecessary architectural bottleneck during camera-lidar fusion. In this work, we introduce a novel fusion method that bypasses monocular depth estimation altogether and instead selects and fuses camera and lidar features in a bird's-eye-view grid using a simple attention mechanism. We show that our model can modulate its use of camera features based on the availability of lidar features and that it yields better 3D object detection on the nuScenes dataset than baselines relying on monocular depth estimation.

翻訳日:2023-12-25 13:53:28 公開日:2023-12-22

# PoseGen: NeRFで3DのPoseデータセットを生成する学習

PoseGen: Learning to Generate 3D Human Pose Dataset with NeRF ( http://arxiv.org/abs/2312.14915v1 )

ライセンス: Link先を確認

Mohsen Gholami, Rabab Ward, Z. Jane Wang

(参考訳) 本稿では,Neural Radiance Fields (NeRF) を用いた3次元ポーズデータセット生成のためのエンドツーエンドフレームワークを提案する。公開データセットは一般的に、人間のポーズやカメラの視点に関して、限られた多様性を持っている。結果として、公開データセットでトレーニングされたポーズ推定器は、未発見の分散サンプルに適用された場合、著しく低下する。以前の研究では、2d-3dのポーズペアを生成したり、大量のランダムデータをレンダリングすることで、パブリックデータセットの強化を提案した。このようなアプローチは、画像レンダリングを見落としたり、事前訓練されたモデルに最適なデータセットをもたらす。本稿では,与えられたポーズ推定器からフィードバック損失を伴うデータセット(人間の3dポーズと画像)を生成する方法を提案する。先行技術とは対照的に、生成されたデータは事前学習したモデルのロバスト性を改善するために最適化されます。 posegenの目的は、与えられた事前学習モデルの予測誤差を最大化するデータの分布を学ぶことである。学習したデータ分布は、事前学習されたモデルのOODサンプルを含むため、事前学習されたモデルをさらに微調整するために、そのような分布からサンプリングしたデータは、モデルの一般化性を向上させる。これは3次元データ生成のためのNeRFを提案する最初の研究である。 NeRFはデータ駆動であり、人間の3Dスキャンを必要としない。したがって、データ生成にNeRFを使うことは、便利なユーザ固有のデータ生成のための新しい方向である。提案したPoseGenは,平均6%の改善率で4つのデータセット上で2つのベースラインモデル(SPINとHybrIK)を改善した。

This paper proposes an end-to-end framework for generating 3D human pose datasets using Neural Radiance Fields (NeRF). Public datasets generally have limited diversity in terms of human poses and camera viewpoints, largely due to the resource-intensive nature of collecting 3D human pose data. As a result, pose estimators trained on public datasets significantly underperform when applied to unseen out-of-distribution samples. Previous works proposed augmenting public datasets by generating 2D-3D pose pairs or rendering a large amount of random data. Such approaches either overlook image rendering or result in suboptimal datasets for pre-trained models. Here we propose PoseGen, which learns to generate a dataset (human 3D poses and images) with a feedback loss from a given pre-trained pose estimator. In contrast to prior art, our generated data is optimized to improve the robustness of the pre-trained model. The objective of PoseGen is to learn a distribution of data that maximizes the prediction error of a given pre-trained model. As the learned data distribution contains OOD samples of the pre-trained model, sampling data from such a distribution for further fine-tuning a pre-trained model improves the generalizability of the model. This is the first work that proposes NeRFs for 3D human data generation. NeRFs are data-driven and do not require 3D scans of humans. Therefore, using NeRF for data generation is a new direction for convenient user-specific data generation. Our extensive experiments show that the proposed PoseGen improves two baseline models (SPIN and HybrIK) on four datasets with an average 6% relative improvement.

翻訳日:2023-12-25 13:53:07 公開日:2023-12-22

# 置換不変量子回路

Permutation-invariant quantum circuits ( http://arxiv.org/abs/2312.14909v1 )

ライセンス: Link先を確認

Maximilian Balthasar Mansky, Santiago Londo\~no Castillo, Victor Ramos Puigvert, Claudia Linnhoff-Popien

(参考訳) 問題記述への物理的対称性の実装は、パラメータと計算複雑性の削減を可能にする。置換対称性を量子回路への最も制限的な離散対称性として積分する。置換対称性は、他のすべての離散群の超群である。我々は、キュービット上の$\operatorname{swap}$操作で置換を識別する。対称性の対応するリー代数への拡張に基づいて、量子回路要素の構成は指数によって示される。これにより、置換群対称性を量子回路アンサーゼに統合することができる。パラメータの数のスケーリングは$\mathcal{o}(n^3)$であり、一般の場合よりもかなり低く、対称性が量子計算の適用性を制限することを示す。また、置換による置換対称性の下で不変であるように既存の回路を適応する方法を示す。

The implementation of physical symmetries into problem descriptions allows for the reduction of parameters and computational complexity. We show the integration of the permutation symmetry as the most restrictive discrete symmetry into quantum circuits. The permutation symmetry is the supergroup of all other discrete groups. We identify the permutation with a $\operatorname{SWAP}$ operation on the qubits. Based on the extension of the symmetry into the corresponding Lie algebra, quantum circuit element construction is shown via exponentiation. This allows for ready integration of the permutation group symmetry into quantum circuit ansatzes. The scaling of the number of parameters is found to be $\mathcal{O}(n^3)$, significantly lower than the general case and an indication that symmetry restricts the applicability of quantum computing. We also show how to adapt existing circuits to be invariant under a permutation symmetry by modification.

翻訳日:2023-12-25 13:52:42 公開日:2023-12-22

# 擬エルミート系の量子化

Quantization of pseudo-hermitian systems ( http://arxiv.org/abs/2312.14906v1 )

ライセンス: Link先を確認

M.C. Baldiotti, R. Fresneda

(参考訳) この研究は、任意の次元のグラスマン代数に対する \cite{baldiotti2021} の一般化である。ここでは、非エルミート量子力学に着目した擬古典理論の共変量子化スキームを提案する。量子化は、正準古典論を任意の次元における等価量子実現に準じる。形式論をハイゼンベルク相互作用を持つ2つの結合スピンの問題に適用する。

This work is a generalization of \cite{baldiotti2021} to Grassmann algebras of arbitrary dimensions. Here we present a covariant quantization scheme for pseudoclassical theories focused on non-hermitian quantum mechanics. The quantization maps canonically related pseudoclassical theories to equivalent quantum realizations in arbitrary dimensions. We apply the formalism to the problem of two coupled spins with Heisenberg interaction.

翻訳日:2023-12-25 13:52:31 公開日:2023-12-22

# 量子アルゴリズムによる科学応用

Quantum algorithms for scientific applications ( http://arxiv.org/abs/2312.14904v1 )

ライセンス: Link先を確認

R. Au-Yeung and B. Camino and O. Rathore and V. Kendon

(参考訳) 量子コンピューティングは、様々なアプリケーション分野の計算能力の次のステップを提供すると約束している。本稿では,実世界の応用において真の量子優位性を達成するために必要な量子ハイプとブレークスルーの背後にある科学を考察する。ハイパフォーマンスコンピューティング(HPC)に最も影響を与える可能性のある分野には、量子システムのシミュレーション、最適化、機械学習などがある。我々は、HPCの現在の科学・工学的利用のかなりの部分を占める材料シミュレーションと計算流体力学の例を引用する。潜在的な課題は、量子デバイスのための古典的なデータのエンコーディングとデコード、および古典プロセッサと量子プロセッサ間のクロック速度のミスマッチである。現在の古典的手法への控えめな量子拡張でさえも、気象予報、工学、航空宇宙、薬物設計、持続可能な開発のための「緑」素材の実現など、広範囲に及ぶ影響をもたらすだろう。これは計算科学、工学、量子コンピューティングのコミュニティの協力による多大な努力を必要とする。

Quantum computing promises to provide the next step up in computational power for diverse application areas. In this review, we examine the science behind the quantum hype and breakthroughs required to achieve true quantum advantage in real world applications. Areas that are likely to have the greatest impact on high performance computing (HPC) include simulation of quantum systems, optimisation, and machine learning. We draw our examples from materials simulations and computational fluid dynamics which account for a large fraction of current scientific and engineering use of HPC. Potential challenges include encoding and decoding classical data for quantum devices, and mismatched clock speeds between classical and quantum processors. Even a modest quantum enhancement to current classical techniques would have far-reaching impacts in areas such as weather forecasting, engineering, aerospace, drug design, and realising ``green'' materials for sustainable development. This requires significant effort from the computational science, engineering and quantum computing communities working together.

翻訳日:2023-12-25 13:52:27 公開日:2023-12-22

# bipartiete mixed separable state を用いた ancilla-assisted process tomography の検討

Ancilla-Assisted Process Tomography with Bipartiete Mixed Separable States ( http://arxiv.org/abs/2312.14901v1 )

ライセンス: Link先を確認

Zhuoran Bao, Daniel F. V. James

(参考訳) apt(ancilla-assisted process tomography)の実施には,システム状態と補助状態との絡み合いが厳密な要件ではないことが示されている。代わりに、システム・アンシラ状態は忠実であることが要求され、これは状態を表すある行列の可逆性と同値である。しかし、小さなエラー増幅をもたらす忠実な状態と、より大きなエラー増幅をもたらす状態とを区別することは困難である。 2量子ビット系アンシラ状態に限定され,2つの量子ビットの相関を分類する可逆性問題とシナスター性の概念を結びつける理論的解析を行う。シナスターネスを用いることで、最小の誤差増幅で2つの量子ビットの忠実な混合分離状態を構成する方法を提供する。最大絡み合う状態は最小の誤差増幅を与える一方、分離可能なワーナー状態は最大絡み合う状態よりも大きい不均一な誤差増幅を生成することを示した。それでも、分離可能なヴェルナー状態または等方性状態の反転による誤差増幅は、任意の混合分離可能な状態において最良である。

It has been shown that the entanglement between the system state and the ancillary state is not a strict requirement for performing ancilla-assisted process tomography(AAPT). Instead, it only requires that the system-ancilla state be faithful, which is equivalent to the invertibility of a certain matrix representing the state. However, it is difficult to distinguish between a faithful state that brings small error amplification and one that produces larger error amplification. Restricted to two-qubit system-ancilla states, we present a theoretical analysis to connect the invertibility problem to the concept of sinisterness, which classifies the correlation of two qubits. Using sinisterness, we provide a way of constructing all two qubits faithful mixed separable states with the smallest error amplification. We show that the maximally entangled states provided the smallest error amplification, while the separable Werner states produced an uneven error amplification larger than the maximally entangled state. Nevertheless, the error amplification due to inverting the separable Werner states or isotropic states is the best any mixed separable state can do.

翻訳日:2023-12-25 13:52:12 公開日:2023-12-22

# キャリブレーションノイズ源を用いた低ノイズ極低温マイクロ波増幅器の特性評価

Low-noise cryogenic microwave amplifier characterization with a calibrated noise source ( http://arxiv.org/abs/2312.14900v1 )

ライセンス: Link先を確認

M. Malnou, T. F. Q. Larson, J. D. Teufel, F. Lecocq and J. Aumentado

(参考訳) パラメトリック増幅器は超伝導量子コンピューティングのワークホースとなっているが、これらの装置の研究と開発は不整合であり、時にはノイズ性能評価手法の誤認によって妨げられている。ノイズ特性の背景にある概念は明らかに単純であり、測定や解釈、分析において誤りを犯すことのできる場所はたくさんある。本稿では,ノイズ特性評価の基礎と,パワーハンドリング能力に制限のあるパラメトリック増幅器の特殊問題について述べる。本稿では,高電子移動型トランジスタ増幅器,ジョセフソン走行波パラメトリック増幅器,ジョセフソンパラメトリック増幅器の3つの具体例を紹介する。我々は,50-$\Omega$ショットノイズトンネル接合(SNTJ)をブロードバンドノイズ源として使用することを強調し,低温増幅増幅器の実用性を実証した。これらの実用的な例は、損失の役割と追加のパラメトリック増幅器 'idler' 入力モードを強調している。

Parametric amplifiers have become a workhorse in superconducting quantum computing, however research and development of these devices has been hampered by inconsistent, and sometimes misleading noise performance characterization methodologies. The concepts behind noise characterization are deceptively simple, and there are many places where one can make mistakes, either in measurement or interpretation and analysis. In this article we cover the basics of noise performance characterization, and the special problems it presents in parametric amplifiers with limited power handling capability. We illustrate the issues with three specific examples: a high-electron mobility transistor amplifier, a Josephson traveling-wave parametric amplifier, and a Josephson parametric amplifier. We emphasize the use of a 50-$\Omega$ shot noise tunnel junction (SNTJ) as a broadband noise source, demonstrating its utility for cryogenic amplifier amplifications. These practical examples highlight the role of loss as well as the additional parametric amplifier `idler' input mode.

翻訳日:2023-12-25 13:51:52 公開日:2023-12-22

# バグレポートから関連するテスト入力を抽出した自動テストケース生成

Enriching Automatic Test Case Generation by Extracting Relevant Test Inputs from Bug Reports ( http://arxiv.org/abs/2312.14898v1 )

ライセンス: Link先を確認

Wendk\^uuni C. Ou\'edraogo, Laura Plein, Kader Kabor\'e, Andrew Habib, Jacques Klein, David Lo, Tegawend\'e F. Bissyand\'e

(参考訳) ソフトウェアの品質は、提出されたテストの品質に大きく依存します。したがって、バグ検出のためのテストを書くことは不可欠である。しかし、手動で行うと時間がかかります。したがって、テストケース生成の自動化は、ソフトウェアエンジニアリングコミュニティにおけるエキサイティングな研究領域である。ほとんどのアプローチはユニットテストの生成に重点を置いている。残念なことに、現在の取り組みは、しばしば関連する入力を生成しないため、自動生成テストの効率が制限される。テストインプットの関連性を改善するために,自動テスト生成ツールに供給可能な入力値を特定するための,バグレポートの探索手法である \name を提案する。本研究では,バグレポートから抽出した入力を \name で評価し,evosuite でテストケースを生成する。評価はDefects4Jベンチマークで行われる。 Defects4J プロジェクトでは,正規表現を用いた場合,正規表現を使用せず,関連する入力の 68.68 % を抽出できた。さらに,本研究では,全プロジェクトにおけるラインとインストラクションのカバレッジを向上させる可能性を示唆した。全体として、ベースラインによって検出されなかった45のバグの検出に繋がった関連するインプットの収集に成功した。

The quality of a software is highly dependent on the quality of the tests it is submitted to. Writing tests for bug detection is thus essential. However, it is time-consuming when done manually. Automating test cases generation has therefore been an exciting research area in the software engineering community. Most approaches have been focused on generating unit tests. Unfortunately, current efforts often do not lead to the generation of relevant inputs, which limits the efficiency of automatically generated tests. Towards improving the relevance of test inputs, we present \name, a technique for exploring bug reports to identify input values that can be fed to automatic test generation tools. In this work, we investigate the performance of using inputs extracted from bug reports with \name to generate test cases with Evosuite. The evaluation is performed on the Defects4J benchmark. For Defects4J projects, our study has shown that \name successfully extracted 68.68\% of relevant inputs when using regular expression in its approach versus 50.21\% relevant inputs without regular expression. Further, our study has shown the potential to improve the Line and Instruction Coverage across all projects. Overall, we successfully collected relevant inputs that led to the detection of 45 bugs that were previously undetected by the baseline.

翻訳日:2023-12-25 13:51:34 公開日:2023-12-22

# 強抗ヘビー塑性はネットワークアトラクタ景観の凸性を変化させる

Strong anti-Hebbian plasticity alters the convexity of network attractor landscapes ( http://arxiv.org/abs/2312.14896v1 )

ライセンス: Link先を確認

Lulu Gong, Xudong Chen, ShiNung Ching

(参考訳) 本稿では,ペアワイズ学習ルールの存在下でのリカレントニューラルネットワークについて検討する。特に,このようなネットワークの誘引的景観が,大規模最適化問題を媒介するルールの能力に重きを置き,学習の強みと自然(反ヘビー語と反ヘビー語)の機能としてどのように変化するかに関心を持っている。フォーマルな分析を通して、ヘビアンから反ヘビアン学習への移行は、ネットワークの誘引者景観の凸性を破壊するピッチフォーク分岐をもたらすことを示す。大規模な設定では、反ヘビアン可塑性は複数の安定平衡をもたらし、そのような効果は相互接続や「チョーク」点において超える可能性がある。さらに、アトラクタランドスケープはより速いものよりも遅い学習率に敏感である。これらの結果は、異なるペアの可塑性規則によって符号化される対象関数の種類に関する洞察を与える。

In this paper, we study recurrent neural networks in the presence of pairwise learning rules. We are specifically interested in how the attractor landscapes of such networks become altered as a function of the strength and nature (Hebbian vs. anti-Hebbian) of learning, which may have a bearing on the ability of such rules to mediate large-scale optimization problems. Through formal analysis, we show that a transition from Hebbian to anti-Hebbian learning brings about a pitchfork bifurcation that destroys convexity in the network attractor landscape. In larger-scale settings, this implies that anti-Hebbian plasticity will bring about multiple stable equilibria, and such effects may be outsized at interconnection or `choke' points. Furthermore, attractor landscapes are more sensitive to slower learning rates than faster ones. These results provide insight into the types of objective functions that can be encoded via different pairwise plasticity rules.

翻訳日:2023-12-25 13:51:11 公開日:2023-12-22

PDF登録状況（公開日: 20231222）