Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20231104となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# ユニバーサル・アトミック・コンポータビリティを目指して:Ethereum上のマルチロール環境のための形式モデル Towards Universal Atomic Composability: A Formal Model for Multi-Rollup Environments on Ethereum ( http://arxiv.org/abs/2311.00422v2 ) ライセンス: Link先を確認	Dipankar Sarkar,	(参考訳) 分散台帳技術の急速に発展する領域において、スケーラビリティと相互運用性は、学術と産業の両方において最重要課題となっている。本稿では,Ethereum上の複数のロールアップにまたがるアトミックコンポーザビリティに対処する包括的形式モデルを提案する。提案モデルではバッファリング,依存性管理,並行性制御,ゼロ知識証明などの機構を取り入れた。さらに, その実用的反感, 強み, 弱みを評価し, 操作性や誤動作に対する弾力性を確保する。提案したモデルを共有シーケンサや他の既存ソリューションに適用することは、その汎用性と普遍性をアクセントする。 In the rapidly evolving domain of distributed ledger technology, scalability and interoperability have become paramount challenges for both academic and industry sectors. In this paper, we introduce a comprehensive formal model to address atomic composability across multiple rollups on Ethereum. The proposed model incorporates mechanisms like buffering, dependency management, concurrency control, and the groundbreaking zero-knowledge proofs. Moreover, we evaluate its practical repercussions, strengths, and weaknesses, ensuring resilience against manipulative or erroneous actions. The application of the proposed model to shared sequencers and other existing solutions accentuates its versatility and universality.	翻訳日:2024-03-25 13:55:39 公開日:2023-11-04
# OverHear:ヘッドホンベースのマルチセンサー・キーストローク推論 OverHear: Headphone based Multi-sensor Keystroke Inference ( http://arxiv.org/abs/2311.02288v1 ) ライセンス: Link先を確認	Raveen Wijewickrama, Maryam Abbasihafshejani, Anindya Maiti, Murtuza Jadliwala,	(参考訳) ヘッドホンは伝統的にオーディオ再生に限られており、高解像度マイクや加速度計のようなセンサーを統合するように進化してきた。これらの進歩によってユーザエクスペリエンスが向上する一方で、キーストローク推論がこの作業における私たちの関心事として、盗聴の潜在的な脆弱性も導入されます。この脅威を検証するために,ヘッドホンの音響および加速度計データを活用するキーストローク推論フレームワークであるOverHearを開発した。加速度計データは、個々のキーストローク識別に十分な詳細ではないが、手の位置によるクラスタリングキーの押圧を支援する。同時に、音響データからMel Frequency Cepstral Coefficients (MFCC)を抽出し、異なるキーストロークの区別を支援する。これらの機能はキーストローク予測のための機械学習モデルにフィードされ、その結果は辞書ベースの単語予測手法によってさらに洗練される。実験では,異なる環境条件下で様々なキーボードのタイプを試験した。メカニカルキーボードでは,トップ5キー予測精度が約80%,膜キーボードでは約60%,すべてのキーボードでは上位100ワード予測精度が70%以上であった。その結果,現実シナリオの文脈におけるアプローチの有効性と限界が浮き彫りになった。 Headphones, traditionally limited to audio playback, have evolved to integrate sensors like high-definition microphones and accelerometers. While these advancements enhance user experience, they also introduce potential eavesdropping vulnerabilities, with keystroke inference being our concern in this work. To validate this threat, we developed OverHear, a keystroke inference framework that leverages both acoustic and accelerometer data from headphones. The accelerometer data, while not sufficiently detailed for individual keystroke identification, aids in clustering key presses by hand position. Concurrently, the acoustic data undergoes analysis to extract Mel Frequency Cepstral Coefficients (MFCC), aiding in distinguishing between different keystrokes. These features feed into machine learning models for keystroke prediction, with results further refined via dictionary-based word prediction methods. In our experimental setup, we tested various keyboard types under different environmental conditions. We were able to achieve top-5 key prediction accuracy of around 80% for mechanical keyboards and around 60% for membrane keyboards with top-100 word prediction accuracies over 70% for all keyboard types. The results highlight the effectiveness and limitations of our approach in the context of real-world scenarios.	翻訳日:2024-03-25 13:45:54 公開日:2023-11-04
# 境界型・非バイアス型複合微分プライバシー Bounded and Unbiased Composite Differential Privacy ( http://arxiv.org/abs/2311.02324v1 ) ライセンス: Link先を確認	Kai Zhang, Yanjun Zhang, Ruoxi Sun, Pei-Wei Tsai, Muneeb Ul Hassan, Xin Yuan, Minhui Xue, Jinjun Chen,	(参考訳) 差分プライバシ(DP)の目的は、隣接する2つのデータベース間で区別できない出力分布を生成することにより、プライバシを保護することである。しかし、従来の微分プライベートなメカニズムは最大外乱範囲を達成するために非有界な出力を生成する傾向があり、これは現実世界の応用と必ずしも一致しない。既存のソリューションは、出力結果を制限するために後処理や切り離し技術を用いてこの問題に対処しようとするが、バイアス問題を導入するコストがかかる。本稿では,複素確率密度関数を用いて,任意の数値入力データに対して有界および非偏りの出力を生成する新しい微分プライベート機構を提案する。この構成は、アクティベーション関数とベース関数から構成され、DP制約に従って機能を定義する柔軟性を提供する。また、繰り返し実験をすることなく最適なハイパーパラメータ設定を反復的に探索できる最適化アルゴリズムを開発し、さらなるプライバシー上のオーバーヘッドを防止する。さらに、合成確率密度関数の分散を評価し、分散推定よりも計算が簡単な2つの代替指標を導入することにより、提案手法の有用性を評価する。 3つのベンチマークデータセットに対する広範な評価は、従来のラプラスとガウスのメカニズムよりも一貫性があり、顕著な改善を示している。提案された有界・無バイアスの複合的私的メカニズムは、幅広いDP兵器の基盤となり、将来のプライバシー保護研究を促進する。 The objective of differential privacy (DP) is to protect privacy by producing an output distribution that is indistinguishable between any two neighboring databases. However, traditional differentially private mechanisms tend to produce unbounded outputs in order to achieve maximum disturbance range, which is not always in line with real-world applications. Existing solutions attempt to address this issue by employing post-processing or truncation techniques to restrict the output results, but at the cost of introducing bias issues. In this paper, we propose a novel differentially private mechanism which uses a composite probability density function to generate bounded and unbiased outputs for any numerical input data. The composition consists of an activation function and a base function, providing users with the flexibility to define the functions according to the DP constraints. We also develop an optimization algorithm that enables the iterative search for the optimal hyper-parameter setting without the need for repeated experiments, which prevents additional privacy overhead. Furthermore, we evaluate the utility of the proposed mechanism by assessing the variance of the composite probability density function and introducing two alternative metrics that are simpler to compute than variance estimation. Our extensive evaluation on three benchmark datasets demonstrates consistent and significant improvement over the traditional Laplace and Gaussian mechanisms. The proposed bounded and unbiased composite differentially private mechanism will underpin the broader DP arsenal and foster future privacy-preserving studies.	翻訳日:2024-03-25 13:45:54 公開日:2023-11-04
# NODLINK: きめ細かいAPT攻撃検出と調査のためのオンラインシステム NODLINK: An Online System for Fine-Grained APT Attack Detection and Investigation ( http://arxiv.org/abs/2311.02331v1 ) ライセンス: Link先を確認	Shaofei Li, Feng Dong, Xusheng Xiao, Haoyu Wang, Fei Shao, Jiedong Chen, Yao Guo, Xiangqun Chen, Ding Li,	(参考訳) 先進的永続的脅威(APT)攻撃は現代企業を悩ませ、大きな損失をもたらした。これらの攻撃に対抗するため、研究者はシステムエンティティとその依存関係をモデル化するために、証明グラフを使用してAPT攻撃の複雑でステルス的なシナリオをキャプチャする手法を提案する。特に、攻撃の検出を加速し、財政的損失を減らすために、タイムラインや限られたリソースの制約の下でAPT攻撃を検知し、調査するオンライン証明ベースの検知システムが必要である。残念ながら、既存のオンラインシステムは通常、検出の粒度を犠牲にして計算の複雑さを減らし、10万以上のノードを持つ証明グラフを生成し、セキュリティ管理者が検出結果を解釈する上での課題を提起する。本稿では,検出粒度を犠牲にすることなく高い検出精度を維持する最初のオンライン検出システムであるNodLinkの設計と実装を行う。我々の知見は、オンライン証明に基づく検出システムにおけるAPT攻撃検出プロセスは、理論的に有界な誤差で簡潔な攻撃関連前処理グラフを復元する効率的なオンライン近似アルゴリズムを持つSteiner Tree Problem (STP)としてモデル化できるということである。 APT攻撃検出のためのSTP近似アルゴリズムフレームワークを利用するために、同じ複雑さを維持しつつ、従来のAPT攻撃検出よりも効率的であるインメモリキャッシュ、効率的な攻撃スクリーニング方法、および新しいSTP近似アルゴリズムを提案する。実運用環境でのNodLinkの評価を行った。オープンワールド実験は、NodLinkが2つの最先端(SOTA)オンライン証明分析システムより優れており、同じまたは高いスループットを持ちながら、高い検出精度と調査精度を達成していることを示している。 Advanced Persistent Threats (APT) attacks have plagued modern enterprises, causing significant financial losses. To counter these attacks, researchers propose techniques that capture the complex and stealthy scenarios of APT attacks by using provenance graphs to model system entities and their dependencies. Particularly, to accelerate attack detection and reduce financial losses, online provenance-based detection systems that detect and investigate APT attacks under the constraints of timeliness and limited resources are in dire need. Unfortunately, existing online systems usually sacrifice detection granularity to reduce computational complexity and produce provenance graphs with more than 100,000 nodes, posing challenges for security admins to interpret the detection results. In this paper, we design and implement NodLink, the first online detection system that maintains high detection accuracy without sacrificing detection granularity. Our insight is that the APT attack detection process in online provenance-based detection systems can be modeled as a Steiner Tree Problem (STP), which has efficient online approximation algorithms that recover concise attack-related provenance graphs with a theoretically bounded error. To utilize STP approximation algorithm frameworks for APT attack detection, we propose a novel design of in-memory cache, an efficient attack screening method, and a new STP approximation algorithm that is more efficient than the conventional one in APT attack detection while maintaining the same complexity. We evaluate NodLink in a production environment. The open-world experiment shows that NodLink outperforms two state-of-the-art (SOTA) online provenance analysis systems by achieving magnitudes higher detection and investigation accuracy while having the same or higher throughput.	翻訳日:2024-03-25 13:45:54 公開日:2023-11-04
# P2P方式のソフトウェア: 中央ソフトウェアのないソフトウェアモデルで、いかなるソフトウェアも自由に参加または離脱できる Software in P2P way: a software model without central software and enabling any software to join or leave freely ( http://arxiv.org/abs/2311.02351v1 ) ライセンス: Link先を確認	Hong Su,	(参考訳) P2Pモデルは、ハードウェアであれソフトウェアであれ、同じピアのネットワークを含み、中央制御なしで自律的に動作し、高い可用性を確保しながら個々のピア障害を許容する。しかしながら、現在のP2P技術は主にハードウェアレベルのレジリエンスに焦点を当てており、しばしばP2Pネットワークと呼ばれる。本稿では,ソフトウェアレベルの高可用性向上を目的としたP2P(Peer-to-Peer)ソフトウェアモデルについて紹介する。一般的なハードウェア中心のP2P技術とは違い、このモデルは様々なソフトウェアコンポーネントの分散した性質、すなわち独立して機能する"ソフトウェアピア"をアクセントし、中央のソフトウェアに頼ることなくシームレスなネットワークの入退避を可能にする。このモデルの協調的なアプローチは、ネットワークトポロジを複数の自律的な処理パスで培養し、動的タスク割り当てによる連続的な操作を分散的に保証する。従来の冗長性手法の限界を超えることで、このP2Pモデルは、堅牢な可用性を実現するための適応的でスケーラブルなソリューションを提供する。検証結果は、高い可用性を確保しつつ、タスク処理を成功させる可能性を高める上で、モデルの有効性を裏付けるものである。 The P2P model encompasses a network of equal peers, whether in hardware or software, operating autonomously without central control, allowing individual peer failure while ensuring high availability. Nevertheless, current P2P technologies primarily focus on hardware-level resilience, often referred to as P2P networks, which do not safeguard against software failures. This paper introduces a pioneering Peer-to-Peer (P2P) software model aimed at enhancing software-level high availability. Diverging from prevalent hardware-centric P2P technologies, this model accentuates the decentralized nature of various software components, or "software peers," which function independently, enabling seamless network entry and exit without relying on central software. The model's collaborative approach cultivates a network topology with multiple autonomous processing paths, ensuring continuous operation through dynamic task allocation in a distributed manner. By surpassing the limitations of traditional redundancy methods, this P2P model provides an adaptive and scalable solution for achieving robust availability. Validation results underscore the model's effectiveness in enhancing the probabilities of successful task processing while ensuring high availability.	翻訳日:2024-03-25 13:45:54 公開日:2023-11-04
# Cryo-EMマイクログラフにおけるタンパク質同定のためのプロンプト学習によるセグメンテーションモデル(SAM)の適応 Adapting Segment Anything Model (SAM) through Prompt-based Learning for Enhanced Protein Identification in Cryo-EM Micrographs ( http://arxiv.org/abs/2311.16140v1 ) ライセンス: Link先を確認	Fei He, Zhiyuan Yang, Mingyue Gao, Biplab Poudel, Newgin Sam Ebin Sam Dhas, Rajan Gyawali, Ashwin Dhakal, Jianlin Cheng, Dong Xu	(参考訳) cryo-electron microscope (cryo-em) は構造生物学において重要な役割を担っているが、3dタンパク質構造構築に不可欠なタンパク質粒子ピッキングの課題は手作業による非効率化である。 TopazやcrYOLOといった最近のAIツールはこの分野を前進させているが、低コントラスト、複雑な形状、異質なコンフォーメーションなど、Cryo-EMイメージの課題を完全には解決していない。本研究では,Creo-EMのための画像分割基礎モデルSegment Anything Model (SAM) を即時学習により適用することを検討した。この焦点は、事前訓練されたパラメータを変更することなく、少数のラベル付きデータでモデルパフォーマンスを最適化し、適応性と基礎的知識保持のバランスを図ることを目的としていた。頭部プロンプト,プレフィックスプロンプト,エンコーダプロンプトという3つのプロンプトベースの学習戦略による試行を通じて,パフォーマンスの向上と,微調整アプローチと比較して計算要件の低減を観察した。この研究は、Cryo-EMマイクログラフからSAMを同定する可能性を強調するだけでなく、バイオメディカル画像のセグメンテーションや物体検出において幅広い可能性を示唆している。 Cryo-electron microscopy (cryo-EM) remains pivotal in structural biology, yet the task of protein particle picking, integral for 3D protein structure construction, is laden with manual inefficiencies. While recent AI tools such as Topaz and crYOLO are advancing the field, they do not fully address the challenges of cryo-EM images, including low contrast, complex shapes, and heterogeneous conformations. This study explored prompt-based learning to adapt the state-of-the-art image segmentation foundation model Segment Anything Model (SAM) for cryo-EM. This focus was driven by the desire to optimize model performance with a small number of labeled data without altering pre-trained parameters, aiming for a balance between adaptability and foundational knowledge retention. Through trials with three prompt-based learning strategies, namely head prompt, prefix prompt, and encoder prompt, we observed enhanced performance and reduced computational requirements compared to the fine-tuning approach. This work not only highlights the potential of prompting SAM in protein identification from cryo-EM micrographs but also suggests its broader promise in biomedical image segmentation and object detection.	翻訳日:2023-12-03 13:17:00 公開日:2023-11-04
# 健常者からの閉塞型睡眠時無呼吸症患者の心拍数-エントロピー指標による血圧結合の定量化 Differentiating patients with obstructive sleep apnea from healthy controls based on heart rate - blood pressure coupling quantified by entropy-based indices ( http://arxiv.org/abs/2311.10752v1 ) ライセンス: Link先を確認	Pawe{\l} Pilarczyk, Grzegorz Graff, Jos\'e M. Amig\'o, Katarzyna Tessmer, Krzysztof Narkiewicz, Beata Graff	(参考訳) 心拍数と心拍数と血圧記録の相互依存性を定量化するために, エントロピーに基づく一対の配列(ecps)の分類法を提案する。本手法の目的は,各項目が対象の2つの中間データ系列から構成されるデータ分類器を構築することである。この方法は順序パターンに基づいており、エントロピーのような指標を用いる。機械学習は、閉塞型睡眠時無呼吸症候群患者と対照群を区別するための最適かつ簡単なモデルを構築するために、分類問題に最も適した指標のサブセットを選択するために使用される。 We introduce an entropy-based classification method for pairs of sequences (ECPS) for quantifying mutual dependencies in heart rate and beat-to-beat blood pressure recordings. The purpose of the method is to build a classifier for data in which each item consists of the two intertwined data series taken for each subject. The method is based on ordinal patterns, and uses entropy-like indices. Machine learning is used to select a subset of indices most suitable for our classification problem in order to build an optimal yet simple model for distinguishing between patients suffering from obstructive sleep apnea and a control group.	翻訳日:2023-11-27 00:46:23 公開日:2023-11-04
# 自己調整カーネル回帰を用いたモバイルインターネット品質評価 Mobile Internet Quality Estimation using Self-Tuning Kernel Regression ( http://arxiv.org/abs/2311.05641v1 ) ライセンス: Link先を確認	Hanyang Jiang, Henry Shaowu Yuchi, Elizabeth Belding, Ellen Zegura, Yao Xie	(参考訳) 空間データのモデリングと推定は、実生活においてユビキタスであり、しばしば天気予報、汚染検知、農業に現れる。空間データ分析は、しばしば巨大なデータセットを処理する。本研究では,ooklaの大規模インターネット品質オープンデータセットに注目した。米国内の国の規模で、モバイル(セルラー)インターネットの品質を推定することを検討する。特に、高度に不均衡なデータに基づいて推定を行うことを目標としています: サンプルの大部分は限られた領域に集中していますが、残りの部分で利用できるものはほとんどありません。本稿では,データ不均衡の悪影響を軽減するために自己調整型カーネルを用いた適応型カーネル回帰手法を提案する。 2つの異なる移動ネットワーク計測データセットの比較実験を通じて,提案手法がより正確な予測を生成することを実証し,他のアプリケーションに適用する可能性を示した。 Modeling and estimation for spatial data are ubiquitous in real life, frequently appearing in weather forecasting, pollution detection, and agriculture. Spatial data analysis often involves processing datasets of enormous scale. In this work, we focus on large-scale internet-quality open datasets from Ookla. We look into estimating mobile (cellular) internet quality at the scale of a state in the United States. In particular, we aim to conduct estimation based on highly {\it imbalanced} data: Most of the samples are concentrated in limited areas, while very few are available in the rest, posing significant challenges to modeling efforts. We propose a new adaptive kernel regression approach that employs self-tuning kernels to alleviate the adverse effects of data imbalance in this problem. Through comparative experimentation on two distinct mobile network measurement datasets, we demonstrate that the proposed self-tuning kernel regression method produces more accurate predictions, with the potential to be applied in other applications.	翻訳日:2023-11-19 14:29:10 公開日:2023-11-04
# 電力流動解析のための量子ニューラルネットワーク Quantum Neural Networks for Power Flow Analysis ( http://arxiv.org/abs/2311.06293v1 ) ライセンス: Link先を確認	Zeynab Kaseb, Matthias Moller, Giorgio Tosti Balducci, Peter Palensky, Pedro P. Vergara	(参考訳) 本稿では,量子ニューラルネットワークとハイブリッド量子古典ニューラルネットワークのパワーフロー解析への応用について検討する。 IEEE 4-busと33-busテストシステムに基づく2つの小型データセットを用いて実験を行った。また, 量子, ハイブリッド量子古典, 古典ニューラルネットワークの系統的性能比較を行った。比較はそれに基づいています (i)一般化能力、 (ii)堅牢性 (iii)必要なデータセットのサイズを訓練すること。 (iv) トレーニングエラー。 (v)計算時間の訓練、 (vi)訓練プロセス安定性。その結果,開発した量子古典ニューラルネットワークは,量子ニューラルネットワークと古典ニューラルネットワークの両方より優れており,ノイズ・中間規模量子(NISQ)時代の深層学習に基づく電力フロー解析を改善することができることがわかった。 This paper explores the potential application of quantum and hybrid quantum-classical neural networks in power flow analysis. Experiments are conducted using two small-size datasets based on the IEEE 4-bus and 33-bus test systems. A systematic performance comparison is also conducted among quantum, hybrid quantum-classical, and classical neural networks. The comparison is based on (i) generalization ability, (ii) robustness, (iii) training dataset size needed, (iv) training error. (v) training computational time, and (vi) training process stability. The results show that the developed quantum-classical neural network outperforms both quantum and classical neural networks, and hence can improve deep learning-based power flow analysis in the noisy-intermediate-scale quantum (NISQ) era.	翻訳日:2023-11-19 14:14:51 公開日:2023-11-04
# WD3: 深層強化学習における評価バイアスの活用 WD3: Taming the Estimation Bias in Deep Reinforcement Learning ( http://arxiv.org/abs/2006.12622v2 ) ライセンス: Link先を確認	Qiang He, Xinwen Hou	(参考訳) 関数近似によって引き起こされる過剰推定現象は、ディープq-ネットワークやddpgのような値ベースの強化学習アルゴリズムでよく知られた問題である。この問題を解決するため、TD3は2人の批評家の間で最小値を取る。本稿では,td3アルゴリズムが軽度仮定に過大評価バイアスを導入することを示す。より正確な価値関数の推定を得るため、これら2つの逆を統一し、推定バイアスを取り除き、一対の批評家を重み付けて性能をさらに向上できる新しいアルゴリズム \underline{w}eighted \underline{d}elayed \underline{d}eep \underline{d}eterministic policy gradient (wd3)を提案する。 WD3の有効性を示すため,DDPG,TD3,WD3の値関数の学習過程を比較した。その結果,提案アルゴリズムは値関数の推定誤差を除去することを確認した。さらに,連続制御タスクにおけるアルゴリズムの評価を行った。各テストタスクにおいて、WD3のパフォーマンスは一貫して上回り、少なくとも、最先端のアルゴリズムである\footnote{Ourコードのパフォーマンスは、~\href{https://sites.google.com/view/ictai20-wd3/}{https://sites.google.com/view/ictai20-wd3/}で利用可能である。 }. The overestimation phenomenon caused by function approximation is a well-known issue in value-based reinforcement learning algorithms such as deep Q-networks and DDPG, which could lead to suboptimal policies. To address this issue, TD3 takes the minimum value between a pair of critics. In this paper, we show that the TD3 algorithm introduces underestimation bias in mild assumptions. To obtain a more precise estimation for value function, we unify these two opposites and propose a novel algorithm \underline{W}eighted \underline{D}elayed \underline{D}eep \underline{D}eterministic Policy Gradient (WD3), which can eliminate the estimation bias and further improve the performance by weighting a pair of critics. To demonstrate the effectiveness of WD3, we compare the learning process of value function between DDPG, TD3, and WD3. The results verify that our algorithm does eliminate the estimation error of value functions. Furthermore, we evaluate our algorithm on the continuous control tasks. We observe that in each test task, the performance of WD3 consistently outperforms, or at the very least matches, that of the state-of-the-art algorithms\footnote{Our code is available at~\href{https://sites.google.com/view/ictai20-wd3/}{https://sites.google.com/view/ictai20-wd3/}.}.	翻訳日:2023-11-09 20:51:29 公開日:2023-11-04
# インテリジェント交通システムのための大規模道路サイドマルチビューマルチセンサ空間同期フレームワーク A Practical Large-Scale Roadside Multi-View Multi-Sensor Spatial Synchronization Framework for Intelligent Transportation Systems ( http://arxiv.org/abs/2311.04231v1 ) ライセンス: Link先を確認	Yong Li, Zhiguo Zhao, Yunli Chen, Rui Tian	(参考訳) 道路側シナリオにおける空間同期は、異なる場所で複数のセンサーからのデータを統合するために不可欠である。カスケード空間変換(CST)を用いた現在の手法は、大規模な展開において累積誤差につながることが多い。手動カメラのキャリブレーションは不十分で、広範囲の手動作業が必要であり、既存の方法は制御されたシナリオや単視点シナリオに限定されている。これらの課題に対処するため,本研究では,大規模マルチビューマルチセンサシナリオのための並列空間変換(pst)ベースのフレームワークを提案する。 PSTはセンサ座標系変換を並列化し、累積誤差を低減する。深層学習を道路側単眼のグローバルローカライゼーションに応用し,手作業の削減を図る。さらに,同期精度を向上させるために,位置情報と最適化アルゴリズムを用いる。我々のフレームワークは実世界のシナリオでテストされ、CSTベースの手法よりも優れています。大規模道路側におけるマルチパースペクティブ・マルチセンサ空間同期を著しく向上させ、デプロイメントコストを低減させる。 Spatial synchronization in roadside scenarios is essential for integrating data from multiple sensors at different locations. Current methods using cascading spatial transformation (CST) often lead to cumulative errors in large-scale deployments. Manual camera calibration is insufficient and requires extensive manual work, and existing methods are limited to controlled or single-view scenarios. To address these challenges, our research introduces a parallel spatial transformation (PST)-based framework for large-scale, multi-view, multi-sensor scenarios. PST parallelizes sensor coordinate system transformation, reducing cumulative errors. We incorporate deep learning for precise roadside monocular global localization, reducing manual work. Additionally, we use geolocation cues and an optimization algorithm for improved synchronization accuracy. Our framework has been tested in real-world scenarios, outperforming CST-based methods. It significantly enhances large-scale roadside multi-perspective, multi-sensor spatial synchronization, reducing deployment costs.	翻訳日:2023-11-09 18:18:46 公開日:2023-11-04
# rmt: 注意ネットワークが視覚トランスフォーマーに対応 RMT: Retentive Networks Meet Vision Transformers ( http://arxiv.org/abs/2309.11523v4 ) ライセンス: Link先を確認	Qihang Fan, Huaibo Huang, Mingrui Chen, Hongmin Liu and Ran He	(参考訳) Retentive NetworkはNLPのドメインで最初に登場し、その顕著な性能のためにすぐに注目を集めた。その印象的な能力のかなりの部分は、貴重な事前知識を含む明示的な崩壊機構に由来する。しかし、この明示的な減衰は一方向的で一次元であり、画像ベースタスクに必要な双方向2次元モデリングには適さない。そこで本研究では,視覚モデルを用いた距離関連事前知識の導入を目的とした,双方向2次元の明示的減衰法を提案する。さらに、言語モデルとは異なり、視覚バックボーンはトレーニングや推論中に同じ並列フォームを使用する。この並列形式が再帰的あるいはチャンク的リカレント形式に置き換えられると、モデルの並列性は著しく乱れ、非常に遅い推論速度となる。そのため、元のRetNetにある2つの追加の推論モードを捨て、並列フォームのみを保持します。具体的には、双方向の2次元明示的減衰を自己アテンションに組み込んで \textbf{re}tentive \textbf{s}elf-\textbf{a}ttention (resa) を形成する。さらに,大域的モデリングの複雑さを軽減するため,画像の2軸に沿ってReSAを分解する。 ReSAに基づいて、強力なビジョンバックボーンであるRTTを構築します。冗長な実験により、RTTは様々なコンピュータビジョンタスクにおいて例外的な性能を示した。例えば、RTT は単に \textbf{4.5G} FLOPs を用いて ImageNet-1k 上で \textbf{84.1\%} Top1-acc を達成する。我々の知る限りでは、RTTはモデルが同じサイズで同じ戦略で訓練された場合、トップ1-accを達成しています。さらに、RTTは下流タスクにおいて、既存のビジョンバックボーンを著しく上回る。コードはhttps://github.com/qhfan/rmtでリリースされる。 Retentive Network first emerged in the domain of NLP and immediately gained widespread attention due to its remarkable performance. A significant portion of its impressive capabilities stems from its explicit decay mechanism, which incorporates valuable prior knowledge. However, this explicit decay is unidirectional and one-dimensional, making it unsuitable for the bidirectional, two-dimensional modeling required in image-based tasks. To solve this, we propose a bidirectional, two-dimensional form of explicit decay specifically designed for vision models to introduce distance-related prior knowledge. Besides, unlike language models, the vision backbones use the same parallel form during training and inference. If this parallel form is replaced with recurrent or chunk-wise recurrent form, the parallelism of the model will be significantly disrupted, resulting in extremely slow inference speed. So we discard the two additional inference modes present in the original RetNet, retaining only the parallel form. Specifically, we incorporate bidirectional, two-dimensional explicit decay into the Self-Attention to form \textbf{Re}tentive \textbf{S}elf-\textbf{A}ttention (ReSA). Furthermore, to reduce the complexity of global modeling, we decompose ReSA along the two axes of the image. Building upon ReSA, we construct RMT, a strong vision backbone. Abundant experiments have demonstrated that our RMT exhibits exceptional performance across various computer vision tasks. For example, RMT achieves \textbf{84.1\%} Top1-acc on ImageNet-1k using merely \textbf{4.5G} FLOPs. To the best of our knowledge, among all models, RMT achieves the highest Top1-acc when models are of similar size and trained with the same strategy. Moreover, RMT significantly outperforms existing vision backbones in downstream tasks. Code will be released at https://github.com/qhfan/RMT.	翻訳日:2023-11-08 19:06:53 公開日:2023-11-04
# ARNIQA:画像品質評価のための歪みマニフォールド学習 ARNIQA: Learning Distortion Manifold for Image Quality Assessment ( http://arxiv.org/abs/2310.14918v2 ) ライセンス: Link先を確認	Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto Del Bimbo	(参考訳) No-Reference Image Quality Assessment (NR-IQA) は、高品質な参照画像を必要としない、人間の知覚に合わせて画像品質を測定する手法を開発することを目的としている。本研究では、画像歪み多様体をモデル化し、本質的な表現を得るための自己教師型アプローチ ARNIQA (leArning distoRtion maNifold for Image Quality Assessment) を提案する。まず,連続した歪みの順序列をランダムに合成する画像劣化モデルを提案する。このようにして、多種多様な劣化パターンで画像を合成分解することができる。第2に,異なる画像のパッチ表現間の類似性を最大化することで,異なるコンテンツに拘わらず等しく歪んだモデルを構築することを提案する。したがって、同じ方法で劣化した画像は歪み多様体内の隣接位置に対応する。最後に、画像表現を単純な線形レグレッサで品質スコアにマッピングし、エンコーダ重みを微調整することなく表示する。実験により,本手法は複数のデータセット上で最先端の性能を実現することを示す。さらに、ARNIQAは競合する手法と比較してデータ効率、一般化能力、堅牢性が改善されている。コードとモデルはhttps://github.com/miccunifi/arniqaで公開されている。 No-Reference Image Quality Assessment (NR-IQA) aims to develop methods to measure image quality in alignment with human perception without the need for a high-quality reference image. In this work, we propose a self-supervised approach named ARNIQA (leArning distoRtion maNifold for Image Quality Assessment) for modeling the image distortion manifold to obtain quality representations in an intrinsic manner. First, we introduce an image degradation model that randomly composes ordered sequences of consecutively applied distortions. In this way, we can synthetically degrade images with a large variety of degradation patterns. Second, we propose to train our model by maximizing the similarity between the representations of patches of different images distorted equally, despite varying content. Therefore, images degraded in the same manner correspond to neighboring positions within the distortion manifold. Finally, we map the image representations to the quality scores with a simple linear regressor, thus without fine-tuning the encoder weights. The experiments show that our approach achieves state-of-the-art performance on several datasets. In addition, ARNIQA demonstrates improved data efficiency, generalization capabilities, and robustness compared to competing methods. The code and the model are publicly available at https://github.com/miccunifi/ARNIQA.	翻訳日:2023-11-08 18:53:47 公開日:2023-11-04
# ZEETAD:ゼロショット終端動作検出のための事前学習型視覚言語モデルの適用 ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection ( http://arxiv.org/abs/2311.00729v2 ) ライセンス: Link先を確認	Thinh Phan, Khoa Vo, Duy Le, Gianfranco Doretto, Donald Adjeroh, Ngan Le	(参考訳) 時間的行動検出(TAD)は、未トリミングビデオ内のアクションインスタンスのローカライズと分類を含む。最近のゼロショットTAD手法では,大規模コントラスト視覚言語(ViL)事前訓練モデルを活用することで,有望なオープンセット設定を示す。しかし、既存のゼロショットTAD法は、ローカライゼーションと分類の2つの相互依存タスク間の強い関係を適切に構築し、ビデオ理解にViLモデルを適用する方法に制限がある。本稿では,デュアルローカライズとゼロショットの提案分類という2つのモジュールを特徴とするゼータドを提案する。前者はtransformerベースのモジュールで、アクションイベントを検出し、後で認識するために重要な意味埋め込みを選択的に収集する。後者はCLIPベースのモジュールで、時間単位ごとにテキストとフレーム入力からセマンティック埋め込みを生成する。さらに,軽量アダプタで冷凍したCLIPエンコーダを最小限に更新することで,未確認クラスの識別能力を向上させる。 THUMOS14とActivityNet-1.3データセットの大規模な実験は、ゼロショットTADにおける我々のアプローチの優れた性能と、ViLモデルから目に見えないアクションカテゴリへの効果的な知識伝達を示す。 Temporal action detection (TAD) involves the localization and classification of action instances within untrimmed videos. While standard TAD follows fully supervised learning with closed-set setting on large training data, recent zero-shot TAD methods showcase the promising open-set setting by leveraging large-scale contrastive visual-language (ViL) pretrained models. However, existing zero-shot TAD methods have limitations on how to properly construct the strong relationship between two interdependent tasks of localization and classification and adapt ViL model to video understanding. In this work, we present ZEETAD, featuring two modules: dual-localization and zero-shot proposal classification. The former is a Transformer-based module that detects action events while selectively collecting crucial semantic embeddings for later recognition. The latter one, CLIP-based module, generates semantic embeddings from text and frame inputs for each temporal unit. Additionally, we enhance discriminative capability on unseen classes by minimally updating the frozen CLIP encoder with lightweight adapters. Extensive experiments on THUMOS14 and ActivityNet-1.3 datasets demonstrate our approach's superior performance in zero-shot TAD and effective knowledge transfer from ViL models to unseen action categories.	翻訳日:2023-11-08 18:41:24 公開日:2023-11-04
# FPGA-QHAR:エッジ上での人間の行動認識のためのスループット最適化 FPGA-QHAR: Throughput-Optimized for Quantized Human Action Recognition on The Edge ( http://arxiv.org/abs/2311.03390v1 ) ライセンス: Link先を確認	Azzam Alhussain and Mingjie Lin	(参考訳) エッジチップ上でのリアルタイム監視とロボットシステムのためのHAR(Human Action Recognition)の効率的な高速化は、高い計算とメモリ要求を考えると、依然として困難な研究分野である。本稿では,8ビット量子化された2ストリームSimpleNet-PyTorch CNNアーキテクチャに基づく,エンドツーエンドHAR拡張型HW/SWアクセラレータ共設計を提案する。我々のネットワークアクセラレーターは、UCF101とUCF24データセットで訓練され、エッジSoC-FPGAで実装された。当社の開発では、部分ストリーミングデータフローアーキテクチャを使用して、ネットワーク設計とリソース利用トレードオフよりも高いスループットを実現しています。我々はまた、全ての畳み込み、バッチノルム、ReLU演算を単一均一層に融合させ、Lucas-Kanade運動流法を用いて高並列性加速器の設計とオンチップエンジンの最適化を実現したが、提案手法は、従来の研究より1.7x-1.9倍高いZCU104上の187MHzのリアルタイム推論スループットで、約81%の予測精度を達成した。最後に、設計されたフレームワークは、スループットとパフォーマンス測定のためにいくつかのハードウェアチップに対してベンチマークされ、エッジプラットフォームでのトレーニングと実装のためのgithubのオープンソースプロジェクトとして利用できる。 Accelerating Human Action Recognition (HAR) efficiently for real-time surveillance and robotic systems on edge chips remains a challenging research field, given its high computational and memory requirements. This paper proposed an integrated end-to-end HAR scalable HW/SW accelerator co-design based on an enhanced 8-bit quantized Two-Stream SimpleNet-PyTorch CNN architecture. Our network accelerator was trained on UCF101 and UCF24 datasets and implemented on edge SoC-FPGA. Our development uses partially streaming dataflow architecture to achieve higher throughput versus network design and resource utilization trade-off. We also fused all convolutional, batch-norm, and ReLU operations into a single homogeneous layer and utilized the Lucas-Kanade motion flow method to enable a high parallelism accelerator design and optimized on-chip engine computing.Furthermore, our proposed methodology achieved nearly 81% prediction accuracy with an approximately 24 FPS real-time inference throughput at 187MHz on ZCU104, which is 1.7x - 1.9x higher than the prior research. Lastly, the designed framework was benchmarked against several hardware chips for higher throughput and performance measurements and is now available as an open-source project on GitHub for training and implementation on edge platforms.	翻訳日:2023-11-08 18:27:43 公開日:2023-11-04
# 異節音声表現の学習 Learning Disentangled Speech Representations ( http://arxiv.org/abs/2311.03389v1 ) ライセンス: Link先を確認	Yusuf Brima, Ulf Krumnack, Simone Pika and Gunther Heidemann	(参考訳) 多くのアプリケーション領域において重要でありながら、音声からのアンタングル表現学習は限定的である。主要な課題は、メソッドを評価するための既知の生成因子を持つ音声データセットの欠如である。本稿では, 音声表現の非接触化に関する研究を可能にする基礎的真理因子を用いた合成音声データセットSynSpeechを提案する。本研究は,教師付きディスタングルメント指標を用いて教師付き手法の評価を行う。このベンチマークデータセットとフレームワークは、最先端不連続音声表現学習法の厳密な評価のギャップに対処する。我々の発見は、この未探索領域を前進させ、より堅牢な音声表現を可能にする洞察を与える。 Disentangled representation learning from speech remains limited despite its importance in many application domains. A key challenge is the lack of speech datasets with known generative factors to evaluate methods. This paper proposes SynSpeech: a novel synthetic speech dataset with ground truth factors enabling research on disentangling speech representations. We plan to present a comprehensive study evaluating supervised techniques using established supervised disentanglement metrics. This benchmark dataset and framework address the gap in the rigorous evaluation of state-of-the-art disentangled speech representation learning methods. Our findings will provide insights to advance this underexplored area and enable more robust speech representations.	翻訳日:2023-11-08 18:27:19 公開日:2023-11-04
# 強化学習を用いたグループミッションにおけるエージェントのエラー源の自動同定 Using reinforcement learning to autonomously identify sources of error for agents in group missions ( http://arxiv.org/abs/2107.09232v4 ) ライセンス: Link先を確認	Keishu Utimula, Ken-taro Hayaschi, Trevor J. Bihl, Kenta Hongo, Ryo Maezono	(参考訳) エージェントが任務を実行するために群がると、いくつかのエージェントは、コマンドベースから観察されるように、しばしば突然の失敗を示す。一般に、コマンドベースとエージェント間の通信のみに依存することで、アクチュエータ(h_a$)やセンサ(h_s$)によって障害が発生するかどうかを判断するのは困難である。言い換えると、我々は対応する変位を$h_a$ で検出するが、$h_s$ では検出しない。本研究では,人工知能が自律的に行動計画「\boldsymbol{g}$」を作成できるかどうかについて考察した。一般的に、$\boldsymbol{g}$に対する期待された応答は、採用されている仮説に依るので、その違いは $d(\boldsymbol{g})$] で示され、$d\left(\boldsymbol{g}\right)$ を使用して原因を特定できる。例えば、$d(\boldsymbol{g})$を最大化する$\boldsymbol{g}^$は、このタスクに適したアクションプランであるが、$d(\boldsymbol{g})$は、他のエージェントとの衝突のような稀なイベントにおいて非ゼロとなり、ほとんどのスウォームアクション$\boldsymbol{g}$は$d(\boldsymbol{g})=0$となるため、従来の勾配法を用いて達成することは困難である。言い換えると、$\boldsymbol{g}$, $d(\boldsymbol{g})$ の空間のほとんど全体が勾配がゼロであり、勾配法は適用されない。そこで我々は,Qテーブル強化学習を用いた行動計画を立てた。意外なことに、強化学習によって生成された最適なアクションプランは、他のエージェントと失敗したエージェントを連携させることで問題を特定するための人間的なソリューションを示しました。この簡単なプロトタイプを用いて,障害原因を特定できる自律的行動計画にqテーブル強化学習手法を適用する可能性を実証した。 When agents swarm to execute a mission, some of them frequently exhibit sudden failure, as observed from the command base. It is generally difficult to determine whether a failure is caused by actuators (hypothesis, $h_a$) or sensors (hypothesis, $h_s$) by solely relying on the communication between the command base and concerning agent. However, by instigating collusion between the agents, the cause of failure can be identified; in other words, we expect to detect corresponding displacements for $h_a$ but not for $h_s$. In this study, we considered the question as to whether artificial intelligence can autonomously generate an action plan $\boldsymbol{g}$ to pinpoint the cause as aforedescribed. Because the expected response to $\boldsymbol{g}$ generally depends upon the adopted hypothesis [let the difference be denoted by $D(\boldsymbol{g})$], a formulation that uses $D\left(\boldsymbol{g}\right)$ to pinpoint the cause can be made. Although a $\boldsymbol{g}^$ that maximizes $D(\boldsymbol{g})$ would be a suitable action plan for this task, such an optimization is difficult to achieve using the conventional gradient method, as $D(\boldsymbol{g})$ becomes nonzero in rare events such as collisions with other agents, and most swarm actions $\boldsymbol{g}$ give $D(\boldsymbol{g})=0$. In other words, throughout almost the entire space of $\boldsymbol{g}$, $D(\boldsymbol{g})$ has zero gradient, and the gradient method is not applicable. To overcome this problem, we formulated an action plan using Q-table reinforcement learning. Surprisingly, the optimal action plan generated via reinforcement learning presented a human-like solution to pinpoint the problem by colliding other agents with the failed agent. Using this simple prototype, we demonstrated the potential of applying Q-table reinforcement learning methods to plan autonomous actions to pinpoint the causes of failure.	翻訳日:2023-11-08 02:16:21 公開日:2023-11-04
# 有限体上のランダム原始多項式生成のための量子加速アルゴリズム Quantum-accelerated algorithms for generating random primitive polynomials over finite fields ( http://arxiv.org/abs/2203.12884v3 ) ライセンス: Link先を確認	Shan Huang, Hua-Lei Yin, Zeng-Bing Chen, Shengjun Wu	(参考訳) 有限体上の原始多項式は、古典的擬似ランダム数生成、符号化理論、ポスト量子暗号など、コンピュータ科学の様々な領域において重要である。それでも、有限体上のランダム原始多項式を生成するための効率的な古典的アルゴリズムの追求は今も続いている課題である。本稿では,この問題をハイブリッド量子古典アルゴリズムを用いて効率的に解く方法を示し,それらを実装するための特定の量子回路の設計について述べる。本研究は,多種多様な量子通信および計算応用におけるランダムプリミティブ多項式の高速かつリアルタイムな生成方法である。 Primitive polynomials over finite fields are crucial for various domains of computer science, including classical pseudo-random number generation, coding theory and post-quantum cryptography. Nevertheless, the pursuit of an efficient classical algorithm for generating random primitive polynomials over finite fields remains an ongoing challenge. In this paper, we show how to solve this problem efficiently through hybrid quantum-classical algorithms, and designs of the specific quantum circuits to implement them are also presented. Our research paves the way for the rapid and real-time generation of random primitive polynomials in diverse quantum communication and computation applications.	翻訳日:2023-11-08 02:09:10 公開日:2023-11-04
# 気晴らしは公正に必要なのは Distraction is All You Need for Fairness ( http://arxiv.org/abs/2203.07593v3 ) ライセンス: Link先を確認	Mehdi Yazdani-Jahromi and AmirArsalan Rajabi and Ali Khodabandeh Yalabadi and Aida Tayebi and Ozlem Ozmen Garibay	(参考訳) トレーニングデータセットのバイアスは、同等または同等の処置を保証するために、分類タスクのさまざまなグループのために管理されなければならない。近年の人工知能モデルの成長と、自動意思決定におけるその役割拡大により、これらのモデルがバイアスを受けないことが不可欠である。これらのモデルには、訓練対象の関数や学習アルゴリズムに固有の、トレーニング対象のデータに存在するバイアスを含む、あるいは増幅する証拠が多数存在する。多くの研究者は、この問題に対して、統計的に独立なデータに変更、パリティを最大化しようとする特定の競争相手の能力を制限するための敵対的トレーニングなど、さまざまな方向に注意を向けている。これらの手法は情報損失をもたらし、正確さと公平さのバランスを適切にとらないか、あるいはトレーニングにおけるバイアスを確実に制限しない。そこで本研究では,分類結果に影響を及ぼすバイアスの制御に理論的に有効であることを証明し,ディープラーニングモデルを学習するための強力な戦略を提案する。この方法は、異なるデータタイプ(例えば、表、画像、グラフなど)で利用することができる。提案手法は,uci成人・遺産健康データセット (tabular), pokec-z, pokec-n, nbaデータセット (graph), celebaデータセット (vision) でテストすることにより有効性を示す。各データセットの公正度文献に提案する最先端手法を用いて、バイアスを最小限に抑え精度を維持する上で、提案手法よりも優れたモデルを示す。 Bias in training datasets must be managed for various groups in classification tasks to ensure parity or equal treatment. With the recent growth in artificial intelligence models and their expanding role in automated decision-making, ensuring that these models are not biased is vital. There is an abundance of evidence suggesting that these models could contain or even amplify the bias present in the data on which they are trained, inherent to their objective function and learning algorithms; Many researchers direct their attention to this issue in different directions, namely, changing data to be statistically independent, adversarial training for restricting the capabilities of a particular competitor who aims to maximize parity, etc. These methods result in information loss and do not provide a suitable balance between accuracy and fairness or do not ensure limiting the biases in training. To this end, we propose a powerful strategy for training deep learning models called the Distraction module, which can be theoretically proven effective in controlling bias from affecting the classification results. This method can be utilized with different data types (e.g., Tabular, images, graphs, etc.). We demonstrate the potency of the proposed method by testing it on UCI Adult and Heritage Health datasets (tabular), POKEC-Z, POKEC-N and NBA datasets (graph), and CelebA dataset (vision). Using state-of-the-art methods proposed in the fairness literature for each dataset, we exhibit our model is superior to these proposed methods in minimizing bias and maintaining accuracy.	翻訳日:2023-11-08 02:09:01 公開日:2023-11-04
# 新規探索に基づく粒子群最適化 Particle Swarm Optimization based on Novelty Search ( http://arxiv.org/abs/2203.05674v2 ) ライセンス: Link先を確認	Mr.Rajesh Misra and Dr. Kumar S Ray	(参考訳) 本稿では,ノベルティ探索と組み合わせた粒子群最適化アルゴリズムを提案する。 Novelty Searchは、検索ドメインで検索する新しい場所を見つけ、次にParticle Swarm Optimizationはその領域を厳格に検索して、グローバルな最適解を求める。この方法は、客観的な自由であるノベルティサーチによって制御されるため、ローカルオプティマではブロックされない。より局所的な最適値と第二大域的最適値がより多く存在する関数に対して、本手法はうまく機能する。現在のアルゴリズムは、検索エリア全体を検索するまで停止しない。一連の実験により、複素最適化テスト関数に対する現在のアルゴリズムの堅牢性と有効性が証明された。 In this paper we propose a Particle Swarm Optimization algorithm combined with Novelty Search. Novelty Search finds novel place to search in the search domain and then Particle Swarm Optimization rigorously searches that area for global optimum solution. This method is never blocked in local optima because it is controlled by Novelty Search which is objective free. For those functions where there are many more local optima and second global optimum is far from true optimum, the present method works successfully. The present algorithm never stops until it searches entire search area. A series of experimental trials prove the robustness and effectiveness of the present algorithm on complex optimization test functions.	翻訳日:2023-11-08 02:08:34 公開日:2023-11-04
# ニューラルタンジェントカーネルを用いたグラフ畳み込みネットワークの新しい展望 New Insights into Graph Convolutional Networks using Neural Tangent Kernels ( http://arxiv.org/abs/2110.04060v2 ) ライセンス: Link先を確認	Mahalakshmi Sabanayagam, Pascal Esser, Debarghya Ghoshdastidar	(参考訳) Graph Convolutional Networks (GCN)は、ネットワーク構造化データを学ぶための強力なツールとして登場した。実験的に成功したが、GCNは厳密な説明を持たない特定の振る舞いを示す。例えば、GCNのパフォーマンスはネットワーク深さの増加とともに著しく低下する。本稿では,グラフに関する半教師付き学習に注目し,その観察をNutral Tangent Kernels (NTK) のレンズを通して説明する。我々は(スキップ接続なしで)無限に広いgcnに対応するntkを導出する。その後、得られたNTKを用いて、適切な正規化を行うと、ネットワーク深さがGCNの性能を劇的に低下させるとは限らないことを確認する。さらに,超パラメータ自由決定性カーネルであるため,超パラメータチューニングによる性能変動に悩まされないGCNに対する効率的な「代理モデル」としてNTKを提案する。このアイデアの有効性は、サロゲートNTKを用いたGCNに対する異なるスキップ接続の比較によって示される。 Graph Convolutional Networks (GCNs) have emerged as powerful tools for learning on network structured data. Although empirically successful, GCNs exhibit certain behaviour that has no rigorous explanation -- for instance, the performance of GCNs significantly degrades with increasing network depth, whereas it improves marginally with depth using skip connections. This paper focuses on semi-supervised learning on graphs, and explains the above observations through the lens of Neural Tangent Kernels (NTKs). We derive NTKs corresponding to infinitely wide GCNs (with and without skip connections). Subsequently, we use the derived NTKs to identify that, with suitable normalisation, network depth does not always drastically reduce the performance of GCNs -- a fact that we also validate through extensive simulation. Furthermore, we propose NTK as an efficient `surrogate model' for GCNs that does not suffer from performance fluctuations due to hyper-parameter tuning since it is a hyper-parameter free deterministic kernel. The efficacy of this idea is demonstrated through a comparison of different skip connections for GCNs using the surrogate NTKs.	翻訳日:2023-11-08 02:05:11 公開日:2023-11-04
# 分布サンプルを用いた非iidデータからのグレイ学習 Gray Learning from Non-IID Data with Out-of-distribution Samples ( http://arxiv.org/abs/2206.09375v2 ) ライセンス: Link先を確認	Zhilin Zhao and Longbing Cao and Chang-Dong Wang	(参考訳) 専門家がアノテートしても、トレーニングデータの完全性は保証されていない。特に、in-of-distriionサンプルとout-of-distriionサンプルで構成される非IIDデータセットに対して。理想的なシナリオでは、サンプルの大部分は分散内であり、意味的に逸脱したサンプルは分散外と識別され、アノテーションプロセス中に除外される。しかし、専門家は誤ってこれらの分布外サンプルを分布内として分類し、本質的に信頼できないラベルを割り当てることがある。この信頼できないラベルとさまざまなデータ型の組み合わせは、堅牢なニューラルネットワークを学習するタスクを特に困難にしている。信頼性の低い基底トラスラベルを別にすれば、分布内および分布外の両方のサンプルは、必ず特定のクラスに属するものから除外できる。これは、サンプルが属していないクラスを示す信頼できる補完ラベルを利用する可能性を開く。この知見に導かれて,本研究では,基礎的真理と相補的ラベルの両面を活用した新しいアプローチである「textit{Gray Learning} (GL)」を導入する。重要なことに、GLは予測信頼度に基づいてこれらの2つのラベルの損失重みを適応的に調整する。統計学習理論のアプローチを基礎として一般化誤差の境界を導出し,非IID設定においてもGLが厳密な制約を達成できることを実証する。実験結果から,本手法はロバストな統計に基づく代替手法よりも優れていることがわかった。 The integrity of training data, even when annotated by experts, is far from guaranteed, especially for non-IID datasets comprising both in- and out-of-distribution samples. In an ideal scenario, the majority of samples would be in-distribution, while samples that deviate semantically would be identified as out-of-distribution and excluded during the annotation process. However, experts may erroneously classify these out-of-distribution samples as in-distribution, assigning them labels that are inherently unreliable. This mixture of unreliable labels and varied data types makes the task of learning robust neural networks notably challenging. We observe that both in- and out-of-distribution samples can almost invariably be ruled out from belonging to certain classes, aside from those corresponding to unreliable ground-truth labels. This opens the possibility of utilizing reliable complementary labels that indicate the classes to which a sample does not belong. Guided by this insight, we introduce a novel approach, termed \textit{Gray Learning} (GL), which leverages both ground-truth and complementary labels. Crucially, GL adaptively adjusts the loss weights for these two label types based on prediction confidence levels. By grounding our approach in statistical learning theory, we derive bounds for the generalization error, demonstrating that GL achieves tight constraints even in non-IID settings. Extensive experimental evaluations reveal that our method significantly outperforms alternative approaches grounded in robust statistics.	翻訳日:2023-11-08 01:55:58 公開日:2023-11-04
# ソースコード要約のための抽出・要約フレームワーク An Extractive-and-Abstractive Framework for Source Code Summarization ( http://arxiv.org/abs/2206.07245v2 ) ライセンス: Link先を確認	Weisong Sun and Chunrong Fang and Yuchen Chen and Quanjun Zhang and Guanhong Tao and Tingxu Han and Yifei Ge and Yudu You and Bin Luo	(参考訳) (資料) コード要約は、自然言語の形式で与えられたコードスニペットの要約/記事を自動的に生成することを目的としている。このような要約は、開発者がソースコードを理解し維持するのを手助けする上で重要な役割を果たす。既存のコード要約技術は抽出メソッドと抽象メソッドに分類できる。抽出方法は、検索技術を用いてコードスニペットから重要文とキーワードのサブセットを抽出し、重要文とキーワードの事実的詳細を保持する要約を生成する。しかし、そのようなサブセットは識別子やエンティティの命名を見逃す可能性があり、その結果、生成された要約の自然性は通常貧弱である。この抽象的手法は、ニューラルネットワーク翻訳ドメインからエンコーダ・デコーダモデルを利用した人書き的な要約を生成することができる。生成された要約は、しばしば重要な事実の詳細を見逃す。実物的詳細を保存した人文的要約を生成するために,新しい抽出・要約フレームワークを提案する。フレームワークの抽出モジュールは、コードスニペットを取り込んで、重要な事実の詳細を含む重要なステートメントを予測する、抽出コード要約のタスクを実行する。フレームワークの抽象モジュールは、コードスニペット全体と重要な文を並行して取り込んで、簡潔で人書きのような自然言語要約を生成する抽象的なコード要約のタスクを実行する。 6つのプログラミング言語を含む3つのデータセットに対して広範な実験を行うことで、EACSと呼ばれる手法の有効性を評価する。実験の結果, EACSはBLEU, METEOR, ROUGH-Lの3つの指標において, 最先端技術よりも優れていた。 (Source) Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code summarization techniques can be categorized into extractive methods and abstractive methods. The extractive methods extract a subset of important statements and keywords from the code snippet using retrieval techniques, and generate a summary that preserves factual details in important statements and keywords. However, such a subset may miss identifier or entity naming, and consequently, the naturalness of generated summary is usually poor. The abstractive methods can generate human-written-like summaries leveraging encoder-decoder models from the neural machine translation domain. The generated summaries however often miss important factual details. To generate human-written-like summaries with preserved factual details, we propose a novel extractive-and-abstractive framework. The extractive module in the framework performs a task of extractive code summarization, which takes in the code snippet and predicts important statements containing key factual details. The abstractive module in the framework performs a task of abstractive code summarization, which takes in the entire code snippet and important statements in parallel and generates a succinct and human-written-like natural language summary. We evaluate the effectiveness of our technique, called EACS, by conducting extensive experiments on three datasets involving six programming languages. Experimental results show that EACS significantly outperforms state-of-the-art techniques in terms of all three widely used metrics, including BLEU, METEOR, and ROUGH-L.	翻訳日:2023-11-08 01:55:33 公開日:2023-11-04
# d2d対応ヘテロジニアスネットワークにおける分散機械学習:アーキテクチャ、パフォーマンス、オープンチャレンジ Distributed Machine Learning in D2D-Enabled Heterogeneous Networks: Architectures, Performance, and Open Challenges ( http://arxiv.org/abs/2206.01906v2 ) ライセンス: Link先を確認	Zhipeng Cheng, Xuwei Fan, Minghui Liwang, Ning Chen, Xiaoyu Xia, Xianbin Wang	(参考訳) データプライバシに関する懸念は、マシンラーニング(ML)アーキテクチャを集中型から分散型に移行させ、プライバシを保存する2つの主要なメカニズムとして、フェデレーション付き学習(FL)とスプリットラーニング(SL)を生み出した。しかしながら、デバイス間(d2d)対応のヘテロジニアスネットワークにおけるflやslの実装は、アーキテクチャのスケーラビリティやトレーニングの遅延の長期化など、大きな課題となっている。これらの課題に対処するため、本稿では、ハイブリッドスプリットFL(HSFL)とハイブリッドフェデレーションSL(HFSL)という、2つの革新的なハイブリッド分散MLアーキテクチャを紹介する。このようなアーキテクチャは、D2D対応ヘテロジニアス無線ネットワークにおけるFLとSLの長所を組み合わせたものである。 HSFLとHFSLの性能と利点を包括的に分析するとともに,今後の探索に向けたオープンな課題も強調する。我々は,非独立および非独立に分散した3つのデータセットを用いて予備シミュレーションを行い,アーキテクチャの実現可能性を示す。シミュレーションの結果,従来のflおよびslと比較して通信/計算コストとトレーニング遅延が著しく低減した。 The ever-growing concerns regarding data privacy have led to a paradigm shift in machine learning (ML) architectures from centralized to distributed approaches, giving rise to federated learning (FL) and split learning (SL) as the two predominant privacy-preserving ML mechanisms. However,implementing FL or SL in device-to-device (D2D)-enabled heterogeneous networks with diverse clients presents substantial challenges, including architecture scalability and prolonged training delays. To address these challenges, this article introduces two innovative hybrid distributed ML architectures, namely, hybrid split FL (HSFL) and hybrid federated SL (HFSL). Such architectures combine the strengths of both FL and SL in D2D-enabled heterogeneous wireless networks. We provide a comprehensive analysis of the performance and advantages of HSFL and HFSL, while also highlighting open challenges for future exploration. We support our proposals with preliminary simulations using three datasets in non-independent and non-identically distributed settings, demonstrating the feasibility of our architectures. Our simulations reveal notable reductions in communication/computation costs and training delays as compared to conventional FL and SL.	翻訳日:2023-11-08 01:54:40 公開日:2023-11-04
# トピック: 注意力を用いたソースコードからの学習リポジトリ埋め込み Topical: Learning Repository Embeddings from Source Code using Attention ( http://arxiv.org/abs/2208.09495v4 ) ライセンス: Link先を確認	Agathe Lherondelle, Varun Babbar, Yash Satsangi, Fran Silavong, Shaltiel Eloul, Sean Moran	(参考訳) 本稿では,リポジトリレベルの埋め込みのための新しいディープニューラルネットワークである topical を提案する。自然言語ドキュメンテーションやナイーブアグリゲーション技術に依存した既存の手法は、トピックルが注意の仕組みを活用していることより優れている。このメカニズムはソースコード、フル依存グラフ、スクリプトレベルのテキストデータからリポジトリレベルの表現を生成する。公開アクセス可能なgithubリポジトリでトレーニングされた topical は,リポジトリの自動タグ付けなどのタスクにおいて,複数のベースラインを越えたものだ。 Topicalはスケーラビリティと効率性を実証し、リポジトリレベルの表現計算に価値ある貢献をする。さらなる研究のために、関連するツール、コード、トレーニングデータセットがhttps://github.com/jpmorganchase/topicalで提供されている。 This paper presents Topical, a novel deep neural network for repository level embeddings. Existing methods, reliant on natural language documentation or naive aggregation techniques, are outperformed by Topical's utilization of an attention mechanism. This mechanism generates repository-level representations from source code, full dependency graphs, and script level textual data. Trained on publicly accessible GitHub repositories, Topical surpasses multiple baselines in tasks such as repository auto-tagging, highlighting the attention mechanism's efficacy over traditional aggregation methods. Topical also demonstrates scalability and efficiency, making it a valuable contribution to repository-level representation computation. For further research, the accompanying tools, code, and training dataset are provided at: https://github.com/jpmorganchase/topical.	翻訳日:2023-11-08 01:41:53 公開日:2023-11-04
# ポスト量子非可視性の新しいアプローチ A New Approach to Post-Quantum Non-Malleability ( http://arxiv.org/abs/2207.05861v3 ) ライセンス: Link先を確認	Xiao Liang, Omkant Pandey, Takashi Yamakawa	(参考訳) 我々は、最初の$\mathit{constant}$-$\mathit{round}$ が、$\mathit{post}$-$\mathit{quantum}$$$\mathit{one}$-$$\mathit{way}$$$$\mathit{functions}$という最小限の仮定の下で、ポスト量子化後の非可算コミットメントの構成を提供する。コミットメントに関して、非適合性の標準概念を達成する。以前の構成では同じ仮定で$\Omega(\log^\lambda)$ラウンドが必要だった。我々は,ポスト量子環境において使用しやすい非可算コミットメントのための新しい手法により,結果を得る。この手法はまた、古典的設定において、一定周期の非可算なコミットメントに対するセキュリティのほぼ初歩的な証明を与える。既存の研究と組み合わせると、我々の結果は古典関数と量子関数の両方に対して最初の定ラウンドの量子セキュアなマルチパーティ計算($\mathit{in}$ $\mathit{the}$ $\mathit{plain}$ $\mathit{model}$, $\mathit{polynomial}$ hardness of quantum full-homomorphic encryption and quantum learning with error)が得られる。 We provide the first $\mathit{constant}$-$\mathit{round}$ construction of post-quantum non-malleable commitments under the minimal assumption that $\mathit{post}$-$\mathit{quantum}$ $\mathit{one}$-$\mathit{way}$ $\mathit{functions}$ exist. We achieve the standard notion of non-malleability with respect to commitments. Prior constructions required $\Omega(\log^\lambda)$ rounds under the same assumption. We achieve our results through a new technique for constant-round non-malleable commitments which is easier to use in the post-quantum setting. The technique also yields an almost elementary proof of security for constant-round non-malleable commitments in the classical setting, which may be of independent interest. When combined with existing work, our results yield the first constant-round quantum-secure multiparty computation for both classical and quantum functionalities $\mathit{in}$ $\mathit{the}$ $\mathit{plain}$ $\mathit{model}$, under the $\mathit{polynomial}$ hardness of quantum fully-homomorphic encryption and quantum learning with errors.	翻訳日:2023-11-08 01:39:54 公開日:2023-11-04
# 線形マルコフ決定過程に対する最短最適強化学習 Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes ( http://arxiv.org/abs/2212.06132v3 ) ライセンス: Link先を確認	Jiafan He and Heyang Zhao and Dongruo Zhou and Quanquan Gu	(参考訳) 線形関数近似による強化学習(rl)について検討した。任意の特徴写像の線形関数として遷移確率をパラメータ化できるエピソドック時間不均質線形マルコフ決定過程(線形mdp)に対して、ほぼミニマックスの最適後悔である$\tilde o(d\sqrt{h^3k})$ を達成する最初の計算効率の高いアルゴリズムを提案し、ここで$d$ は特徴写像の次元、$h$ は計画の地平線、$k$ はエピソード数である。本アルゴリズムは,(1)最適値関数の分散を直接推定し,(2)エピソード数に対して単調に減少して推定精度が向上し,(3)推定値関数クラスの複雑性を制御するために,値関数推定器の更新にレアスイッチングポリシを用いる新しい分散推定器に依存する,注意深く設計された重み付き線形回帰スキームに基づいている。本研究は,線形mdpを用いた最適rlに対する完全な回答を提供するとともに,開発したアルゴリズムと理論的ツールが独立した興味を持つかもしれない。 We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given feature mapping, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $\tilde O(d\sqrt{H^3K})$, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $K$ is the number of episodes. Our algorithm is based on a weighted linear regression scheme with a carefully designed weight, which depends on a new variance estimator that (1) directly estimates the variance of the optimal value function, (2) monotonically decreases with respect to the number of episodes to ensure a better estimation accuracy, and (3) uses a rare-switching policy to update the value function estimator to control the complexity of the estimated value function class. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.	翻訳日:2023-11-08 01:31:40 公開日:2023-11-04
# コントラル検出のための深部セグメンテーションモデルの性能評価 Performance evaluation of deep segmentation models for Contrails detection ( http://arxiv.org/abs/2211.14851v4 ) ライセンス: Link先を確認	Akshat Bhandari and Sriya Rallabandi and Sanchit Singhal and Aditya Kasliwal and Pratinav Seth	(参考訳) コントラル(Contrail)は、冷たく湿った空気を飛ぶ際に航空機のエンジンの排気によって生じる線状の氷雲である。放射される長波の約33%を地球に吸収または誘導することで温室効果を発生させる。それらは航空活動による気候変動の半分以上を占める。コントラルの回避と飛行経路の調整は、その影響を減らすための安価で効果的な方法である可能性がある。違反回避戦略の開発と評価には,正確で自動化された信頼性の高い検出アルゴリズムが必要である。コントラル検出の進歩は、いくつかの要因により、主に品質ラベル付きデータの欠如により、著しく制限されている。近年,大型のLandsat-8コントラルデータセットが提案されている。各コントラルには、ランドサット8衛星画像の様々な場面で様々な入力が慎重にラベル付けされている。本研究では,様々な損失関数とエンコーダのバックボーンを組み合わせたセグメンテーションモデルをベンチマークする。この研究は、低軌道衛星画像の反則を検出するために最先端のセグメンテーション技術を適用した最初のものである。私たちの作品は、反則セグメンテーションのオープンベンチマークとしても使用でき、公開されています。 Contrails, short for condensation trails, are line-shaped ice clouds produced by aircraft engine exhaust when they fly through cold and humid air. They generate a greenhouse effect by absorbing or directing back to Earth approximately 33% of emitted outgoing longwave radiation. They account for over half of the climate change resulting from aviation activities. Avoiding contrails and adjusting flight routes could be an inexpensive and effective way to reduce their impact. An accurate, automated, and reliable detection algorithm is required to develop and evaluate contrail avoidance strategies. Advancement in contrail detection has been severely limited due to several factors, primarily due to a lack of quality-labeled data. Recently, proposed a large human-labeled Landsat-8 contrails dataset. Each contrail is carefully labeled with various inputs in various scenes of Landsat-8 satellite imagery. In this work, we benchmark several popular segmentation models with combinations of different loss functions and encoder backbones. This work is the first to apply state-of-the-art segmentation techniques to detect contrails in low-orbit satellite imagery. Our work can also be used as an open benchmark for contrail segmentation and is publicly available.	翻訳日:2023-11-08 01:30:38 公開日:2023-11-04
# 連合学習とメタ学習:アプローチ、応用、方向性 Federated Learning and Meta Learning: Approaches, Applications, and Directions ( http://arxiv.org/abs/2210.13111v2 ) ライセンス: Link先を確認	Xiaonan Liu and Yansha Deng and Arumugam Nallanathan and Mehdi Bennis	(参考訳) ここ数年、リソース管理、干渉管理、自律性、無線ネットワークにおける意思決定に対処するため、機械学習(ML)の分野で大きな進歩を遂げてきた。従来のMLアプローチは、トレーニングのために中央サーバでデータを収集する集中型メソッドに依存している。しかし、このアプローチはデバイスのデータのプライバシを維持するという点で課題となる。この問題に対処するため、フェデレーション学習(fl)は、データプライバシを損なうことなく、エッジデバイスが協調的にmlモデルをトレーニングできる効果的なソリューションとして浮上した。 FLでは、ローカルデータセットは共有されず、すべてのデバイスを含む特定のタスクのグローバルモデル学習に重点を置いている。しかし、FLは、異なるデータ分布を持つデバイスにモデルを適応することに関して制限がある。このような場合、メタラーニングは、少数のデータサンプルを用いて異なるデータ分布に学習モデルの適応を可能にするため、考慮される。本稿では,fl,meta learning,federated meta learning (fedmeta)の包括的レビューを紹介する。他のチュートリアルと異なり、私たちの目標はFL、メタラーニング、FedMetaの方法論をどのように設計、最適化、進化させ、無線ネットワーク上で応用するかを探ることです。また、これらの学習アルゴリズム間の関係を分析し、実世界の応用におけるそれらの利点と欠点について検討する。 Over the past few years, significant advancements have been made in the field of machine learning (ML) to address resource management, interference management, autonomy, and decision-making in wireless networks. Traditional ML approaches rely on centralized methods, where data is collected at a central server for training. However, this approach poses a challenge in terms of preserving the data privacy of devices. To address this issue, federated learning (FL) has emerged as an effective solution that allows edge devices to collaboratively train ML models without compromising data privacy. In FL, local datasets are not shared, and the focus is on learning a global model for a specific task involving all devices. However, FL has limitations when it comes to adapting the model to devices with different data distributions. In such cases, meta learning is considered, as it enables the adaptation of learning models to different data distributions using only a few data samples. In this tutorial, we present a comprehensive review of FL, meta learning, and federated meta learning (FedMeta). Unlike other tutorial papers, our objective is to explore how FL, meta learning, and FedMeta methodologies can be designed, optimized, and evolved, and their applications over wireless networks. We also analyze the relationships among these learning algorithms and examine their advantages and disadvantages in real-world applications.	翻訳日:2023-11-08 01:27:36 公開日:2023-11-04
# 大規模言語モデルにおける言語と思考の解離 Dissociating language and thought in large language models ( http://arxiv.org/abs/2301.06627v2 ) ライセンス: Link先を確認	Kyle Mahowald, Anna A. Ivanova, Idan A. Blank, Nancy Kanwisher, Joshua B. Tenenbaum, Evelina Fedorenko	(参考訳) 大規模言語モデル(LLM)は、現在まで人間の言語を習得する上で最も近いモデルとなっているが、その言語的および認知的能力に関する意見は相変わらず分かれている。本稿では,言語規則とパターンの理解-および機能的言語能力-世界における言語の理解と活用-を,形式的言語能力の区別を用いて評価する。我々はこの区別を人間の神経科学に置き、形式的および機能的な能力は異なる神経機構に依存していることを示す。 LLMの形式的能力は驚くほど優れているが、機能的能力のタスクのパフォーマンスは不明瞭であり、しばしば外部モジュールとの特別な微調整や結合を必要とする。要するに、LLMは言語の優れたモデルであるが、人間の思考の不完全なモデルである。 Large language models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their linguistic and cognitive capabilities remain split. Here, we evaluate LLMs using a distinction between formal linguistic competence--knowledge of linguistic rules and patterns--and functional linguistic competence--understanding and using language in the world. We ground this distinction in human neuroscience, showing that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty and often requires specialized fine-tuning and/or coupling with external modules. In short, LLMs are good models of language but incomplete models of human thought.	翻訳日:2023-11-08 01:18:19 公開日:2023-11-04
# 完全逆数検出のための(ほぼ)局所的成長速度推定 Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial Detection ( http://arxiv.org/abs/2212.06776v3 ) ライセンス: Link先を確認	Peter Lorenz, Margret Keuper and Janis Keuper	(参考訳) 畳み込みニューラルネットワーク(CNN)は、多くの知覚的タスクにおける最先端のソリューションを定義する。しかし、現在のCNNアプローチは、人間の目に準知覚できない状態でシステムを騙すために特別に作られた入力の敵の摂動に対して脆弱なままである。近年、モデル硬化や明示的な防御機構の追加など、CNNをこのような攻撃から守るための様々なアプローチが提案されている。これにより、ネットワークに小さな「検出器」が含まれ、真データと逆摂動を含むデータとを区別する二分分類タスクで訓練される。本研究では,ネットワークの局所固有次元(LID)と敵攻撃の関係について,最近の知見を生かした,シンプルで軽量な検出器を提案する。 LID測度の再解釈といくつかの単純な適応に基づいて、敵検出の最先端をかなりのマージンで超越し、複数のネットワークやデータセットのF1スコアでほぼ完璧な結果を得る。出典: https://github.com/adverML/multiLID Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small "detector" is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant margin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID	翻訳日:2023-11-08 01:14:57 公開日:2023-11-04
# 分極正規化とワンパス学習による合意サブネットワークの学習 Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training ( http://arxiv.org/abs/2302.10798v4 ) ライセンス: Link先を確認	Xiaoying Zhi, Varun Babbar, Pheobe Sun, Fran Silavong, Ruibo Shi, Sean Moran	(参考訳) 最近の大規模で複雑なニューラルネットワークモデルの動向を考えると、グリーンAIの主題はディープラーニングコミュニティ内で注目を集めている。推論時のトレーニングの計算負荷を削減する既存のソリューションは、通常ネットワークパラメータの刈り込みを伴う。プルーニングスキームは、反復的なトレーニングと静的プルーニングの微調整、動的プルーニンググラフの反復計算によって余分なオーバーヘッドを生み出す。そこで本研究では, 省エネコストを最小にしつつ, 下流タスクの完全パラメータ化ネットワークと同等の性能を維持する軽量サブネットワークを学習するための新しいパラメータプルーニング手法を提案する。提案手法はグリーン指向であり,動的プルーニング法により最適な静的サブネットワークを発見するためには,ワンオフトレーニングのみを必要とする。プルーニング方式は、二分ゲーティングモジュールと、ユーザが定義した間隔でサブネットワークを探索する新しい損失関数から構成される。提案手法は,訓練段階と推論段階の両方でエネルギーを節約し,演算オーバーヘッドの増大を回避し,同時に刈り取り訓練を可能にする。 CIFAR-10 と CIFAR-100 では,分類精度が1% 未満の深層ネットワークにおける接続の50%を除去できる可能性が示唆された。本手法は他のプルーニング法と比較して,計算コストの等価な削減のための精度の低下を示す。 The subject of green AI has been gaining attention within the deep learning community given the recent trend of ever larger and more complex neural network models. Existing solutions for reducing the computational load of training at inference time usually involve pruning the network parameters. Pruning schemes often create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph. We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks. Our proposed pruning scheme is green-oriented, as it only requires a one-off training to discover the optimal static sub-networks by dynamic pruning methods. The pruning scheme consists of a binary gating module and a novel loss function to uncover sub-networks with user-defined sparsity. Our method enables pruning and training simultaneously, which saves energy in both the training and inference phases and avoids extra computational overhead from gating modules at inference time. Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy. Compared to other related pruning methods, our method demonstrates a lower drop in accuracy for equivalent reductions in computational cost.	翻訳日:2023-11-08 01:05:42 公開日:2023-11-04
# AfriSenti: アフリカの言語に対するTwitterの感情分析ベンチマーク AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages ( http://arxiv.org/abs/2302.08956v5 ) ライセンス: Link先を確認	Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, Nedjma Ousidhoum, David Ifeoluwa Adelani, Seid Muhie Yimam, Ibrahim Sa'id Ahmad, Meriem Beloucif, Saif M. Mohammad, Sebastian Ruder, Oumaima Hourrane, Pavel Brazdil, Felermino D\'ario M\'ario Ant\'onio Ali, Davis David, Salomey Osei, Bello Shehu Bello, Falalu Ibrahim, Tajuddeen Gwadabe, Samuel Rutunda, Tadesse Belay, Wendimu Baye Messelle, Hailu Beshada Balcha, Sisay Adugna Chala, Hagos Tesfahun Gebremichael, Bernard Opoku, Steven Arthur	(参考訳) アフリカには6以上の言語族から2000以上の言語があり、全大陸で最高の言語多様性がある。 75の言語があり、それぞれ100万人の話者がいる。しかし、アフリカ語に関するNLP研究はほとんど行われていない。このような研究を可能にする上で重要なのは、高品質な注釈付きデータセットの提供だ。本稿では,4つの言語族から,14のアフリカ語(アムハラ語,アルジェリア語,ハウサ語,イグボ語,キニャルワンダ語,モロッコ語,モザンビーク語,ナイジェリア・ピジン語,オロモ語,スワヒリ語,ティグリニャ語,トワイ語,キシトニガ語,ヨルジュブワ語)で合計110,000以上のツイートを含む感情分析ベンチマークであるafrisentiを紹介する。ツイートはネイティブスピーカーによって注釈付けされ、AfriSenti-SemEval共有タスクで使用された(AfriSenti Shared Taskには200人以上の参加者がいた)。各データセットのキュレーションにおいて,データ収集の方法論,アノテーションプロセス,対処すべき課題について述べる。さらに,異なるデータセット上で実施したベースライン実験を報告し,その有用性について考察する。 Africa is home to over 2,000 languages from more than six language families and has the highest linguistic diversity among all continents. These include 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial to enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, a sentiment analysis benchmark that contains a total of >110,000 tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yor\`ub\'a) from four language families. The tweets were annotated by native speakers and used in the AfriSenti-SemEval shared task (The AfriSenti Shared Task had over 200 participants. See website at https://afrisenti-semeval.github.io). We describe the data collection methodology, annotation process, and the challenges we dealt with when curating each dataset. We further report baseline experiments conducted on the different datasets and discuss their usefulness.	翻訳日:2023-11-08 01:05:20 公開日:2023-11-04
# 病理組織学的全スライディング画像におけるメラノーマ皮膚癌の検出と局在 Detection and Localization of Melanoma Skin Cancer in Histopathological Whole Slide Images ( http://arxiv.org/abs/2302.03014v4 ) ライセンス: Link先を確認	Neel Kanwal, Roger Amundsen, Helga Hardardottir, Luca Tomasetti, Erling Sandoy Undersrud, Emiel A.M. Janssen, Kjersti Engan	(参考訳) 早期に診断および治療を行ったメラノーマは生存率を高めることができる。皮膚がんの発生が予想される増加と皮膚病理学者の足跡は、計算病理学(CPATH)システムの必要性を強調している。深層学習(DL)モデルを持つCPATHシステムは、基礎となる形態学的および細胞的特徴を利用してメラノーマの存在を識別する可能性がある。本論文は,WSI(Whole Slide Images)における悪性黒色腫の検出と正常皮膚と良性悪性黒色腫病変の鑑別を目的としたDL法を提案する。本手法は, 病変を高精度に検出し, 病理医の関心領域を特定するためにWSI上に局在する。興味深いことに,本手法では,まず1つのCNNネットワークを用いて局所化マップを作成し,それを用いてスライドレベルの予測を行い,メラノーマ患者を判定する。ベストモデルでは、0.992のF1スコアと0.99の感度でパッチ単位の分類結果が得られる。ソースコードはhttps://github.com/RogerAmundsen/Melanoma-Diagnosis-and-Localization-from-Whole-Slide-Images-using-C onvolutional-Neural-Networks。 Melanoma diagnosed and treated in its early stages can increase the survival rate. A projected increase in skin cancer incidents and a dearth of dermatopathologists have emphasized the need for computational pathology (CPATH) systems. CPATH systems with deep learning (DL) models have the potential to identify the presence of melanoma by exploiting underlying morphological and cellular features. This paper proposes a DL method to detect melanoma and distinguish between normal skin and benign/malignant melanocytic lesions in Whole Slide Images (WSI). Our method detects lesions with high accuracy and localizes them on a WSI to identify potential regions of interest for pathologists. Interestingly, our DL method relies on using a single CNN network to create localization maps first and use them to perform slide-level predictions to determine patients who have melanoma. Our best model provides favorable patch-wise classification results with a 0.992 F1 score and 0.99 sensitivity on unseen data. The source code is https://github.com/RogerAmundsen/Melanoma-Diagnosis-and-Localization-from-Whole-Slide-Images-using-C onvolutional-Neural-Networks.	翻訳日:2023-11-08 01:03:20 公開日:2023-11-04
# GQE-Net:ポイントクラウドカラー属性のためのグラフベースの品質向上ネットワーク GQE-Net: A Graph-based Quality Enhancement Network for Point Cloud Color Attribute ( http://arxiv.org/abs/2303.13764v2 ) ライセンス: Link先を確認	Jinrui Xing, Hui Yuan, Raouf Hamzaoui, Hao Liu, and Junhui Hou	(参考訳) 近年、点雲は3次元(3次元)の視覚オブジェクトやシーンを表現するために人気が高まっている。点雲を効率的に保存・送信するために圧縮法が開発されているが、品質が劣化することが多い。点雲の色歪みを低減するため,幾何学情報を補助入力とし,グラフ畳み込みブロックを用いて局所特徴を効率的に抽出するグラフベース品質向上ネットワーク(GQE-Net)を提案する。具体的には,マルチヘッドグラフアテンション機構を備えた並列シリアルグラフアテンションモジュールを用いて重要な点や特徴に着目し,それらを融合させる。さらに,点間の正規性と幾何学的距離を考慮に入れた特徴改善モジュールを設計する。 GPUメモリ容量の制限の中で機能するために、歪んだポイントクラウドはオーバーラップ可能な3Dパッチに分割され、品質向上のためにGQE-Netに送られる。異なる色成分間のデータ分布の違いを考慮するため、3つの色成分について3つのモデルを訓練する。実験結果から,本手法は最先端性能を実現することが示された。例えば、幾何ベースのポイントクラウド圧縮 (g-pcc) 標準である 0.43 db, 0.25 db, 0.36 db bjontegaard delta (bd)-peak-signal-to-noise ratio (psnr) の最近のテストモデル上でgqe-netを実装する場合、それぞれ、y、cb、crコンポーネントの高密度ポイントクラウド上で、14.0%、9.3%、14.5%のbdレート節約を達成できる。このメソッドのソースコードはhttps://github.com/xjr998/gqe-netで入手できる。 In recent years, point clouds have become increasingly popular for representing three-dimensional (3D) visual objects and scenes. To efficiently store and transmit point clouds, compression methods have been developed, but they often result in a degradation of quality. To reduce color distortion in point clouds, we propose a graph-based quality enhancement network (GQE-Net) that uses geometry information as an auxiliary input and graph convolution blocks to extract local features efficiently. Specifically, we use a parallel-serial graph attention module with a multi-head graph attention mechanism to focus on important points or features and help them fuse together. Additionally, we design a feature refinement module that takes into account the normals and geometry distance between points. To work within the limitations of GPU memory capacity, the distorted point cloud is divided into overlap-allowed 3D patches, which are sent to GQE-Net for quality enhancement. To account for differences in data distribution among different color components, three models are trained for the three color components. Experimental results show that our method achieves state-of-the-art performance. For example, when implementing GQE-Net on a recent test model of the geometry-based point cloud compression (G-PCC) standard, 0.43 dB, 0.25 dB, and 0.36 dB Bjontegaard delta (BD)-peak-signal-to-noise ratio (PSNR), corresponding to 14.0%, 9.3%, and 14.5% BD-rate savings can be achieved on dense point clouds for the Y, Cb, and Cr components, respectively. The source code of our method is available at https://github.com/xjr998/GQE-Net.	翻訳日:2023-11-07 23:21:01 公開日:2023-11-04
# 命令型ニューラル表現を用いたタスク指向型ヒューマンオブジェクトインタラクション生成 Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations ( http://arxiv.org/abs/2303.13129v2 ) ライセンス: Link先を確認	Quanzhou Li, Jingbo Wang, Chen Change Loy, Bo Dai	(参考訳) デジタルヒューマンモーション合成は、映画、AR/VR、ビデオゲームに応用される活発な研究分野である。自然で現実的な人間の動きを生成する方法が提案されたが、ほとんどは人間のモデリングに焦点を合わせ、物体の動きを無視した。シミュレーションにおけるタスク指向の人間-物体相互作用運動の生成は困難である。物体の使用の異なる意図のために、人間は様々な動きを行うため、人間はまず物体に接近し、そこに留まる代わりに人間と連続して動くように要求する。また、下流アプリケーションに展開するためには、合成された動きは、様々な目的のために予測された動きをパーソナライズするオプションを提供するために、長めの柔軟性が望まれる。この目的のために,タスクタイプ,オブジェクト,および開始状態のみを与えられた特定のタスクを実行するために,完全なヒューマン・オブジェクトインタラクション動作を生成する暗黙の神経表現によるタスク指向のヒューマン・オブジェクトインタラクション生成を提案する。 TOHOは3ステップで人物体の動きを生成する。 1) タスクの種類と対象情報を与えられたタスクを実行する際のキーフレームのポーズを最初に見積もる。 2) キーフレームを満たし,連続的な動作を生成する。 3) 最後に,コンパクトな閉形式物体運動推定を適用し,物体運動を生成する。本手法では,時間座標のみによってパラメータ化される連続運動を生成し,任意のフレームへのシーケンスのアップサンプリングやダウンサンプリングを可能にし,時間座標ベクトルの設計による動き速度の調整を行う。本手法の有効性を質的および定量的に実証する。この研究は、一般の人間とシーンの相互作用シミュレーションに向けてさらに一歩前進する。 Digital human motion synthesis is a vibrant research field with applications in movies, AR/VR, and video games. Whereas methods were proposed to generate natural and realistic human motions, most only focus on modeling humans and largely ignore object movements. Generating task-oriented human-object interaction motions in simulation is challenging. For different intents of using the objects, humans conduct various motions, which requires the human first to approach the objects and then make them move consistently with the human instead of staying still. Also, to deploy in downstream applications, the synthesized motions are desired to be flexible in length, providing options to personalize the predicted motions for various purposes. To this end, we propose TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations, which generates full human-object interaction motions to conduct specific tasks, given only the task type, the object, and a starting human status. TOHO generates human-object motions in three steps: 1) it first estimates the keyframe poses of conducting a task given the task type and object information; 2) then, it infills the keyframes and generates continuous motions; 3) finally, it applies a compact closed-form object motion estimation to generate the object motion. Our method generates continuous motions that are parameterized only by the temporal coordinate, which allows for upsampling or downsampling of the sequence to arbitrary frames and adjusting the motion speeds by designing the temporal coordinate vector. We demonstrate the effectiveness of our method, both qualitatively and quantitatively. This work takes a step further toward general human-scene interaction simulation.	翻訳日:2023-11-07 23:20:23 公開日:2023-11-04
# 触媒による熱過程の階層崩壊 A hierarchy of thermal processes collapses under catalysis ( http://arxiv.org/abs/2303.13020v2 ) ライセンス: Link先を確認	Jeongrak Son, Nelly H.Y. Ng	(参考訳) 熱操作は、熱力学的制約下での許容状態遷移の一般的な記述である。しかし、これらのプロセスをすべて包含するシンプルな方法の探求は未完成のままである。この課題は、容易に利用できると仮定された熱浴の触媒利用によって解決する。基本熱操作とマルコフ熱操作の2つの簡易操作を選択した。彼らは実験的な実現性で知られているが、生来のマルコヴィアン性のために熱活動の完全な範囲を捉えられなかった。しかし, 環境温度のギブス状態触媒によって操作が強化されると, この制限を克服できることを示す。以上の結果から, 熱的操作における自由状態は, より単純な操作に必要な非マルコビアン性を与える触媒として機能することが示唆された。さらに, 触媒が適用できる場合, 異なる熱過程(熱操作, 初等熱操作, マルコフ熱操作)が収束することを示す。特に,エネルギー固有ベイシスにおける初期状態のコヒーレンスに関するシナリオは,その特徴付けが難しいことで悪名高い。 Thermal operations are a generic description for allowed state transitions under thermodynamic restrictions. However, the quest for simpler methods to encompass all these processes remains unfulfilled. We resolve this challenge through the catalytic use of thermal baths, which are assumed to be easily accessible. We select two sets of simplified operations: elementary thermal operations and Markovian thermal operations. They are known for their experimental feasibility, but fail to capture the full extent of thermal operations due to their innate Markovianity. We nevertheless demonstrate that this limitation can be overcome when the operations are enhanced by ambient-temperature Gibbs state catalysts. In essence, our result indicates that free states within thermal operations can act as catalysts that provide the necessary non-Markovianity for simpler operations. Furthermore, we prove that when any catalyst can be employed, different thermal processes (thermal operations, elementary thermal operations, and Markovian thermal operations) converge. Notably, our results extend to scenarios involving initial states with coherence in the energy eigenbasis, a notoriously difficult process to characterise.	翻訳日:2023-11-07 23:19:58 公開日:2023-11-04
# ネットワークシナリオにおける局所モデルの数値支援決定 Numerically assisted determination of local models in network scenarios ( http://arxiv.org/abs/2303.09954v3 ) ライセンス: Link先を確認	Jos\'e M\'ario da Silva and Fernando Parisio	(参考訳) ネットワークシナリオにおける隠れ変数の濃度が一般性を失うことなく有限であると仮定できるという事実を生かして、与えられた統計的振る舞いを再現する明示的な局所モデルを見つけるための数値ツールを開発した。次に,ネットワーク局所境界が知られている統計的行動の家族を用いて,二元的シナリオを用いて数値計算を行った。さらに,入力のない三角形ネットワークにおいて,均一なランダムノイズを混合した3つの顕著な分布の臨界可視性について検討した。グリーンベルガー・ホルン・ザイリンガー(GHZ)およびW分布(第4次多項式の根である)の臨界可視性についての予想と、エレガント関節計測分布の臨界可視性の低い境界推定を提供する。開発されたコードとドキュメントはgithub.com/mariofilho281/localmodelsで公開されている Taking advantage of the fact that the cardinalities of hidden variables in network scenarios can be assumed to be finite without loss of generality, a numerical tool for finding explicit local models that reproduce a given statistical behaviour was developed. The numerical procedure was then validated using families of statistical behaviours for which the network-local boundary is known, in the bilocal scenario. Furthermore, the critical visibility for 3 notable distributions mixed with a uniform random noise is investigated in the triangle network without inputs. We provide conjectures for the critical visibilities of the Greenberger-Horne-Zeilinger (GHZ) and W distributions (which are roots of 4th degree polynomials), as well as a lower bound estimate of the critical visibility of the Elegant Joint Measurement distribution. The developed codes and documentation are publicly available at github.com/mariofilho281/localmodels	翻訳日:2023-11-07 23:18:49 公開日:2023-11-04
# 多値拡散:画像生成のための無限次元スコアベース拡散モデル Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation ( http://arxiv.org/abs/2303.04772v3 ) ライセンス: Link先を確認	Paul Hagemann, Sophie Mildenberger, Lars Ruthotto, Gabriele Steidl, Nicole Tianjiao Yang	(参考訳) スコアベース拡散モデル(SBDM)は画像生成のための最先端のアプローチとして最近登場した。既存のSBDMは通常有限次元の設定で定式化され、画像は有限サイズのテンソルと見なされる。本稿では,SBDMを無限次元設定,すなわち矩形領域でサポートされている関数としてトレーニングデータをモデル化する。より高解像度で画像を生成することの探求に加えて、私たちの一番の動機は、複数の解像度レベルで一貫した識別を可能にするために、よく考えられた無限次元の学習問題を作ることです。そこで我々は,様々な解像度レベルで一般化し,学習過程の効率化を図る拡散モデルを得る。無限次元設定におけるsbdmアプローチの2つの欠点を克服する方法を示す。まず, 潜在分布が無限次元設定においてトレースクラス作用素の概念を用いて well-defined であることを保証するために, フォワードプロセスを修正した。有限近似に対する逆過程を導出する。第2に,オペレータネットワークでスコア関数を近似することは,多レベルトレーニングに有用であることを示す。離散化の収束とマルチレベルトレーニングの近似を導出した後、無限次元SBDM手法を実装し、MNISTとFashion-MNISTで最初の有望な結果を示す。 Score-based diffusion models (SBDM) have recently emerged as state-of-the-art approaches for image generation. Existing SBDMs are typically formulated in a finite-dimensional setting, where images are considered as tensors of finite size. This paper develops SBDMs in the infinite-dimensional setting, that is, we model the training data as functions supported on a rectangular domain. Besides the quest for generating images at ever higher resolution, our primary motivation is to create a well-posed infinite-dimensional learning problem so that we can discretize it consistently on multiple resolution levels. We thereby intend to obtain diffusion models that generalize across different resolution levels and improve the efficiency of the training process. We demonstrate how to overcome two shortcomings of current SBDM approaches in the infinite-dimensional setting. First, we modify the forward process to ensure that the latent distribution is well-defined in the infinite-dimensional setting using the notion of trace class operators. We derive the reverse processes for finite approximations. Second, we illustrate that approximating the score function with an operator network is beneficial for multilevel training. After deriving the convergence of the discretization and the approximation of multilevel training, we implement an infinite-dimensional SBDM approach and show the first promising results on MNIST and Fashion-MNIST, underlining our developed theory.	翻訳日:2023-11-07 23:17:02 公開日:2023-11-04
# 2層ReLU畳み込みニューラルネットワークの良性オーバーフィッティング Benign Overfitting for Two-layer ReLU Convolutional Neural Networks ( http://arxiv.org/abs/2303.04145v2 ) ライセンス: Link先を確認	Yiwen Kou and Zixiang Chen and Yuanzhou Chen and Quanquan Gu	(参考訳) 優れた表現力を持つ現代のディープラーニングモデルは、トレーニングデータに過度に適合するが、それでも十分に一般化できる。この現象は \textit{benign overfitting} と呼ばれる。近年、ニューラルネットワークの良性過剰適合を理論的に理解しようとする研究がいくつかある。しかしながら、これらの研究は、スムーズな活性化機能を持つニューラルネットワークや、ニューラルタンジェントカーネル体制に限られている。 ReLUニューラルネットワークが過度に適合する理由と時期は未解決のままである。本研究では,ラベルフリップ雑音を伴う2層ReLU畳み込みニューラルネットワークを学習するアルゴリズム依存型リスク境界を確立することにより,この問題に対処する。緩やかな条件下では、勾配降下によってトレーニングされたニューラルネットワークは、ほぼゼロに近いトレーニング損失とベイズ最適試験リスクを達成できることを示す。また,テストリスクの観点から,データ分布の異なる条件下での良性と有害なオーバーフィッティングの急激な移行も明らかにした。私たちの理論を裏付ける合成データの実験。 Modern deep learning models with great expressive power can be trained to overfit the training data but still generalize well. This phenomenon is referred to as \textit{benign overfitting}. Recently, a few studies have attempted to theoretically understand benign overfitting in neural networks. However, these works are either limited to neural networks with smooth activation functions or to the neural tangent kernel regime. How and when benign overfitting can occur in ReLU neural networks remains an open problem. In this work, we seek to answer this question by establishing algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise. We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk. Our result also reveals a sharp transition between benign and harmful overfitting under different conditions on data distribution in terms of test risk. Experiments on synthetic data back up our theory.	翻訳日:2023-11-07 23:16:38 公開日:2023-11-04
# 高レベルロボット説明のための逆解法について A Closer Look at Reward Decomposition for High-Level Robotic Explanations ( http://arxiv.org/abs/2304.12958v2 ) ライセンス: Link先を確認	Wenhao Lu, Xufeng Zhao, Sven Magg, Martin Gromniak, Mengdi Li, Stefan Wermter	(参考訳) 強化学習(RL)によって人間に学習された知的エージェントの振る舞いを説明することは、理解不能な先天受容状態、変分中間目標、そして結果として予測不可能であるために、非常に難しい。さらに、RLエージェントの1段階の説明は、各遷移におけるエージェントの将来の振る舞いを説明できないため曖昧になり、ロボットアクションを説明する複雑さが増す。タスク固有のプリミティブにマップする抽象的なアクションを活用することで、動作レベルの説明を避けることができる。ロボットシステムの透明性と説明可能性をさらに向上するために,報酬分解(RD)と抽象的な行動空間を組み合わせたQ-Map学習フレームワークを提案する。本研究では,人間の理解が容易なRD説明の出力成果から視覚的・テキスト的説明を提示する,2つのシナリオの定量的・定性的な分析を通じて,フレームワークの有効性を実証する。さらに,これらのアーティファクトを大規模言語モデル(llm)に統合し,推論と対話的なクエリを行う汎用性を示す。 Explaining the behaviour of intelligent agents learned by reinforcement learning (RL) to humans is challenging yet crucial due to their incomprehensible proprioceptive states, variational intermediate goals, and resultant unpredictability. Moreover, one-step explanations for RL agents can be ambiguous as they fail to account for the agent's future behaviour at each transition, adding to the complexity of explaining robot actions. By leveraging abstracted actions that map to task-specific primitives, we avoid explanations on the movement level. To further improve the transparency and explainability of robotic systems, we propose an explainable Q-Map learning framework that combines reward decomposition (RD) with abstracted action spaces, allowing for non-ambiguous and high-level explanations based on object properties in the task. We demonstrate the effectiveness of our framework through quantitative and qualitative analysis of two robotic scenarios, showcasing visual and textual explanations, from output artefacts of RD explanations, that are easy for humans to comprehend. Additionally, we demonstrate the versatility of integrating these artefacts with large language models (LLMs) for reasoning and interactive querying.	翻訳日:2023-11-07 23:09:01 公開日:2023-11-04
# ChatGPT対応労働市場の将来--中国における予備的研究 The Future of ChatGPT-enabled Labor Market: A Preliminary Study in China ( http://arxiv.org/abs/2304.09823v4 ) ライセンス: Link先を確認	Lan Chen, Xi Chen, Shiyu Wu, Yaqi Yang, Meng Chang, Hengshu Zhu	(参考訳) 驚くべき大きな言語モデルとして、chatgptは様々な現実世界のタスクで並行して成功し、日々の生活や仕事においてますます重要な役割を演じています。しかし、倫理的な問題、特にChatGPTのような人工知能(AGI)が人間の仕事を置き換えるかどうかについても、大きな懸念が持ち上がっている。そこで,本稿では,人間-AIコンファレンスではなく,人間-AI共生の観点から,ChatGPTを活用した労働市場の将来に関する予備的なデータ駆動研究を紹介する。具体的には、中国最大のオンラインリクルートプラットフォームであるboss zhipinで、大規模求人データの詳細な分析をまず実施する。その結果、現在の労働市場の職業の約28%はChatGPT関連のスキルを必要とすることがわかった。さらに,大規模職業中心知識グラフに基づいて,労働市場における職業スキル関係を予測するための意味情報強化協調フィルタリングアルゴリズムを開発した。その結果,今後45%の職業がchatgpt関連のスキルを必要とすることがわかった。特に、技術、製品、オペレーションに関連する産業は、ChatGPT関連のスキルに対して高い熟練度を要求され、一方、製造、サービス、教育、健康科学関連産業は、ChatGPT関連スキルに対してより低い熟練度を要求される。 As a phenomenal large language model, ChatGPT has achieved unparalleled success in various real-world tasks and increasingly plays an important role in our daily lives and work. However, extensive concerns are also raised about the potential ethical issues, especially about whether ChatGPT-like artificial general intelligence (AGI) will replace human jobs. To this end, in this paper, we introduce a preliminary data-driven study on the future of ChatGPT-enabled labor market from the view of Human-AI Symbiosis instead of Human-AI Confrontation. To be specific, we first conduct an in-depth analysis of large-scale job posting data in BOSS Zhipin, the largest online recruitment platform in China. The results indicate that about 28% of occupations in the current labor market require ChatGPT-related skills. Furthermore, based on a large-scale occupation-centered knowledge graph, we develop a semantic information enhanced collaborative filtering algorithm to predict the future occupation-skill relations in the labor market. As a result, we find that additional 45% occupations in the future will require ChatGPT-related skills. In particular, industries related to technology, products, and operations are expected to have higher proficiency requirements for ChatGPT-related skills, while the manufacturing, services, education, and health science related industries will have lower requirements for ChatGPT-related skills.	翻訳日:2023-11-07 23:07:43 公開日:2023-11-04
# 視覚言語事前学習のためのベースラインの改善 Improved baselines for vision-language pre-training ( http://arxiv.org/abs/2305.08675v2 ) ライセンス: Link先を確認	Enrico Fini and Pietro Astolfi and Adriana Romero-Soriano and Jakob Verbeek and Michal Drozdzal	(参考訳) コントラスト学習はマルチモーダル表現を学習するための効率的なフレームワークとして登場した。この領域の独創的な研究であるクリップは、コントラスト損失を使ってペア画像テキストデータをトレーニングすることで素晴らしい結果を得た。最近の研究は、自己教師型学習にインスパイアされた非コントラスト的損失によるCLIPの改善を主張している。しかし、モデルのトレーニングに使用されるデータ拡張や正規化といった他の実装の詳細から、これらの追加的な損失の貢献を外すのは難しい場合があります。そこで本稿では,コントラスト学習と近年の自己教師型学習の進歩を組み合わせることで得られるいくつかの基本点を,まず提案し,実装し,評価する。特に,視覚的自己指導学習において得られた損失関数を用いて画像とテキストのモダリティを整列させる。これらのベースラインはCLIPの基本実装よりも優れています。しかし、より強いトレーニングレシピを採用すると、その利点は消える。実際、簡単なCLIPベースラインも大幅に改善され、他のサブフィールドで人気がある有名なトレーニング技術を使用することで、下流のゼロショットタスクを25%改善できることがわかった。また,先行研究による改善のほとんどを補うために,画像やテキストの増補を適用するだけで十分であることがわかった。 clipのトレーニングレシピが改善されたことで,4つの標準データセットで最先端のパフォーマンスが得られ,従来作業(最大データセットでは最大+4%まで)を一貫して上回っています。コードはhttps://github.com/facebookresearch/clip-rocketで入手できる。 Contrastive learning has emerged as an efficient framework to learn multimodal representations. CLIP, a seminal work in this area, achieved impressive results by training on paired image-text data using the contrastive loss. Recent work claims improvements over CLIP using additional non-contrastive losses inspired from self-supervised learning. However, it is sometimes hard to disentangle the contribution of these additional losses from other implementation details, e.g., data augmentation or regularization techniques, used to train the model. To shed light on this matter, in this paper, we first propose, implement and evaluate several baselines obtained by combining contrastive learning with recent advances in self-supervised learning. In particular, we use the loss functions that were proven successful for visual self-supervised learning to align image and text modalities. We find that these baselines outperform a basic implementation of CLIP. However, when a stronger training recipe is employed, the advantage disappears. Indeed, we find that a simple CLIP baseline can also be improved substantially, up to a 25% relative improvement on downstream zero-shot tasks, by using well-known training techniques that are popular in other subfields. Moreover, we discover that it is enough to apply image and text augmentations to make up for most of the improvement attained by prior works. With our improved training recipe for CLIP, we obtain state-of-the-art performance on four standard datasets, and consistently outperform prior work (up to +4% on the largest dataset), while being substantially simpler. The code is available at https://github.com/facebookresearch/clip-rocket	翻訳日:2023-11-07 22:56:10 公開日:2023-11-04
# 校正説明:不確実性情報と対策 Calibrated Explanations: with Uncertainty Information and Counterfactuals ( http://arxiv.org/abs/2305.02305v3 ) ライセンス: Link先を確認	Helena Lofstrom, Tuwe Lofstrom, Ulf Johansson, Cecilia Sonstrod	(参考訳) aiモデルの局所的な説明は、機能の重要性など個々の予測に対する洞察を提供するが、不安定性などの問題に苦しめられている。 MLモデルのキャリブレーションが不十分なためにしばしば歪んだ特徴量の信頼性の欠如は、これらの課題をさらに深めている。さらに、特徴の重要さの重要な側面は、説明可能なAI(XAI)にほとんど適応していない。本稿では,これらの課題に真っ向から対処するために,キャリブレート説明(CE)と呼ばれる特徴重要度説明手法を提案する。 Venn-Abersの基礎の上に構築されたCEは、基礎となるモデルを校正するだけでなく、機能重みを正確に定義した信頼性の高い機能重要な説明を提供する。 CEは出力の不確実性に対処することで、従来のソリューションを超える。これは特徴量とモデルの確率推定の両方に対して不確実な定量化を提供することによって達成される。さらに、CEはモデルに依存しず、容易に理解可能な条件付きルールと、組み込まれた不確実性定量化による反実的説明を生成する能力を備えている。 25のベンチマークデータセットによる評価の結果は、CEの有効性を裏付けるもので、高速で信頼性があり、安定しており、堅牢なソリューションである。 While local explanations for AI models can offer insights into individual predictions, such as feature importance, they are plagued by issues like instability. The unreliability of feature weights, often skewed due to poorly calibrated ML models, deepens these challenges. Moreover, the critical aspect of feature importance uncertainty remains mostly unaddressed in Explainable AI (XAI). The novel feature importance explanation method presented in this paper, called Calibrated Explanations (CE), is designed to tackle these issues head-on. Built on the foundation of Venn-Abers, CE not only calibrates the underlying model but also delivers reliable feature importance explanations with an exact definition of the feature weights. CE goes beyond conventional solutions by addressing output uncertainty. It accomplishes this by providing uncertainty quantification for both feature weights and the model's probability estimates. Additionally, CE is model-agnostic, featuring easily comprehensible conditional rules and the ability to generate counterfactual explanations with embedded uncertainty quantification. Results from an evaluation with 25 benchmark datasets underscore the efficacy of CE, making it stand as a fast, reliable, stable, and robust solution.	翻訳日:2023-11-07 22:51:48 公開日:2023-11-04
# リピッツネスとスムーズネスのないオンラインポートフォリオ選択のためのデータ依存境界 Data-Dependent Bounds for Online Portfolio Selection Without Lipschitzness and Smoothness ( http://arxiv.org/abs/2305.13946v2 ) ライセンス: Link先を確認	Chung-En Tsai and Ying-Ting Lin and Yen-Huan Li	(参考訳) この研究は、オンラインポートフォリオ選択における最初の小さな損失と段階的な後悔の限界を導入し、非リプシッツ、非スムース損失によるオンライン凸最適化のためのデータ依存境界の最初の例を示している。提案するアルゴリズムは、最悪の場合におけるサブ線形後悔率を示し、データが「容易」である場合に対数後悔を達成する。後悔境界は、対数損失の新たなスムーズな特徴付け、正規化リーダ(FTRL)と必ずしも障壁ではない自己調和正則化器による局所ノルムに基づく解析、および対数バリアによる楽観的FTRLの暗黙的変種を用いて導出される。 This work introduces the first small-loss and gradual-variation regret bounds for online portfolio selection, marking the first instances of data-dependent bounds for online convex optimization with non-Lipschitz, non-smooth losses. The algorithms we propose exhibit sublinear regret rates in the worst cases and achieve logarithmic regrets when the data is "easy," with per-iteration time almost linear in the number of investment alternatives. The regret bounds are derived using novel smoothness characterizations of the logarithmic loss, a local norm-based analysis of following the regularized leader (FTRL) with self-concordant regularizers, which are not necessarily barriers, and an implicit variant of optimistic FTRL with the log-barrier.	翻訳日:2023-11-07 22:41:52 公開日:2023-11-04
# RKHMとペロン・フロベニウス演算子によるカーネルによる深層学習 Deep Learning with Kernels through RKHM and the Perron-Frobenius Operator ( http://arxiv.org/abs/2305.13588v2 ) ライセンス: Link先を確認	Yuka Hashimoto, Masahiro Ikeda, Hachem Kadri	(参考訳) 再生カーネル Hilbert $C^$-module (RKHM) は、C^$-algebra を用いて再生カーネル Hilbert 空間 (RKHS) の一般化であり、ペロン・フロベニウス作用素は函数の構成に関連する線型作用素である。これら2つの概念を組み合わせることで、カーネルメソッドのディープラーニングフレームワークであるDeep RKHMを提案する。この設定で束縛された新しいラデマッハ一般化を導出し、ペロン・フロベニウス作用素による良性過剰の理論的解釈を提供する。 C^$-algebraにより、出力次元上の境界の依存性は、既存の境界よりも緩やかである。 C^$-algebraはカーネルによるディープラーニングに適したツールであり、演算子の製品構造を活用でき、畳み込みニューラルネットワークとの明確な接続を提供することができる。我々の理論的解析は、深いカーネルメソッドを設計、分析できる新しいレンズを提供する。 Reproducing kernel Hilbert $C^$-module (RKHM) is a generalization of reproducing kernel Hilbert space (RKHS) by means of $C^$-algebra, and the Perron-Frobenius operator is a linear operator related to the composition of functions. Combining these two concepts, we present deep RKHM, a deep learning framework for kernel methods. We derive a new Rademacher generalization bound in this setting and provide a theoretical interpretation of benign overfitting by means of Perron-Frobenius operators. By virtue of $C^$-algebra, the dependency of the bound on output dimension is milder than existing bounds. We show that $C^$-algebra is a suitable tool for deep learning with kernels, enabling us to take advantage of the product structure of operators and to provide a clear connection with convolutional neural networks. Our theoretical analysis provides a new lens through which one can design and analyze deep kernel methods.	翻訳日:2023-11-07 22:41:12 公開日:2023-11-04
# NashFormer: 局所的なNash平衡を利用した意味的多元性軌道予測 NashFormer: Leveraging Local Nash Equilibria for Semantically Diverse Trajectory Prediction ( http://arxiv.org/abs/2305.17600v2 ) ライセンス: Link先を確認	Justin Lidard, Oswin So, Yanxia Zhang, Jonathan DeCastro, Xiongyi Cui, Xin Huang, Yen-Ling Kuo, John Leonard, Avinash Balachandran, Naomi Leonard, Guy Rosman	(参考訳) 道路エージェント間の相互作用は、特に複数のエージェントを含む場合において、軌道予測において重要な課題となる。既存の多様性を考慮した予測器はマルチエージェント予測のインタラクティブな性質を考慮しないため、これらの重要な相互作用の結果を見逃す可能性がある。本稿では,マルチモーダル予測のカバレッジ向上のために,ゲーム理論の逆強化学習を活用する軌道予測フレームワークであるNashFormerを提案する。トレーニング時間ゲーム理論解析を補助的損失として用いて,エージェントの行動の分類を仮定することなく,カバレッジと精度を向上させる。 Waymo Open Motion Datasetのインタラクティブな分割について,対話性の高いシナリオを含む4つのサブセットを含む,私たちのアプローチを実証する。実験の結果,予測器はベースラインモデルよりも3,3\%以上の潜在的な相互作用をカバーし,正確な予測を行うことがわかった。 Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose NashFormer, a framework for trajectory prediction that leverages game-theoretic inverse reinforcement learning to improve coverage of multi-modal predictions. We use a training-time game-theoretic analysis as an auxiliary loss resulting in improved coverage and accuracy without presuming a taxonomy of actions for the agents. We demonstrate our approach on the interactive split of the Waymo Open Motion Dataset, including four subsets involving scenarios with high interaction complexity. Experiment results show that our predictor produces accurate predictions while covering $33\%$ more potential interactions versus a baseline model.	翻訳日:2023-11-07 22:31:46 公開日:2023-11-04
# コンテキスト圧縮に言語モデルを適用する Adapting Language Models to Compress Contexts ( http://arxiv.org/abs/2305.14788v2 ) ライセンス: Link先を確認	Alexis Chevalier, Alexander Wettig, Anirudh Ajith, Danqi Chen	(参考訳) トランスフォーマティブ言語モデル(lms)は強力で広く適用可能なツールであるが、その有用性は、有限コンテキストウィンドウと長いテキスト文書を処理するための高価な計算コストによって制限されている。プリトレーニングされたlmsをオートコンプレッサーに適用する。これらの言語モデルは、長いコンテキストをコンパクトなサマリーベクトルに圧縮し、ソフトプロンプトとしてモデルにアクセスすることができる。要約ベクトルは教師なしの目的で訓練され、長い文書はセグメントで処理され、以前の全てのセグメントからの要約ベクトルは言語モデリングに使用される。最大30,720個のトークンのシーケンスでOPTとLlama-2モデルを微調整し、AutoCompressorが長いコンテキストを使ってパープレキシティを向上できることを示す。タスク実演を圧縮することで,テキスト内学習におけるAutoCompressorsの評価を行い,要約ベクトルが平文実演の代用となり,推論コストを削減しつつ精度を高めた。最後に,検索強化言語モデルに要約ベクトルを適用することで,大規模コーパスに対する要約ベクトルの事前計算の利点について検討する。全体として、オートコンプレッサーはlmsのコンテキストウィンドウを拡張し、長いコンテキストでの推論をスピードアップするためのシンプルで安価なソリューションとして現れる。 Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window and the expensive computational cost of processing long text documents. We propose to adapt pre-trained LMs into AutoCompressors. These language models are capable of compressing long contexts into compact summary vectors, which are then accessible to the model as soft prompts. Summary vectors are trained with an unsupervised objective, whereby long documents are processed in segments, and summary vectors from all previous segments are used in language modeling. We fine-tune OPT and Llama-2 models on sequences of up to 30,720 tokens and show that AutoCompressors can utilize long contexts to improve perplexity. We evaluate AutoCompressors on in-context learning by compressing task demonstrations and find that summary vectors are good substitutes for plain-text demonstrations, increasing accuracy while reducing inference costs. Finally, we explore the benefits of pre-computing summary vectors for large corpora by applying summary vectors to retrievalaugmented language modeling and a passage re-ranking task. Overall, AutoCompressors emerge as a simple and inexpensive solution to extend the context window of LMs while speeding up inference over long contexts.	翻訳日:2023-11-07 22:27:10 公開日:2023-11-04
# Language-Model-as-an-Examinerを用いたベンチマーク基礎モデル Benchmarking Foundation Models with Language-Model-as-an-Examiner ( http://arxiv.org/abs/2306.04181v2 ) ライセンス: Link先を確認	Yushi Bai, Jiahao Ying, Yixin Cao, Xin Lv, Yuze He, Xiaozhi Wang, Jifan Yu, Kaisheng Zeng, Yijia Xiao, Haozhe Lyu, Jiayin Zhang, Juanzi Li, Lei Hou	(参考訳) 人間に似た方法で言語を理解し、生成するモデルの能力の包括的なテストとして、オープンエンドの質問応答における基礎モデルのパフォーマンスを評価するために、多くのベンチマークが確立されている。これらの研究の多くは、新しいデータセットの提案に重点を置いているが、以前のベンチマークパイプラインには2つの大きな問題がある。本稿では,lmが知識に基づいて質問を定式化し,その応答を参照のない方法で評価する,新たなベンチマークフレームワークであるlanguage-model-as-an-examinerを提案する。我々のフレームワークは、様々なlmsを検査者として採用することができ、質問はより多様なトリガートピックによって常に更新できるため、無力な拡張性を可能にする。より包括的かつ公平な評価を行うため,(1)広範囲のドメインに質問を発生させるようLM検査官に指示し,さらに詳細な評価を行うためにフォローアップ質問を提起する3つの戦略を考案した。 2)評価では,評価基準と評価基準を組み合わせ,人間のアノテーションと密接に一致して信頼性の高い結果が得られる。 (3) 単検定における偏りに対処する分散化ピア検定法も提案する。我々のデータとベンチマーク結果は以下の通りである。 Numerous benchmarks have been established to assess the performance of foundation models on open-ended question answering, which serves as a comprehensive test of a model's ability to understand and generate language in a manner similar to humans. Most of these works focus on proposing new datasets, however, we see two main issues within previous benchmarking pipelines, namely testing leakage and evaluation automation. In this paper, we propose a novel benchmarking framework, Language-Model-as-an-Examiner, where the LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner. Our framework allows for effortless extensibility as various LMs can be adopted as the examiner, and the questions can be constantly updated given more diverse trigger topics. For a more comprehensive and equitable evaluation, we devise three strategies: (1) We instruct the LM examiner to generate questions across a multitude of domains to probe for a broad acquisition, and raise follow-up questions to engage in a more in-depth assessment. (2) Upon evaluation, the examiner combines both scoring and ranking measurements, providing a reliable result as it aligns closely with human annotations. (3) We additionally propose a decentralized Peer-examination method to address the biases in a single examiner. Our data and benchmarking results are available at: http://lmexam.xlore.cn.	翻訳日:2023-11-07 22:19:25 公開日:2023-11-04
# トルコ語テキスト可読性のためのハイブリッド言語機能の検討 Exploring Hybrid Linguistic Features for Turkish Text Readability ( http://arxiv.org/abs/2306.03774v3 ) ライセンス: Link先を確認	Ahmet Yavuz Uluslu and Gerold Schneider	(参考訳) 本稿では,トルコ語テキストの自動可読性評価に関する最初の包括的研究を行う。我々は,最先端のニューラルネットワークモデルと,語彙的,形態素的,構文的,談話的レベルでの言語的特徴を組み合わせることで,高度な可読性ツールを開発した。従来の可読性公式の有効性を,現代の自動手法と比較して評価し,トルコ語の可読性を決定する重要な言語的特徴を特定する。 This paper presents the first comprehensive study on automatic readability assessment of Turkish texts. We combine state-of-the-art neural network models with linguistic features at lexical, morphosyntactic, syntactic and discourse levels to develop an advanced readability tool. We evaluate the effectiveness of traditional readability formulas compared to modern automated methods and identify key linguistic features that determine the readability of Turkish texts.	翻訳日:2023-11-07 22:18:02 公開日:2023-11-04
# 制限付き選択バイアスによる統計的推測 Statistical Inference Under Constrained Selection Bias ( http://arxiv.org/abs/2306.03302v3 ) ライセンス: Link先を確認	Santiago Cortes-Gomez, Mateo Dulce, Carlos Patino, Bryan Wilder	(参考訳) 大規模なデータセットは、意思決定を知らせるためにますます使われています。この取り組みは、現実世界の証拠にポリシーを基礎付けることを目的としているが、選択バイアスやその他の分布シフトが観察データに支障をきたすため、課題が発生する。堅牢な推論を提供する以前の試みでは、ユーザが指定した分布シフトの量(例えば、観測された分布と対象分布の最大KLばらつき)に応じて保証が与えられていた。しかしながら、意思決定者は、可能なシフトの種類を制限するターゲット分布に関する追加の知識を持つことが多い。このような情報を活用するために,対象分布下で期待が知られている関数の形で,ユーザが特定した制約に従う選択バイアスの存在下で統計的推測を可能にする枠組みを提案する。出力は、目標分布に対する推定値に対する高確率境界である。そこで,本手法は,広い範囲の推定値を部分的に識別するために,ドメイン知識を活用する。これらの境界を推定する手法の計算・統計特性を解析し,本手法が実世界のユースケースと同様に,様々なシミュレーションおよび半合成タスクにおいて情報的境界を生成できることを示す。 Large-scale datasets are increasingly being used to inform decision making. While this effort aims to ground policy in real-world evidence, challenges have arisen as selection bias and other forms of distribution shifts often plague observational data. Previous attempts to provide robust inference have given guarantees depending on a user-specified amount of possible distribution shift (e.g., the maximum KL divergence between the observed and target distributions). However, decision makers will often have additional knowledge about the target distribution which constrains the kind of possible shifts. To leverage such information, we propose a framework that enables statistical inference in the presence of selection bias which obeys user-specified constraints in the form of functions whose expectation is known under the target distribution. The output is high-probability bounds on the value of an estimand for the target distribution. Hence, our method leverages domain knowledge in order to partially identify a wide class of estimands. We analyze the computational and statistical properties of methods to estimate these bounds and show that our method can produce informative bounds on a variety of simulated and semisynthetic tasks, as well as in a real-world use case.	翻訳日:2023-11-07 22:17:36 公開日:2023-11-04
# 言語間の感情弧の評価: 感情分析におけるグローバル分割の橋渡し Evaluating Emotion Arcs Across Languages: Bridging the Global Divide in Sentiment Analysis ( http://arxiv.org/abs/2306.02213v3 ) ライセンス: Link先を確認	Daniela Teodorescu and Saif M. Mohammad	(参考訳) 感情は、個人(または人口)が時間とともにどのように感じるかを捉えます。産業や研究で広く使われているが、自動的に生成された弧を評価する作業はほとんどない。これは真の(金)感情の弧を確立するのが難しいためである。私たちの研究は、初めて、系統的かつ定量的に自動生成された感情弧を評価しました。また、機械学習(ML)モデルとLexicon-Only(LexO)手法の2つの感情弧を生成する一般的な方法を比較する。 9言語で18の多様なデータセットで実験を行うことで、インスタンスレベルの感情分類が著しく貧弱であるにもかかわらず、LexO法は数百のインスタンスから情報を集約する際に感情弧を生成するのに非常に正確であることを示す。また,6つのアフリカ諸言語とアラビア語,スペイン語による実験を通じて,英語感情辞書の自動翻訳により,低リソース言語における高品質な感情アークを生成することができることを示した。これは世界中の言語における感情の研究の道を開くもので、これは商業、公共政策、健康研究に欠かせない。コードとリソース:https://github.com/dteodore/EmotionArcs Emotion arcs capture how an individual (or a population) feels over time. They are widely used in industry and research; however, there is little work on evaluating the automatically generated arcs. This is because of the difficulty of establishing the true (gold) emotion arc. Our work, for the first time, systematically and quantitatively evaluates automatically generated emotion arcs. We also compare two common ways of generating emotion arcs: Machine-Learning (ML) models and Lexicon-Only (LexO) methods. By running experiments on 18 diverse datasets in 9 languages, we show that despite being markedly poor at instance level emotion classification, LexO methods are highly accurate at generating emotion arcs when aggregating information from hundreds of instances. We also show, through experiments on six indigenous African languages, as well as Arabic, and Spanish, that automatic translations of English emotion lexicons can be used to generate high-quality emotion arcs in less-resource languages. This opens up avenues for work on emotions in languages from around the world; which is crucial for commerce, public policy, and health research in service of speakers often left behind. Code and resources: https://github.com/dteodore/EmotionArcs	翻訳日:2023-11-07 22:16:59 公開日:2023-11-04
# GAD-NR 近傍再構成によるグラフ異常検出 GAD-NR: Graph Anomaly Detection via Neighborhood Reconstruction ( http://arxiv.org/abs/2306.01951v5 ) ライセンス: Link先を確認	Amit Roy, Juan Shu, Jia Li, Carl Yang, Olivier Elshocht, Jeroen Smeets and Pan Li	(参考訳) Graph Anomaly Detection (GAD) は、グラフ内の異常ノードを識別し、ネットワークセキュリティ、不正検出、ソーシャルメディアスパム検出、その他さまざまな分野の応用を見つけるために用いられるテクニックである。 GADの一般的な方法は、グラフデータをノード表現にエンコードし、これらの表現に基づいてグラフの再構成品質を評価することによって異常を識別するグラフオートエンコーダ(GAE)である。しかし、既存のGAEモデルは直接リンク再構成に最適化されており、グラフに接続されたノードは潜在空間にクラスタ化される。その結果、クラスター型構造異常を検出するのに優れるが、クラスタに適合しないより複雑な構造異常に悩まされる。この制限に対処するため,グラフ異常検出のための近傍再構成を組み込んだGAEの新しい変種であるGAD-NRを提案する。 GAD-NRは、ノード表現に基づいて、ローカル構造、自己属性、および隣接属性を含むノードの近傍全体を再構築することを目的としている。異常ノードと正常ノード間の近傍再構成損失を比較することで、GAD-NRは任意の異常を効果的に検出できる。 6つの実世界のデータセットで実施された大規模な実験は、GAD-NRの有効性を検証し、最先端の競合相手よりも顕著な改善(AUCでは最大30%)を示す。 GAD-NRのソースコードが公開されている。比較分析の結果,既存の手法は3種類の異常から1種類または2種類の異常を検出する場合にのみ有効であることが判明した。対照的に、GAD-NRはデータセット全体の3種類の異常を検知し、その包括的な異常検出能力を示す。 Graph Anomaly Detection (GAD) is a technique used to identify abnormal nodes within graphs, finding applications in network security, fraud detection, social media spam detection, and various other domains. A common method for GAD is Graph Auto-Encoders (GAEs), which encode graph data into node representations and identify anomalies by assessing the reconstruction quality of the graphs based on these representations. However, existing GAE models are primarily optimized for direct link reconstruction, resulting in nodes connected in the graph being clustered in the latent space. As a result, they excel at detecting cluster-type structural anomalies but struggle with more complex structural anomalies that do not conform to clusters. To address this limitation, we propose a novel solution called GAD-NR, a new variant of GAE that incorporates neighborhood reconstruction for graph anomaly detection. GAD-NR aims to reconstruct the entire neighborhood of a node, encompassing the local structure, self-attributes, and neighbor attributes, based on the corresponding node representation. By comparing the neighborhood reconstruction loss between anomalous nodes and normal nodes, GAD-NR can effectively detect any anomalies. Extensive experimentation conducted on six real-world datasets validates the effectiveness of GAD-NR, showcasing significant improvements (by up to 30% in AUC) over state-of-the-art competitors. The source code for GAD-NR is openly available. Importantly, the comparative analysis reveals that the existing methods perform well only in detecting one or two types of anomalies out of the three types studied. In contrast, GAD-NR excels at detecting all three types of anomalies across the datasets, demonstrating its comprehensive anomaly detection capabilities.	翻訳日:2023-11-07 22:16:40 公開日:2023-11-04
# Pix2Repair:画像から形状を復元する Pix2Repair: Implicit Shape Restoration from Images ( http://arxiv.org/abs/2305.18273v2 ) ライセンス: Link先を確認	Xinchao Song, Nikolas Lamb, Sean Banerjee, Natasha Kholgade Banerjee	(参考訳) Pix2Repairは、画像から復元形状を生成し、破折した物体を修復する自動形状修復手法である。以前の修理アプローチでは、入力として破砕した物体の高分解能の防水3dメッシュが必要だった。入力3Dメッシュは高価な3Dスキャナーを使用して取得し、スキャンされたメッシュは手作業によるクリーンアップ、アクセシビリティとスケーラビリティの制限を必要とする。 Pix2Repairは、壊れた物体の画像を入力として、自動的に3Dプリント可能な復元形状を生成する。本稿では, 破壊対象を表す潜在符号を, 完全な形状と破壊面に分解する新しい形状関数を提案する。本稿では, 幾何破折と破折バッドデータセットからの人工骨折の復元, QPデータセットからの文化的遺産, Fantastic Breaksデータセットからの実際の骨折の復元について述べる。視線中心の復元を予測することで軸対称物体の復元における課題を克服する。本手法は, シャムハ距離, アースムーバー距離, ノーマル一貫性, およびパーセンテージ復元の観点で形状修復に適応した形状補完アプローチよりも優れる。 We present Pix2Repair, an automated shape repair approach that generates restoration shapes from images to repair fractured objects. Prior repair approaches require a high-resolution watertight 3D mesh of the fractured object as input. Input 3D meshes must be obtained using expensive 3D scanners, and scanned meshes require manual cleanup, limiting accessibility and scalability. Pix2Repair takes an image of the fractured object as input and automatically generates a 3D printable restoration shape. We contribute a novel shape function that deconstructs a latent code representing the fractured object into a complete shape and a break surface. We show restorations for synthetic fractures from the Geometric Breaks and Breaking Bad datasets, and cultural heritage objects from the QP dataset, and for real fractures from the Fantastic Breaks dataset. We overcome challenges in restoring axially symmetric objects by predicting view-centered restorations. Our approach outperforms shape completion approaches adapted for shape repair in terms of chamfer distance, earth mover's distance, normal consistency, and percent restorations generated.	翻訳日:2023-11-07 22:14:31 公開日:2023-11-04
# 効率的なシーケンスモデリングのためのスパースモジュラーアクティベーション Sparse Modular Activation for Efficient Sequence Modeling ( http://arxiv.org/abs/2306.11197v4 ) ライセンス: Link先を確認	Liliang Ren, Yang Liu, Shuohang Wang, Yichong Xu, Chenguang Zhu, ChengXiang Zhai	(参考訳) 線形状態空間モデル(SSM)と自己アテンション機構を組み合わせた最近のハイブリッドモデルは、様々なシーケンスモデリングタスクにおいて印象的な結果を示した。しかし、現在のアプローチでは、アテンションモジュールを静的かつ均一に入力シーケンスのすべての要素に適用することで、準最適品質効率のトレードオフにつながる。この制限に対処するために,sparse modular activation(sma)という,ニューラルネットワークによるシーケンス要素のサブモジュールのスパースおよび動的アクティベートを可能にする汎用機構を導入する。各要素が非アクティブなサブモジュールをスキップできるようにすることで、SMAはトレーニングと推論の両方の段階でニューラルネットワークの計算とメモリ消費を減らす。シーケンスモデリングにおけるSMAの有効性を検証するため,SMAを用いた新しいニューラルネットワークSeqBoatを設計し,SSMから学んだ状態表現に基づいてGAU(Gated Attention Unit)を疎結合に活性化する。 GAUが活性化された入力にのみ局所的な注意を集中させることで、セックボートは理論上無限の注意範囲を持つ線形推論複雑性を達成でき、チャンキングベースモデルよりもはるかに優れた品質と効率のトレードオフを提供できる。長いシーケンスモデリング、音声分類、言語モデリングを含む幅広いタスクの実験により、seqboatは線形複雑性を持つハイブリッドモデル間で新たな最先端の結果をもたらし、学習されたスパースアクティベーションパターンを通じて各タスクに必要な注意の量を明らかにする。私たちのコードはhttps://github.com/renll/SeqBoat.comで公開されています。 Recent hybrid models combining Linear State Space Models (SSMs) with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. However, current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. To address this limitation, we introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely and dynamically activate sub-modules for sequence elements in a differentiable manner. Through allowing each element to skip non-activated sub-modules, SMA reduces computation and memory consumption of neural networks at both training and inference stages. To validate the effectiveness of SMA on sequence modeling, we design a novel neural architecture, SeqBoat, which employs SMA to sparsely activate a Gated Attention Unit (GAU) based on the state representations learned from an SSM. By constraining the GAU to only conduct local attention on the activated inputs, SeqBoat can achieve linear inference complexity with theoretically infinite attention span, and provide substantially better quality-efficiency trade-off than the chunking-based models. With experiments on a wide range of tasks, including long sequence modeling, speech classification and language modeling, SeqBoat brings new state-of-the-art results among hybrid models with linear complexity, and reveals the amount of attention needed for each task through the learned sparse activation patterns. Our code is publicly available at https://github.com/renll/SeqBoat.	翻訳日:2023-11-07 22:06:58 公開日:2023-11-04
# テキスト・画像拡散モデルにおけるベイズ文脈更新のためのエネルギーに基づく交差注意 Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models ( http://arxiv.org/abs/2306.09869v3 ) ライセンス: Link先を確認	Geon Yeong Park, Jeongsol Kim, Beomsu Kim, Sang Wan Lee, Jong Chul Ye	(参考訳) 画像生成タスクにおけるテキスト間拡散モデルの顕著な性能にもかかわらず、近年の研究では、生成した画像がテキストプロンプトの意図した意味的内容をキャプチャできないという問題を提起している。本稿では,文脈ベクトルの後方をモデル化し,適応的文脈制御のための新しいエネルギーベースモデル(ebm)フレームワークを提案する。具体的には、まず遅延画像表現とテキスト埋め込みのESMをデノナイズドオートエンコーダの各クロスアテンション層に定式化する。次に, コンテキストベクトルの対数後方勾配を更新し, その後のクロスアテンション層に転送することにより, エネルギー関数のネスト階層を暗黙的に最小化する。我々の潜在ebmsは、異なる文脈からのクロス・アテンション出力の線形結合としてゼロショット合成生成を可能にする。広範にわたる実験により,本手法は,マルチコンセプト生成,テキスト誘導画像のインペイント,リアルおよび合成画像編集など,様々な画像生成タスクの処理に有効であることが実証された。コード:https://github.com/EnergyAttention/Energy-Based-CrossAttention。 Despite the remarkable performance of text-to-image diffusion models in image generation tasks, recent studies have raised the issue that generated images sometimes cannot capture the intended semantic contents of the text prompts, which phenomenon is often called semantic misalignment. To address this, here we present a novel energy-based model (EBM) framework for adaptive context control by modeling the posterior of context vectors. Specifically, we first formulate EBMs of latent image representations and text embeddings in each cross-attention layer of the denoising autoencoder. Then, we obtain the gradient of the log posterior of context vectors, which can be updated and transferred to the subsequent cross-attention layer, thereby implicitly minimizing a nested hierarchy of energy functions. Our latent EBMs further allow zero-shot compositional generation as a linear combination of cross-attention outputs from different contexts. Using extensive experiments, we demonstrate that the proposed method is highly effective in handling various image generation tasks, including multi-concept generation, text-guided image inpainting, and real and synthetic image editing. Code: https://github.com/EnergyAttention/Energy-Based-CrossAttention.	翻訳日:2023-11-07 22:05:59 公開日:2023-11-04
# villandiffusion:拡散モデルのための統一バックドア攻撃フレームワーク VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models ( http://arxiv.org/abs/2306.06874v3 ) ライセンス: Link先を確認	Sheng-Yen Chou, Pin-Yu Chen, Tsung-Yi Ho	(参考訳) 拡散モデル(dms)は、反復的ノイズ付加と雑音除去から可逆的破壊過程を学ぶ最先端の生成モデルである。これらは、テキストから画像への条件生成など、多くの生成AIアプリケーションのバックボーンである。しかし、最近の研究では、基本的な無条件DM(DDPMやDDIMなど)は、モデル入力における悪意ある埋め込みパターンによって引き起こされる出力操作攻撃であるバックドアインジェクションに弱いことが示されている。本稿では,dmsのバックドア解析の現在の範囲を拡大するための統一バックドアアタックフレームワーク(villandiffusion)を提案する。本フレームワークは, 主流の非条件および条件付きDM(デノジングベースおよびスコアベース)と, 総合評価のための各種トレーニングフリーサンプリングを対象とする。実験により,dm構成のバックドア解析を容易にするとともに,dmsに対するキャプションに基づくバックドア攻撃に対する新たな洞察を提供する。私たちのコードはgithubで入手できる。 \url{https://github.com/ibm/villandiffusion} Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising. They are the backbone of many generative AI applications, such as text-to-image conditional generation. However, recent studies have shown that basic unconditional DMs (e.g., DDPM and DDIM) are vulnerable to backdoor injection, a type of output manipulation attack triggered by a maliciously embedded pattern at model input. This paper presents a unified backdoor attack framework (VillanDiffusion) to expand the current scope of backdoor analysis for DMs. Our framework covers mainstream unconditional and conditional DMs (denoising-based and score-based) and various training-free samplers for holistic evaluations. Experiments show that our unified framework facilitates the backdoor analysis of different DM configurations and provides new insights into caption-based backdoor attacks on DMs. Our code is available on GitHub: \url{https://github.com/IBM/villandiffusion}	翻訳日:2023-11-07 22:04:27 公開日:2023-11-04
# MANER: クラッタ環境における物体のマルチエージェントニューラルアレンジメント計画 MANER: Multi-Agent Neural Rearrangement Planning of Objects in Cluttered Environments ( http://arxiv.org/abs/2306.06543v2 ) ライセンス: Link先を確認	Vivek Gupta, Praphpreet Dhir, Jeegn Dani, Ahmed H. Qureshi	(参考訳) オブジェクトの並べ替えはロボット工学における根本的な問題であり、倉庫の管理から家庭のキッチンの清掃、整理まで様々な応用が考えられる。既存の研究は主に単一エージェントのソリューションに焦点を当てているが、現実のシナリオでは複数のロボットが並べ替え作業を行う必要がある。本稿では,複雑な環境におけるタスクシーケンシングと経路計画の課題に対処する,マルチエージェントオブジェクト再構成計画のための総合的な学習ベースフレームワークを提案する。提案手法は,オブジェクトを反復的に選択し,その転置領域を判定し,目標配置を達成するためのキネマティック実現性とタスク到達性を備えたロボットとペアリングする。シミュレーションおよび実世界の多様な環境における実験により,提案フレームワークの有効性とロバスト性を実証した。さらに, トラバース時間と成功率に関して, ベースラインアプローチと比較して, 性能が向上したことを示す。 Object rearrangement is a fundamental problem in robotics with various practical applications ranging from managing warehouses to cleaning and organizing home kitchens. While existing research has primarily focused on single-agent solutions, real-world scenarios often require multiple robots to work together on rearrangement tasks. This paper proposes a comprehensive learning-based framework for multi-agent object rearrangement planning, addressing the challenges of task sequencing and path planning in complex environments. The proposed method iteratively selects objects, determines their relocation regions, and pairs them with available robots under kinematic feasibility and task reachability for execution to achieve the target arrangement. Our experiments on a diverse range of simulated and real-world environments demonstrate the effectiveness and robustness of the proposed framework. Furthermore, results indicate improved performance in terms of traversal time and success rate compared to baseline approaches.	翻訳日:2023-11-07 22:03:37 公開日:2023-11-04
# 表面統計の超越:潜時拡散モデルにおけるシーン表現 Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model ( http://arxiv.org/abs/2306.05720v2 ) ライセンス: Link先を確認	Yida Chen, Fernanda Vi\'egas, Martin Wattenberg	(参考訳) 潜在拡散モデル(LDMs)は、現実的な画像を生成する素晴らしい能力を示すが、これらのモデルの内部構造は謎のままである。露骨な奥行き情報のない画像に純粋に訓練しても、通常は3dシーンのコヒーレントな画像を出力する。本研究では, LDMは単純なシーン幾何学の内部表現を作成し, 利用するのか? 線形プローブを用いて,LDMの内部活性化が3次元深度データの線形表現と有向物体/背景の区別を符号化していることを示す。これらの表現は、人間がノイズの多い画像を容易に理解できるようになる前に、ノイズ処理の初期段階に驚くほど現れる。介入実験では、これらの表現が画像合成において因果的役割を果たすことが示され、ldmの出力の単純な高レベルな編集に使うことができる。プロジェクトページ: https://yc015.github.io/scene-representation-diffusion-model/ Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious. Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process$-$well before a human can easily make sense of the noisy images. Intervention experiments further indicate these representations play a causal role in image synthesis, and may be used for simple high-level editing of an LDM's output. Project page: https://yc015.github.io/scene-representation-diffusion-model/	翻訳日:2023-11-07 22:02:49 公開日:2023-11-04
# RDumb: 継続的なテスト時間適応の進捗に疑問を呈するシンプルなアプローチ RDumb: A simple approach that questions our progress in continual test-time adaptation ( http://arxiv.org/abs/2306.05401v2 ) ライセンス: Link先を確認	Ori Press, Steffen Schneider, Matthias K\"ummerer, Matthias Bethge	(参考訳) テスト時間適応(tta)は、トレーニング済みのモデルをデプロイ時にデータ分布を変更するように更新できる。初期の研究は、個々の固定分布シフトに対してこれらのアルゴリズムを検証したが、近年の研究では、長期にわたる連続的な適応法が提案されている。そこで本研究では,TTA手法の漸近的性能を評価するために,CCC(Continuous Changeing Corruptions)ベンチマークを提案する。最終的に、1つの最先端のメソッド以外はすべて崩壊し、非適応モデルよりもパフォーマンスが悪くなることに気付きました。さらに,モデルが予め訓練された状態に定期的にリセットされるシンプルなベースライン "RDumb" を導入する。 RDumbは、これまで提案されていたすべてのベンチマークで、より良く、あるいは同等に動作する。以上の結果から, 従来のTTAアプローチは, 崩壊を避けるための適応の正則化や, 単純化されたリセット戦略に勝ることが不可能であった。 Test-Time Adaptation (TTA) allows to update pre-trained models to changing data distributions at deployment time. While early work tested these algorithms for individual fixed distribution shifts, recent work proposed and applied methods for continual adaptation over long timescales. To examine the reported progress in the field, we propose the Continually Changing Corruptions (CCC) benchmark to measure asymptotic performance of TTA techniques. We find that eventually all but one state-of-the-art methods collapse and perform worse than a non-adapting model, including models specifically proposed to be robust to performance collapse. In addition, we introduce a simple baseline, "RDumb", that periodically resets the model to its pretrained state. RDumb performs better or on par with the previously proposed state-of-the-art in all considered benchmarks. Our results show that previous TTA approaches are neither effective at regularizing adaptation to avoid collapse nor able to outperform a simplistic resetting strategy.	翻訳日:2023-11-07 22:02:16 公開日:2023-11-04
# インタラクションワーピングによるワンショット模倣学習 One-shot Imitation Learning via Interaction Warping ( http://arxiv.org/abs/2306.12392v2 ) ライセンス: Link先を確認	Ondrej Biza, Skye Thompson, Kishore Reddy Pagidi, Abhinav Kumar, Elise van der Pol, Robin Walters, Thomas Kipf, Jan-Willem van de Meent, Lawson L.S. Wong, Robert Platt	(参考訳) デモの少ないロボットポリシーの模倣学習は、オープンエンドアプリケーションにおいて不可欠である。本稿では,1つのデモンストレーションからSE(3)ロボット操作ポリシーを学習するためのインタラクションウォーピングを提案する。オブジェクトインスタンス間のポイントクラウドをアライメントするテクニックであるshape warpingを用いて、環境内の各オブジェクトの3dメッシュを推定する。次に、操作動作をオブジェクト上のキーポイントとして表現し、オブジェクトの形状を歪めることができる。 3つのシミュレーションおよび実世界のオブジェクト再配置タスクで1ショットの模倣学習を成功させる。また,本手法が野生の物体メッシュやロボットの把持を予測できることを示す。 Imitation learning of robot policies from few demonstrations is crucial in open-ended applications. We propose a new method, Interaction Warping, for learning SE(3) robotic manipulation policies from a single demonstration. We infer the 3D mesh of each object in the environment using shape warping, a technique for aligning point clouds across object instances. Then, we represent manipulation actions as keypoints on objects, which can be warped with the shape of the object. We show successful one-shot imitation learning on three simulated and real-world object re-arrangement tasks. We also demonstrate the ability of our method to predict object meshes and robot grasps in the wild.	翻訳日:2023-11-07 21:50:06 公開日:2023-11-04
# RoMe:メッシュ表現による大規模道路表面再構築に向けて RoMe: Towards Large Scale Road Surface Reconstruction via Mesh Representation ( http://arxiv.org/abs/2306.11368v2 ) ライセンス: Link先を確認	Ruohong Mei, Wei Sui, Jiaxin Zhang, Xue Qin, Gang Wang, Tao Peng and Cong Yang	(参考訳) 自動運転アプリケーションでは、正確で効率的な路面再構築が最重要である。本稿では,大規模道路面の堅牢な再構築を目的とした新しいフレームワークであるRoMeを紹介する。ユニークなメッシュ表現を利用することで、再構成された道路表面が正確で、セマンティクスとシームレスに一致していることを保証する。計算効率の課題に対処するため,我々は,RoMeがサブアレーに着目し,その後にマージすることで,広大な環境を再構築できる経路点サンプリング戦略を提案する。さらに,外因性キャリブレーションにおける不正確性に対する堅牢性を高めるために,外因性最適化モジュールを組み込んだ。パブリックデータセットとワイルドデータの両方に対する広範な評価は、速度、正確性、堅牢性という点で、RoMeの優位性を示している。たとえば、何千もの画像から600600平方メートルの道路表面を回収するのに2GPU時間しかかからない。特に、RoMeの機能は単なる再構築を超えて、自律運転アプリケーションにおける自動ラベリングタスクに重要な価値を提供する。関連するすべてのデータとコードはhttps://github.com/DRosemei/RoMe.comで入手できる。 In autonomous driving applications, accurate and efficient road surface reconstruction is paramount. This paper introduces RoMe, a novel framework designed for the robust reconstruction of large-scale road surfaces. Leveraging a unique mesh representation, RoMe ensures that the reconstructed road surfaces are accurate and seamlessly aligned with semantics. To address challenges in computational efficiency, we propose a waypoint sampling strategy, enabling RoMe to reconstruct vast environments by focusing on sub-areas and subsequently merging them. Furthermore, we incorporate an extrinsic optimization module to enhance the robustness against inaccuracies in extrinsic calibration. Our extensive evaluations of both public datasets and wild data underscore RoMe's superiority in terms of speed, accuracy, and robustness. For instance, it costs only 2 GPU hours to recover a road surface of 600600 square meters from thousands of images. Notably, RoMe's capability extends beyond mere reconstruction, offering significant value for auto-labeling tasks in autonomous driving applications. All related data and code are available at https://github.com/DRosemei/RoMe.	翻訳日:2023-11-07 21:49:27 公開日:2023-11-04
# トルコ語母語識別 Turkish Native Language Identification ( http://arxiv.org/abs/2307.14850v4 ) ライセンス: Link先を確認	Ahmet Yavuz Uluslu and Gerold Schneider	(参考訳) 本稿では,トルコ語に対するNative Language Identification (NLI)の最初の応用について述べる。 NLIは、著者の最初の言語を様々な言語で分析することで予測する。ほとんどのNLI研究は英語に重点を置いているが、トルコ語にまで範囲を広げている。我々は,最近構築されたトルコ語学習者コーパスを用いて,3つの構文的特徴(CFG生成規則,助詞n-gram,関数語)とL2テキストの組み合わせを用いて,これらの課題の有効性を実証した。 In this paper, we present the first application of Native Language Identification (NLI) for the Turkish language. NLI involves predicting the writer's first language by analysing their writing in different languages. While most NLI research has focused on English, our study extends its scope to Turkish. We used the recently constructed Turkish Learner Corpus and employed a combination of three syntactic features (CFG production rules, part-of-speech n-grams, and function words) with L2 texts to demonstrate their effectiveness in this task.	翻訳日:2023-11-07 21:39:40 公開日:2023-11-04
# AlpaGasus: 少ないデータでより良いAlpacaをトレーニングする AlpaGasus: Training A Better Alpaca with Fewer Data ( http://arxiv.org/abs/2307.08701v4 ) ライセンス: Link先を確認	Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia Jin	(参考訳) 大きな言語モデル~(llms)は教師付き命令/応答データに対する命令細分化(ift)を通じて命令追従能力を強化する。しかし、広く使われているIFTデータセット(例えば、アルパカの52kデータ)は驚くほど多くの低品質なインスタンスを含み、不正確または無関係な応答はIFTに誤解を与え、有害である。本稿では,強力なllm(例えばchatgpt)を用いて低品質データを自動的に識別しフィルタする,簡便で効果的なデータ選択戦略を提案する。この目的のために,52kのAlpacaデータからフィルタした9kの高品質データのみを微調整したAlpaGasusを導入する。 AlpaGasusは、複数のテストセットと制御されたヒトの評価において、GPT-4で評価されたオリジナルのAlpacaよりも大幅に優れている。 13bの変種は、テストタスクにおける教師のllm(つまり52kデータを生成するtext-davinci-003)の90\%$のパフォーマンスに匹敵する。また、5.7倍高速な訓練も提供し、7B型の訓練時間を80分(アルパカ用)から14分に短縮した。さらに,本手法の有効性を,多種多様なデータセット,ベースモデル,LLMフィルタで実証した。全体として、AlpaGasusは命令チューニングデータに適用可能な新しいデータ中心のIFTパラダイムを実証し、より高速なトレーニングとより良い命令追従モデルをもたらす。私たちのプロジェクトページは以下の通りである。 Large language models~(LLMs) strengthen instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data selection strategy that automatically identifies and filters out low-quality data using a strong LLM (e.g., ChatGPT). To this end, we introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca as evaluated by GPT-4 on multiple test sets and the controlled human evaluation. Its 13B variant matches $>90\%$ performance of its teacher LLM (i.e., Text-Davinci-003 generating the 52k data) on test tasks. It also provides 5.7x faster training, reducing the training time for a 7B variant from 80 minutes (for Alpaca) to 14 minutes. Moreover, the experiments prove the efficacy of our method across diverse datasets, base models, and LLM filters. Overall, AlpaGasus demonstrates a novel data-centric IFT paradigm that can be generally applied to instruction-tuning data, leading to faster training and better instruction-following models. Our project page is available at: \url{https://lichang-chen.github.io/AlpaGasus/}	翻訳日:2023-11-07 21:37:06 公開日:2023-11-04
# ニューラルダイナミクスの低テンソルランク学習 Low Tensor Rank Learning of Neural Dynamics ( http://arxiv.org/abs/2308.11567v2 ) ライセンス: Link先を確認	Arthur Pellegrino, N Alex Cayco-Gajic, Angus Chadwick	(参考訳) 学習は神経細胞の繰り返し連結された集団における協調シナプス変化に依存する。したがって、学習によるシナプス接続の集団的進化を理解することは、神経科学と機械学習の重要な課題である。特に、最近の研究では、タスク訓練されたrnnの重み行列は一般的に低ランクであるが、この低ランク構造が学習上でどのように展開するかは不明である。そこで本研究では,学習を通して重み行列によって形成される3-テンソルのランクについて検討する。運動学習タスク中に様々なランクのRNNを大規模ニューラル記録に合わせることで、推定重みは低テンソルランクであり、したがって学習過程全体を通して一定の低次元部分空間で進化することがわかった。次に、同じ課題を解決するために訓練されたRNN上での低テンソルランク学習の観察を検証する。最後に,低次元課題を解くために訓練されたRNNにおいて,低テンソルランクの重みが自然に現れることを示す勾配勾配勾配学習ダイナミクスの行列とテンソルランクの数学的結果を示す。本研究は,生物と人工ニューラルネットワークの双方における学習による集団接続の進化に関する知見を提供し,大規模ニューラル記録からの学習誘起動的変化のリバースエンジニアリングを可能にする。 Learning relies on coordinated synaptic changes in recurrently connected populations of neurons. Therefore, understanding the collective evolution of synaptic connectivity over learning is a key challenge in neuroscience and machine learning. In particular, recent work has shown that the weight matrices of task-trained RNNs are typically low rank, but how this low rank structure unfolds over learning is unknown. To address this, we investigate the rank of the 3-tensor formed by the weight matrices throughout learning. By fitting RNNs of varying rank to large-scale neural recordings during a motor learning task, we find that the inferred weights are low-tensor-rank and therefore evolve over a fixed low-dimensional subspace throughout the entire course of learning. We next validate the observation of low-tensor-rank learning on an RNN trained to solve the same task. Finally, we present a set of mathematical results bounding the matrix and tensor ranks of gradient descent learning dynamics which show that low-tensor-rank weights emerge naturally in RNNs trained to solve low-dimensional tasks. Taken together, our findings provide insight on the evolution of population connectivity over learning in both biological and artificial neural networks, and enable reverse engineering of learning-induced changes in recurrent dynamics from large-scale neural recordings.	翻訳日:2023-11-07 21:29:42 公開日:2023-11-04
# SBSM-Pro:タンパク質のバイオシーケンスマシンをサポート SBSM-Pro: Support Bio-sequence Machine for Proteins ( http://arxiv.org/abs/2308.10275v2 ) ライセンス: Link先を確認	Yizheng Wang, Yixiao Zhai, Yijie Ding, Quan Zou	(参考訳) タンパク質は生物学的システムにおいて重要な役割を果たす。タンパク質の分類に機械学習アルゴリズムを使用することで、生物実験を補助し、ガイドすることもできる。本稿では,生物配列の分類を目的としたモデルであるSBSM-Pro(Support Bio-Sequence Machine for Proteins)を紹介する。このモデルは生の配列から始まり、その物理化学的性質に基づいてアミノ酸をグループ化する。配列アライメントを組み、タンパク質間の類似性を計測し、新しいマルチカーネル学習(MKL)アプローチを使用して様々な種類の情報を統合し、サポートベクターマシンを用いて分類予測を行う。以上の結果から,本モデルではタンパク質機能の同定と翻訳後修飾の観点から10個のデータセットをまたいだ可読性を示す。本研究は、タンパク質の分類における最先端の研究を実証するだけでなく、生物配列の分類に適したプラットフォームの開発における有益な取り組みとして、この領域の新しい方向を舗装する。 SBSM-Proはhttp://lab.malab.cn/soft/SBSM-Pro/からアクセスできる。 Proteins play a pivotal role in biological systems. The use of machine learning algorithms for protein classification can assist and even guide biological experiments, offering crucial insights for biotechnological applications. We introduce the Support Bio-Sequence Machine for Proteins (SBSM-Pro), a model purpose-built for the classification of biological sequences. This model starts with raw sequences and groups amino acids based on their physicochemical properties. It incorporates sequence alignment to measure the similarities between proteins and uses a novel multiple kernel learning (MKL) approach to integrate various types of information, utilizing support vector machines for classification prediction. The results indicate that our model demonstrates commendable performance across ten datasets in terms of the identification of protein function and posttranslational modification. This research not only exemplifies state-of-the-art work in protein classification but also paves avenues for new directions in this domain, representing a beneficial endeavor in the development of platforms tailored for the classification of biological sequences. SBSM-Pro is available for access at http://lab.malab.cn/soft/SBSM-Pro/.	翻訳日:2023-11-07 21:29:19 公開日:2023-11-04
# 教師に適応する: 模範のない連続学習のための知識蒸留の改善 Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-free Continual Learning ( http://arxiv.org/abs/2308.09544v3 ) ライセンス: Link先を確認	Filip Szatkowski, Mateusz Pyla, Marcin Przewi\k{e}\'zlikowski, Sebastian Cygert, Bart{\l}omiej Twardowski, Tomasz Trzci\'nski	(参考訳) 本研究では, 知識蒸留(KD)を正規化戦略とし, 忘れることの防止を目的とした, 模範的自由クラスインクリメンタルラーニング(CIL)について検討する。 KDベースの手法はCILでうまく使われているが、以前のタスクからトレーニングデータの例にアクセスできることなくモデルを規則化するのに苦労することが多い。分析の結果,この問題は教師ネットワークにおける配布外データを扱う場合の表現変化に起因していることがわかった。これにより、KD損失成分に大きなエラーが発生し、CILモデルのパフォーマンスが低下する。近年の試験時間適応法に触発されて,インクリメンタルトレーニング中に教師と主要モデルを同時に更新する手法であるTeacher Adaptation (TA)を紹介した。提案手法は KD ベースの CIL アプローチとシームレスに統合し,その性能を複数の例のない CIL ベンチマークで一貫した向上を可能にする。このメソッドのソースコードはhttps://github.com/fszatkowski/cl-teacher-adaptationで入手できる。 In this work, we investigate exemplar-free class incremental learning (CIL) with knowledge distillation (KD) as a regularization strategy, aiming to prevent forgetting. KD-based methods are successfully used in CIL, but they often struggle to regularize the model without access to exemplars of the training data from previous tasks. Our analysis reveals that this issue originates from substantial representation shifts in the teacher network when dealing with out-of-distribution data. This causes large errors in the KD loss component, leading to performance degradation in CIL models. Inspired by recent test-time adaptation methods, we introduce Teacher Adaptation (TA), a method that concurrently updates the teacher and the main models during incremental training. Our method seamlessly integrates with KD-based CIL approaches and allows for consistent enhancement of their performance across multiple exemplar-free CIL benchmarks. The source code for our method is available at https://github.com/fszatkowski/cl-teacher-adaptation.	翻訳日:2023-11-07 21:28:15 公開日:2023-11-04
# ヒッグス真空がゼロの可視宇宙の双対として実現される隠れたセクタダークマター Hidden Sector Dark Matter Realized as a Twin of the Visible Universe With Zero Higgs Vacuum Expectation ( http://arxiv.org/abs/2308.08107v2 ) ライセンス: Link先を確認	Stephen L. Adler	(参考訳) 宇宙は2つの同一の粒子集合とゲージ相互作用を含み、ヒッグスポテンシャルによって異なる重力によってのみ結合する。基礎となる対称性のため、非結合時の2つのセクタは非零相と零ヒッグス真空期待相の境界にあるヒッグスポテンシャルを持つと仮定する。 2つのセクター間の結合を断ち切ることで、あるセクターにおけるヒッグスポテンシャルを非ゼロヒッグス期待領域に(可視セクターを)押し込み、もう一方セクターにおけるヒッグスポテンシャルをゼロヒッグス期待領域に(暗セクターを)押し込むことができる。ダークセクターで最小の質量のバリオンは、自ら相互作用するダークマター粒子の候補となる。 We propose that the universe contains two identical sets of particles and gauge interactions, coupling only through gravitation, which differ by their Higgs potentials. We postulate that because of underlying symmetries, the two sectors when uncoupled have Higgs potentials that lie at the boundary between phases with nonzero and zero Higgs vacuum expectation. Turning on the coupling between the two sectors can break the degeneracy, pushing the Higgs potential in one sector into the domain of nonzero Higgs expectation (giving the visible sector), and pushing the Higgs potential in the other sector into the domain of zero Higgs expectation (giving the dark sector). The least massive baryon in the dark sector will then be a candidate self-interacting dark matter particle.	翻訳日:2023-11-07 21:27:58 公開日:2023-11-04
# 広帯域指向性可視性 Broadband directional invisibility ( http://arxiv.org/abs/2308.03689v2 ) ライセンス: Link先を確認	Farhang Loran and Ali Mostafazadeh	(参考訳) 空間的クラマース-クロニッヒ関係を満たす光学媒体における一方向可視性の発見とそのブロードバンド実現は、非エルミートフォトニクスの重要なランドマークである。この効果の高次元一般化を正確に評価し,2次元および3次元のスカラー波と3次元の電磁波の散乱におけるその実現のための十分な条件を求める。より具体的には、正の実数 $\alpha$ と単位ベクトルの連続体 $\Omega$ が与えられたとき、入射波数 $k$ が$\alpha$(すなわち $k\in(0,\alpha]$) を超えないときに完全(非近似)な可視性を示す相互作用ポテンシャル(または電磁散乱の場合の散乱媒質の誘電率と透過性テンソル)と、入射波ベクトルの方向が$\Omega$ を超えるような相互作用条件を提供する。このアプローチの特徴は、有限周波数領域における完全方向の可視性を示す電位および線形誘電体媒体の構築を可能にすることである。 The discovery of unidirectional invisibility and its broadband realization in optical media satisfying spatial Kramers-Kronig relations are important landmarks of non-Hermitian photonics. We offer a precise characterization of a higher-dimensional generalization of this effect and find sufficient conditions for its realization in the scattering of scalar waves in two and three dimensions and electromagnetic waves in three dimensions. More specifically, given a positive real number $\alpha$ and a continuum of unit vectors $\Omega$, we provide explicit conditions on the interaction potential (or the permittivity and permeability tensors of the scattering medium in the case of electromagnetic scattering) under which it displays perfect (non-approximate) invisibility whenever the incident wavenumber $k$ does not exceed $\alpha$ (i.e., $k\in(0,\alpha]$) and the direction of the incident wave vector ranges over $\Omega$. A distinctive feature of our approach is that it allows for the construction of potentials and linear dielectric media that display perfect directional invisibility in a finite frequency domain.	翻訳日:2023-11-07 21:25:38 公開日:2023-11-04
# 高強度薄膜によるナノ構造中の歪色中心の形成 Deterministic Creation of Strained Color Centers in Nanostructures via High-Stress Thin Films ( http://arxiv.org/abs/2309.07935v2 ) ライセンス: Link先を確認	Daniel R. Assumpcao, Chang Jin, Madison Sutula, Sophie W. Ding, Phong Pham, Can M. Knaut, Mihir K. Bhaskar, Abishrant Panday, Aaron M. Day, Dylan Renaud, Mikhail D. Lukin, Evelyn Hu, Bartholomeus Machielse, Marko Loncar	(参考訳) カラーセンターは、スピン光子量子情報技術を実現するための主要な量子ビット候補として登場した。しかし、プラットフォームの主な制限の1つは、個々の色中心の特性がしばしば歪んでいることである。ダイヤモンドのシリコン空白中心は通常、長いコヒーレンス特性を達成するためにミリケルビン温度を必要とするが、歪んだシリコン空白中心はフォノンによるデコヒーレンスなしで1k以上の温度で動作することが示されている。本研究は,高強度窒化ケイ素薄膜をダイヤモンドナノ構造と組み合わせて,静的に歪んだシリコン空洞色中心(平均基底状態は608GHz)を,ひずみ強度$\sim 4 \times 10^{-4}$で再現する。モデルに基づいて, このひずみは, スピン特性の劣化を伴わずに, 試料中のシリコン空孔中心を1.5Kの高温で動作させるのに十分である。この方法は、高温動作量子メモリを製造するためのスケーラブルなアプローチを提供する。シリコン空調センター以外にも、この手法は他のプラットフォームにも容易に拡張できるほど一般的である。 Color centers have emerged as a leading qubit candidate for realizing hybrid spin-photon quantum information technology. One major limitation of the platform, however, is that the characteristics of individual color-centers are often strain dependent. As an illustrative case, the silicon-vacancy center in diamond typically requires millikelvin temperatures in order to achieve long coherence properties, but strained silicon vacancy centers have been shown to operate at temperatures beyond 1K without phonon-mediated decoherence. In this work we combine high-stress silicon nitride thin films with diamond nanostructures in order to reproducibly create statically strained silicon-vacancy color centers (mean ground state splitting of 608 GHz) with strain magnitudes of $\sim 4 \times 10^{-4}$. Based on modeling, this strain should be sufficient to allow for operation of a majority silicon-vacancy centers within the measured sample at elevated temperatures (1.5K) without any degradation of their spin properties. This method offers a scalable approach to fabricate high-temperature operation quantum memories. Beyond silicon-vacancy centers, this method is sufficiently general that it can be easily extended to other platforms as well.	翻訳日:2023-11-07 21:04:31 公開日:2023-11-04
# 分散型動的チームの信頼を満たすためのステップ Steps Towards Satisficing Distributed Dynamic Team Trust ( http://arxiv.org/abs/2309.05378v2 ) ライセンス: Link先を確認	Edmund R. Hunt, Chris Baber, Mehdi Sobhani, Sanja Milivojevic, Sagir Yusuf, Mirco Musolesi, Patrick Waterson, Sally Maynard	(参考訳) 動的でマルチエージェントなチームに対する信頼の定義と測定は、さまざまな状況、特に防衛とセキュリティの領域において重要です。チームメンバは、合意された目標と、共有された価値に従って作業することが信頼されるべきです。本稿では,人間とロボットの両方が「信頼」を解釈可能かつ使用可能な方法で定義できるように,目標と価値の定義について考察する。チームの活動の結果は、"目標"、"個人的/チーム的価値"、"法的な原則"という観点で考えることができます。我々は、アライメントが「個人/チーム価値」のレベルで可能か、または「ゴール」と「法的原則」のレベルでのみ可能であるかを疑問視する。我々は、人間またはロボットチームメンバーによって解釈可能な人間ロボットチームの信頼を定義するための一連のメトリクスを議論し、シミュレーションミッションの過程で「満足できる信頼」の概念を実証できる実験を考えます。 Defining and measuring trust in dynamic, multiagent teams is important in a range of contexts, particularly in defense and security domains. Team members should be trusted to work towards agreed goals and in accordance with shared values. In this paper, our concern is with the definition of goals and values such that it is possible to define 'trust' in a way that is interpretable, and hence usable, by both humans and robots. We argue that the outcome of team activity can be considered in terms of 'goal', 'individual/team values', and 'legal principles'. We question whether alignment is possible at the level of 'individual/team values', or only at the 'goal' and 'legal principles' levels. We argue for a set of metrics to define trust in human-robot teams that are interpretable by human or robot team members, and consider an experiment that could demonstrate the notion of 'satisficing trust' over the course of a simulated mission.	翻訳日:2023-11-07 21:02:31 公開日:2023-11-04
# 神経後主成分による不確かさの定量化 Uncertainty Quantification via Neural Posterior Principal Components ( http://arxiv.org/abs/2309.15533v2 ) ライセンス: Link先を確認	Elias Nehme, Omer Yair, Tomer Michaeli	(参考訳) 不確かさの定量化は、自動運転や生物イメージングのような安全クリティカルな領域への画像復元モデルの導入に不可欠である。これまで不確かさを可視化する手法は主にピクセル単位の見積もりに焦点を当ててきた。しかし、ピクセルごとの熱マップは、ピクセル間の強い相関を捉えないため、一般的にはほとんど実用的ではない。より自然な不確実性の尺度は、後方分布の主成分(pcs)に沿った分散に対応する。理論的には、入力画像の条件生成モデルから生成されたサンプルにPCAを適用することにより、PCを計算できる。しかし、これはテスト時に非常に多くのサンプルを生成する必要があり、現在の最先端(拡散)モデルでは痛ましいほど遅い。本研究では,ニューラルネットワークの1回のフォワードパスにおいて,任意の入力画像に対する後続分布のpcsを予測する手法を提案する。提案手法は,平均二乗誤差(MSE)を最小限に抑えるために訓練された事前学習モデルや,予測画像と後部PCの両方を出力するスクラッチからトレーニングすることができる。本稿では,画像のデノナイズ,塗布,超解像,生体画像間翻訳など,画像の逆問題について紹介する。提案手法は, インスタンス適応型不確実性方向を確実に伝達し, 後方サンプリング器に匹敵する不確実性定量化を実現する。コードと例はhttps://eliasnehme.github.io/NPPC/で公開されている。 Uncertainty quantification is crucial for the deployment of image restoration models in safety-critical domains, like autonomous driving and biological imaging. To date, methods for uncertainty visualization have mainly focused on per-pixel estimates. Yet, a heatmap of per-pixel variances is typically of little practical use, as it does not capture the strong correlations between pixels. A more natural measure of uncertainty corresponds to the variances along the principal components (PCs) of the posterior distribution. Theoretically, the PCs can be computed by applying PCA on samples generated from a conditional generative model for the input image. However, this requires generating a very large number of samples at test time, which is painfully slow with the current state-of-the-art (diffusion) models. In this work, we present a method for predicting the PCs of the posterior distribution for any input image, in a single forward pass of a neural network. Our method can either wrap around a pre-trained model that was trained to minimize the mean square error (MSE), or can be trained from scratch to output both a predicted image and the posterior PCs. We showcase our method on multiple inverse problems in imaging, including denoising, inpainting, super-resolution, and biological image-to-image translation. Our method reliably conveys instance-adaptive uncertainty directions, achieving uncertainty quantification comparable with posterior samplers while being orders of magnitude faster. Code and examples are available at https://eliasnehme.github.io/NPPC/	翻訳日:2023-11-07 20:50:55 公開日:2023-11-04
# DeepACO: 組合せ最適化のためのニューラルネットワークAntシステム DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization ( http://arxiv.org/abs/2309.14032v2 ) ライセンス: Link先を確認	Haoran Ye, Jiarui Wang, Zhiguang Cao, Helan Liang, Yong Li	(参考訳) Ant Colony Optimization (ACO) は、様々な組合せ最適化問題(COP)に適用されたメタヒューリスティックアルゴリズムである。伝統的に、特定の問題に対してACOをカスタマイズするには、知識駆動ヒューリスティックスの専門家設計が必要である。本稿では,深層強化学習を用いてヒューリスティック設計を自動化する汎用フレームワークdeepacoを提案する。 DeepACOは、既存のACOアルゴリズムのヒューリスティックな対策を強化し、将来のACOアプリケーションにおける厳しい手動設計を不要にする。ニューラル強化されたメタヒューリスティックとして、DeepACOは1つのニューラルアーキテクチャと1セットのハイパーパラメータを使用して、8つのCOPでACOの能力を上回っている。 Neural Combinatorial Optimization法として、DeepACOは標準ルーティング問題における問題固有の手法と同等以上の性能を発揮する。私たちのコードはhttps://github.com/henry-yeh/DeepACO.comで公開されています。 Ant Colony Optimization (ACO) is a meta-heuristic algorithm that has been successfully applied to various Combinatorial Optimization Problems (COPs). Traditionally, customizing ACO for a specific problem requires the expert design of knowledge-driven heuristics. In this paper, we propose DeepACO, a generic framework that leverages deep reinforcement learning to automate heuristic designs. DeepACO serves to strengthen the heuristic measures of existing ACO algorithms and dispense with laborious manual design in future ACO applications. As a neural-enhanced meta-heuristic, DeepACO consistently outperforms its ACO counterparts on eight COPs using a single neural architecture and a single set of hyperparameters. As a Neural Combinatorial Optimization method, DeepACO performs better than or on par with problem-specific methods on canonical routing problems. Our code is publicly available at https://github.com/henry-yeh/DeepACO.	翻訳日:2023-11-07 20:49:37 公開日:2023-11-04
# 確率測度空間における勾配流によるサンプリング Sampling via Gradient Flows in the Space of Probability Measures ( http://arxiv.org/abs/2310.03597v2 ) ライセンス: Link先を確認	Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M Stuart	(参考訳) 未知の正規化定数で目標確率分布をサンプリングすることは、計算科学と工学における根本的な課題である。近年の研究では,確率測度空間における勾配流を考慮したアルゴリズムが,アルゴリズム開発の新たな道を開くことが示されている。本稿では,これらの勾配流の設計成分を精査することにより,このサンプリング手法に3つの貢献を行う。サンプリングのための勾配流のインスタンス化には、フローを決定するためのエネルギー関数と計量、およびアルゴリズムを導出するフローの数値近似が必要である。第一の貢献は、エネルギー汎関数としてのクルバック・リーブラーの発散が、対象分布の正規化定数に依存しない勾配流の独特の性質(すべてのf-分岐)を持つことを示すことである。第二の貢献は、不変性の観点から計量の選択を研究することである。フィッシャー・ラオ計量は微分同相不変量である唯一の選択(スケーリングまで)として知られている。計算可能な代替として,メトリクスと勾配流れに対する緩和されたアフィン不変性を導入する。特に、様々なアフィン不変量wasersteinおよびstein勾配流を構成する。アフィン不変勾配流は、理論上および粒子法を用いて高異方性分布をサンプリングする場合、非アフィン不変流よりも好ましく振る舞うことが示されている。第3の貢献は、勾配流のガウス近似に基づく効率的なアルゴリズムの研究と開発であり、これは粒子法に代わるものである。種々のガウス近似勾配流の接続を確立し,パラメトリック変分推論から生じる勾配法との関係を議論し,その収束特性を理論的および数値的に検討する。 Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.	翻訳日:2023-11-07 20:40:27 公開日:2023-11-04
# 暗号通貨の解読:暗号通貨による消費者の知識と嗜好 Deciphering the Crypto-shopper: Knowledge and Preferences of Consumers Using Cryptocurrencies for Purchases ( http://arxiv.org/abs/2310.02911v3 ) ライセンス: Link先を確認	Massimiliano Silenzi and Umut Can Cabuk	(参考訳) 急速に成長する暗号通貨部門は、ビジネスと消費者の両方に挑戦と機会を与えている。本研究では、暗号通貨で買い物をする人の知識、専門知識、購買習慣を調査した。 516名の被験者を対象に調査を行ったところ,知識レベルは初心者から専門家まで様々であった。興味深いことに、回答者の30%近くが、限られた知識にもかかわらず高い購入頻度を示した。回帰分析の結果、ドメイン知識が果たす役割は、購入頻度に影響を与える要因の11.6%に過ぎなかった。 K平均クラスタ分析により、回答者はさらに3つの異なるグループに分類された。これらの結果は、幅広い知識を暗号通貨の利用の増加に結びつける従来の考え方に異議を唱え、他の要因を示唆している。さまざまな暗号通貨購入者層を理解することは、ビジネスにとって重要な要素であり、適切な戦略とユーザーフレンドリーな体験の必要性を強調している。この研究は、現在の暗号商取引行動に関する洞察を提供し、暗号商業界における幅広い影響と潜在的な変化を探求する将来の研究について論じる。 The fast-growing cryptocurrency sector presents both challenges and opportunities for businesses and consumers alike. This study investigates the knowledge, expertise, and buying habits of people who shop using cryptocurrencies. Our survey of 516 participants shows that knowledge levels vary from beginners to experts. Interestingly, a segment of respondents, nearly 30%, showed high purchase frequency despite their limited knowledge. Regression analyses indicated that while domain knowledge plays a role, it only accounts for 11.6% of the factors affecting purchasing frequency. A K-means cluster analysis further segmented the respondents into three distinct groups, each having unique knowledge levels and purchasing tendencies. These results challenge the conventional idea linking extensive knowledge to increased cryptocurrency usage, suggesting other factors at play. Understanding this varying crypto-shopper demographic is pivotal for businesses, emphasizing the need for tailored strategies and user-friendly experiences. This study offers insights into current crypto-shopping behaviors and discusses future research exploring the broader impacts and potential shifts in the crypto-consumer landscape.	翻訳日:2023-11-07 20:39:29 公開日:2023-11-04
# ジョイントトランスを用いたデ・ノボ薬物設計 De Novo Drug Design with Joint Transformers ( http://arxiv.org/abs/2310.02066v2 ) ライセンス: Link先を確認	Adam Izdebski and Ewelina Weglarz-Tomczak and Ewa Szczurek and Jakub M. Tomczak	(参考訳) de novo drug designでは、トレーニングデータ以外の新しい分子を同時生成し、そのターゲット特性を予測する必要があるため、生成モデルでは難しい作業となる。そこで本研究では,共同生成モデルにおけるトランスフォーマーデコーダ,トランスフォーマーエンコーダ,および予測器を組み合わせたジョイントトランスフォーマを提案する。ペナル化されたログライクな目的を持つモデルのトレーニングにより,分子生成における最先端性能が向上し,新たにサンプリングした分子の予測誤差は,微調整デコーダのみのトランスに比べて42%減少した。最後に, 統合トランスフォーマを用いた確率的ブラックボックス最適化アルゴリズムを提案し, トレーニングデータと比較し, ド・ノボの薬剤設計における他のスマイルベース最適化法を上回って, 目標特性を改善した新規分子を生成する。 De novo drug design requires simultaneously generating novel molecules outside of training data and predicting their target properties, making it a hard task for generative models. To address this, we propose Joint Transformer that combines a Transformer decoder, a Transformer encoder, and a predictor in a joint generative model with shared weights. We show that training the model with a penalized log-likelihood objective results in state-of-the-art performance in molecule generation, while decreasing the prediction error on newly sampled molecules, as compared to a fine-tuned decoder-only Transformer, by 42%. Finally, we propose a probabilistic black-box optimization algorithm that employs Joint Transformer to generate novel molecules with improved target properties, as compared to the training data, outperforming other SMILES-based optimization methods in de novo drug design.	翻訳日:2023-11-07 20:39:14 公開日:2023-11-04
# LanguageBind: 言語に基づくセマンティックアライメントによるN-モダリティへのビデオ言語事前学習 LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment ( http://arxiv.org/abs/2310.01852v5 ) ライセンス: Link先を確認	Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, HongFa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, Wancai Zhang, Zhifeng Li, Wei Liu, and Li Yuan	(参考訳) ビデオ言語(VL)プレトレーニングは、複数の下流タスクにおいて著しく改善されている。しかしながら、現在のVL事前学習フレームワークは、視覚や言語を超えた複数のモーダル(Nモダリティ、N>=3)にまで拡張するのは難しい。そこで我々は言語bindを提案し,言語モダリティは十分に探索され,豊富な意味論を含んでいるため,言語を異なるモダリティのバインドとして捉える。具体的には、VL事前学習によって得られた言語エンコーダを凍結し、コントラスト学習を伴う他のモダリティのためのエンコーダを訓練する。その結果、すべてのモダリティは共有機能空間にマッピングされ、マルチモーダルなセマンティックアライメントを実装する。 LanguageBindは、VLモダリティをNモダリティに拡張できることを保証する一方で、言語を中心としたデータペアをアライメントする高品質なデータセットも必要です。そこで我々は,VIDAL-10Mをビデオ,赤外線,深度,オーディオおよびそれに対応する言語として提案し,VIDAL-10Mと命名した。我々のVIDAL-10Mでは、すべてのビデオは長いビデオから切り離されたセグメントではなく、完全な意味を持った短いビデオプラットフォームから作成されています。 vidal-10mをプリトレーニングした後、ゼロショットビデオテキスト検索タスクのパラメータの15%しか持たないmsr-vttデータセットで、imagebindを5.8%r@1に上回った。さらに、LanguageBindはゼロショットビデオ、オーディオ、奥行き、赤外線理解タスクを大幅に改善しました。例えば、LanguageBindがInterVideoを1.9%、MSVDが8.8%、DiDeMoが6.3%、ActivityNetが4.4%上回った。 LLVIPとNYU-Dデータセットでは、LanguageBindがImageBindを23.8%、11.1%で上回っている。コードアドレスはhttps://github.com/PKU-YuanGroup/LanguageBind。 The video-language (VL) pretraining has achieved remarkable improvement in multiple downstream tasks. However, the current VL pretraining framework is hard to extend to multiple modalities (N modalities, N>=3) beyond vision and language. We thus propose LanguageBind, taking the language as the bind across different modalities because the language modality is well-explored and contains rich semantics. Specifically, we freeze the language encoder acquired by VL pretraining, then train encoders for other modalities with contrastive learning. As a result, all modalities are mapped to a shared feature space, implementing multi-modal semantic alignment. While LanguageBind ensures that we can extend VL modalities to N modalities, we also need a high-quality dataset with alignment data pairs centered on language. We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M. In our VIDAL-10M, all videos are from short video platforms with complete semantics rather than truncated segments from long videos, and all the video, depth, infrared, and audio modalities are aligned to their textual descriptions. After pretraining on VIDAL-10M, we outperform ImageBind by 5.8% R@1 on the MSR-VTT dataset with only 15% of the parameters in the zero-shot video-text retrieval task. Beyond this, our LanguageBind has greatly improved in the zero-shot video, audio, depth, and infrared understanding tasks. For instance, LanguageBind surpassing InterVideo by 1.9% on MSR-VTT, 8.8% on MSVD, 6.3% on DiDeMo, and 4.4% on ActivityNet. On the LLVIP and NYU-D datasets, LanguageBind outperforms ImageBind with 23.8% and 11.1% top-1 accuracy. Code address: https://github.com/PKU-YuanGroup/LanguageBind.	翻訳日:2023-11-07 20:38:25 公開日:2023-11-04
# 拡散モデルの訓練に関するデバイアス Debias the Training of Diffusion Models ( http://arxiv.org/abs/2310.08442v2 ) ライセンス: Link先を確認	Hu Yu, Li Shen, Jie Huang, Man Zhou, Hongsheng Li, Feng Zhao	(参考訳) 拡散モデルでは、単純な denoising score matching loss によって変分下界を最適化することで、魅力的な生成品質を示す。本稿では,拡散モデルにおける一定損失重み戦略の利用が,トレーニング段階での偏り推定につながるという理論的根拠を与える。ガウス雑音を一定重み付けで予測するために単純にデノナイジングネットワークを最適化することは、原画像の正確な推定を妨げる可能性がある。この問題に対処するため,理論的に偏りのない原理に基づくエレガントで効果的な重み付け戦略を提案する。さらに, 本研究は, その存在, 影響, 理由の観点から, 定常的な重み付け損失から生じる本質バイアス問題を明らかにするため, 包括的かつ体系的な調査を行う。これらの分析は、拡散モデルの内部動作の理解とデミステレーションを促進することが期待されている。実験結果から,提案手法は複雑な手法に頼らずにサンプル品質を著しく向上させ,トレーニングやサンプリング処理においてベースライン法と比較して精度が向上することを示した。 Diffusion models have demonstrated compelling generation quality by optimizing the variational lower bound through a simple denoising score matching loss. In this paper, we provide theoretical evidence that the prevailing practice of using a constant loss weight strategy in diffusion models leads to biased estimation during the training phase. Simply optimizing the denoising network to predict Gaussian noise with constant weighting may hinder precise estimations of original images. To address the issue, we propose an elegant and effective weighting strategy grounded in the theoretically unbiased principle. Moreover, we conduct a comprehensive and systematic exploration to dissect the inherent bias problem deriving from constant weighting loss from the perspectives of its existence, impact and reasons. These analyses are expected to advance our understanding and demystify the inner workings of diffusion models. Through empirical evaluation, we demonstrate that our proposed debiased estimation method significantly enhances sample quality without the reliance on complex techniques, and exhibits improved efficiency compared to the baseline method both in training and sampling processes.	翻訳日:2023-11-07 20:26:33 公開日:2023-11-04
# 抽象的故障症状マッチングによるJust-in-Time Flakyテスト検出 Just-in-Time Flaky Test Detection via Abstracted Failure Symptom Matching ( http://arxiv.org/abs/2310.06298v2 ) ライセンス: Link先を確認	Gabin An, Juyeon Yoon, Thomas Bach, Jingun Hong, Shin Yoo	(参考訳) 我々は,大規模な産業用ソフトウェアシステムであるSAP HANAの継続的インテグレーション(CI)パイプラインにおいて,エラーメッセージやスタックトレースなどの障害症状を使用して,不安定なテスト障害を特定する経験を報告する。障害症状は類似した障害を特定するために一般的に用いられるが、これまでは不安定なテスト障害を検出するために用いられていなかった。我々の仮説は、脆弱な障害は非脆弱な障害と異なる症状を示すだろうということです。その結果,失敗症状を過去の失敗症状と一致させることで,テストを再実行することなく,繰り返し発生する不安定な障害を識別できる。これにより、テストの再実行の必要性が大幅に低減され、最終的にはテスト結果のデリバリが高速になる。異なる実行インスタンスにまたがるフレキ障害の対応を容易にするため、障害重複とログ解析の分野における以前の研究から着想を得た、フレキ障害の既知のパターンと一致する前に、より新しいテスト障害症状を抽象化する。 SAP HANAのCIデータから収集した実際の故障症状を6カ月間に検出し,症状に基づくフレキネス検出法について検討した。本手法は, 故障症状を用いて再発障害を同定し, 96%以上の精度を達成し, 従来の再実行戦略と比較して約58%の機械時間を節約できる可能性を示した。偽陽性の分析と開発者からのフィードバックは、この症状ベースのアプローチの効果的なデプロイと不安定なテストのデバッグの両方において、説明的かつ情報的障害症状を持つことの重要性を強調している。 We report our experience of using failure symptoms, such as error messages or stack traces, to identify flaky test failures in a Continuous Integration (CI) pipeline for a large industrial software system, SAP HANA. Although failure symptoms are commonly used to identify similar failures, they have not previously been employed to detect flaky test failures. Our hypothesis is that flaky failures will exhibit symptoms distinct from those of non-flaky failures. Consequently, we can identify recurring flaky failures, without rerunning the tests, by matching the failure symptoms to those of historical flaky runs. This can significantly reduce the need for test reruns, ultimately resulting in faster delivery of test results to developers. To facilitate the process of matching flaky failures across different execution instances, we abstract newer test failure symptoms before matching them to the known patterns of flaky failures, inspired by previous research in the fields of failure deduplication and log analysis. We evaluate our symptom-based flakiness detection method using actual failure symptoms gathered from CI data of SAP HANA during a six-month period. Our method shows the potential of using failure symptoms to identify recurring flaky failures, achieving a precision of at least 96%, while saving approximately 58% of the machine time compared to the traditional rerun strategy. Analysis of the false positives and the feedback from developers underscore the importance of having descriptive and informative failure symptoms for both the effective deployment of this symptom-based approach and the debugging of flaky tests.	翻訳日:2023-11-07 20:25:25 公開日:2023-11-04
# マルチリガンドドドッキングと結合サイト設計のための高調波自己条件流れマッチング Harmonic Self-Conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design ( http://arxiv.org/abs/2310.05764v2 ) ライセンス: Link先を確認	Hannes St\"ark, Bowen Jing, Regina Barzilay, Tommi Jaakkola	(参考訳) タンパク質の機能には酵素触媒を含む小さな分子の結合が必要である。そのため、小さな分子に対する結合ポケットの設計には、薬物合成からエネルギー貯蔵まで、いくつかの影響のある応用がある。この目的に向けて,我々はまず,自己条件付きフローマッチングの目的に基づいて3次元タンパク質-リガンド結合構造を改良したHarmonicFlowを開発した。 flowsiteはこのフローモデルを拡張して、タンパク質ポケットの離散的な残基型と分子の結合3d構造を共同生成する。本研究では,HarmonicFlowがポケットレベルのドッキングにおいて,ドッキングの簡易性,汎用性,平均サンプル品質を向上することを示す。この構造モデリングにより、フローサイトはバインドサイトをベースラインアプローチよりも実質的に優れている設計をする。 A significant amount of protein function requires binding small molecules, including enzymatic catalysis. As such, designing binding pockets for small molecules has several impactful applications ranging from drug synthesis to energy storage. Towards this goal, we first develop HarmonicFlow, an improved generative process over 3D protein-ligand binding structures based on our self-conditioned flow matching objective. FlowSite extends this flow model to jointly generate a protein pocket's discrete residue types and the molecule's binding 3D structure. We show that HarmonicFlow improves upon state-of-the-art generative processes for docking in simplicity, generality, and average sample quality in pocket-level docking. Enabled by this structure modeling, FlowSite designs binding sites substantially better than baseline approaches.	翻訳日:2023-11-07 20:24:56 公開日:2023-11-04
# 検索型大規模言語モデルによる財務感性分析の強化 Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models ( http://arxiv.org/abs/2310.04027v2 ) ライセンス: Link先を確認	Boyu Zhang, Hongyang Yang, Tianyu Zhou, Ali Babar, Xiao-Yang Liu	(参考訳) 金融センチメント分析は、バリュエーションと投資決定に不可欠である。しかし、従来のNLPモデルは、パラメータサイズとトレーニングデータセットの範囲によって制限されており、この分野での一般化能力と有効性を損なう。近年,広範コーパスで事前学習したLarge Language Models (LLMs) は,圧縮可能なゼロショット能力のため,様々なNLPタスクにおいて優れた性能を示した。 LLMの事前学習目標と感情ラベルの予測との相違は、彼らの予測性能を損なう可能性がある。さらに、十分な文脈を欠いた財務ニュースの簡潔な性質は、LLMの感情分析の信頼性を著しく低下させる可能性がある。これらの課題に対処するため,金融感情分析のためのLLMフレームワークを提案する。このフレームワークは、LLMが感情ラベルの予測子として振る舞うことを保証する命令調整LDMモジュールと、信頼できる外部ソースから追加コンテキストを取得する検索拡張モジュールを含む。従来のモデルとChatGPTやLLaMAなどのLLMを比較し,精度とF1得点の15～48倍の性能向上を実現した。 Financial sentiment analysis is critical for valuation and investment decision-making. Traditional NLP models, however, are limited by their parameter size and the scope of their training datasets, which hampers their generalization capabilities and effectiveness in this field. Recently, Large Language Models (LLMs) pre-trained on extensive corpora have demonstrated superior performance across various NLP tasks due to their commendable zero-shot abilities. Yet, directly applying LLMs to financial sentiment analysis presents challenges: The discrepancy between the pre-training objective of LLMs and predicting the sentiment label can compromise their predictive performance. Furthermore, the succinct nature of financial news, often devoid of sufficient context, can significantly diminish the reliability of LLMs' sentiment analysis. To address these challenges, we introduce a retrieval-augmented LLMs framework for financial sentiment analysis. This framework includes an instruction-tuned LLMs module, which ensures LLMs behave as predictors of sentiment labels, and a retrieval-augmentation module which retrieves additional context from reliable external sources. Benchmarked against traditional models and LLMs like ChatGPT and LLaMA, our approach achieves 15\% to 48\% performance gain in accuracy and F1 score.	翻訳日:2023-11-07 20:23:17 公開日:2023-11-04
# RTDK-BO:Reinforced Transformer Deep kernelを用いた高次元ベイズ最適化 RTDK-BO: High Dimensional Bayesian Optimization with Reinforced Transformer Deep kernels ( http://arxiv.org/abs/2310.03912v4 ) ライセンス: Link先を確認	Alexander Shmakov, Avisek Naug, Vineet Gundecha, Sahand Ghorbanpour, Ricardo Luna Gutierrez, Ashwin Ramesh Babu, Antonio Guillen and Soumyendu Sarkar	(参考訳) gaussian process (gp) surrogates によって導かれたベイズ最適化 (bo) は、効率的で高次元のブラックボックス最適化のための貴重な技術であり、産業設計や科学計算のような多くの応用に固有の重要な問題である。近年、単機能最適化と多目的最適化の両方において最適化性能を向上させるために強化学習(RL)を導入している。しかし、数発のテクニックでさえ、密接に関連する目的間で共有される類似性を活用できない。本稿では,近年のDeep Kernel Learning(DKL)とアテンションベースのTransformerモデルを組み合わせて,GPサロゲートとメタラーニングのモデリング能力を向上させる。本稿では,dklに注意機構を組み込んで,boプロセス中に収集した文脈情報に適応させる新しいメタラーニングboサロゲート改善手法を提案する。このトランスフォーマーディープカーネルと,連続的ソフトアクタ-クリティック強化学習を訓練した学習獲得関数を組み合わせることで,探索を支援する。この強化変圧器ディープカーネル(rtdk-bo)アプローチは、最先端の結果を連続的な高次元最適化問題に導く。 Bayesian Optimization (BO), guided by Gaussian process (GP) surrogates, has proven to be an invaluable technique for efficient, high-dimensional, black-box optimization, a critical problem inherent to many applications such as industrial design and scientific computing. Recent contributions have introduced reinforcement learning (RL) to improve the optimization performance on both single function optimization and \textit{few-shot} multi-objective optimization. However, even few-shot techniques fail to exploit similarities shared between closely related objectives. In this paper, we combine recent developments in Deep Kernel Learning (DKL) and attention-based Transformer models to improve the modeling powers of GP surrogates with meta-learning. We propose a novel method for improving meta-learning BO surrogates by incorporating attention mechanisms into DKL, empowering the surrogates to adapt to contextual information gathered during the BO process. We combine this Transformer Deep Kernel with a learned acquisition function trained with continuous Soft Actor-Critic Reinforcement Learning to aid in exploration. This Reinforced Transformer Deep Kernel (RTDK-BO) approach yields state-of-the-art results in continuous high-dimensional optimization problems.	翻訳日:2023-11-07 20:22:38 公開日:2023-11-04
# PyDCM:持続可能性のための強化学習を備えたカスタムデータセンターモデル PyDCM: Custom Data Center Models with Reinforcement Learning for Sustainability ( http://arxiv.org/abs/2310.03906v5 ) ライセンス: Link先を確認	Avisek Naug, Antonio Guillen, Ricardo Luna Guti\'errez, Vineet Gundecha, Dejan Markovikj, Lekhapriya Dheeraj Kashyap, Lorenz Krause, Sahand Ghorbanpour, Sajad Mousavi, Ashwin Ramesh Babu, Soumyendu Sarkar	(参考訳) 持続可能性や二酸化炭素排出量削減の国際的重点化が進む中、政府や企業はデータセンターの設計と運用に対するアプローチを再考するよう迫られている。高エネルギー消費と指数関数的に大きな計算ワークロードを考えると、データセンターは特に冷却やITエネルギー利用といった分野において、電力消費を最適化する主要な候補である。この追求における重要な課題は、エンドツーエンドのパイプラインを提供する構成可能でスケーラブルな熱データセンターモデルがないことである。データセンターは、幾何学的な構成と熱散逸が熱モデリングを困難にする複数のITコンポーネントで構成されている。本稿では,Pythonで実装されたカスタマイズ可能なデータセンターモデルであるPyDCMを提案する。ベクトル化熱計算を用いることで、pydcmのオーダーは現在のエネルギーとモデリングの実装よりも30倍速くなり、cpuの数とサブリニアにスケールできる。また、pydcmは、gymnasiumラッパーを介して深層強化学習を使用してデータセンターの冷却を最適化し、様々なデータセンター設計プロトタイプをテストするユーザフレンドリーなプラットフォームを提供する。 The increasing global emphasis on sustainability and reducing carbon emissions is pushing governments and corporations to rethink their approach to data center design and operation. Given their high energy consumption and exponentially large computational workloads, data centers are prime candidates for optimizing power consumption, especially in areas such as cooling and IT energy usage. A significant challenge in this pursuit is the lack of a configurable and scalable thermal data center model that offers an end-to-end pipeline. Data centers consist of multiple IT components whose geometric configuration and heat dissipation make thermal modeling difficult. This paper presents PyDCM, a customizable Data Center Model implemented in Python, that allows users to create unique configurations of IT equipment with custom server specifications and geometric arrangements of IT cabinets. The use of vectorized thermal calculations makes PyDCM orders of magnitude faster (30 times) than current Energy Plus modeling implementations and scales sublinearly with the number of CPUs. Also, PyDCM enables the use of Deep Reinforcement Learning via the Gymnasium wrapper to optimize data center cooling and offers a user-friendly platform for testing various data center design prototypes.	翻訳日:2023-11-07 20:22:16 公開日:2023-11-04
# 多角形ビリヤードにおける半古典状態励起 The semiclassical states excitations in the multi-rectangular billiards ( http://arxiv.org/abs/2310.13166v2 ) ライセンス: Link先を確認	Stefan Giller	(参考訳) l$の形をしたビリヤード等の量子化の問題、すなわち、各角度が$\pi/2$または$3\pi/2$であるような問題は、フーリエ級数展開法(英語版)のツールとして用いられる。これらの多角形ビリヤード(MRB)におけるスーパーカー効果について,各波動関数と量子化条件を記述し,検討した。近似コピーに存在する半古典的モードに最も近いモード全体に対してビリヤードが励起されるmrmのスーパースカル現象は、それらの平行辺が互いに合理的な関係にあるmrmである。 The problem of the quantizations of the $L$-shaped billiards and the like ones, i.e. each angle of which is equal to $\pi/2$ or $3\pi/2$, is considered using as a tool the Fourier series expansion method. The respective wave functions and the quantization conditions are written and discussed looking for and discussing about the superscars effects in such multi-rectangular billiards (MRB). It is found that a special set of POC modes effect the superscars phenomena in MRB in which the billiards are excited as a whole to the modes closest to the semiclassical ones existing in their approximated copies being MRB in which their parallel sides remain in rational relations between themselves.	翻訳日:2023-11-07 20:14:36 公開日:2023-11-04
# vibe: twitter分類のためのトピック駆動時間適応 VIBE: Topic-Driven Temporal Adaptation for Twitter Classification ( http://arxiv.org/abs/2310.10191v3 ) ライセンス: Link先を確認	Yuji Zhang, Jing Li, Wenjie Li	(参考訳) 言語機能は現実世界のソーシャルメディアで進化しており、ダイナミックスにおけるテキスト分類のパフォーマンスが低下している。この課題に対処するために、過去のデータに基づいてトレーニングされたモデルが将来テストされる時間適応について研究する。以前のほとんどの作業は、事前トレーニングや知識更新の継続に重点を置いており、騒がしいソーシャルメディアデータでのパフォーマンスを損なう可能性がある。この問題に取り組むために,潜在トピック進化のモデル化を通じて特徴変化を反映し,新しいモデルであるvibe: variational information bottleneck for evolutionsを提案する。具体的には、まず2つのInformation Bottleneck(IB)レギュレータを使用し、過去と将来のトピックを区別する。次に,タイムスタンプとクラスラベル予測を用いたマルチタスクトレーニングによる適応機能として機能する。適応学習では、VIBEは、後進的に生成されたオンラインストリームから取得した未ラベルデータをトレーニングデータ時間に利用する。 twitterによる3つの分類タスクの実験では、データのわずか3%のモデルが、これまでの最先端のトレーニング方法を大きく上回っていることが分かりました。 Language features are evolving in real-world social media, resulting in the deteriorating performance of text classification in dynamics. To address this challenge, we study temporal adaptation, where models trained on past data are tested in the future. Most prior work focused on continued pretraining or knowledge updating, which may compromise their performance on noisy social media data. To tackle this issue, we reflect feature change via modeling latent topic evolution and propose a novel model, VIBE: Variational Information Bottleneck for Evolutions. Concretely, we first employ two Information Bottleneck (IB) regularizers to distinguish past and future topics. Then, the distinguished topics work as adaptive features via multi-task training with timestamp and class label prediction. In adaptive learning, VIBE utilizes retrieved unlabeled data from online streams created posterior to training data time. Substantial Twitter experiments on three classification tasks show that our model, with only 3% of data, significantly outperforms previous state-of-the-art continued-pretraining methods.	翻訳日:2023-11-07 20:12:00 公開日:2023-11-04
# 言語とメンタルヘルス:言語的バイオソーシャルマーカーとしてのテキストからの感情動態の測定 Language and Mental Health: Measures of Emotion Dynamics from Text as Linguistic Biosocial Markers ( http://arxiv.org/abs/2310.17369v2 ) ライセンス: Link先を確認	Daniela Teodorescu, Tiffany Cheng, Alona Fyshe, Saif M. Mohammad	(参考訳) 精神病理学の研究は、総じて、時間とともに感情の変化のパターン(感情のダイナミクス)が精神状態の指標であることを示した。感情変化のパターンは、伝統的に感情の自己報告を通じて決定されてきたが、正確性、バイアス、データ収集の容易さに問題がある。日常の発話から感情のダイナミクスを決定する最近のアプローチは、これらの懸念の多くに対処しているが、これらの発話感情のダイナミクス(ued)が精神の健康診断と相関しているかどうかはまだ分かっていない。ここでは、ツイートの感情動態とメンタルヘルス障害との関係について初めて検討する。調査対象のUEDメトリクスはそれぞれ,ユーザの自己開示診断によって異なることがわかった。例えば、ADHD、MDD、PTSDのユーザと比較して、コントロールグループでは平均値が有意に高かった(すなわち、よりポジティブなテキスト)。 ADHD, うつ病, 双極性障害, MDD, PTSD, OCDに対して有意差は認められなかったが, PPDは認められなかった。原子価の上昇と回復率もコントロールと大きく異なることが示された。この研究は、感情力学に関連する言語的手がかりが、精神疾患の生社会マーカーとして重要な役割を担い、精神疾患の理解、診断、管理に役立っていることを示す重要な初期の証拠を提供する。 Research in psychopathology has shown that, at an aggregate level, the patterns of emotional change over time -- emotion dynamics -- are indicators of one's mental health. One's patterns of emotion change have traditionally been determined through self-reports of emotions; however, there are known issues with accuracy, bias, and ease of data collection. Recent approaches to determining emotion dynamics from one's everyday utterances addresses many of these concerns, but it is not yet known whether these measures of utterance emotion dynamics (UED) correlate with mental health diagnoses. Here, for the first time, we study the relationship between tweet emotion dynamics and mental health disorders. We find that each of the UED metrics studied varied by the user's self-disclosed diagnosis. For example: average valence was significantly higher (i.e., more positive text) in the control group compared to users with ADHD, MDD, and PTSD. Valence variability was significantly lower in the control group compared to ADHD, depression, bipolar disorder, MDD, PTSD, and OCD but not PPD. Rise and recovery rates of valence also exhibited significant differences from the control. This work provides important early evidence for how linguistic cues pertaining to emotion dynamics can play a crucial role as biosocial markers for mental illnesses and aid in the understanding, diagnosis, and management of mental health disorders.	翻訳日:2023-11-07 20:02:33 公開日:2023-11-04
# Data Provenance Initiative: AIにおけるデータセットライセンスと属性の大規模監査 The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI ( http://arxiv.org/abs/2310.16787v3 ) ライセンス: Link先を確認	Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Sara Hooker	(参考訳) 膨大な、多様な、一貫性のないデータセットで言語モデルをトレーニングするレースは、実践者に対する法的および倫理的リスクに対する懸念を高めている。データの透明性と理解を脅かすこれらのプラクティスを是正するために、法律と機械学習の専門家の間で、1800以上のテキストデータセットを体系的に監査し追跡するための、複数の学際的な取り組みを招集する。私たちは、ソース、クリエーター、一連のライセンス条件、プロパティ、以降の使用から、これらのデータセットの系統をトレースするためのツールと標準を開発します。私たちのランドスケープ分析は、より低いリソース言語、より創造的なタスク、よりリッチなトピックの多様性、より新しい、より合成的なトレーニングデータといった重要なカテゴリを独占するクローズドデータセットによる、商業的にオープンなデータセットとクローズドデータセットの組成と焦点の急激な分割を強調しています。このことは、異なるライセンス条件下で利用できるデータの種類がより深く分断され、著作権と公正使用に関する司法的法的解釈への含意が高まったことを示している。また、広く使われているデータセットホスティングサイトでは、ライセンスが70%以上、エラー率が50%以上である、ライセンスの頻繁な誤分類も観察する。これは、多くの最近のブレークスルーを駆動する最も人気のあるデータセットの誤帰と情報利用の危機を示している。データセットの透明性と責任ある使用に関する継続的な改善への貢献として、私たちは、最もポピュラーなオープンソースの微調整データコレクションであるwww.dataprovenance.orgのために、データプロヴァンスをトレースしてフィルタできるインタラクティブuiであるdata provenance explorerを使って、監査全体をリリースします。 The race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners. To remedy these practices threatening data transparency and understanding, we convene a multi-disciplinary effort between legal and machine learning experts to systematically audit and trace 1800+ text datasets. We develop tools and standards to trace the lineage of these datasets, from their source, creators, series of license conditions, properties, and subsequent use. Our landscape analysis highlights the sharp divides in composition and focus of commercially open vs closed datasets, with closed datasets monopolizing important categories: lower resource languages, more creative tasks, richer topic variety, newer and more synthetic training data. This points to a deepening divide in the types of data that are made available under different license conditions, and heightened implications for jurisdictional legal interpretations of copyright and fair use. We also observe frequent miscategorization of licenses on widely used dataset hosting sites, with license omission of 70%+ and error rates of 50%+. This points to a crisis in misattribution and informed use of the most popular datasets driving many recent breakthroughs. As a contribution to ongoing improvements in dataset transparency and responsible use, we release our entire audit, with an interactive UI, the Data Provenance Explorer, which allows practitioners to trace and filter on data provenance for the most popular open source finetuning data collections: www.dataprovenance.org.	翻訳日:2023-11-07 20:01:55 公開日:2023-11-04
# GlotLID:低リソース言語のための言語識別 GlotLID: Language Identification for Low-Resource Languages ( http://arxiv.org/abs/2310.16248v2 ) ライセンス: Link先を確認	Amir Hossein Kargaran, Ayyoob Imani, Fran\c{c}ois Yvon, Hinrich Sch\"utze	(参考訳) 最近のいくつかの論文は、約300の高リソース言語と中リソース言語のための優れた言語識別ソリューション(lid)を公開している。ただし、LIDは利用できない。 i) 幅広い低リソース言語をカバーしている。 (ii)厳格に評価され、信頼性がある (iii)効率的で使いやすい。 glotlid-mは広範にわたる範囲,信頼性,効率性のデシデラタを満たすlidモデルである。 1665の言語を識別し、以前の作業に比べてカバー範囲が大幅に増加した。実験では,F1と偽陽性率(FPR)のバランスをとる場合,GlotLID-Mは4つのベースライン(CLD3,FT176,OpenLID,NLLB)を上回った。コーパスメタデータの誤り、高リソース言語からの漏洩、密接な関連言語間の分離の困難、マクロ言語対バラエティの処理、一般的なノイズデータなどである。 GlotLID-Mをデータセット生成パイプラインに統合することで,低リソース言語や文化に対するNLP技術の品質向上とアクセシビリティ向上が期待できる。 GlotLID-Mモデル、コード、およびデータソースのリストが利用可能である。 Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages. However, there is no LID available that (i) covers a wide range of low-resource languages, (ii) is rigorously evaluated and reliable and (iii) efficient and easy to use. Here, we publish GlotLID-M, an LID model that satisfies the desiderata of wide coverage, reliability and efficiency. It identifies 1665 languages, a large increase in coverage compared to prior work. In our experiments, GlotLID-M outperforms four baselines (CLD3, FT176, OpenLID and NLLB) when balancing F1 and false positive rate (FPR). We analyze the unique challenges that low-resource LID poses: incorrect corpus metadata, leakage from high-resource languages, difficulty separating closely related languages, handling of macrolanguage vs varieties and in general noisy data. We hope that integrating GlotLID-M into dataset creation pipelines will improve quality and enhance accessibility of NLP technology for low-resource languages and cultures. GlotLID-M model, code, and list of data sources are available: https://github.com/cisnlp/GlotLID.	翻訳日:2023-11-07 20:01:24 公開日:2023-11-04
# ノイズ量子チャネルとしてのLandau-Streaterチャネル The Landau-Streater Channel as a Noisy Quantum Channel ( http://arxiv.org/abs/2310.15353v3 ) ライセンス: Link先を確認	Shayan Roofeh, Vahid Karimipour	(参考訳) 3次元では、ランダウ・セプター・チャンネルはヴェルナー・ホルボ・チャンネルにすぎない。このようなチャネルは連続パラメータを持たず、環境ノイズをモデル化することはできない。我々は、その凸と同一性チャネルとの組合せを考え、クトリッツ上の1パラメータ雑音モデルとして適する。さらに、Werner-Holevo チャネルは完全ユニタリ群 $SU(3)$ の下で共分散を示すが、拡張族は群 $SO(3)$ の下でのみ共分散を保持する。この対称性の低減は、元のチャネルの様々な特性に対する影響を調べることができる。特に, チャネルのスペクトル, 可視性, 相補的チャネル, 正確なあるいは近似的な分解性, および各種のキャパシティへの影響について検討する。具体的には, 量子容量に対する下界と上界の確立とともに, 単発古典容量と絡み合い支援容量の解析式を導出する。 In three dimensions, the Landau-Streater channel is nothing but the Werner-Holevo channel. Such a channel has no continuous parameter and hence cannot model an environmental noise. We consider its convex combination with the identity channel, making it suitable as a one-parameter noise model on qutrits. Moreover, whereas the original Werner-Holevo channel exhibits covariance under the complete unitary group $SU(3)$, the extended family maintains covariance only under the group $SO(3)$. This symmetry reduction allows us to investigate its impact on various properties of the original channel. In particular, we examine its influence on the channel's spectrum, divisibility, complementary channel, and exact or approximate degradability, as well as its various kinds of capacities. Specifically, we derive analytical expressions for the one-shot classical capacity and the entanglement-assisted capacity, accompanied by the establishment of lower and upper bounds for the quantum capacity.	翻訳日:2023-11-07 20:00:02 公開日:2023-11-04
# ヘビアン学習と自由エネルギー最小化による認知共通モデルの神経模倣的実現 A Neuro-mimetic Realization of the Common Model of Cognition via Hebbian Learning and Free Energy Minimization ( http://arxiv.org/abs/2310.15177v2 ) ライセンス: Link先を確認	Alexander Ororbia, Mary Alexandria Kelly	(参考訳) ここ数年で、意味的に豊富なテキストのパスを合成したり、複雑なイメージを生成できる大きなニューラル生成モデルが、'生成人工知能'(Generative AI)として知られるようになったものの一般的な表現として登場した。新たな機会への扉を開くだけでなく、統計的機械学習の領域の課題にも目を向けるだけでなく、生成型aiの人気が高まるにつれて、認知科学にも興味深い疑問が持ち上がっている。この目標を念頭に置いて、有望な研究プログラムは認知アーキテクチャの創造であり、この分野の長年の伝統であり、基本的にはニューロ・ミメティック・ジェネレーティブ・ビルディング・ブロック(英語版)という観点から鋳造されていると論じている。具体的には,多変量自由エネルギー汎関数を最適化する目的で動作するヒュービアン適応の観点から,認知の共通モデルを用いる認知神経生成システムについて論じる。 Over the last few years, large neural generative models, capable of synthesizing semantically rich passages of text or producing complex images, have recently emerged as a popular representation of what has come to be known as ``generative artificial intelligence'' (generative AI). Beyond opening the door to new opportunities as well as challenges for the domain of statistical machine learning, the rising popularity of generative AI brings with it interesting questions for Cognitive Science, which seeks to discover the nature of the processes that underpin minds and brains as well as to understand how such functionality might be acquired and instantianted in biological (or artificial) substrate. With this goal in mind, we argue that a promising research program lies in the crafting of cognitive architectures, a long-standing tradition of the field, cast fundamentally in terms of neuro-mimetic generative building blocks. Concretely, we discuss the COGnitive Neural GENerative system, such an architecture that casts the Common Model of Cognition in terms of Hebbian adaptation operating in service of optimizing a variational free energy functional.	翻訳日:2023-11-07 19:59:46 公開日:2023-11-04
# 北エフの量子二重模型の任意のセクターの分類 Classification of the anyon sectors of Kitaev's quantum double model ( http://arxiv.org/abs/2310.19661v2 ) ライセンス: Link先を確認	Alex Bols, Siddharth Vadnerkar	(参考訳) 無限三角格子上のキタエフの量子二重モデルの任意のセクターと、非アーベルケースを含む有限ゲージ群$G$の完全な分類を与える。予想通り、モデルの任意のセクターは、正確に$G$の量子二重代数の既約表現に対応する。私たちの証明は2つの主な部分からなる。第一部では、量子二重代数の各既約表現を純粋状態として構成し、これらの純状態の GNS 表現が任意のセクターに対的に不随意であることを示す。第2部では、任意のエノンセクターが、第1部で構築されたエノンセクターの1つに一意的に等しいことを示す。証明の最初の部分は、問題の状態の記述を文字列-ネット凝縮として決定的に用いている。純粋性は、これらの状態が局所的制約の適切な集合を満たすユニークな状態として特徴づけられる。証明の核心は、局所ゲージ変換のある群が局所弦ネットの集合に対して自由に推移的に作用するという事実である。第二に、任意のセクターがこれらの制約の有限個を除いて全てを満たす純粋状態を含むことを示す。既知の手法を用いることで、これらの制約のうちの1つを除いて全てを満たすあらゆるセクターで純粋な状態を構築することができる。最後に、そのような状態は、最初の部分で構築された任意のセクターの1つのベクトル状態でなければならないことを示す。 We give a complete classification of the anyon sectors of Kitaev's quantum double model on the infinite triangular lattice and for finite gauge group $G$, including the non-abelian case. As conjectured, the anyon sectors of the model correspond precisely to the irreducible representations of the quantum double algebra of $G$. Our proof consists of two main parts. In the first part, we construct for each irreducible representation of the quantum double algebra a pure state and show that the GNS representations of these pure states are pairwise disjoint anyon sectors. In the second part we show that any anyon sector is unitarily equivalent to one of the anyon sectors constructed in the first part. The first part of the proof crucially uses a description of the states in question as string-net condensates. Purity is shown by characterising these states as the unique states that satisfy appropriate sets of local constraints. At the core of the proof is the fact that certain groups of local gauge transformations act freely and transitively on collections of local string-nets. For the second part, we show that any anyon sector contains a pure state that satisfies all but a finite number of these constraints. Using known techniques we can then construct a pure state in the anyon sector that satisfies all but one of these constraints. Finally, we show explicitly that any such state must be a vector state in one of the anyon sectors constructed in the first part.	翻訳日:2023-11-07 19:50:12 公開日:2023-11-04
# リニア関数近似による強化学習のための遅延フィードバックによる後方サンプリング Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation ( http://arxiv.org/abs/2310.18919v2 ) ライセンス: Link先を確認	Nikki Lijing Kuang, Ming Yin, Mengdi Wang, Yu-Xiang Wang, Yi-An Ma	(参考訳) 強化学習(RL)の最近の研究は、関数近似を利用して、より優れたパフォーマンスのためにサンプル複雑性ハードルを緩和することで、大きな進歩を遂げている。成功にもかかわらず、既存の効率的アルゴリズムは通常、行動を取る際の即時フィードバックのアクセシビリティに依存している。観測における遅延の影響を考慮できないことは、後悔の爆発によって現実世界のシステムの性能を著しく低下させる可能性がある。本研究では, 線形関数近似を用いたRLにおける遅延フィードバックの課題に対して, 後方サンプリングを用いることで, 幅広い状況において, 一般的な UCB アルゴリズムを実証的に上回っていることを示す。 Delayed-PSVIは楽観的な値に基づくアルゴリズムで、後続サンプリングによる雑音摂動による値関数空間を効果的に探索する。 RLの遅延フィードバックによる後方サンプリングアルゴリズムの最初の解析を行い,我々のアルゴリズムが未知の確率遅延の存在下での最悪の後悔を$\widetilde{O}(\sqrt{d^3H^3T} + d^2H^2E[\tau])で達成したことを示す。ここで$E[\tau]$が期待の遅延です。計算効率をさらに向上し,高次元RL問題に適用可能性を高めるために,遅延LPSVIのランゲヴィン力学を用いた勾配に基づく近似サンプリングスキームを導入し,計算コストを$\widetilde{O}(dHK)$で同じオーダー最適後悔保証を維持する。アルゴリズムの統計的および計算的有効性を示すために経験的評価を行う。 Recent studies in reinforcement learning (RL) have made significant progress by leveraging function approximation to alleviate the sample complexity hurdle for better performance. Despite the success, existing provably efficient algorithms typically rely on the accessibility of immediate feedback upon taking actions. The failure to account for the impact of delay in observations can significantly degrade the performance of real-world systems due to the regret blow-up. In this work, we tackle the challenge of delayed feedback in RL with linear function approximation by employing posterior sampling, which has been shown to empirically outperform the popular UCB algorithms in a wide range of regimes. We first introduce Delayed-PSVI, an optimistic value-based algorithm that effectively explores the value function space via noise perturbation with posterior sampling. We provide the first analysis for posterior sampling algorithms with delayed feedback in RL and show our algorithm achieves $\widetilde{O}(\sqrt{d^3H^3 T} + d^2H^2 E[\tau])$ worst-case regret in the presence of unknown stochastic delays. Here $E[\tau]$ is the expected delay. To further improve its computational efficiency and to expand its applicability in high-dimensional RL problems, we incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI, which maintains the same order-optimal regret guarantee with $\widetilde{O}(dHK)$ computational cost. Empirical evaluations are performed to demonstrate the statistical and computational efficacy of our algorithms.	翻訳日:2023-11-07 19:47:44 公開日:2023-11-04
# TiV-NeRF:動的ニューラルネットワークを用いた時間変化表現による追跡とマッピング TiV-NeRF: Tracking and Mapping via Time-Varying Representation with Dynamic Neural Radiance Fields ( http://arxiv.org/abs/2310.18917v2 ) ライセンス: Link先を確認	Chengyao Duan and Zhiliu Yang	(参考訳) 従来のNeural Radiance Fields(NeRF)をSLAMフレームワークに統合するための試みは、静的シーンの仮定に依存するか、動的オブジェクトを外れ値として扱うかに依存する。しかし、現実世界のシナリオのほとんどは動的です。本稿では,動的シーンの追跡と再構成を行うための時間変化表現を提案する。システムは追跡プロセスとマッピングプロセスという2つのプロセスを同時に維持する。トラッキングプロセスでは、入力画像全体を一様にサンプリングし、RGB画像のトレーニングを自己管理する。マッピングプロセスでは,動的オブジェクトと静的背景を区別するためにノウマスクを活用し,異なるサンプリング戦略を2種類の領域に適用した。両過程のパラメータ最適化は2段階で構成され、第1段階は時間と3次元の位置を関連付けて変形場を正準場に変換する。そして、第2の時間は標準場の3D位置と結びつき、色と符号付き距離関数(SDF)を得る。また,重複率に基づく新しいキーフレーム選択戦略を提案する。提案手法は,2つの公開合成データセットに対して評価し,現状の動的マッピング法よりも有効であることを示す。 Previous attempts to integrate Neural Radiance Fields (NeRF) into Simultaneous Localization and Mapping (SLAM) framework either rely on the assumption of static scenes or treat dynamic objects as outliers. However, most of real-world scenarios is dynamic. In this paper, we propose a time-varying representation to track and reconstruct the dynamic scenes. Our system simultaneously maintains two processes, tracking process and mapping process. For tracking process, the entire input images are uniformly sampled and training of the RGB images are self-supervised. For mapping process, we leverage know masks to differentiate dynamic objects and static backgrounds, and we apply distinct sampling strategies for two types of areas. The parameters optimization for both processes are made up by two stages, the first stage associates time with 3D positions to convert the deformation field to the canonical field. And the second associates time with 3D positions in canonical field to obtain colors and Signed Distance Function (SDF). Besides, We propose a novel keyframe selection strategy based on the overlapping rate. We evaluate our approach on two publicly available synthetic datasets and validate that our method is more effective compared to current state-of-the-art dynamic mapping methods.	翻訳日:2023-11-07 19:47:13 公開日:2023-11-04
# PHD: 歴史的文書のピクセルベース言語モデリング PHD: Pixel-Based Language Modeling of Historical Documents ( http://arxiv.org/abs/2310.18343v2 ) ライセンス: Link先を確認	Nadav Borenstein, Phillip Rust, Desmond Elliott, Isabelle Augenstein	(参考訳) 歴史文書のデジタル化は歴史家に前例のない研究機会を与えた。しかし、従来の歴史文書の分析手法では、画像からテキストへocrで変換するが、これは画像として扱うことの利点を見逃し、高いレベルのノイズをもたらすプロセスである。このギャップを埋めるために、トークン分布を予測する代わりに、マスクしたピクセルのパッチを再構築するよう訓練された画素ベース言語モデルの最近の進歩を利用する。実史スキャンが不足していることから,実史文書に類似した合成スキャンを生成する新しい手法を提案する。 1700-1900年代には,本モデルであるPHDを,合成スキャンと実際の歴史新聞の組み合わせで事前訓練した。実験により,PHDはマスク付き画像パッチの再構築に高い習熟度を示し,本モデルで注目すべき言語理解能力を示す。特に、我々のモデルを歴史的QAタスクに適用し、この領域での有用性を強調した。 The digitisation of historical documents has provided historians with unprecedented research opportunities. Yet, the conventional approach to analysing historical documents involves converting them from images to text using OCR, a process that overlooks the potential benefits of treating them as images and introduces high levels of noise. To bridge this gap, we take advantage of recent advancements in pixel-based language models trained to reconstruct masked patches of pixels instead of predicting token distributions. Due to the scarcity of real historical scans, we propose a novel method for generating synthetic scans to resemble real historical documents. We then pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period. Through our experiments, we demonstrate that PHD exhibits high proficiency in reconstructing masked image patches and provide evidence of our model's noteworthy language understanding capabilities. Notably, we successfully apply our model to a historical QA task, highlighting its usefulness in this domain.	翻訳日:2023-11-07 19:46:37 公開日:2023-11-04
# オフライン強化学習における事前学習言語モデルの活用 Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning ( http://arxiv.org/abs/2310.20587v2 ) ライセンス: Link先を確認	Ruizhe Shi, Yuyao Liu, Yanjie Ze, Simon S. Du, Huazhe Xu	(参考訳) オフライン強化学習(RL)は、事前コンパイルされたデータセットを使用して、ほぼ最適ポリシーを見つけることを目的としている。現実のシナリオでは、データ収集は高価でリスクが高いため、ドメイン内のデータが制限された場合、オフラインRLは特に困難になる。近年のLLM(Large Language Models)とその数発の学習技術の進歩を踏まえ、オフラインRLに事前学習言語モデル(LM)を効果的に活用するための決定変換器に基づく一般的なフレームワークである$\textbf{La}$tion Control(\textbf{LaMo}$tion Control)(\textbf{LaMo}$)について紹介する。 Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. 実験結果から、sparse-reward タスクでは $\textbf{LaMo}$ が最先端のパフォーマンスを達成し、高密度リワードタスクでは値ベースオフライン RL メソッドと決定変換器とのギャップを埋めることを示す。特に本手法は,データサンプルが限られたシナリオにおいて優れた性能を示す。プロジェクトのwebサイトは$\href{https://lamo2023.github.io}{\text{this https url}}$です。 Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces $\textbf{La}$nguage Models for $\textbf{Mo}$tion Control ($\textbf{LaMo}$), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. Empirical results indicate $\textbf{LaMo}$ achieves state-of-the-art performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. In particular, our method demonstrates superior performance in scenarios with limited data samples. Our project website is $\href{https://lamo2023.github.io}{\text{this https URL}}$.	翻訳日:2023-11-07 19:35:32 公開日:2023-11-04
# 多言語慣用文脈における連続生成 Generating Continuations in Multilingual Idiomatic Contexts ( http://arxiv.org/abs/2310.20195v2 ) ライセンス: Link先を確認	Rhitabrat Pokharel, Ameeta Agrawal	(参考訳) 慣用的あるいはリテラルな多語表現を処理する能力は、あらゆる言語を理解し、生成する上で重要な側面である。慣用的(あるいはリテラル)表現を含むナラティブの文脈的関連のある継続を生成するタスクは、非定形的テキストを含むニュアンス言語を理解する際に、生成言語モデル(lms)の能力をテストすることができる。 2つの異なる言語(英語とポルトガル語)のデータセットを使って、3つの異なるトレーニング設定(ゼロショット、少数ショット、微調整)で一連の実験を行いました。以上の結果から,本モデルでは慣用的文脈よりも連続生成がわずかに優れていることが示唆された。さらに、本研究で研究されたモデルは両言語で同等に機能し、このタスクの実行における生成モデルの堅牢性を示している。 The ability to process idiomatic or literal multiword expressions is a crucial aspect of understanding and generating any language. The task of generating contextually relevant continuations for narratives containing idiomatic (or literal) expressions can allow us to test the ability of generative language models (LMs) in understanding nuanced language containing non-compositional figurative text. We conduct a series of experiments using datasets in two distinct languages (English and Portuguese) under three different training settings (zero-shot, few-shot, and fine-tuned). Our results suggest that the models are only slightly better at generating continuations for literal contexts than idiomatic contexts, with exceedingly small margins. Furthermore, the models studied in this work perform equally well across both languages, indicating the robustness of generative models in performing this task.	翻訳日:2023-11-07 19:33:44 公開日:2023-11-04
# 有限レベル量子メモリシステムの相互接続によるデコヒーレンス時間制御 Decoherence time control by interconnection for finite-level quantum memory systems ( http://arxiv.org/abs/2311.02292v1 ) ライセンス: Link先を確認	Igor G. Vladimirov, Ian R. Petersen	(参考訳) 本稿では、有限レベル系のパウリ行列と同様、動的変数が代数的構造を持つ開量子系について述べる。システムの外部ボゾン場への結合のハミルトニアンと作用素は、系変数に線形に依存する。場は、ドリフトベクトルと分散行列がアフィンおよび系変数の線形関数である準線型ハドソン・パルタハラシー量子確率微分方程式に従って系の力学を駆動する量子ウィナー過程によって表現される。この設定は、システム変数が時間的に一定である特別な場合として、ゼロハミルトニアン孤立系ダイナミクスを含み、量子メモリとして応用することができる。より現実的なシステム-フィールド結合の場合、システムの変数が初期値から平均2乗ずれが相対的に重み行列と忠実度パラメータによって指定されたときにメモリデコヒーレンス時間を定義する。系のエネルギーパラメータに対するデコヒーレンス時間最大化を考慮し、ゼロハミルトニアンが準最適解を提供する条件を得る。この最適化問題は、そのようなシステムの直接エネルギー結合相互接続についても論じる。 This paper is concerned with open quantum systems whose dynamic variables have an algebraic structure, similar to that of the Pauli matrices for finite-level systems. The Hamiltonian and the operators of coupling of the system to the external bosonic fields depend linearly on the system variables. The fields are represented by quantum Wiener processes which drive the system dynamics according to a quasilinear Hudson-Parthasarathy quantum stochastic differential equation whose drift vector and dispersion matrix are affine and linear functions of the system variables. This setting includes the zero-Hamiltonian isolated system dynamics as a particular case, where the system variables are constant in time, which makes them potentially applicable as a quantum memory. In a more realistic case of nonvanishing system-field coupling, we define a memory decoherence time when a mean-square deviation of the system variables from their initial values becomes relatively significant as specified by a weighting matrix and a fidelity parameter. We consider the decoherence time maximization over the energy parameters of the system and obtain a condition under which the zero Hamiltonian provides a suboptimal solution. This optimization problem is also discussed for a direct energy coupling interconnection of such systems.	翻訳日:2023-11-07 18:36:45 公開日:2023-11-04
# 人工知能をより説明しやすいものにするための各種手法に関する調査研究 A Survey of the Various Methodologies Towards making Artificial Intelligence More Explainable ( http://arxiv.org/abs/2311.02291v1 ) ライセンス: Link先を確認	Sopam Dasgupta	(参考訳) マシンは意思決定プロセスでますます使われており、意思決定には説明が必要だという認識が生まれる。残念ながら、これらのデプロイされたモデルの増加は、決定の背後にある理由が不明な'ブラックボックス'な性質である。したがって、これらの決定の理由は明確である必要がある。人間として、私たちはこれらの決定を説明可能な方法で提示したいと考えています。しかし、説明だけでは不十分である。必ずしも結果を達成する方法を伝えるのではなく、与えられた結果を達成する方法を伝えるのです。この理由から,本研究は,説明可能性/解釈可能性と,それが反事実的思考にどのように広がるかに注目した。 Machines are being increasingly used in decision-making processes, resulting in the realization that decisions need explanations. Unfortunately, an increasing number of these deployed models are of a 'black-box' nature where the reasoning behind the decisions is unknown. Hence, there is a need for clarity behind the reasoning of these decisions. As humans, we would want these decisions to be presented to us in an explainable manner. However, explanations alone are insufficient. They do not necessarily tell us how to achieve an outcome but merely tell us what achieves the given outcome. For this reason, my research focuses on explainability/interpretability and how it extends to counterfactual thinking.	翻訳日:2023-11-07 18:36:24 公開日:2023-11-04
# 慣性センサによる地盤反応力予測 Predicting Ground Reaction Force from Inertial Sensors ( http://arxiv.org/abs/2311.02287v1 ) ライセンス: Link先を確認	Bowen Song, Marco Paolieri, Harper E. Stewart, Leana Golubchik, Jill L. McNitt-Gray, Vishal Misra, Devavrat Shah	(参考訳) 地上反応力(grf)の研究は、ランニングなどの運動において個人が経験する機械的負荷を特徴付けるために用いられ、ストレスに関連した怪我のリスクのあるアスリートを特定するのに臨床的に応用できる。本研究の目的は,運動選手がアウトドアラン中に装着できる慣性測定装置(IMU)を用いて収集したデータを用いて,その生体力学的変数(例えば,接触時間と負荷速度)の分析を可能にするために,十分な精度でGRFを予測することである。本稿では,LSTMニューラルネットワークを用いた最先端予測とは対照的に,軽量なアプローチを検討する。具体的には、LSTMをk-Nearest Neighbors(KNN)回帰と比較し、IMUデータ(インプット)とGRFデータ(アウトプット)の特異値分解埋め込みの線形回帰を用いた新しい解であるSVD Embedding Regression(SER)を提案する。異なる競技者,同じ競技者,あるいはその両方から収集したトレーニングデータを用いて,これらの手法の精度を評価し,異なる場所(サックラム,シャンク)におけるセンサからの加速度および角速度データの利用について検討した。我々の結果は、SERやKNNのような単純な機械学習手法は、LSTMニューラルネットワークと同様、あるいはより正確であり、トレーニング時間とハイパーパラメータの最適化がはるかに高速であることを示している。特に、個人データの使用は、ほとんどの生体力学変数に対する全てのメソッドの予測エラーを減らす。 The study of ground reaction forces (GRF) is used to characterize the mechanical loading experienced by individuals in movements such as running, which is clinically applicable to identify athletes at risk for stress-related injuries. Our aim in this paper is to determine if data collected with inertial measurement units (IMUs), that can be worn by athletes during outdoor runs, can be used to predict GRF with sufficient accuracy to allow the analysis of its derived biomechanical variables (e.g., contact time and loading rate). In this paper, we consider lightweight approaches in contrast to state-of-the-art prediction using LSTM neural networks. Specifically, we compare use of LSTMs to k-Nearest Neighbors (KNN) regression as well as propose a novel solution, SVD Embedding Regression (SER), using linear regression between singular value decomposition embeddings of IMUs data (input) and GRF data (output). We evaluate the accuracy of these techniques when using training data collected from different athletes, from the same athlete, or both, and we explore the use of acceleration and angular velocity data from sensors at different locations (sacrum and shanks). Our results illustrate that simple machine learning methods such as SER and KNN can be similarly accurate or more accurate than LSTM neural networks, with much faster training times and hyperparameter optimization; in particular, SER and KNN are more accurate when personal training data are available, and KNN comes with benefit of providing provenance of prediction. Notably, the use of personal data reduces prediction errors of all methods for most biomechanical variables.	翻訳日:2023-11-07 18:36:13 公開日:2023-11-04
# 目的: 明確な多様性維持を伴わずに、偽りの問題を解決すること。 Objectives Are All You Need: Solving Deceptive Problems Without Explicit Diversity Maintenance ( http://arxiv.org/abs/2311.02283v1 ) ライセンス: Link先を確認	Ryan Boldi, Li Ding, Lee Spector	(参考訳) 認識ドメインのナビゲートは、探索アルゴリズムが最適でない局所最適化で立ち往生しているため、機械学習においてしばしば困難である。多様性を明示的に維持するか、あるいはノベルティ探索やいわゆる品質多様性アルゴリズムのような探索を促進することによって、これらの領域をナビゲートするために多くのアルゴリズムが提案されている。本稿では,潜在的に大きな定義対象の集合を最適化することにより,明らかな多様性維持を行なわずに,擬似ドメインの解決を約束するアプローチを提案する。これらの目標は、個人の生の性能を様々な方法でサブアグリゲーションすることで、環境から直接抽出することができる。人口多様性を暗黙的に維持することが示されているため,これらの目的を最適化するためにレキシケースの選択を用いる。我々は,この手法を多種多様な目的に対して,離散最適化のセット上でのmap-elites法と,多種多様なデセプションを持つ強化学習領域とを比較した。目的を多くの目的に分解し、それらを最適化することで、探究する偽りの領域においてMAP-Elitesよりも優れることがわかった。さらに,本手法は,QDスコアとカバレッジの多様性に着目した指標に対して,これらの指標を明示的に最適化することなく,競争性能が向上することがわかった。我々のアブレーション研究は、この技術が異なるサブアグリゲーション技術に対して堅牢であることを示している。しかしながら、非知覚的あるいは‘照明’なドメインの場合、品質の多様性のテクニックは一般的に、探索(搾取ではなく)に関して客観的なフレームワークを上回っており、将来の作業への潜在的な方向性を示唆しています。 Navigating deceptive domains has often been a challenge in machine learning due to search algorithms getting stuck at sub-optimal local optima. Many algorithms have been proposed to navigate these domains by explicitly maintaining diversity or equivalently promoting exploration, such as Novelty Search or other so-called Quality Diversity algorithms. In this paper, we present an approach with promise to solve deceptive domains without explicit diversity maintenance by optimizing a potentially large set of defined objectives. These objectives can be extracted directly from the environment by sub-aggregating the raw performance of individuals in a variety of ways. We use lexicase selection to optimize for these objectives as it has been shown to implicitly maintain population diversity. We compare this technique with a varying number of objectives to a commonly used quality diversity algorithm, MAP-Elites, on a set of discrete optimization as well as reinforcement learning domains with varying degrees of deception. We find that decomposing objectives into many objectives and optimizing them outperforms MAP-Elites on the deceptive domains that we explore. Furthermore, we find that this technique results in competitive performance on the diversity-focused metrics of QD-Score and Coverage, without explicitly optimizing for these things. Our ablation study shows that this technique is robust to different subaggregation techniques. However, when it comes to non-deceptive, or ``illumination" domains, quality diversity techniques generally outperform our objective-based framework with respect to exploration (but not exploitation), hinting at potential directions for future work.	翻訳日:2023-11-07 18:35:44 公開日:2023-11-04
# spark plug fault 診断のためのコントラスト型マルチモーダル表現学習 Contrastive Multi-Modal Representation Learning for Spark Plug Fault Diagnosis ( http://arxiv.org/abs/2311.02282v1 ) ライセンス: Link先を確認	Ardavan Modarres, Vahid Mohammad-Zadeh Eivaghi, Mahdi Aliyari Shoorehdeli, Ashkan Moosavian	(参考訳) 複雑な工業機構のコンディションモニタリングや、単一センサの誤解を招くノイズを克服するための十分な情報を提供するための1つのセンサ測定が不可能なため、複数のセンサを設置して、いくつかの産業機器のコンディション監視を改善する。そのため、効率的なデータ融合戦略が要求される。本研究では,マシンヘルスモニタリングの分野では初めて,コントラスト学習パラダイムに基づいたユニークな学習戦略を持つマルチモーダルオートエンコーダを提案する。このアプローチは教師なし学習と教師なし学習の両方の利点を生かしたものであり、データの複数のモダリティ(またはビュー)を豊かに共通表現に融合する優れた性能を達成するだけでなく、推論時間中に1つのビューをわずかな性能低下で省略できる次のレベルまでデータ融合する。提案手法は,センサ故障発生時に,マルチモーダル故障診断システムがより堅牢に動作することを可能にし,センサの1つ(より高価なもの)を意図的に省略することで,実用的性能を犠牲にすることなく,よりコスト効率の高い状態監視システムを構築することができる。提案手法の有効性を, 複雑な工学的機構, インライン4ストローク点火エンジンから非作業条件下で収集した実世界のプライベートマルチモーダルデータセットを用いて検討した。加速度計と音響信号を2つのモダリティとして含むこのデータセットは、非常に少量の障害を有しており、このようなデータセット上で優れた性能を達成することで、提案手法が他の機器でもうまく機能することを約束する。 Due to the incapability of one sensory measurement to provide enough information for condition monitoring of some complex engineered industrial mechanisms and also for overcoming the misleading noise of a single sensor, multiple sensors are installed to improve the condition monitoring of some industrial equipment. Therefore, an efficient data fusion strategy is demanded. In this research, we presented a Denoising Multi-Modal Autoencoder with a unique training strategy based on contrastive learning paradigm, both being utilized for the first time in the machine health monitoring realm. The presented approach, which leverages the merits of both supervised and unsupervised learning, not only achieves excellent performance in fusing multiple modalities (or views) of data into an enriched common representation but also takes data fusion to the next level wherein one of the views can be omitted during inference time with very slight performance reduction, or even without any reduction at all. The presented methodology enables multi-modal fault diagnosis systems to perform more robustly in case of sensor failure occurrence, and one can also intentionally omit one of the sensors (the more expensive one) in order to build a more cost-effective condition monitoring system without sacrificing performance for practical purposes. The effectiveness of the presented methodology is examined on a real-world private multi-modal dataset gathered under non-laboratory conditions from a complex engineered mechanism, an inline four-stroke spark-ignition engine, aiming for spark plug fault diagnosis. This dataset, which contains the accelerometer and acoustic signals as two modalities, has a very slight amount of fault, and achieving good performance on such a dataset promises that the presented method can perform well on other equipment as well.	翻訳日:2023-11-07 18:35:14 公開日:2023-11-04
# 機械学習の産業革命 Machine learning's own Industrial Revolution ( http://arxiv.org/abs/2311.02278v1 ) ライセンス: Link先を確認	Yuan Luo, Song Han, Jingjing Liu	(参考訳) 機械学習は次の産業革命を可能にすると期待されている。しかし、標準化され自動化されたアセンブリネットワークが欠如しているMLは、成長を続ける企業需要に対処し、幅広い産業に力を与えるという大きな課題に直面している。パースペクティブでは、MLはまず独自の産業革命を完遂し、その目標を最大限に達成する方法を精査し、MLのイノベーションフロンティアから大量生産と利用への迅速な翻訳を可能にする新たな機会について論じる必要がある。 Machine learning is expected to enable the next Industrial Revolution. However, lacking standardized and automated assembly networks, ML faces significant challenges to meet ever-growing enterprise demands and empower broad industries. In the Perspective, we argue that ML needs to first complete its own Industrial Revolution, elaborate on how to best achieve its goals, and discuss new opportunities to enable rapid translation from ML's innovation frontier to mass production and utilization.	翻訳日:2023-11-07 18:34:42 公開日:2023-11-04
# 科学シミュレーションの時空間超解像のための演算子学習枠組み An Operator Learning Framework for Spatiotemporal Super-resolution of Scientific Simulations ( http://arxiv.org/abs/2311.02328v1 ) ライセンス: Link先を確認	Valentin Duruisseaux and Amit Chakraborty	(参考訳) 多くの文脈において、偏微分方程式に対する高分解能な解は、小さな時空間スケールで起こる忠実に不可欠な力学を捉えるために必要であるが、これらの解は計算資源が限られているため、従来の方法を使用するのは非常に困難で遅い。これらの計算限界を回避するための最近の方向は、より効率的に得られる低分解能シミュレーションから高分解能数値解を再構築するために、超解法に機械学習技術を使用することである。提案手法であるスーパーレゾリューション演算子ネットワーク(SROpNet)は、演算子学習問題として超解をフレーム化し、既存のアーキテクチャからインスピレーションを得て、低分解能近似からパラメトリック微分方程式の連続表現を学習し、任意の所で評価することができる。また、低分解能近似が提供された(一定数の)時空間センサの位置には制約が課されないため、既存の多くの超解像アプローチが不適当であるような、実際に発生する幅広い問題のスペクトルを考慮できる。 In numerous contexts, high-resolution solutions to partial differential equations are required to capture faithfully essential dynamics which occur at small spatiotemporal scales, but these solutions can be very difficult and slow to obtain using traditional methods due to limited computational resources. A recent direction to circumvent these computational limitations is to use machine learning techniques for super-resolution, to reconstruct high-resolution numerical solutions from low-resolution simulations which can be obtained more efficiently. The proposed approach, the Super Resolution Operator Network (SROpNet), frames super-resolution as an operator learning problem and draws inspiration from existing architectures to learn continuous representations of solutions to parametric differential equations from low-resolution approximations, which can then be evaluated at any desired location. In addition, no restrictions are imposed on the locations of (the fixed number of) spatiotemporal sensors at which the low-resolution approximations are provided, thereby enabling the consideration of a broader spectrum of problems arising in practice, for which many existing super-resolution approaches are not well-suited.	翻訳日:2023-11-07 18:24:29 公開日:2023-11-04
# fragxsitedti: 薬物標的相互作用とトランスフォーマー駆動解釈における責任セグメントの解明 FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven Interpretation ( http://arxiv.org/abs/2311.02326v1 ) ライセンス: Link先を確認	Ali Khodabandeh Yalabadi, Mehdi Yazdani-Jahromi, Niloofar Yousefi, Aida Tayebi, Sina Abdidizaji, Ozlem Ozmen Garibay	(参考訳) 薬物-標的相互作用(DTI)予測は薬物発見に不可欠であるが、モデル解釈可能性の実現と性能の最適化には課題が続く。 DTI予測におけるこれらの課題に対処することを目的とした新しいトランスフォーマーモデルFragXsiteDTIを提案する。 fragxsitedtiは薬物分子断片とタンパク質ポケットを同時に利用する最初のdtiモデルである。タンパク質と薬物の両方に対する情報豊富な表現は、相互作用について詳細な視点を提供する。 Perceiver IOフレームワークにインスパイアされた我々のモデルは学習可能な潜伏配列を特徴とし、最初はクロスアテンションを用いてタンパク質結合部位の埋め込みと相互作用し、その後自己アテンションによって洗練され、薬物のクロスアテンショントランスポーターブロックの薬物断片に対するクエリとして使用される。この学習可能なクエリ配列は、メディエーターとして機能し、薬物とタンパク質の相互作用において重要なニュアンスを保持するシームレスな情報翻訳を可能にする。 3つのベンチマークデータセットの計算結果は、いくつかの最先端モデルよりも優れた予測能力を示している。また,本モデルでは,標的タンパク質と薬物分子の双方の臨界成分について,薬物と標的のペア内での解釈可能性を示す。 Drug-Target Interaction (DTI) prediction is vital for drug discovery, yet challenges persist in achieving model interpretability and optimizing performance. We propose a novel transformer-based model, FragXsiteDTI, that aims to address these challenges in DTI prediction. Notably, FragXsiteDTI is the first DTI model to simultaneously leverage drug molecule fragments and protein pockets. Our information-rich representations for both proteins and drugs offer a detailed perspective on their interaction. Inspired by the Perceiver IO framework, our model features a learnable latent array, initially interacting with protein binding site embeddings using cross-attention and later refined through self-attention and used as a query to the drug fragments in the drug's cross-attention transformer block. This learnable query array serves as a mediator and enables seamless information translation, preserving critical nuances in drug-protein interactions. Our computational results on three benchmarking datasets demonstrate the superior predictive power of our model over several state-of-the-art models. We also show the interpretability of our model in terms of the critical components of both target proteins and drug molecules within drug-target pairs.	翻訳日:2023-11-07 18:24:05 公開日:2023-11-04
# 評価集合生成のための文脈依存翻訳の同定 Identifying Context-Dependent Translations for Evaluation Set Production ( http://arxiv.org/abs/2311.02321v1 ) ライセンス: Link先を確認	Rachel Wicks, Matt Post	(参考訳) 文脈対応機械翻訳への移行の大きな障害は、優れた評価指標とテストセットがないことである。文脈を正しく翻訳する必要がある文はテストセットでは稀であり、cometやbleuのような標準コーパスレベルのメトリクスの有用性が低下する。一方、このような文に注釈を付けるデータセットも稀で、規模が小さく、いくつかの言語でしか利用できない。これに対処するために、従来のアノテーションパイプラインの近代化、一般化、拡張を行い、代名詞、動詞句の楕円、曖昧な名詞の変形の5つの現象を正しく翻訳するコンテキストを必要とする文を含む並列文書のサブセットを識別するツールであるctxproを作成した。パイプラインへの入力は、手作り、言語ごと、言語的にインフォームドされたルールのセットであり、コア参照、パート・オブ・音声、そして最先端ツールによって提供される形態的特徴を用いて文脈的な文対を選択する。このパイプラインを、7つの言語ペア(EN into and out-of DE, ES, FR, IT, PL, PT, RU)と2つのデータセット(OpenSubtitlesとWMTテストセット)に適用し、その性能を従来の作業と重なり合い、文脈的MTシステムを文ベースシステムと区別する能力の両方を用いて検証する。我々はCTXPROパイプラインとデータをオープンソースとしてリリースする。 A major impediment to the transition to context-aware machine translation is the absence of good evaluation metrics and test sets. Sentences that require context to be translated correctly are rare in test sets, reducing the utility of standard corpus-level metrics such as COMET or BLEU. On the other hand, datasets that annotate such sentences are also rare, small in scale, and available for only a few languages. To address this, we modernize, generalize, and extend previous annotation pipelines to produce CTXPRO, a tool that identifies subsets of parallel documents containing sentences that require context to correctly translate five phenomena: gender, formality, and animacy for pronouns, verb phrase ellipsis, and ambiguous noun inflections. The input to the pipeline is a set of hand-crafted, per-language, linguistically-informed rules that select contextual sentence pairs using coreference, part-of-speech, and morphological features provided by state-of-the-art tools. We apply this pipeline to seven languages pairs (EN into and out-of DE, ES, FR, IT, PL, PT, and RU) and two datasets (OpenSubtitles and WMT test sets), and validate its performance using both overlap with previous work and its ability to discriminate a contextual MT system from a sentence-based one. We release the CTXPRO pipeline and data as open source.	翻訳日:2023-11-07 18:23:44 公開日:2023-11-04
# 空間表現の自己教師あり学習によるマルチモジュラーグリッドセルの生成 Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells ( http://arxiv.org/abs/2311.02316v1 ) ライセンス: Link先を確認	Rylan Schaeffer, Mikail Khona, Tzuhsuan Ma, Crist\'obal Eyzaguirre, Sanmi Koyejo, Ila Rani Fiete	(参考訳) マッピング,局所化,ナビゲーションの空間的問題を解決するために,哺乳類の系統は顕著な空間的表現を発達させた。 1つの重要な空間的表現はノーベル賞受賞の格子細胞である: 自己位置を表すニューロン、局所的および周期的な量、そしていくつかの離散的な周期的な非局所的および空間的活動パターンのように見える。哺乳類の系統はなぜこの特異なグリッド表現を学んでいるのか? 数学的解析により、この多周期表現は高いキャパシティと本質的な誤り補正を持つ代数的符号として優れた性質を持つことが示唆されるが、今のところ、深いリカレントニューラルネットワークにおいて多モジュラーグリッド細胞に繋がるコア原理の十分な合成は行われていない。本研究は,符号化理論,動的システム,関数最適化,教師付きディープラーニングという,グリッドセル問題に答える4つのアプローチのファミリーから,重要な洞察を抽出することから始める。次に、洞察を活用して、4つのアプローチの長所を組み合わせた新しいアプローチを提案します。我々のアプローチは、データ、データ拡張、損失関数、ネットワークアーキテクチャを含む自己教師あり学習(ssl)フレームワークであり、従来のアプローチで必要とされる特定の読み出し表現の教師あり位置情報やエンジニアリングにアクセスせずに、規範的な観点から動機づけられる。 SSLフレームワーク上でトレーニングされたネットワークに複数のグリッドセルモジュールが出現し,ネットワークと初期表現がトレーニングディストリビューションの外部でうまく一般化できることが示される。この研究には、グリッド細胞の起源に関心を持つ神経科学者や、新しいSSLフレームワークに関心を持つ機械学習研究者のための洞察が含まれている。 To solve the spatial problems of mapping, localization and navigation, the mammalian lineage has developed striking spatial representations. One important spatial representation is the Nobel-prize winning grid cells: neurons that represent self-location, a local and aperiodic quantity, with seemingly bizarre non-local and spatially periodic activity patterns of a few discrete periods. Why has the mammalian lineage learnt this peculiar grid representation? Mathematical analysis suggests that this multi-periodic representation has excellent properties as an algebraic code with high capacity and intrinsic error-correction, but to date, there is no satisfactory synthesis of core principles that lead to multi-modular grid cells in deep recurrent neural networks. In this work, we begin by identifying key insights from four families of approaches to answering the grid cell question: coding theory, dynamical systems, function optimization and supervised deep learning. We then leverage our insights to propose a new approach that combines the strengths of all four approaches. Our approach is a self-supervised learning (SSL) framework - including data, data augmentations, loss functions and a network architecture - motivated from a normative perspective, without access to supervised position information or engineering of particular readout representations as needed in previous approaches. We show that multiple grid cell modules can emerge in networks trained on our SSL framework and that the networks and emergent representations generalize well outside their training distribution. This work contains insights for neuroscientists interested in the origins of grid cells as well as machine learning researchers interested in novel SSL frameworks.	翻訳日:2023-11-07 18:23:18 公開日:2023-11-04
# ディープニューラルネットワークと異方性ガウス核を用いたマナテ集計 Counting Manatee Aggregations using Deep Neural Networks and Anisotropic Gaussian Kernel ( http://arxiv.org/abs/2311.02315v1 ) ライセンス: Link先を確認	Zhiqiang Wang, Yiran Pang, Cihan Ulus, Xingquan Zhu	(参考訳) マナテ(manatee)は、食欲の強い水生哺乳動物である。主な食料源は海草であり、1日8時間の放牧に費やされることが多い。ゆっくりと移動し、しばしば浅瀬で群れ(すなわち集合体)に留まり、食物を探し、環境の変化や他のリスクに弱いようにする。地域内での正確な計数マナティーアグリゲーションは、その習慣を観察する上で生物学的に有意義であるだけでなく、人間のボート、ダイバー等の安全規則を策定し、看護、介入、その他の計画を立てる上でも重要である。本稿では,低画質画像を入力として利用して,地域内のマナティ数を自動的にカウントする,深層学習に基づく群集カウント手法を提案する。マナテは独特の形状を持ち、浅瀬や水面反射、咬合、カモフラージュなど、しばしば浅瀬に留まり、正確なマナテ数を数えることは困難である。この課題に対処するため, 等方的ガウスカーネル (AGK) と可変回転および分散を用いて, 密度関数が異なるアグリゲーションにおける個々のマナートの形状を最大に捉えられるようにすることを提案する。その後,vgg,sert,congested scene recognition network(csrnet),marunetなど,群衆カウントを主目的とした異なるタイプのディープニューラルネットワークにagkカーネルを適用し,マナティー密度を学習し,シーン内のマナティー数を計算する。監視映像から抽出した汎用低品質画像を用いて,agkカーネルを用いたマナテ計数により最小平均絶対誤差 (mae) と根平均二乗誤差 (rmse) が得られることを示す。提案手法は,複雑な環境下でのマナテ集約の計測に特に有効である。 Manatees are aquatic mammals with voracious appetites. They rely on sea grass as the main food source, and often spend up to eight hours a day grazing. They move slow and frequently stay in group (i.e. aggregations) in shallow water to search for food, making them vulnerable to environment change and other risks. Accurate counting manatee aggregations within a region is not only biologically meaningful in observing their habit, but also crucial for designing safety rules for human boaters, divers, etc., as well as scheduling nursing, intervention, and other plans. In this paper, we propose a deep learning based crowd counting approach to automatically count number of manatees within a region, by using low quality images as input. Because manatees have unique shape and they often stay in shallow water in groups, water surface reflection, occlusion, camouflage etc. making it difficult to accurately count manatee numbers. To address the challenges, we propose to use Anisotropic Gaussian Kernel (AGK), with tunable rotation and variances, to ensure that density functions can maximally capture shapes of individual manatees in different aggregations. After that, we apply AGK kernel to different types of deep neural networks primarily designed for crowd counting, including VGG, SANet, Congested Scene Recognition network (CSRNet), MARUNet etc. to learn manatee densities and calculate number of manatees in the scene. By using generic low quality images extracted from surveillance videos, our experiment results and comparison show that AGK kernel based manatee counting achieves minimum Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The proposed method works particularly well for counting manatee aggregations in environments with complex background.	翻訳日:2023-11-07 18:22:48 公開日:2023-11-04
# 深層学習技術を用いた熱顔画像分類 Thermal Face Image Classification using Deep Learning Techniques ( http://arxiv.org/abs/2311.02314v1 ) ライセンス: Link先を確認	Prosenjit Chatterjee and ANK Zaman	(参考訳) 熱画像は、セキュリティ、医療、産業分野に様々な応用がある。本稿では,熱画像分類のための実用的な深層学習手法を提案する。熱画像の高精度かつ効率的な分類は、複雑な画像の内容と注釈付きデータセットの不足により、様々な分野において大きな課題となる。この研究は畳み込みニューラルネットワーク(CNN)アーキテクチャ、特にResNet-50とVGGNet-19を使用して、熱画像から特徴を抽出する。また,熱入力画像に対してkalmanフィルタを適用した。実験結果は,提案手法の有効性を,精度と効率の観点から示している。 Thermal images have various applications in security, medical and industrial domains. This paper proposes a practical deep-learning approach for thermal image classification. Accurate and efficient classification of thermal images poses a significant challenge across various fields due to the complex image content and the scarcity of annotated datasets. This work uses a convolutional neural network (CNN) architecture, specifically ResNet-50 and VGGNet-19, to extract features from thermal images. This work also applied Kalman filter on thermal input images for image denoising. The experimental results demonstrate the effectiveness of the proposed approach in terms of accuracy and efficiency.	翻訳日:2023-11-07 18:22:15 公開日:2023-11-04
# LISNeRFマッピング:大規模3次元シーンのための意味的ニューラルネットワークによるLiDARに基づくインシシットマッピング LISNeRF Mapping: LiDAR-based Implicit Mapping via Semantic Neural Fields for Large-Scale 3D Scenes ( http://arxiv.org/abs/2311.02313v1 ) ライセンス: Link先を確認	Jianyuan Zhang and Zhiliu Yang	(参考訳) 大規模セマンティックマッピングは、屋外の自律エージェントが計画やナビゲーションといった高度なタスクを遂行するために不可欠である。本稿では,LiDAR測定のみでの暗黙的表現による大規模3次元意味再構築手法を提案する。まず,暗黙的特徴を格納するために,オクツリーをベースとした階層構造を利用し,その暗黙的特徴を,浅層パーセプトロン(MLP)を介して意味情報と符号付き距離値にデコードする。市販のアルゴリズムを用いて,ポイントクラウドの意味ラベルとインスタンスidを予測する。次に,暗黙的特徴とMDPパラメータを,点雲幾何学の自己超越パラダイムと意味的および汎光学的ラベルの擬似超越パラダイムとで最適化する。その後、マーチングキューブアルゴリズムを用いて推論段階のシーンを分割して視覚化する。メモリ制約のあるシナリオでは、サブマップを完全なマップにマージするmap stitchingストラテジーも開発されている。我々が知る限り、この手法はLiDARのみの入力から意味的な暗黙のシーンを再構築する最初の試みである。実世界の3つのデータセット、SemanticKITTI, SemanticPOSS, nuScenesの実験は、現在の最先端3Dマッピング手法と比較して、我々のフレームワークの有効性と効率を実証している。 Large-scale semantic mapping is crucial for outdoor autonomous agents to fulfill high-level tasks such as planning and navigation. This paper proposes a novel method for large-scale 3D semantic reconstruction through implicit representations from LiDAR measurements alone. We firstly leverages an octree-based and hierarchical structure to store implicit features, then these implicit features are decoded to semantic information and signed distance value through shallow Multilayer Perceptrons (MLPs). We adopt off-the-shelf algorithms to predict the semantic labels and instance IDs of point cloud. Then we jointly optimize the implicit features and MLPs parameters with self-supervision paradigm for point cloud geometry and pseudo-supervision pradigm for semantic and panoptic labels. Subsequently, Marching Cubes algorithm is exploited to subdivide and visualize the scenes in the inferring stage. For scenarios with memory constraints, a map stitching strategy is also developed to merge sub-maps into a complete map. As far as we know, our method is the first work to reconstruct semantic implicit scenes from LiDAR-only input. Experiments on three real-world datasets, SemanticKITTI, SemanticPOSS and nuScenes, demonstrate the effectiveness and efficiency of our framework compared to current state-of-the-art 3D mapping methods.	翻訳日:2023-11-07 18:22:05 公開日:2023-11-04
# マッチングスタイルによる零点と零点の機械翻訳のギャップを狭める Narrowing the Gap between Zero- and Few-shot Machine Translation by Matching Styles ( http://arxiv.org/abs/2311.02310v1 ) ライセンス: Link先を確認	Weiting Tan, Haoran Xu, Lingfeng Shen, Shuyue Stella Li, Kenton Murray, Philipp Koehn, Benjamin Van Durme, Yunmo Chen	(参考訳) 主にモノリンガルな設定で訓練された大規模な言語モデルは、ゼロショットと少数ショットの例を使って機械翻訳に一般化できることを示した。しかし、ゼロショット翻訳は比較的良いが、その性能と少数ショット設定との差ははっきりしない。本稿では,このギャップに寄与する要因について検討し,対象コーパスの書字スタイルを一致させることで,このギャップをほとんど(約70%)クローズできることを示す。さらに、並列デモの例を必要とせず、ゼロショットベースラインを強化するための潜在的アプローチを検討し、これらの手法が翻訳メトリクスの改善にどのように貢献するかについて貴重な洞察を提供する。 Large language models trained primarily in a monolingual setting have demonstrated their ability to generalize to machine translation using zero- and few-shot examples with in-context learning. However, even though zero-shot translations are relatively good, there remains a discernible gap comparing their performance with the few-shot setting. In this paper, we investigate the factors contributing to this gap and find that this gap can largely be closed (for about 70%) by matching the writing styles of the target corpus. Additionally, we explore potential approaches to enhance zero-shot baselines without the need for parallel demonstration examples, providing valuable insights into how these methods contribute to improving translation metrics.	翻訳日:2023-11-07 18:21:38 公開日:2023-11-04
# Heteroskedastic Tensor Clustering Heteroskedastic Tensor Clustering ( http://arxiv.org/abs/2311.02306v1 ) ライセンス: Link先を確認	Yuchen Zhou, Yuxin Chen	(参考訳) ノイズの多いテンソル観測から基盤となるクラスタ構造を抽出しようとするテンソルクラスタリングが注目されている。テンソルクラスタリングの広く研究されているモデルの一つはテンソルブロックモデルであり、各モードに沿ってクラスタリング構造が存在することを仮定し、多関節遺伝子発現解析や多層ネットワーク解析といった領域で広く応用されている。しかし、現在利用可能なテンソルクラスタリングの計算可能な方法は、サブガウスノイズの処理に限られるか、あるいは準最適統計性能に悩まされているかのいずれかであり、ヘテロスケダスティックデータや低信号対雑音比(SNR)を扱う必要があるアプリケーションにおいて、それらの実用性を抑える。これらの課題を克服するために,2段階の手法である$\mathsf{high\text{-}order~heteroclustering}$ (\mathsf{hhc}$) を提案する。本稿では,SNRが計算限界を超える限り,精度の高いクラスタリングを確実に達成し(対数的要因を無視する),SNRはノード間の対等差と雑音レベルとの比を,計算限界は多項式ランタイムとの正確なクラスタリングを可能にする最下位のSNRを示す。総合的なシミュレーションと実データ実験により,提案アルゴリズムが既存のアルゴリズムを様々な設定で上回り,より信頼性の高いクラスタリング性能を提供することが示唆された。 Tensor clustering, which seeks to extract underlying cluster structures from noisy tensor observations, has gained increasing attention. One extensively studied model for tensor clustering is the tensor block model, which postulates the existence of clustering structures along each mode and has found broad applications in areas like multi-tissue gene expression analysis and multilayer network analysis. However, currently available computationally feasible methods for tensor clustering either are limited to handling i.i.d. sub-Gaussian noise or suffer from suboptimal statistical performance, which restrains their utility in applications that have to deal with heteroskedastic data and/or low signal-to-noise-ratio (SNR). To overcome these challenges, we propose a two-stage method, named $\mathsf{High\text{-}order~HeteroClustering}$ ($\mathsf{HHC}$), which starts by performing tensor subspace estimation via a novel spectral algorithm called $\mathsf{Thresholded~Deflated\text{-}HeteroPCA}$, followed by approximate $k$-means to obtain cluster nodes. Encouragingly, our algorithm provably achieves exact clustering as long as the SNR exceeds the computational limit (ignoring logarithmic factors); here, the SNR refers to the ratio of the pairwise disparity between nodes to the noise level, and the computational limit indicates the lowest SNR that enables exact clustering with polynomial runtime. Comprehensive simulation and real-data experiments suggest that our algorithm outperforms existing algorithms across various settings, delivering more reliable clustering performance.	翻訳日:2023-11-07 18:21:25 公開日:2023-11-04
# OSM vs HDマップ:軌道予測のためのマップ表現 OSM vs HD Maps: Map Representations for Trajectory Prediction ( http://arxiv.org/abs/2311.02305v1 ) ライセンス: Link先を確認	Jing-Yan Liao, Parth Doshi, Zihan Zhang, David Paz, Henrik Christensen	(参考訳) High Definition (HD) Mapsは、静的道路要素の正確な描写に長年好まれてきたが、そのアクセシビリティの制約と環境変化への感受性は、特に運動予測タスクにおいて、自動運転の広範な展開を妨げる。本稿では,長期動作予測のためのHDマップの代替として,OpenStreetMap (OSM)を活用することを提案する。この研究の貢献は3つある: まず、OSMの応用を長期予測に拡張し、以前の研究と比べて予測の地平を2倍にする。第2に,レセプティブフィールドの拡大と交差点優先の統合を通じて,osmベースのアプローチは,hdマップベースのモデルとのギャップを狭める競争性能を示す。最後に,多種多様なシナリオにおける動き予測の深い洞察を提供するとともに,クラス認識の比較を行う。この研究は、粗い地図表現による長期動作予測を推し進めるだけでなく、自律運転の領域において潜在的にスケーラブルなソリューションを提供する。 While High Definition (HD) Maps have long been favored for their precise depictions of static road elements, their accessibility constraints and susceptibility to rapid environmental changes impede the widespread deployment of autonomous driving, especially in the motion forecasting task. In this context, we propose to leverage OpenStreetMap (OSM) as a promising alternative to HD Maps for long-term motion forecasting. The contributions of this work are threefold: firstly, we extend the application of OSM to long-horizon forecasting, doubling the forecasting horizon compared to previous studies. Secondly, through an expanded receptive field and the integration of intersection priors, our OSM-based approach exhibits competitive performance, narrowing the gap with HD Map-based models. Lastly, we conduct an exhaustive context-aware analysis, providing deeper insights in motion forecasting across diverse scenarios as well as conducting class-aware comparisons. This research not only advances long-term motion forecasting with coarse map representations but additionally offers a potential scalable solution within the domain of autonomous driving.	翻訳日:2023-11-07 18:20:49 公開日:2023-11-04
# MFTCoder: マルチタスクファインチューニングによるコードLLMの強化 MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning ( http://arxiv.org/abs/2311.02303v1 ) ライセンス: Link先を確認	Bingchang Liu, Chaoyu Chen, Cong Liao, Zi Gong, Huan Wang, Zhichao Lei, Ming Liang, Dajun Chen, Min Shen, Hailian Zhou, Hang Yu, Jianguo Li	(参考訳) コードllmは特別な研究分野として登場し、事前訓練されたモデルの微調整によるモデルのコーディング能力の向上に特化している。従来の微調整アプローチは、通常、特定の下流タスクやシナリオに合わせたもので、各タスクごとに微調整を分離し、広範なトレーニングリソースを必要とし、デプロイメントとメンテナンスの観点から課題を提起することを意味していた。さらに、これらのアプローチは、異なるコード関連タスク間の固有の相互接続性を活用できなかった。これらの制約を克服するために,複数タスクの同時かつ並列な微調整を可能にするマルチタスクファインチューニングフレームワーク MFTcoder を提案する。各種損失関数を組み込むことにより,データ不均衡,難易度の変化,収束速度の不整合といったマルチタスク学習における共通課題を効果的に解決する。大規模な実験により、我々のマルチタスクファインチューニングアプローチは、単一タスクにおける個々のファインチューニングと混合タスクにおけるファインチューニングの両方に優れることが示された。さらに、MPTコーダは、効率的なデータトークン化モードやPEFTファインチューニングを含む効率的なトレーニング機能を提供しており、従来のファインチューニング手法に比べて、大幅に速度が向上している。 MFTcoder は CodeLLama や Qwen など,主要なオープンソース LLM とシームレスに統合されている。 MFTcoderの微調整モデルであるCodeLLama Foundationを活用して、HumaneEvalベンチマークで74.4\%の素晴らしいパス@1スコアを達成し、GPT-4パフォーマンス(67.%、ゼロショット)を上回りました。 MFTCoder は \url{https://github.com/codefuse-ai/MFTCOder} でオープンソース化されている Code LLMs have emerged as a specialized research field, with remarkable studies dedicated to enhancing model's coding capabilities through fine-tuning on pre-trained models. Previous fine-tuning approaches were typically tailored to specific downstream tasks or scenarios, which meant separate fine-tuning for each task, requiring extensive training resources and posing challenges in terms of deployment and maintenance. Furthermore, these approaches failed to leverage the inherent interconnectedness among different code-related tasks. To overcome these limitations, we present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. By incorporating various loss functions, we effectively address common challenges in multi-task learning, such as data imbalance, varying difficulty levels, and inconsistent convergence speeds. Extensive experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks. Moreover, MFTcoder offers efficient training capabilities, including efficient data tokenization modes and PEFT fine-tuning, resulting in significantly improved speed compared to traditional fine-tuning methods. MFTcoder seamlessly integrates with several mainstream open-source LLMs, such as CodeLLama and Qwen. Leveraging the CodeLLama foundation, our MFTcoder fine-tuned model, \textsc{CodeFuse-CodeLLama-34B}, achieves an impressive pass@1 score of 74.4\% on the HumaneEval benchmark, surpassing GPT-4 performance (67\%, zero-shot). MFTCoder is open-sourced at \url{https://github.com/codefuse-ai/MFTCOder}	翻訳日:2023-11-07 18:20:27 公開日:2023-11-04
# 積極的漸進学習QAOA Proactively incremental-learning QAOA ( http://arxiv.org/abs/2311.02302v1 ) ライセンス: Link先を確認	Lingxiao Li, Jing Li, Yanqi Song, Sujuan Qin, Qiaoyan Wen, and Fei Gao	(参考訳) 量子近似最適化アルゴリズム (Quantum Approximate Optimization Algorithm, QAOA) の既存の研究のターゲットとなっている。そこで本研究では,段階的学習に基づく高度なQAOAを提案する。例えば、MaxCut問題を例として、グラフ全体から小さな部分グラフをランダムに選択し、量子回路をトレーニングし、第1フェーズにおいて、サブグラフのMaxCutに対して最適化されたパラメータを得る。その後のインクリメンタルフェーズ毎に、残りのノードとエッジの一部が現在のサブグラフに追加され、回路が再トレーニングされ、新しい最適化パラメータが取得される。上記の操作は、グラフ全体のMaxCut問題が解決されるまで繰り返される。キーポイントは、前フェーズの最適化されたパラメータが現在のフェーズの初期パラメータで再利用されることである。多数のシミュレーション実験により,本手法は近似比(AR)とトレーニング時間において,QAOAの一般的な作業よりも優れた性能を示した。具体的には、ARは標準のQAOAよりも13.17%高い。 Solving optimization problems with high performance is the target of existing works of Quantum Approximate Optimization Algorithm (QAOA). With this intention, we propose an advanced QAOA based on incremental learning, where the training trajectory is proactively segmented into incremental phases. Taking the MaxCut problem as our example, we randomly select a small subgraph from the whole graph and train the quantum circuit to get optimized parameters for the MaxCut of the subgraph in the first phase. Then in each subsequent incremental phase, a portion of the remaining nodes and edges are added to the current subgraph, and the circuit is retrained to get new optimized parameters. The above operation is repeated until the MaxCut problem on the whole graph is solved. The key point is that the optimized parameters of the previous phase will be reused in the initial parameters of the current phase. Numerous simulation experiments show our method has superior performance on Approximation Ratio (AR) and training time compared to prevalent works of QAOA. Specifically, the AR is higher than standard QAOA by 13.17% on weighted random graphs.	翻訳日:2023-11-07 18:19:54 公開日:2023-11-04
# 部分絡み合いエントロピーの測地:PEEスレッドからビットスレッドへ Geometrizing the Partial Entanglement Entropy: from PEE Threads to Bit Threads ( http://arxiv.org/abs/2311.02301v1 ) ライセンス: Link先を確認	Jiong Lin, Yizhou Lu, Qiang Wen	(参考訳) ホログラフィックCFTにおける部分絡み合いエントロピー(PEE)をAdS/CFTの文脈で測る手法を提案する。より具体的には、ある点 $\textbf{x}$ が与えられたとき、これらの2点を接続するバルク測地学の観点で、$\textbf{x}$ と他の任意の点の間の2点 PEE を測地する。我々はこれらの測地線を \textit{pee threads} と呼び、これは自然に分岐のないベクトル場 $v_{\textbf{x}}^{\mu}$ の積分曲線と見なすことができ、これは我々が \emph{pee thread flow} と呼ぶ。 PEEスレッドの密度を特徴付ける$V_{\textbf{x}}^{\mu}$のノルムは、PEEの物理的要求によって決定できる。任意の静的区間または球面領域$A$に対して、状態によって決定されるPEEスレッド構成からユニークなビットスレッド構成を生成することができることを示す。したがって、中性でないビットスレッドは、内在的なpeスレッドから発生する。静的非連結区間の場合、分散のない流れを記述するベクトル場はRT式を再現するのにより適している。我々は、PEEスレッドを任意のホモロジー曲面と交差する回数で重み付けする。代わりに、RT式は、全ての重みの割り当てが可能なPEEスレッドの和の最小化として完全に再構成される。 We give a scheme to geometrize the partial entanglement entropy (PEE) for holographic CFT in the context of AdS/CFT. More explicitly, given a point $\textbf{x}$ we geometrize the two-point PEEs between $\textbf{x}$ and any other points in terms of the bulk geodesics connecting these two points. We refer to these geodesics as the \textit{PEE threads}, which can be naturally regarded as the integral curves of a divergenceless vector field $V_{\textbf{x}}^{\mu}$, which we call \emph{PEE thread flow}. The norm of $V_{\textbf{x}}^{\mu}$ that characterizes the density of the PEE threads can be determined by some physical requirements of the PEE. We show that, for any static interval or spherical region $A$, a unique bit thread configuration can be generated from the PEE thread configuration determined by the state. Hence, the non-intrinsic bit threads are emergent from the intrinsic PEE threads. For static disconnected intervals, the vector fields describing a divergenceless flow is are longer suitable to reproduce the RT formula. We weight the PEE threads with the number of times it intersects with any homologous surface. Instead the RT formula is perfectly reformulated to be the minimization of the summation of the PEE threads with all possible assignment of weights.	翻訳日:2023-11-07 18:19:35 公開日:2023-11-04
# Few-Shot Fault Time Series Prognosis に対する逐次モデル非依存メタラーニング Successive Model-Agnostic Meta-Learning for Few-Shot Fault Time Series Prognosis ( http://arxiv.org/abs/2311.02300v1 ) ライセンス: Link先を確認	Hai Su, Jiajun Hu, Songsen Yu	(参考訳) メタラーニングは,近年多くの研究者が注目している,数発の断層予測問題の解決に有望な手法である。既存の時系列予測のメタラーニング手法は, 乱数および類似性に基づくタスク分割に大きく依存するが, 機能評価の非効率性, (2) 最適タスクデータアロケーション, (3) 小サンプルによるロバストさの3つの大きな制約に直面している。このような制約を克服するために,連続した時系列を複数連続する短周期からなるメタタスクとして扱う,新しい「擬似メタタスク」分割方式を導入する。連続時系列を擬似メタタスクとして使用することにより,データからより包括的な特徴や関係を抽出し,より正確な予測を行うことができる。さらに,異なるデータセットにまたがる手法の堅牢性を高めるために,差分アルゴリズムを導入する。複数の故障・時系列予測データセットを広範囲に実験した結果,本手法は予測性能と一般化能力を大きく向上させることを実証した。 Meta learning is a promising technique for solving few-shot fault prediction problems, which have attracted the attention of many researchers in recent years. Existing meta-learning methods for time series prediction, which predominantly rely on random and similarity matching-based task partitioning, face three major limitations: (1) feature exploitation inefficiency; (2) suboptimal task data allocation; and (3) limited robustness with small samples. To overcome these limitations, we introduce a novel 'pseudo meta-task' partitioning scheme that treats a continuous time period of a time series as a meta-task, composed of multiple successive short time periods. Employing continuous time series as pseudo meta-tasks allows our method to extract more comprehensive features and relationships from the data, resulting in more accurate predictions. Moreover, we introduce a differential algorithm to enhance the robustness of our method across different datasets. Through extensive experiments on several fault and time series prediction datasets, we demonstrate that our approach substantially enhances prediction performance and generalization capability under both few-shot and general conditions.	翻訳日:2023-11-07 18:19:12 公開日:2023-11-04
# LLMは概念の道徳を理解する LLMs grasp morality in concept ( http://arxiv.org/abs/2311.02294v1 ) ライセンス: Link先を確認	Mark Pock, Andre Ye, Jared Moore	(参考訳) AI倫理と公正に関する作業は、公正さ、真実、多様性といった特定の価値を反映するLLMの規制に大きな進歩をもたらした。しかし、LLMがどんなものでも「意味」するかどうかという問題は当然ある。これに対処しない限り、そのような値で LLM を印字する意味は明確ではない。これに対し、私たちは人間を超えて広がる意味の一般的な理論を提供します。我々はこの理論を用いて、LLMの正確な性質を意味エージェントとして説明する。我々は, LLMが意味エージェントとしての立場から, 人間の社会の構成(道徳, 性別, 人種など)を概念的に把握していることを提案する。その結果、ある倫理的枠組みの下では、モデルアライメントの一般的な手法は、ベストに制限され、最悪に反生産的である。さらに、整合性のないモデルは、道徳的および社会的哲学をより良く発展させるのに役立つかもしれない。 Work in AI ethics and fairness has made much progress in regulating LLMs to reflect certain values, such as fairness, truth, and diversity. However, it has taken the problem of how LLMs might 'mean' anything at all for granted. Without addressing this, it is not clear what imbuing LLMs with such values even means. In response, we provide a general theory of meaning that extends beyond humans. We use this theory to explicate the precise nature of LLMs as meaning-agents. We suggest that the LLM, by virtue of its position as a meaning-agent, already grasps the constructions of human society (e.g. morality, gender, and race) in concept. Consequently, under certain ethical frameworks, currently popular methods for model alignment are limited at best and counterproductive at worst. Moreover, unaligned models may help us better develop our moral and social philosophy.	翻訳日:2023-11-07 18:18:50 公開日:2023-11-04
# コミュニティ検出のための対比的非負行列因子化 Contrastive Deep Nonnegative Matrix Factorization for Community Detection ( http://arxiv.org/abs/2311.02357v1 ) ライセンス: Link先を確認	Yuecheng Li, Jialong Chen, Chuan Chen, Lei Yang, Zibin Zheng	(参考訳) 近年,非負行列因子化(NMF)がコミュニティ検出に広く採用されている。しかし、既存のNMFベースの手法には以下の3つの問題がある。 1) 本来のネットワークを直接コミュニティメンバーシップ空間に変換するため,階層的な情報を把握することが困難である。 2) ネットワークのトポロジにのみ注意を払い、ノード属性を無視することが少なくない。 3)地域社会発見に必要なグローバルな構造情報を学習することは困難である。そこで我々はContrastive Deep Non negative Matrix Factorization (CDNMF) という新しいコミュニティ検出アルゴリズムを提案する。まず、情報抽出能力を強化するため、NMFをより深めます。その後,コントラスト学習に触発され,ネットワークトポロジーとノード属性を2つのコントラストビューとして創造的に構成する。さらに,debiased negative sampling layerを用いて,コミュニティレベルでのノード類似性を学習し,コミュニティ検出のためのモデルの適合性を高める。 3つの公開実数グラフデータセットについて実験を行い,提案手法は最先端手法よりも優れた結果を得た。コードはhttps://github.com/6lyc/cdnmf.git。 Recently, nonnegative matrix factorization (NMF) has been widely adopted for community detection, because of its better interpretability. However, the existing NMF-based methods have the following three problems: 1) they directly transform the original network into community membership space, so it is difficult for them to capture the hierarchical information; 2) they often only pay attention to the topology of the network and ignore its node attributes; 3) it is hard for them to learn the global structure information necessary for community detection. Therefore, we propose a new community detection algorithm, named Contrastive Deep Nonnegative Matrix Factorization (CDNMF). Firstly, we deepen NMF to strengthen its capacity for information extraction. Subsequently, inspired by contrastive learning, our algorithm creatively constructs network topology and node attributes as two contrasting views. Furthermore, we utilize a debiased negative sampling layer and learn node similarity at the community level, thereby enhancing the suitability of our model for community detection. We conduct experiments on three public real graph datasets and the proposed model has achieved better results than state-of-the-art methods. Code available at https://github.com/6lyc/CDNMF.git.	翻訳日:2023-11-07 18:11:16 公開日:2023-11-04
# mata:近似グラフ編集距離計算のための学習ノードマッチングとaアルゴリズムを組み合わせる MATA: Combining Learnable Node Matching with A Algorithm for Approximate Graph Edit Distance Computation ( http://arxiv.org/abs/2311.02356v1 ) ライセンス: Link先を確認	Junfeng Liu, Min Zhou, Shuai Ma, Lujia Pan	(参考訳) グラフ編集距離 (Graph Edit Distance, GED) は、グラフ検索や検索タスクで広く使われているグラフ類似度を測定する一般的な、および、ドメインに依存しない尺度である。しかし、正確なGED計算はNP完全であることが知られている。例えば、広く使われているAアルゴリズムは、必然的にスケーラビリティに悩む最適なソリューションを見つけるために、検索空間全体を探索する。学習ベースの手法は、回帰タスクを定式化してGEDを学習するためにグラフ表現技術を適用し、編集パスを復元できず、不正確なGED近似につながる(すなわち、予測されたGEDは正確なよりも小さい)。そこで本研究では,グラフニューラルネットワーク(GNN)とAアルゴリズムに基づくGEDの近似計算のためのデータ駆動型ハイブリッドアプローチMATAを提案する。具体的には、GED計算における構造支配的操作(ノードとエッジの挿入/削除)の性質を認識し、ノードマッチングのためのノード埋め込みのための局所および高次構造情報を共同で学習する構造強化GNNを設計する。第2に、top-k候補ノードは微分可能なtop-k操作によって生成され、gedの他の特性、すなわち複数の最適ノードマッチングに準拠したノードマッチングのトレーニングを可能にする。第3に、候補ノードの恩恵を受けたmataは、有望な検索方向のみを実行し、効率的にソリューションに到達する。最後に、MATAは組合せ探索法、学習法、ハイブリッド法を著しく上回り、大規模グラフに匹敵するスケール性を示す。 Graph Edit Distance (GED) is a general and domain-agnostic metric to measure graph similarity, widely used in graph search or retrieving tasks. However, the exact GED computation is known to be NP-complete. For instance, the widely used A algorithms explore the entire search space to find the optimal solution which inevitably suffers scalability issues. Learning-based methods apply graph representation techniques to learn the GED by formulating a regression task, which can not recover the edit path and lead to inaccurate GED approximation (i.e., the predicted GED is smaller than the exact). To this end, in this work, we present a data-driven hybrid approach MATA* for approximate GED computation based on Graph Neural Networks (GNNs) and A* algorithms, which models from the perspective of learning to match nodes instead of directly regressing GED. Specifically, aware of the structure-dominant operations (i.e.,node and edge insertion/deletion) property in GED computation, a structure-enhanced GNN is firstly designed to jointly learn local and high-order structural information for node embeddings for node matchings. Second, top-k candidate nodes are produced via a differentiable top-k operation to enable the training for node matchings, which is adhering to another property of GED, i.e., multiple optimal node matchings. Third, benefiting from the candidate nodes, MATA* only performs on the promising search directions, reaching the solution efficiently. Finally, extensive experiments show the superiority of MATA* as it significantly outperforms the combinatorial search-based, learning-based and hybrid methods and scales well to large-size graphs.	翻訳日:2023-11-07 18:10:57 公開日:2023-11-04
# TreeSwap: 依存サブツリースワッピングによる機械翻訳のためのデータ拡張 TreeSwap: Data Augmentation for Machine Translation via Dependency Subtree Swapping ( http://arxiv.org/abs/2311.02355v1 ) ライセンス: Link先を確認	Attila Nagy, Dorina Lakatos, Botond Barta, Judit \'Acs	(参考訳) ニューラルネットワーク翻訳のためのデータ拡張手法は、限られた量のトレーニングデータが利用可能である場合、特に有用である。本稿では,物体と対象をバイセントで置き換えることで,新たな文を生成する手法を提案する。これはソースとターゲット文の依存関係解析木に基づいて同時に実行される。このメソッドをTreeSwapと名付けます。この結果から,TreeSwapはリソース制約付きデータセット上で,4つの言語ペアのベースラインモデルに対して一貫した改善を実現していることがわかった。ドメイン固有のコーパスについても検討するが,本手法は法,医療,ITデータに大きな改善をもたらすものではない。同様の拡張手法のスコアを報告し,treeswapが両立することを確認した。また、生成した文を定性的に分析し、ほとんどのケースで増補が正しい翻訳を生み出すことを見出した。コードはgithubから入手できます。 Data augmentation methods for neural machine translation are particularly useful when limited amount of training data is available, which is often the case when dealing with low-resource languages. We introduce a novel augmentation method, which generates new sentences by swapping objects and subjects across bisentences. This is performed simultaneously based on the dependency parse trees of the source and target sentences. We name this method TreeSwap. Our results show that TreeSwap achieves consistent improvements over baseline models in 4 language pairs in both directions on resource-constrained datasets. We also explore domain-specific corpora, but find that our method does not make significant improvements on law, medical and IT data. We report the scores of similar augmentation methods and find that TreeSwap performs comparably. We also analyze the generated sentences qualitatively and find that the augmentation produces a correct translation in most cases. Our code is available on Github.	翻訳日:2023-11-07 18:10:26 公開日:2023-11-04
# ネットワーク上の意見形成のサンプル複雑性 Sample Complexity of Opinion Formation on Networks ( http://arxiv.org/abs/2311.02349v1 ) ライセンス: Link先を確認	Haolin Liu, Rajmohan Rajaraman, Ravi Sundaram, Anil Vullikanti, Omer Wasim, Haifeng Xu	(参考訳) ソーシャル・ネットワークが連携する地域社会において、新たなワクチンに対する意識を広めることを目指す公衆衛生担当者について検討する。情報を最小限のリソースで分散し、実際の事実に沿ったコミュニティ全体の理解を確保するにはどうすればよいのか? この懸念は多くの現実世界の状況を反映している。本稿では,この問題を解決するために,サンプル複雑性の研究を意見形成において初期化する。我々のモデルは、認識された意見形成ゲームに基づいており、各エージェントの意見は、先行研究のような実数ではなく、データ由来のモデルパラメータであるとみなす。このような拡張は、意見形成をより深く理解し、連合学習と密接に結びついている。この定式化を通じて、任意のネットワークのサンプル複雑性境界を特徴づけ、特定のネットワーク構造に対して漸近的に密接な境界を示す。興味深いことに、最適な戦略は、しばしばその度合いに逆らってサンプルを割り当て、重要な政策含意を示唆する。本研究は,合成ネットワークと実世界のネットワークの両方で実証実験を行った。 Consider public health officials aiming to spread awareness about a new vaccine in a community interconnected by a social network. How can they distribute information with minimal resources, ensuring community-wide understanding that aligns with the actual facts? This concern mirrors numerous real-world situations. In this paper, we initialize the study of sample complexity in opinion formation to solve this problem. Our model is built on the recognized opinion formation game, where we regard each agent's opinion as a data-derived model parameter, not just a real number as in prior studies. Such an extension offers a wider understanding of opinion formation and ties closely with federated learning. Through this formulation, we characterize the sample complexity bounds for any network and also show asymptotically tight bounds for specific network structures. Intriguingly, we discover optimal strategies often allocate samples inversely to the degree, hinting at vital policy implications. Our findings are empirically validated on both synthesized and real-world networks.	翻訳日:2023-11-07 18:10:11 公開日:2023-11-04
# 質問応答のための摂動型アクティブラーニング Perturbation-based Active Learning for Question Answering ( http://arxiv.org/abs/2311.02345v1 ) ライセンス: Link先を確認	Fan Luo, Mihai Surdeanu	(参考訳) アクティブラーニング(AL)トレーニング戦略を活用することで、アノテーションコストの少ない質問応答(QA)モデルを構築することができる。最も情報のないトレーニングデータを選択して、モデルを効果的に更新する。 ALの取得関数は、不確実性や多様性に基づくサンプリングなど、各トレーニング例がどの程度情報的であるかを決定するために使用される。本研究では,摂動型アクティブラーニングによる学習戦略を提案し,既存の一般的な学習戦略よりも効果的であることを実証する。 Building a question answering (QA) model with less annotation costs can be achieved by utilizing active learning (AL) training strategy. It selects the most informative unlabeled training data to update the model effectively. Acquisition functions for AL are used to determine how informative each training example is, such as uncertainty or diversity based sampling. In this work, we propose a perturbation-based active learning acquisition strategy and demonstrate it is more effective than existing commonly used strategies.	翻訳日:2023-11-07 18:09:54 公開日:2023-11-04
# 1回だけ前進する - 予測と合理化を1回のフォワードパスで行う You Only Forward Once: Prediction and Rationalization in A Single Forward Pass ( http://arxiv.org/abs/2311.02344v1 ) ライセンス: Link先を確認	Han Jiang, Junwen Duan, Zhe Qu, and Jianxin Wang	(参考訳) 教師なし論理抽出は、注釈付き推論なしでモデル予測をサポートするために、簡潔で連続したテキストスニペットを抽出することを目的としている。これまでの研究では、RNP(Rationalizing Neural Prediction)フレームワークと呼ばれる2段階のフレームワークを使用してきた。彼らは、抽出された説明は合理性と呼ばれ、ゴールデンラベルを予測するのに十分であると仮定した。しかし、上記の仮定は元の定義から外れており、うまく機能するには厳格すぎる。さらに、これらの二相モデルは連動問題とスプリアス相関に苦しむ。そこで本研究では, 予測ではなくモデル予測を支援するため, 理論の緩やかなバージョンから導出した, you only forward once (yofo) と呼ばれる新しい単相フレームワークを提案する。我々のフレームワークでは、BERTのような事前訓練された言語モデルがデプロイされ、相互ロックや素早い相関による影響が少なく、同時に予測と合理化が行われる。教師なしの方法で重要なトークンを直接選択することは難しい。 YOFOは重要なトークンを直接選択する代わりに、前方伝播中に重要でないトークンを徐々に削除する。 BeerAdvocateおよびHotel Reviewデータセットの実験を通して、我々のモデルが有理性を抽出し、RNPベースのモデルよりも正確に予測できることを示した。従来の最先端手法と比較して,トークンレベルのF1では最大18.4\%の改善が見られた。また,抽出された合理性およびトークン崩壊戦略の解明と実験を行った。その結果, YOFOは, モデル中央の重要でないトークンを除去しながら, 正確かつ重要な有理を抽出できることがわかった。 Unsupervised rationale extraction aims to extract concise and contiguous text snippets to support model predictions without any annotated rationale. Previous studies have used a two-phase framework known as the Rationalizing Neural Prediction (RNP) framework, which follows a generate-then-predict paradigm. They assumed that the extracted explanation, called rationale, should be sufficient to predict the golden label. However, the assumption above deviates from the original definition and is too strict to perform well. Furthermore, these two-phase models suffer from the interlocking problem and spurious correlations. To solve the above problems, we propose a novel single-phase framework called You Only Forward Once (YOFO), derived from a relaxed version of rationale where rationales aim to support model predictions rather than make predictions. In our framework, A pre-trained language model like BERT is deployed to simultaneously perform prediction and rationalization with less impact from interlocking or spurious correlations. Directly choosing the important tokens in an unsupervised manner is intractable. Instead of directly choosing the important tokens, YOFO gradually removes unimportant tokens during forward propagation. Through experiments on the BeerAdvocate and Hotel Review datasets, we demonstrate that our model is able to extract rationales and make predictions more accurately compared to RNP-based models. We observe an improvement of up to 18.4\% in token-level F1 compared to previous state-of-the-art methods. We also conducted analyses and experiments to explore the extracted rationales and token decay strategies. The results show that YOFO can extract precise and important rationales while removing unimportant tokens in the middle part of the model.	翻訳日:2023-11-07 18:09:46 公開日:2023-11-04
# 安定拡散参照のみ:イメージプロンプトとブループリントによる2次塗装用多成分拡散モデル Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting ( http://arxiv.org/abs/2311.02343v1 ) ライセンス: Link先を確認	Hao Ai, Lu Sheng	(参考訳) 安定拡散と制御ネットは画像生成と合成の分野で優れた成果を上げている。しかし、その粒度と制御方法により、二次絵画を主な作品とする漫画やアニメーション制作などの専門的な芸術作品では、効率性の向上が限定されている。現在のワークフローでは、文字や画像のスタイルを修正するには長いテキストプロンプトが必要であり、さらにテキストインバージョンやdreamboothなどの方法によるさらなるトレーニングが必要であり、これは画家にとって非常に複雑で高価である。そこで,本論文では,2種類の条件付き画像のみを用いて,2次絵画の高速化を行う,画像から画像への自己教師付きモデルである,安定拡散参照(Stable Diffusion Reference Only)を提案する。第1タイプの条件画像は、画像プロンプトとして機能し、生成に必要な概念および色情報を提供する。第2のタイプはブループリントイメージであり、生成された画像の視覚構造を制御する。元々のUNetにネイティブに組み込まれており、ControlNetの必要性を排除している。モジュールとパイプラインのすべてのコードをリリースし、コントロール可能な文字行アートカラーリングモデルをhttps://github.com/aihao2000/stable-diffusion-reference-onlyでトレーニングしました。これにより、この構造の有効性が検証され、アニメーション、漫画、ファンワークの生産効率が大幅に向上する。 Stable Diffusion and ControlNet have achieved excellent results in the field of image generation and synthesis. However, due to the granularity and method of its control, the efficiency improvement is limited for professional artistic creations such as comics and animation production whose main work is secondary painting. In the current workflow, fixing characters and image styles often need lengthy text prompts, and even requires further training through TextualInversion, DreamBooth or other methods, which is very complicated and expensive for painters. Therefore, we present a new method in this paper, Stable Diffusion Reference Only, a images-to-image self-supervised model that uses only two types of conditional images for precise control generation to accelerate secondary painting. The first type of conditional image serves as an image prompt, supplying the necessary conceptual and color information for generation. The second type is blueprint image, which controls the visual structure of the generated image. It is natively embedded into the original UNet, eliminating the need for ControlNet. We released all the code for the module and pipeline, and trained a controllable character line art coloring model at https://github.com/aihao2000/stable-diffusion-reference-only, that achieved state-of-the-art results in this field. This verifies the effectiveness of the structure and greatly improves the production efficiency of animations, comics, and fanworks.	翻訳日:2023-11-07 18:09:19 公開日:2023-11-04
# オープンワールドアンバイアス検出器のための提案レベル非教師なし領域適応 Proposal-Level Unsupervised Domain Adaptation for Open World Unbiased Detector ( http://arxiv.org/abs/2311.02342v1 ) ライセンス: Link先を確認	Xuanyi Liu, Zhongqi Yue, Xian-Sheng Hua	(参考訳) Open World Object Detection (OWOD)はオープンセットのオブジェクト検出とインクリメンタルな学習機能を組み合わせて、オープンでダイナミックなビジュアル世界の課題に対処する。既存の研究では、観察されたカテゴリで訓練されたフォアグラウンド予測器は、トップkの最も自信のあるフォアグラウンド予測を選択することで、見当たらないカテゴリの場所を特定するために直接転送できると仮定している。しかし、この仮定は実際はほとんど有効ではない。これは、予測者は必然的に既知のカテゴリに偏り、見当たらないカテゴリの出現のシフト下で失敗するためである。本研究では,非教師なし領域適応の下でタスクを再フォーマットし,現在のバイアス付き予測者がドメイン形成を支援することにより,未バイアスのフォアグラウンド予測器を構築することを目的としている。次に, 単純かつ効果的な自己学習法を用いて, 領域不変のフォアグラウンド特徴に基づく予測系を学習し, 視認圏と視認圏の出現の変化に頑健な非バイアス予測を実現する。このアプローチのパイプラインは,OWOD評価によって実証的に検証された,さまざまな検出フレームワークやUDAメソッドに適応することができる。 Open World Object Detection (OWOD) combines open-set object detection with incremental learning capabilities to handle the challenge of the open and dynamic visual world. Existing works assume that a foreground predictor trained on the seen categories can be directly transferred to identify the unseen categories' locations by selecting the top-k most confident foreground predictions. However, the assumption is hardly valid in practice. This is because the predictor is inevitably biased to the known categories, and fails under the shift in the appearance of the unseen categories. In this work, we aim to build an unbiased foreground predictor by re-formulating the task under Unsupervised Domain Adaptation, where the current biased predictor helps form the domains: the seen object locations and confident background locations as the source domain, and the rest ambiguous ones as the target domain. Then, we adopt the simple and effective self-training method to learn a predictor based on the domain-invariant foreground features, hence achieving unbiased prediction robust to the shift in appearance between the seen and unseen categories. Our approach's pipeline can adapt to various detection frameworks and UDA methods, empirically validated by OWOD evaluation, where we achieve state-of-the-art performance.	翻訳日:2023-11-07 18:08:52 公開日:2023-11-04
# 中国の多工系学生における英語の筆記能力向上 : 入力仮説の適用に関する詳細な文献レビュー Enhancing English Writing Proficiency in China's Polytechnic Students An In-Depth Literature Review on the Application of the Input Hypothesis ( http://arxiv.org/abs/2311.02341v1 ) ライセンス: Link先を確認	Wei Zhou	(参考訳) 英語を上手に書くことは、多技術系学生にとって非常に重要である。しかし、技術系学校の多くの生徒は、高いレベルのスキルに達するのに苦労している。入力仮説はStephen Krashen氏によって作成され、人々が既に知っているよりも少し難しい情報を受け取れば、言語をうまく学べることを示唆している。本研究は,多芸学生が英語の書き方を改善する上で,入力仮説がいかに役立つかを研究することを目的とする。この研究には、これまでの研究からの実際の観察と実験が含まれる。入力仮説が実際に筆記スキルの向上に役立つかどうかを確認するため、特殊書記指導を受ける多技術者学生のデータを調べる。この論文は、ポリテクニックの学生、教員、サポートスタッフ、そして、より大きなコミュニティのメンバーにも、その帰属、プロセス、そして、ポリテクニックの学生にとって第二言語開発の結果について、より良い情報を提供することができる。キーワード:英語書記スキル、多芸学生、入力仮説、理解可能な入力 Having good English writing skills is extremely important for students in polytechnic institutions. However, a lot of students in technical schools have difficulties in reaching high levels of skill. The Input Hypothesis, created by Stephen Krashen, suggests that people learn languages well when they receive information that's a little harder than what they already know but still understandable. This research paper wants to study how the Input Hypothesis can help polytechnic students improve their English writing skills. The study will include real-life observations and experiments from the previous research. We will look at data from polytechnic students who are receiving special writing instruction to see if the Input Hypothesis actually helps improve their writing skills. The paper can better inform polytechnic students, faculty members, and support staff and even members of the larger community about the attributions, the processes, and the possible outcomes of second language development for polytechnic students. Keywords: English writing skills, Polytechnic students, Input hypothesis, Comprehensible input	翻訳日:2023-11-07 18:08:27 公開日:2023-11-04
# MC-Stereo:ステレオマッチングのためのマルチピーク検索とカスケード検索範囲 MC-Stereo: Multi-peak Lookup and Cascade Search Range for Stereo Matching ( http://arxiv.org/abs/2311.02340v1 ) ライセンス: Link先を確認	Miaojie Feng, Junda Cheng, Hao Jia, Longliang Liu, Gangwei Xu, Xin Yang	(参考訳) ステレオマッチングはシーン理解における基本的なタスクである。近年,反復最適化に基づく手法がステレオマッチングに有望であることが示された。しかし、現在のイテレーションフレームワークはシングルピークルックアップを採用しており、マルチピーク問題を効果的に処理するのに苦労している。さらに、イテレーションプロセス中に使われる固定探索範囲は最終収束効果を制限する。これらの問題に対処するため、MC-Stereoと呼ばれる新しい反復最適化アーキテクチャを提案する。このアーキテクチャは、マルチピークルックアップ戦略を通したマッチングにおけるマルチピーク分布問題を緩和し、粗大な概念をカスケード探索範囲を介して反復的なフレームワークに統合する。さらに, 特徴表現学習が学習ベースステレオマッチングの成功に不可欠であることを踏まえ, 特徴抽出器として機能する事前学習ネットワークを導入し, ステレオマッチングパイプラインのフロントエンドを強化する。これらの改善に基づき、MC-Stereo は KITTI-2012 と KITTI-2015 ベンチマークで利用可能なすべてのメソッドの中で第1位であり、ETH3D の最先端性能も達成している。コードは、この論文の公開後にオープンソース化される。 Stereo matching is a fundamental task in scene comprehension. In recent years, the method based on iterative optimization has shown promise in stereo matching. However, the current iteration framework employs a single-peak lookup, which struggles to handle the multi-peak problem effectively. Additionally, the fixed search range used during the iteration process limits the final convergence effects. To address these issues, we present a novel iterative optimization architecture called MC-Stereo. This architecture mitigates the multi-peak distribution problem in matching through the multi-peak lookup strategy, and integrates the coarse-to-fine concept into the iterative framework via the cascade search range. Furthermore, given that feature representation learning is crucial for successful learnbased stereo matching, we introduce a pre-trained network to serve as the feature extractor, enhancing the front end of the stereo matching pipeline. Based on these improvements, MC-Stereo ranks first among all publicly available methods on the KITTI-2012 and KITTI-2015 benchmarks, and also achieves state-of-the-art performance on ETH3D. The code will be open sourced after the publication of this paper.	翻訳日:2023-11-07 18:08:09 公開日:2023-11-04
# 深層学習によるPotato Leaf病の分類:畳み込みニューラルネットワークによるアプローチ Potato Leaf Disease Classification using Deep Learning: A Convolutional Neural Network Approach ( http://arxiv.org/abs/2311.02338v1 ) ライセンス: Link先を確認	Utkarsh Yashwant Tambe, A. Shobanadevi, A. Shanthini and Hsiu-Chun Hsu	(参考訳) 本研究では、ディープラーニングを用いて、ジャガイモ葉病の分類に畳み込みニューラルネットワーク(CNN)を用いる。提案するアプローチでは、リーフイメージデータの事前処理、そのデータ上でCNNモデルをトレーニング、テストセットでのモデルの成功を評価する。実験結果によると、CNNモデル全体の精度は99.1%であり、初期明光、後期明光、健康といった2種類のジャガイモの葉病を同定する上で非常に正確である。提案手法は, 食品の安全維持と農業の財政的損失の最小化に不可欠なジャガイモ病の同定に, 信頼性と効果的な対策を提供する。モデルでは、重症感染症が存在する場合でも、さまざまな疾患タイプを正確に認識することができる。本研究は,ジャガイモ栽培における効果的かつ自動化された病害管理を支援する,ジャガイモ病を分類するための深層学習手法の可能性を強調した。 In this study, a Convolutional Neural Network (CNN) is used to classify potato leaf illnesses using Deep Learning. The suggested approach entails preprocessing the leaf image data, training a CNN model on that data, and assessing the model's success on a test set. The experimental findings show that the CNN model, with an overall accuracy of 99.1%, is highly accurate in identifying two kinds of potato leaf diseases, including Early Blight, Late Blight, and Healthy. The suggested method may offer a trustworthy and effective remedy for identifying potato diseases, which is essential for maintaining food security and minimizing financial losses in agriculture. The model can accurately recognize the various disease types even when there are severe infections present. This work highlights the potential of deep learning methods for categorizing potato diseases, which can help with effective and automated disease management in potato farming.	翻訳日:2023-11-07 18:07:49 公開日:2023-11-04
# STOW:倉庫ピッキングロボットの離散フレームセグメンテーションと未確認物体追跡 STOW: Discrete-Frame Segmentation and Tracking of Unseen Objects for Warehouse Picking Robots ( http://arxiv.org/abs/2311.02337v1 ) ライセンス: Link先を確認	Yi Li, Muru Zhang, Markus Grotz, Kaichun Mo, Dieter Fox	(参考訳) 離散フレームにおける見えないオブジェクトインスタンスのセグメンテーションと追跡は、分散倉庫のような動的産業ロボットのコンテキストにおいて大きな課題となる。ここでロボットは、新しいアイテムによる移動、除去、部分的閉塞を含むオブジェクトの再配置を処理し、時間的ギャップのかなりの後にこれらのアイテムを追跡する必要がある。このタスクは、トレーニングセットで学習されていない物体にロボットが遭遇した場合、さらに複雑になる。このような環境では、連続観察がしばしばアクセスできないことを考えると、我々のタスクは、シーンに実質的な変化が生じる可能性のある、不確定な期間で区切られた離散的なフレームの集合を扱うことである。このタスクは、テーブル上のオブジェクトの並べ替えなど、国内のロボットアプリケーションにも変換される。これらの要求に対処するために、これらの産業と家庭のシナリオを再現する新しい合成および実世界のデータセットを導入します。また,効率の良いフレーム間通信を容易にするトランスフォーマーモジュールとともに,離散フレームにおけるジョイントセグメンテーションとトラッキングのための新しいパラダイムを提案する。実験の結果,我々のアプローチは最近の手法を大きく上回っていることがわかった。さらなる結果とビデオについては、 \href{https://sites.google.com/view/stow-corl23}{website} をご覧ください。コードとデータセットがリリースされる。 Segmentation and tracking of unseen object instances in discrete frames pose a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses. Here, robots must handle object rearrangement, including shifting, removal, and partial occlusion by new items, and track these items after substantial temporal gaps. The task is further complicated when robots encounter objects not learned in their training sets, which requires the ability to segment and track previously unseen items. Considering that continuous observation is often inaccessible in such settings, our task involves working with a discrete set of frames separated by indefinite periods during which substantial changes to the scene may occur. This task also translates to domestic robotic applications, such as rearrangement of objects on a table. To address these demanding challenges, we introduce new synthetic and real-world datasets that replicate these industrial and household scenarios. We also propose a novel paradigm for joint segmentation and tracking in discrete frames along with a transformer module that facilitates efficient inter-frame communication. The experiments we conduct show that our approach significantly outperforms recent methods. For additional results and videos, please visit \href{https://sites.google.com/view/stow-corl23}{website}. Code and dataset will be released.	翻訳日:2023-11-07 18:07:32 公開日:2023-11-04
# バイトレベルの精度を持つエンコーダ・デコーダ基礎モデルを用いたDNAの自然言語理解 Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision ( http://arxiv.org/abs/2311.02333v1 ) ライセンス: Link先を確認	Aditya Malusare and Harish Kothandaraman and Dipesh Tamboli and Nadia A. Lanman and Vaneet Aggarwal	(参考訳) 本稿では、エンコーダ・デコーダトランスフォーマアーキテクチャを用いて、dna配列をバイトレベルの精度で解析するアンサンブルヌクレオチドヌクレオチドバイトレベルエンコーダ・デコーダ(enbed)基礎モデルを提案する。 ENBEDは、エンコーダのみまたはデコーダのみのアーキテクチャで以前のゲノムモデルを一般化し、シーケンスからシーケンスへの変換が可能な効率的なモデルを開発するために、注意のサブクアドラルな実装を使用する。 We use Masked Language Modeling to pre-train the foundation model using reference genome sequences and apply it in the following downstream tasks: (1) identification of enhancers, promotors and splice sites, (2) identification of biological function annotations of genomic sequences, (3) recognition of sequences containing base call mismatches and insertion/deletion errors, an advantage over tokenization schemes involving multiple base pairs, which lose the ability to analyze with byte-level precision, and (4) generating mutations of the Influenza virus using the encoder-decoder architecture and validating them against real-world observations. これらの課題のそれぞれにおいて、既存の最先端の成果と比較して顕著な改善が示される。 This paper presents the Ensemble Nucleotide Byte-level Encoder-Decoder (ENBED) foundation model, analyzing DNA sequences at byte-level precision with an encoder-decoder Transformer architecture. ENBED uses a sub-quadratic implementation of attention to develop an efficient model capable of sequence-to-sequence transformations, generalizing previous genomic models with encoder-only or decoder-only architectures. We use Masked Language Modeling to pre-train the foundation model using reference genome sequences and apply it in the following downstream tasks: (1) identification of enhancers, promotors and splice sites, (2) identification of biological function annotations of genomic sequences, (3) recognition of sequences containing base call mismatches and insertion/deletion errors, an advantage over tokenization schemes involving multiple base pairs, which lose the ability to analyze with byte-level precision, and (4) generating mutations of the Influenza virus using the encoder-decoder architecture and validating them against real-world observations. In each of these tasks, we demonstrate significant improvement as compared to the existing state-of-the-art results.	翻訳日:2023-11-07 18:07:12 公開日:2023-11-04
# 臨床補助イメージングに基づくバイオメディカル応用のためのマルチモーダル機械学習 Multimodal Machine Learning for Clinically-Assistive Imaging-Based Biomedical Applications ( http://arxiv.org/abs/2311.02332v1 ) ライセンス: Link先を確認	Elisa Warner, Joonsang Lee, William Hsu, Tanveer Syeda-Mahmood, Charles Kahn, and Arvind Rao	(参考訳) 医療人工知能(AI)システムにおける機械学習(ML)の応用は、伝統的および統計的手法から、ディープラーニングモデルやより最近の生成モデルの適用の増加へと移行してきた。近年,特に画像を用いたマルチモーダルデータ統合をサポートする,広く利用可能なディープラーニングアーキテクチャの発見が増えている。これらのモデルに複数のモダリティを組み込むことは、独自の課題を示す、繁栄する研究トピックである。本稿では、ML(representation, fusion, alignment, translation, co-learning)に関連するマルチモーダルAIに対する5つの課題について論じ、医療画像に基づく臨床意思決定モデルにおけるこれらの課題に対処するための最近のアプローチについて調査する。結論として,この分野の将来について議論し,臨床モデルと臨床現場への翻訳についてさらに解明すべき方向性を示唆した。 Machine learning (ML) applications in medical artificial intelligence (AI) systems have shifted from traditional and statistical methods to increasing application of deep learning models and even more recently generative models. Recent years have seen a rise in the discovery of widely-available deep learning architectures that support multimodal data integration, particularly with images. The incorporation of multiple modalities into these models is a thriving research topic, presenting its own unique challenges. In this work, we discuss five challenges to multimodal AI as it pertains to ML (representation, fusion, alignment, translation, and co-learning) and survey recent approaches to addressing these challenges in the context of medical image-based clinical decision support models. We conclude with a discussion of the future of the field, suggesting directions that should be elucidated further for successful clinical models and their translation to the clinical setting.	翻訳日:2023-11-07 18:06:41 公開日:2023-11-04
# 複合臓器マスクガイド放射線治療報告の作成 Complex Organ Mask Guided Radiology Report Generation ( http://arxiv.org/abs/2311.02329v1 ) ライセンス: Link先を確認	Gu Tiancheng, Liu Dongnan, Li Zhiyuan, Cai Weidong	(参考訳) The goal of automatic report generation is to generate a clinically accurate and coherent phrase from a single given X-ray image, which could alleviate the workload of traditional radiology reporting.However, in a real-world scenario, radiologists frequently face the challenge of producing extensive reports derived from numerous medical images, thereby medical report generation from multi-image perspective is needed.In this paper, we propose the Complex Organ Mask Guided (termed as COMG) report generation model, which incorporates masks from multiple organs (e.g., bones, lungs, heart, and mediastinum), to provide more detailed information and guide the model's attention to these crucial body regions. 具体的には, 融合過程における各臓器に対応する疾患の事前知識を活用して, 報告生成過程における疾患識別フェーズを増強する。さらに、コサイン類似度損失を目標関数として、クロスモーダル一貫性の収束を保証し、モデルの最適化を促進するとともに、COMGがそれぞれIU-Xray上のSOTAモデルKiUTとMIMICのBLEU@4スコアで11.4%と9.7%の改善を達成したことを示す。 The goal of automatic report generation is to generate a clinically accurate and coherent phrase from a single given X-ray image, which could alleviate the workload of traditional radiology reporting.However, in a real-world scenario, radiologists frequently face the challenge of producing extensive reports derived from numerous medical images, thereby medical report generation from multi-image perspective is needed.In this paper, we propose the Complex Organ Mask Guided (termed as COMG) report generation model, which incorporates masks from multiple organs (e.g., bones, lungs, heart, and mediastinum), to provide more detailed information and guide the model's attention to these crucial body regions. Specifically, we leverage prior knowledge of the disease corresponding to each organ in the fusion process to enhance the disease identification phase during the report generation process. Additionally, cosine similarity loss is introduced as target function to ensure the convergence of cross-modal consistency and facilitate model optimization.Experimental results on two public datasets show that COMG achieves a 11.4% and 9.7% improvement in terms of BLEU@4 scores over the SOTA model KiUT on IU-Xray and MIMIC, respectively.	翻訳日:2023-11-07 18:06:14 公開日:2023-11-04
# CDR-Adapter: クロスドメイン勧告モデルのための転送能力向上のための学習アダプタ CDR-Adapter: Learning Adapters to Dig Out More Transferring Ability for Cross-Domain Recommendation Models ( http://arxiv.org/abs/2311.02398v1 ) ライセンス: Link先を確認	Yanyu Chen, Yao Yao, Wai Kin Victor Chan, Li Xiao, Kai Zhang, Liang Zhang, Yun Ye	(参考訳) データスパーシリティとコールドスタート問題は、レコメンデーションシステムにおいて永続的な課題である。クロスドメインレコメンデーション(CDR)は、ソースドメインからの知識を利用して、ターゲットドメインのレコメンデーションパフォーマンスを改善する、有望なソリューションである。従来のCDRアプローチは主に、知識伝達を促進するためにマッピング関数を学習するEmbedding and Mapping(EMCDR)フレームワークに従ったものだ。しかし、これらのアプローチは、計算コストが高く、元の知識を壊滅的に忘れてしまう可能性がある、転送可能な知識を組み込むために、ネットワーク構造の再設計と再訓練を必要としている。本稿では、ネットワーク構造の再設計を必要とせず、元のレコメンデーションモデルをマッピング関数から切り離すことにより、CDRにおけるデータスパーシリティとコールドスタート問題に対処するスケーラブルで効率的なパラダイムを提案する。具体的には、CDR-Adapterは、アダプタモジュールを使用して特徴表現を整列させ、異なるドメイン間で柔軟な知識伝達を可能にし、トレーニングコストを最小限に抑えた効率的な微調整を可能にする。ベンチマークデータセットについて広範な実験を行い,最先端cdrアプローチの有効性を実証した。 Data sparsity and cold-start problems are persistent challenges in recommendation systems. Cross-domain recommendation (CDR) is a promising solution that utilizes knowledge from the source domain to improve the recommendation performance in the target domain. Previous CDR approaches have mainly followed the Embedding and Mapping (EMCDR) framework, which involves learning a mapping function to facilitate knowledge transfer. However, these approaches necessitate re-engineering and re-training the network structure to incorporate transferrable knowledge, which can be computationally expensive and may result in catastrophic forgetting of the original knowledge. In this paper, we present a scalable and efficient paradigm to address data sparsity and cold-start issues in CDR, named CDR-Adapter, by decoupling the original recommendation model from the mapping function, without requiring re-engineering the network structure. Specifically, CDR-Adapter is a novel plug-and-play module that employs adapter modules to align feature representations, allowing for flexible knowledge transfer across different domains and efficient fine-tuning with minimal training costs. We conducted extensive experiments on the benchmark dataset, which demonstrated the effectiveness of our approach over several state-of-the-art CDR approaches.	翻訳日:2023-11-07 17:58:28 公開日:2023-11-04
# NeuroEvoBench: ディープラーニングアプリケーションのための進化的最適化のベンチマーク NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications ( http://arxiv.org/abs/2311.02394v1 ) ライセンス: Link先を確認	Robert Tjarko Lange, Yujin Tang, Yingtao Tian	(参考訳) 近年、ディープラーニングコミュニティは、長いインナーループアンロールによるメタラーニングや非微分演算子の最適化など、ハード最適化問題に対処する手段として、進化的最適化(eo)に関心を寄せている。このトレンドの主な理由は、最近ハードウェアアクセラレーションと互換性のあるソフトウェアが革新され、分散人口評価が以前よりもずっと簡単になったことだ。しかし、勾配降下に基づく手法とは違って、EOのハイパーパラメータ理解とベストプラクティスが欠如している。さらに、進化的コミュニティの古典的なベンチマークは、ディープラーニングアプリケーションに対する実践的な洞察をほとんど提供しません。これは、新参者がハードウェアアクセラレーションのeoに挑戦し、大きな採用を妨げる。そこで我々は,Deep Learningアプリケーションに適したEO手法(NeuroEvoBench)の新たなベンチマークを構築し,従来型およびメタ学習型EOを徹底的に評価する。本稿では,資源配分,適合性形成,正規化,正規化,EOのスケーラビリティといった科学的な問題について検討する。ベンチマークはApache-2.0ライセンス下でhttps://github.com/neuroevobench/neuroevobenchで公開されている。 Recently, the Deep Learning community has become interested in evolutionary optimization (EO) as a means to address hard optimization problems, e.g. meta-learning through long inner loop unrolls or optimizing non-differentiable operators. One core reason for this trend has been the recent innovation in hardware acceleration and compatible software - making distributed population evaluations much easier than before. Unlike for gradient descent-based methods though, there is a lack of hyperparameter understanding and best practices for EO - arguably due to severely less 'graduate student descent' and benchmarking being performed for EO methods. Additionally, classical benchmarks from the evolutionary community provide few practical insights for Deep Learning applications. This poses challenges for newcomers to hardware-accelerated EO and hinders significant adoption. Hence, we establish a new benchmark of EO methods (NeuroEvoBench) tailored toward Deep Learning applications and exhaustively evaluate traditional and meta-learned EO. We investigate core scientific questions including resource allocation, fitness shaping, normalization, regularization & scalability of EO. The benchmark is open-sourced at https://github.com/neuroevobench/neuroevobench under Apache-2.0 license.	翻訳日:2023-11-07 17:58:05 公開日:2023-11-04
# ビデオからの教師なし単眼深度の連続学習 Continual Learning of Unsupervised Monocular Depth from Videos ( http://arxiv.org/abs/2311.02393v1 ) ライセンス: Link先を確認	Hemang Chawla, Arnav Varma, Elahe Arani, and Bahram Zonooz	(参考訳) 単眼深度推定を含む空間的シーン理解は、ロボット工学や自律運転といった様々な応用において重要な問題である。教師なし単眼深度推定の改善により、さまざまなクラウドソースビデオでモデルをトレーニングすることが可能になったが、ほとんどの手法では標準のトレーニングプロトコルを使用しており、新しいデータが収集された後、モデルがスクラッチからトレーニングされる。代わりに、逐次的に収集されたデータに対するモデルの連続的なトレーニングは、計算とメモリコストを大幅に削減する。それにもかかわらず、ナイーブな継続的なトレーニングは、モデル安定性と可塑性の間のトレードオフを強調しながら、古いドメインでモデルパフォーマンスが劣化する破滅的な忘れ込みにつながる。画像分類においてこの問題に対処するためにいくつかの手法が提案されているが、深度推定の高次元および時空間的相関アウトプットは別の課題となっている。私たちの知る限りでは、深さ推定における連続学習の問題に焦点をあてたフレームワークや方法は存在しない。そこで我々は,連続的教師なし深度推定(CUDE)の課題を捉え,モデルの性能を評価するために必要な指標を定義する枠組みを提案する。本稿では,カメラ内在性が不明な場合であっても,時間的一貫性を深度推定に活用するリハーサルベースデュアルメモリ法 monodepthcl を提案する。 Spatial scene understanding, including monocular depth estimation, is an important problem in various applications, such as robotics and autonomous driving. While improvements in unsupervised monocular depth estimation have potentially allowed models to be trained on diverse crowdsourced videos, this remains underexplored as most methods utilize the standard training protocol, wherein the models are trained from scratch on all data after new data is collected. Instead, continual training of models on sequentially collected data would significantly reduce computational and memory costs. Nevertheless, naive continual training leads to catastrophic forgetting, where the model performance deteriorates on older domains as it learns on newer domains, highlighting the trade-off between model stability and plasticity. While several techniques have been proposed to address this issue in image classification, the high-dimensional and spatiotemporally correlated outputs of depth estimation make it a distinct challenge. To the best of our knowledge, no framework or method currently exists focusing on the problem of continual learning in depth estimation. Thus, we introduce a framework that captures the challenges of continual unsupervised depth estimation (CUDE), and define the necessary metrics to evaluate model performance. We propose a rehearsal-based dual-memory method, MonoDepthCL, which utilizes spatiotemporal consistency for continual learning in depth estimation, even when the camera intrinsics are unknown.	翻訳日:2023-11-07 17:57:45 公開日:2023-11-04
# クロスレベル蒸留と機能劣化 Cross-Level Distillation and Feature Denoising for Cross-Domain Few-Shot Classification ( http://arxiv.org/abs/2311.02392v1 ) ライセンス: Link先を確認	Hao Zheng, Runqi Wang, Jianzhuang Liu, Asako Kanezaki	(参考訳) 従来の少数ショット分類は、大きなラベル付きベースデータセット上でモデルを学習し、ベースデータセットと同じ分布からターゲットデータセットに迅速に適応することを目的としている。しかし、実際には、いくつかのショット分類のベースとターゲットデータセットは、通常異なるドメインから作られており、これはクロスドメインのショット分類の問題である。トレーニング段階において、対象領域内のラベルなし画像のごく一部をアクセス可能にすることで、この問題に対処する。この設定では、ベースデータが十分でラベル付けされているにもかかわらず、大きなドメインシフトはベースデータセットからの知識の転送を困難にする。我々は,ネットワークの浅い層を誘導し,より高いレベルの情報を学習することで,対象データセットのより差別的な特徴を抽出する能力を高めることができるクロスレベル知識蒸留法を慎重に設計する。さらに,評価段階におけるオーバーフィッティングを緩和するために,特徴冗長性を低減し,オーバーフィッティングを緩和できる特徴デノイジング操作を提案する。 BSCD-FSLベンチマークでは,従来の動的蒸留法を1ショットで5.44%,5ショットの分類タスクで1.37%超えることができる。実装コードはhttps://github.com/jarucezh/cldfdで利用可能である。 The conventional few-shot classification aims at learning a model on a large labeled base dataset and rapidly adapting to a target dataset that is from the same distribution as the base dataset. However, in practice, the base and the target datasets of few-shot classification are usually from different domains, which is the problem of cross-domain few-shot classification. We tackle this problem by making a small proportion of unlabeled images in the target domain accessible in the training stage. In this setup, even though the base data are sufficient and labeled, the large domain shift still makes transferring the knowledge from the base dataset difficult. We meticulously design a cross-level knowledge distillation method, which can strengthen the ability of the model to extract more discriminative features in the target dataset by guiding the network's shallow layers to learn higher-level information. Furthermore, in order to alleviate the overfitting in the evaluation stage, we propose a feature denoising operation which can reduce the feature redundancy and mitigate overfitting. Our approach can surpass the previous state-of-the-art method, Dynamic-Distillation, by 5.44% on 1-shot and 1.37% on 5-shot classification tasks on average in the BSCD-FSL benchmark. The implementation code will be available at https://github.com/jarucezh/cldfd.	翻訳日:2023-11-07 17:57:22 公開日:2023-11-04
# 量子重ね合わせの原理:再考 The quantum superposition principle: a reconsideration ( http://arxiv.org/abs/2311.02391v1 ) ライセンス: Link先を確認	Ivan Georgiev Koprinkov	(参考訳) 量子重ね合わせの原理は、量子力学の断熱定理、非断熱的な状態、実験的証拠に基づいて再考される。量子重ね合わせの物理的機構と物理的性質を明らかにする。 The quantum superposition principle is reconsidered based on adiabatic theorem of quantum mechanics, nonadiabatic dressed states and experimental evidence. The physical mechanism and physical properties of the quantum superposition are revealed.	翻訳日:2023-11-07 17:56:57 公開日:2023-11-04
# セルラーネットワークに適用するAIベースの自己修復ソリューションの概要 AI-based Self-healing Solutions Applied to Cellular Networks: An Overview ( http://arxiv.org/abs/2311.02390v1 ) ライセンス: Link先を確認	Jaleh Farmani, Amirreza Khalil Zadeh	(参考訳) 本稿では,セルネットワークにおけるセル障害に対する自己修復を実装するために使用される,古典型と深層型の両方の機械学習(ml)手法の概要について述べる。自己修復はネットワーク管理に対する有望なアプローチであり、自律的な方法で細胞障害の検出と補償を目的としている。この技術は,既存の4gネットワークと5gネットワークの設置とメンテナンスに伴うコストを削減することを目的としている。本稿では,ネットワーク管理におけるSON,自己修復,ML技術の基本概念と分類について概説する。さらに, 細胞障害の文献における現状を概観し, 特にMLに基づくアプローチに注目した。 In this article, we provide an overview of machine learning (ML) methods, both classical and deep variants, that are used to implement self-healing for cell outages in cellular networks. Self-healing is a promising approach to network management, which aims to detect and compensate for cell outages in an autonomous way. This technology aims to decrease the expenses associated with the installation and maintenance of existing 4G and 5G, i.e. emerging 6G networks by simplifying operational tasks through its ability to heal itself. We provide an overview of the basic concepts and taxonomy for SON, self-healing, and ML techniques, in network management. Moreover, we review the state-of-the-art in literature for cell outages, with a particular emphasis on ML-based approaches.	翻訳日:2023-11-07 17:56:52 公開日:2023-11-04
# 超長周期分散変圧器 Ultra-Long Sequence Distributed Transformer ( http://arxiv.org/abs/2311.02382v1 ) ライセンス: Link先を確認	Xiao Wang, Isaac Lyngaas, Aristeidis Tsaris, Peng Chen, Sajal Dash, Mayanka Chandra Shekar, Tao Luo, Hong-Jun Yoon, Mohamed Wahib, John Gouley	(参考訳) 長いシーケンスで訓練されたトランスフォーマーモデルは、しばしば短いシーケンスよりも高い精度を達成する。残念なことに、従来のトランスフォーマーは、圧倒的な計算とメモリ要求のために長いシーケンストレーニングに苦労している。既存のロングシーケンストレーニングの方法は、制限されたスピードアップとメモリ削減を提供し、精度を損なう可能性がある。本稿では,長周期の変圧器を学習するための新しい分散学習手法であるLong Short-Sequence Transformer(LSS Transformer)を提案する。長いシーケンスをGPU間でセグメントに分散し、各GPUコンピューティングはそのセグメントに対して部分的な自己アテンションを持つ。そして、融合通信と新しい二重勾配平均化技術を用いて、部分的な自己注意の集約や通信オーバーヘッドの最小化を回避する。 wikipedia enwik8データセット上で,lssトランスフォーマタとnvidiaシーケンシャル並列性の性能評価を行った。その結果,提案手法はNvidia V100の144 GPUにおける最先端シーケンス並列処理と比較して,5.6倍,メモリ効率が10.2倍向上した。さらに,3,456個のGPUで50,112個の極端なシーケンス長にスケールアップし,超線形並列効率161%,スループット32ペタフロップスを実現した。 Transformer models trained on long sequences often achieve higher accuracy than short sequences. Unfortunately, conventional transformers struggle with long sequence training due to the overwhelming computation and memory requirements. Existing methods for long sequence training offer limited speedup and memory reduction, and may compromise accuracy. This paper presents a novel and efficient distributed training method, the Long Short-Sequence Transformer (LSS Transformer), for training transformer with long sequences. It distributes a long sequence into segments among GPUs, with each GPU computing a partial self-attention for its segment. Then, it uses a fused communication and a novel double gradient averaging technique to avoid the need to aggregate partial self-attention and minimize communication overhead. We evaluated the performance between LSS Transformer and the state-of-the-art Nvidia sequence parallelism on a Wikipedia enwik8 dataset. Results show that our proposed method lead to 5.6x faster and 10.2x more memory-efficient implementation compared to state-of-the-art sequence parallelism on 144 Nvidia V100 GPUs. Moreover, our algorithm scales to an extreme sequence length of 50,112 at 3,456 GPUs, achieving 161% super-linear parallel efficiency and a throughput of 32 petaflops.	翻訳日:2023-11-07 17:56:38 公開日:2023-11-04
# 大規模言語モデルからのフィードバックによるロボット操作の強化学習 Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models ( http://arxiv.org/abs/2311.02379v1 ) ライセンス: Link先を確認	Kun Chu, Xufeng Zhao, Cornelius Weber, Mengdi Li, Stefan Wermter	(参考訳) 強化学習(rl)は,環境との試行錯誤による自己学習を可能にするため,ロボット操作領域において重要な役割を果たす。それでも、サンプル効率と報酬仕様は、その可能性を大幅に制限している。ひとつの可能な解決策は、専門家の指導から学ぶことだ。しかし、RLエージェントを監督するコストが高いため、人間専門家の獲得は不可能であり、自動スーパーバイザーの開発は困難な作業である。大規模言語モデル(LLM)は、自然言語のユーザ入力に対して人間のようなフィードバックを提供する能力を示す。それでも、訓練は特定のロボットデータではなく、巨大なインターネットデータに基づいているため、低レベルのロボットの動きを直接制御するように設計されていない。本稿では,LLMのタイムリーなフィードバックを利用して,RLエージェントがロボットタスクを効率的に学習することを可能にするLafite-RL(Language Agent feedback Interactive Reinforcement Learning)フレームワークを提案する。 rlbenchタスクで行った実験は、自然言語による簡単なプロンプトデザインにより、llmに導かれると学習能力が向上することを示している。これは、学習効率と成功率の両方においてベースラインを上回り、llmによって提供される報酬の有効性を強調する。 Reinforcement Learning (RL) plays an important role in the robotic manipulation domain since it allows self-learning from trial-and-error interactions with the environment. Still, sample efficiency and reward specification seriously limit its potential. One possible solution involves learning from expert guidance. However, obtaining a human expert is impractical due to the high cost of supervising an RL agent, and developing an automatic supervisor is a challenging endeavor. Large Language Models (LLMs) demonstrate remarkable abilities to provide human-like feedback on user inputs in natural language. Nevertheless, they are not designed to directly control low-level robotic motions, as their pretraining is based on vast internet data rather than specific robotics data. In this paper, we introduce the Lafite-RL (Language agent feedback interactive Reinforcement Learning) framework, which enables RL agents to learn robotic tasks efficiently by taking advantage of LLMs' timely feedback. Our experiments conducted on RLBench tasks illustrate that, with simple prompt design in natural language, the Lafite-RL agent exhibits improved learning capabilities when guided by an LLM. It outperforms the baseline in terms of both learning efficiency and success rate, underscoring the efficacy of the rewards provided by an LLM.	翻訳日:2023-11-07 17:56:18 公開日:2023-11-04
# MTS-DVGAN:二変量生成対向ネットワークを用いたサイバー物理システムにおける異常検出 MTS-DVGAN: Anomaly Detection in Cyber-Physical Systems using a Dual Variational Generative Adversarial Network ( http://arxiv.org/abs/2311.02378v1 ) ライセンス: Link先を確認	Haili Sun, Yan Huang, Lansheng Han, Cai Fu, Hongle Liu, Xiang Long	(参考訳) 深層生成モデルは、ラベル付き情報に頼ることなくサイバーフィジカルシステム(cpss)の脆弱性を緩和し、新しいサイバーフィジカル攻撃を検出することを約束している。それでもこれらの生成モデルは、通常のデータによく似た攻撃行動を識別したり、通常のデータ分布から逸脱するが、潜在空間における通常のクラスタの多様体に近い攻撃行動を特定するという課題に直面している。そこで本論文では,MST-DVGAN と呼ばれる非教師付き二重変動生成逆数モデルを提案し,CPS セキュリティのための多変量時系列データにおける異常検出を行う。中心となる概念は、再構成された異常サンプルと通常のサンプルとの区別を広げることで、モデルの識別能力を高めることである。具体的には,よりコンパクトな組込みを得るために,コントラスト制約を再構成プロセスに課すことで拡張モジュールを提案する。次に,多変量時系列の分布特性を利用して正規パターンをモデル化することにより,GAN(Generative Adversarial Network)を強制的に生成する変動オートエンコーダを導入する。さらに,2つの拡張損失関数は,拡張サンプルと原サンプルの相互誘導により,自己監督的な本質的な特徴を抽出するように設計されている。最後に、ジェネレータネットワークの安定性を高めるために、特定の特徴中心損失を導入する。 SWAT、WADI、NSL_KDDという3つの公開データセットで実証実験を行った。その結果,MTS-DVGANの安定性が向上し,一貫した性能向上が達成できた。 Deep generative models are promising in detecting novel cyber-physical attacks, mitigating the vulnerability of Cyber-physical systems (CPSs) without relying on labeled information. Nonetheless, these generative models face challenges in identifying attack behaviors that closely resemble normal data, or deviate from the normal data distribution but are in close proximity to the manifold of the normal cluster in latent space. To tackle this problem, this article proposes a novel unsupervised dual variational generative adversarial model named MST-DVGAN, to perform anomaly detection in multivariate time series data for CPS security. The central concept is to enhance the model's discriminative capability by widening the distinction between reconstructed abnormal samples and their normal counterparts. Specifically, we propose an augmented module by imposing contrastive constraints on the reconstruction process to obtain a more compact embedding. Then, by exploiting the distribution property and modeling the normal patterns of multivariate time series, a variational autoencoder is introduced to force the generative adversarial network (GAN) to generate diverse samples. Furthermore, two augmented loss functions are designed to extract essential characteristics in a self-supervised manner through mutual guidance between the augmented samples and original samples. Finally, a specific feature center loss is introduced for the generator network to enhance its stability. Empirical experiments are conducted on three public datasets, namely SWAT, WADI and NSL_KDD. Comparing with the state-of-the-art methods, the evaluation results show that the proposed MTS-DVGAN is more stable and can achieve consistent performance improvement.	翻訳日:2023-11-07 17:55:58 公開日:2023-11-04
# 厳密な鞍点を避けるリーマン確率最適化法 Riemannian stochastic optimization methods avoid strict saddle points ( http://arxiv.org/abs/2311.02374v1 ) ライセンス: Link先を確認	Ya-Ping Hsieh and Mohammad Reza Karimi and Andreas Krause and Panayotis Mertikopoulos	(参考訳) オンライン主成分分析から共分散行列同定や辞書学習に至るまで、現代の機械学習アプリケーションの多くはリーマン多様体上の最小化問題として定式化され、リーマンの確率的勾配法(あるいはその変種)で解かれる。しかし、多くの場合において、結果の最小化問題は測地的に凸ではないので、選択された解の望ましい解(すなわち局所最小化)への収束は決して保証されない。本稿では,確率 1 の鞍点を避けるために,確率リーマン最適化アルゴリズムが保証されているか,という問題を正確に研究する。一般性については, リーマン勾配降下と比較して, シナリオ毎のコストがはるかに低い可能性に加えて, 自然政策勾配法や通常の凸空間におけるミラー降下法など, 広く用いられている他のアルゴリズムを含む, 引き込みに基づく手法の族について検討する。この一般的な設定では、環境多様体と勾配情報を提供するオラクルの穏やかな仮定の下で、研究中のポリシーは、任意の初期条件から、確率 1 の厳密な saddle point / submanifolds を避ける。この結果は、ほぼ常に、確率リーマンアルゴリズムの極限状態が局所的最小値のみであることを示すため、多様体上の勾配法の使用に対する重要な健全性チェックを提供する。 Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesically convex, so the convergence of the chosen solver to a desirable solution - i.e., a local minimizer - is by no means guaranteed. In this paper, we study precisely this question, that is, whether stochastic Riemannian optimization algorithms are guaranteed to avoid saddle points with probability 1. For generality, we study a family of retraction-based methods which, in addition to having a potentially much lower per-iteration cost relative to Riemannian gradient descent, include other widely used algorithms, such as natural policy gradient methods and mirror descent in ordinary convex spaces. In this general setting, we show that, under mild assumptions for the ambient manifold and the oracle providing gradient information, the policies under study avoid strict saddle points / submanifolds with probability 1, from any initial condition. This result provides an important sanity check for the use of gradient methods on manifolds as it shows that, almost always, the limit state of a stochastic Riemannian algorithm can only be a local minimizer.	翻訳日:2023-11-07 17:55:31 公開日:2023-11-04
# トロイの木馬から城壁へ:拡散モデルにおける両側のバックドア効果 From Trojan Horses to Castle Walls: Unveiling Bilateral Backdoor Effects in Diffusion Models ( http://arxiv.org/abs/2311.02373v1 ) ライセンス: Link先を確認	Zhuoshi Pan, Yuguang Yao, Gaowen Liu, Bingquan Shen, H. Vicky Zhao, Ramana Rao Kompella, Sijia Liu	(参考訳) 最先端拡散モデル(DM)は画像生成において優れているが、セキュリティに関する懸念は持続する。初期の研究ではdmsのバックドア攻撃に対する脆弱性が強調されたが、これらの研究は画像分類における'badnets'のような従来の方法よりも厳格な要件を課した。これは前者が拡散サンプリングと訓練手順に修正を加える必要があるためである。従来と異なり,従来の拡散過程を阻害することなくトレーニングデータセットを汚染するだけで,DMのバックドア攻撃がBadNetsと同じくらい簡単にできるかどうかを検討する。この現実的なバックドア設定では、敵の目的(DMの機能を補完する)だけでなく、防御的優位性(バックドアの防御に活用できる)をもたらす両側のバックドア効果を明らかにする。具体的には、BadNetsのようなバックドア攻撃は、(意図したテキスト条件と一致しない)不正確な画像を生成するためのDMに対して有効であり、DMを分類器として使用すると誤予測が生じる。一方,バックドアDMでは,生成画像中のバックドアトリガの割合が増加しており,この現象は「トリガー増幅」と呼ばれている。後者の知見は,バックドア・ポゾンによるトレーニングデータの検出の促進に有効であることを示す。低バックドア中毒率下においても、DMのバックドア効果を研究することは、抗バックドア画像分類器の設計にも有用である。最後に,dms固有のデータ記憶傾向を探索することにより,バックドア攻撃とデータ複製現象との間に有意義な関連性を確立する。私たちの作業のコードはhttps://github.com/OPTML-Group/BiBadDiff.orgで公開されています。 While state-of-the-art diffusion models (DMs) excel in image generation, concerns regarding their security persist. Earlier research highlighted DMs' vulnerability to backdoor attacks, but these studies placed stricter requirements than conventional methods like 'BadNets' in image classification. This is because the former necessitates modifications to the diffusion sampling and training procedures. Unlike the prior work, we investigate whether generating backdoor attacks in DMs can be as simple as BadNets, i.e., by only contaminating the training dataset without tampering the original diffusion process. In this more realistic backdoor setting, we uncover bilateral backdoor effects that not only serve an adversarial purpose (compromising the functionality of DMs) but also offer a defensive advantage (which can be leveraged for backdoor defense). Specifically, we find that a BadNets-like backdoor attack remains effective in DMs for producing incorrect images (misaligned with the intended text conditions), and thereby yielding incorrect predictions when DMs are used as classifiers. Meanwhile, backdoored DMs exhibit an increased ratio of backdoor triggers, a phenomenon we refer to as `trigger amplification', among the generated images. We show that this latter insight can be used to enhance the detection of backdoor-poisoned training data. Even under a low backdoor poisoning ratio, studying the backdoor effects of DMs is also valuable for designing anti-backdoor image classifiers. Last but not least, we establish a meaningful linkage between backdoor attacks and the phenomenon of data replications by exploring DMs' inherent data memorization tendencies. The codes of our work are available at https://github.com/OPTML-Group/BiBadDiff.	翻訳日:2023-11-07 17:55:05 公開日:2023-11-04
# TACNET: テンポラルオーディオソースカウントネットワーク TACNET: Temporal Audio Source Counting Network ( http://arxiv.org/abs/2311.02369v1 ) ライセンス: Link先を確認	Amirreza Ahmadnejad, Ahmad Mahmmodian Darviishani, Mohmmad Mehrdad Asadi, Sajjad Saffariyeh, Pedram Yousef, Emad Fatemizadeh	(参考訳) 本稿では,音声ソースカウントタスクの制限に対処する革新的なアーキテクチャであるTemporal Audio Source Counting Network(TaCNet)を紹介する。 TaCNetは生のオーディオ入力を直接操作し、複雑な前処理ステップを排除し、ワークフローを簡素化する。特に、Truncatedの入力ウィンドウでさえ、リアルタイムの話者カウントに優れています。 LibriCountデータセットを用いて行った広範囲な評価は、TaCNetの例外的なパフォーマンスを強調し、オーディオソースカウントタスクの最先端ソリューションとして位置付ける。 11のクラスで平均74.18パーセントの精度で、TaCNetは中国語とペルシア語を含む様々なシナリオでその効果を実証している。この言語間適応性は、その汎用性と潜在的影響を強調している。 In this paper, we introduce the Temporal Audio Source Counting Network (TaCNet), an innovative architecture that addresses limitations in audio source counting tasks. TaCNet operates directly on raw audio inputs, eliminating complex preprocessing steps and simplifying the workflow. Notably, it excels in real-time speaker counting, even with truncated input windows. Our extensive evaluation, conducted using the LibriCount dataset, underscores TaCNet's exceptional performance, positioning it as a state-of-the-art solution for audio source counting tasks. With an average accuracy of 74.18 percentage over 11 classes, TaCNet demonstrates its effectiveness across diverse scenarios, including applications involving Chinese and Persian languages. This cross-lingual adaptability highlights its versatility and potential impact.	翻訳日:2023-11-07 17:54:39 公開日:2023-11-04
# 量子通信 Quantum Communications ( http://arxiv.org/abs/2311.02367v1 ) ライセンス: Link先を確認	Michal Hajdu\v{s}ek and Rodney Van Meter	(参考訳) 第2次量子革命は、この10年で勢いを増している。量子技術は政府、民間企業、投資家、そして公共から注目を集め始めている。情報処理とコミュニケーションのために個々の量子システムを制御する能力は、もはや理論上の夢ではないが、世界中の研究所やスタートアップで着実に日常化しつつある。これにより、量子エンジニアの次世代を教育する必要性がもたらされる。この教科書は、Quantum Academy of Science and Technologyとして知られるQ-Leap Educationプロジェクトにおける、量子コミュニケーションの概要に関するビデオ講義の仲間である。量子ネットワークへの温和な導入であり、様々な背景を持つ大学生の教科書としての使用に適している。量子物理学や量子情報の事前知識は想定されていない。各章にエクササイズが含まれている。 The second quantum revolution has been picking up momentum over the last decade. Quantum technologies are starting to attract more attention from governments, private companies, investors, and public. The ability to control individual quantum systems for the purpose of information processing and communication is no longer a theoretical dream, but is steadily becoming routine in laboratories and startups around the world. With this comes the need to educate the future generation of quantum engineers. This textbook is a companion to our video lectures on Overview of Quantum Communications from the Q-Leap Education project known as Quantum Academy of Science and Technology. It is a gentle introduction to quantum networks, and is suitable for use as a textbook for undergraduate students of diverse background. No prior knowledge of quantum physics or quantum information is assumed. Exercises are included in each chapter.	翻訳日:2023-11-07 17:54:26 公開日:2023-11-04
# 2つの純粋状態の最適識別とドリナー型コヒーレント状態検出 Optimal Discrimination Between Two Pure States and Dolinar-Type Coherent-State Detection ( http://arxiv.org/abs/2311.02366v1 ) ライセンス: Link先を確認	Itamar Katz, Alex Samorodnitsky and Yuval Kochman	(参考訳) 我々は2つの純粋量子状態の識別の問題を考える。誤差確率とログロスの基準の両方の下での最適測定が投影であることはよく知られているが、'reerasure-distortion'' の基準の下では、3-outcome positive operator-valued measure (povm) である。これらの結果は別々に導かれた。 Bhattacharyya 距離に対する凸関係を満たす任意の歪み測度の下で最適な測定値を求める統一的なアプローチを提案する。すなわち、測度が相対凸 (resp. concave) であれば、その測度は上記の射影 (resp. three-outcome POVM) である。上記の3つの結果は、この単純な導出の特別な場合として得られる。結果が適用されるさらなる測度については、位数 1 以上の Renyi エントロピー(参照: $1/2$ 以下)が相対凸(参照: concave)であることを証明する。実用的関心の特別な設定は、2つのコヒーレント光波形の識別である。ドリナーによる顕著な研究で、光子カウンタとフィードバック制御された局所発振器からなる単純な検出器が量子最適誤差確率を得ることを示した。後に、同じ検出器(同じ局所信号を持つ)もログロスの意味で最適であることが示される。同様の凸性アプローチを適用することで、様々な基準に対して最適な信号が統一的に得られる。 We consider the problem of discrimination between two pure quantum states. It is well known that the optimal measurement under both the error-probability and log-loss criteria is a projection, while under an ``erasure-distortion'' criterion it is a three-outcome positive operator-valued measure (POVM). These results were derived separately. We present a unified approach which finds the optimal measurement under any distortion measure that satisfies a convexity relation with respect to the Bhattacharyya distance. Namely, whenever the measure is relatively convex (resp. concave), the measurement is the projection (resp. three-outcome POVM) above. The three above-mentioned results are obtained as special cases of this simple derivation. As for further measures for which our result applies, we prove that Renyi entropies of order $1$ and above (resp. $1/2$ and below) are relatively convex (resp. concave). A special setting of great practical interest, is the discrimination between two coherent-light waveforms. In a remarkable work by Dolinar it was shown that a simple detector consisting of a photon counter and a feedback-controlled local oscillator obtains the quantum-optimal error probability. Later it was shown that the same detector (with the same local signal) is also optimal in the log-loss sense. By applying a similar convexity approach, we obtain in a unified manner the optimal signal for a variety of criteria.	翻訳日:2023-11-07 17:54:15 公開日:2023-11-04
# 画像超解像における潜時空間(DTLS)の領域移動-非分解モデル Domain Transfer in Latent Space (DTLS) Wins on Image Super-Resolution -- a Non-Denoising Model ( http://arxiv.org/abs/2311.02358v1 ) ライセンス: Link先を確認	Chun-Chuen Hui, Wan-Chi Siu, Ngai-Fong Law	(参考訳) 大規模な画像スーパーレゾリューションはコンピュータビジョンの課題であり、例えばforscale x16スーパーレゾリューションのような高度に劣化した画像には膨大な情報が欠落している。拡散モデルは近年、超高分解能な応用において成功しており、ガウスノイズは潜在光写実空間を形成する手段として使われ、潜光写実空間と潜光写実空間の間のリンクとして機能する。拡散モデルを成功させるガウス雑音の統計のマッピングには、かなり洗練された数学的導出がある。本稿では,ガウス雑音を回避しつつ,画像の高分解能化に拡散モデルの基本構造を応用した簡易な手法を提案する。基本的には,統計的性質の違いを学習し,適度な品質の結果として段階的な補間を容易にする,隣接領域間のドメイン転送を行うdnnを提案する。入力LR画像を参照してドメイン転送を条件付けすることにより、さらなる品質向上を実現する。実験結果から,本手法は最先端の大規模超解像モデルだけでなく,画像超解像に対する現在の拡散モデルよりも優れていた。このアプローチは、画像の啓蒙、塗装、装飾など、他のイメージ・ツー・イメージタスクに容易に拡張できる。 Large scale image super-resolution is a challenging computer vision task, since vast information is missing in a highly degraded image, say for example forscale x16 super-resolution. Diffusion models are used successfully in recent years in extreme super-resolution applications, in which Gaussian noise is used as a means to form a latent photo-realistic space, and acts as a link between the space of latent vectors and the latent photo-realistic space. There are quite a few sophisticated mathematical derivations on mapping the statistics of Gaussian noises making Diffusion Models successful. In this paper we propose a simple approach which gets away from using Gaussian noise but adopts some basic structures of diffusion models for efficient image super-resolution. Essentially, we propose a DNN to perform domain transfer between neighbor domains, which can learn the differences in statistical properties to facilitate gradual interpolation with results of reasonable quality. Further quality improvement is achieved by conditioning the domain transfer with reference to the input LR image. Experimental results show that our method outperforms not only state-of-the-art large scale super resolution models, but also the current diffusion models for image super-resolution. The approach can readily be extended to other image-to-image tasks, such as image enlightening, inpainting, denoising, etc.	翻訳日:2023-11-07 17:53:52 公開日:2023-11-04
# 教師付き分類のためのネットワーク上の量子輸送 Quantum transport on networks for supervised classification ( http://arxiv.org/abs/2311.02442v1 ) ライセンス: Link先を確認	Shmuel Lorber, Oded Zimron, Inbal Lorena Zak, Anat Milo and Yonatan Dubi	(参考訳) 入力を既存のクラスに分類する計算プロセスである分類は、機械学習の時代における現代の計算の基礎となっている。本稿では、トレーニングされた量子ネットワークにおける粒子の量子輸送に基づく新しいタイプの量子分類器を提案する。分類器は、量子粒子をネットワークに送信し、粒子の出口点を測定することに基づいており、これは「クラス」として機能し、ネットワークパラメータを変更することで決定される。このスキームを用いて、分類の例を3つ示す: まず、波動関数は、所定の(ランダムな)群との重なりに従って分類される。第二に、波動関数はその位置化のレベルに応じて分類する。どちらの例も小さなトレーニングセットを使用し、90%以上の精度とリコールを達成している。第3の分類は、その反応性に応じて触媒芳香族アルデヒド基質の分類に関する「現実世界問題」である。実験データを用いて、量子分類器は平均86\%の分類精度に達する。量子分類器はこれらの例では古典的よりも優れており、特に「小さなデータ」の体系において量子上の優位性を示す。これらの結果は、アルゴリズムとして実装でき、フォトニックネットワークのような量子ハードウェア上で実験的に実現できる新しい分類法への道を開いた。 Classification, the computational process of categorizing an input into pre-existing classes, is now a cornerstone in modern computation in the era of machine learning. Here we propose a new type of quantum classifier, based on quantum transport of particles in a trained quantum network. The classifier is based on sending a quantum particle into a network and measuring the particle's exit point, which serves as a "class" and can be determined by changing the network parameters. Using this scheme, we demonstrate three examples of classification; in the first, wave functions are classified according to their overlap with predetermined (random) groups. In the second, we classify wave-functions according to their level of localization. Both examples use small training sets and achieve over 90\% precision and recall. The third classification scheme is a "real-world problem", concerning classification of catalytic aromatic-aldehyde substrates according to their reactivity. Using experimental data, the quantum classifier reaches an average 86\% classification accuracy. We show that the quantum classifier outperforms its classical counterpart for these examples, thus demonstrating quantum advantage, especially in the regime of "small data". These results pave the way for a novel classification scheme, which can be implemented as an algorithm, and potentially realized experimentally on quantum hardware such as photonic networks.	翻訳日:2023-11-07 17:46:12 公開日:2023-11-04
# ChatGPTはソフトウェア検証をサポートできるか? Can ChatGPT support software verification? ( http://arxiv.org/abs/2311.02433v1 ) ライセンス: Link先を確認	Christian Jan{\ss}en, Cedric Richter, Heike Wehrheim	(参考訳) 大規模な言語モデルは,コード生成やデバッグ,修復といったソフトウェアエンジニアリングタスクにおいて,ますます効果的になっています。 chatgptのような言語モデルはコードを生成するだけでなく、内部動作や特に正確性を説明することができる。これにより、ChatGPTを使って正式なソフトウェア検証をサポートできるかという疑問が持ち上がる。本稿では,この質問に答える第一歩を踏み出す。具体的には,ChatGPTがループ不変量を生成できるかどうかを検討する。ループ不変量生成はソフトウェア検証における中核的なタスクであり、有効な不変量の生成は形式的検証に役立つ可能性が高い。この仮説に関する最初の証拠を与えるため、ChatGPT にループ不変量を持つ 106 C プログラムにアノテートを依頼する。 frama-c と cpachecker の2つの検証器に渡して生成した不変量の有効性と有用性を確認した。評価の結果,ChatGPTはFrama-Cがこれまで解決できなかったタスクを検証できる有効かつ有用な不変量を生成することができることがわかった。最初の知見に基づいて,ChatGPT(あるいは大規模言語モデル)とソフトウェア検証器を組み合わせる方法を提案し,現状の限界とオープンな問題について議論する。 Large language models have become increasingly effective in software engineering tasks such as code generation, debugging and repair. Language models like ChatGPT can not only generate code, but also explain its inner workings and in particular its correctness. This raises the question whether we can utilize ChatGPT to support formal software verification. In this paper, we take some first steps towards answering this question. More specifically, we investigate whether ChatGPT can generate loop invariants. Loop invariant generation is a core task in software verification, and the generation of valid and useful invariants would likely help formal verifiers. To provide some first evidence on this hypothesis, we ask ChatGPT to annotate 106 C programs with loop invariants. We check validity and usefulness of the generated invariants by passing them to two verifiers, Frama-C and CPAchecker. Our evaluation shows that ChatGPT is able to produce valid and useful invariants allowing Frama-C to verify tasks that it could not solve before. Based on our initial insights, we propose ways of combining ChatGPT (or large language models in general) and software verifiers, and discuss current limitations and open issues.	翻訳日:2023-11-07 17:45:53 公開日:2023-11-04
# P-Age:ロバストな時空間年齢分類のためのピクセルデータセット P-Age: Pexels Dataset for Robust Spatio-Temporal Apparent Age Classification ( http://arxiv.org/abs/2311.02432v1 ) ライセンス: Link先を確認	Abid Ali and Ashish Marisetty and Francois Bremond	(参考訳) 年齢推定は、多くのアプリケーションを持つ難しいタスクである。本稿では, 咬合や低分解能, 照明条件などの課題に対処するために, ビデオベースモデルを用いた年齢分類の新たな方向性を提案する。これらの課題に対処するために,年齢分類において顔に基づく方法が支配される全身の動態の時空間情報を利用する AgeFormer を提案する。提案する2ストリームアーキテクチャは,timeformer と efficientnet をバックボーンとして使用し,顔と身体のダイナミクス情報の両方を効果的にキャプチャし,効率的な年齢推定を行う。さらに,映像からの年齢予測のギャップを埋めるため,年齢分類のためのPexels Age(P-Age)というビデオデータセットを構築した。提案手法は, 既存の顔年齢推定法と比較して優れた結果を得ることができ, 顔の遮蔽, ぼやけた, マスクを施した状況で評価できる。また、Charades、Smarthome、Thumos-14など、さまざまな挑戦的なビデオデータセット上でクロステストされている。 Age estimation is a challenging task that has numerous applications. In this paper, we propose a new direction for age classification that utilizes a video-based model to address challenges such as occlusions, low-resolution, and lighting conditions. To address these challenges, we propose AgeFormer which utilizes spatio-temporal information on the dynamics of the entire body dominating face-based methods for age classification. Our novel two-stream architecture uses TimeSformer and EfficientNet as backbones, to effectively capture both facial and body dynamics information for efficient and accurate age estimation in videos. Furthermore, to fill the gap in predicting age in real-world situations from videos, we construct a video dataset called Pexels Age (P-Age) for age classification. The proposed method achieves superior results compared to existing face-based age estimation methods and is evaluated in situations where the face is highly occluded, blurred, or masked. The method is also cross-tested on a variety of challenging video datasets such as Charades, Smarthome, and Thumos-14.	翻訳日:2023-11-07 17:45:35 公開日:2023-11-04
# 連続学習のためのloraを用いたタスク演算 Task Arithmetic with LoRA for Continual Learning ( http://arxiv.org/abs/2311.02428v1 ) ライセンス: Link先を確認	Rajas Chitale, Ankit Vaidya, Aditya Kane, Archana Ghotkar	(参考訳) 連続学習は、トレーニングデータが連続的なチャンクで利用可能である問題を「タスク」と呼ぶ。連続学習の進歩の大部分は、データのストリーム上でモデルを逐次訓練することによる破滅的な忘れ込みの問題によって妨げられている。さらに、大規模モデルを複数回連続的にトレーニングする計算コストも高くなる。両問題を同時に緩和するために,低ランク適応とタスク演算を用いたトランスフォーマーベース視覚モデルを継続的に学習する手法を提案する。本手法は,各タスクにおける学習モデルの計算要求を減らし,破滅的忘れの問題を完全に回避する。クラス毎に10個のサンプルを小さなメモリで支援すると,本手法はフルセットファインタニングに近い性能が得られる。本手法の長所を支援するために厳格なアブレーションを行った。 Continual learning refers to the problem where the training data is available in sequential chunks, termed "tasks". The majority of progress in continual learning has been stunted by the problem of catastrophic forgetting, which is caused by sequential training of the model on streams of data. Moreover, it becomes computationally expensive to sequentially train large models multiple times. To mitigate both of these problems at once, we propose a novel method to continually train transformer-based vision models using low-rank adaptation and task arithmetic. Our method completely bypasses the problem of catastrophic forgetting, as well as reducing the computational requirement for training models on each task. When aided with a small memory of 10 samples per class, our method achieves performance close to full-set finetuning. We present rigorous ablations to support the prowess of our method.	翻訳日:2023-11-07 17:45:15 公開日:2023-11-04
# オンライン長期制約最適化 Online Long-run Constrained Optimization ( http://arxiv.org/abs/2311.02426v1 ) ライセンス: Link先を確認	Shijie Pan and Wenjie Huang	(参考訳) 本稿では,目的と制約が必ずしも凸であるとは限らないオンライン方式の長期的制約付き最適化問題を解くために,新しい追従型アルゴリズムを提案し,解析する。各期間において、ランダムな線形摂動と強い凹凸摂動をそれぞれ、オフラインのオラクルにプリマル方向とデュアル方向に組み入れ、グローバルミニマックス点を解として探索する。期待される2つの静的累積後悔の定義に基づいて、この問題のクラスに対する最初のサブ線形$O(T^{8/9})$後悔の複雑さを導き出す。提案アルゴリズムは,長期(リスク)制約のある河川汚染源同定問題に対処し,理論結果の有効性を実証し,既存手法と比較して優れた性能を示す。 In this paper, a novel Follow-the-Perturbed-Leader type algorithm is proposed and analyzed for solving general long-term constrained optimization problems in online manner, where the objective and constraints are not necessarily convex. In each period, random linear perturbation and strongly concave perturbation are incorporated in primal and dual directions, respectively, to the offline oracle, and a global minimax point is searched as solution. Based on two particular definitions of expected static cumulative regret, we derive the first sublinear $O(T^{8/9})$ regret complexity for this class of problems. The proposed algorithm is applied to tackle a long-term (risk) constrained river pollutant source identification problem, demonstrating the validity of the theoretical results and exhibiting superior performance compared to existing method.	翻訳日:2023-11-07 17:45:02 公開日:2023-11-04
# 二次駆動を持つ量子電池 A quantum battery with quadratic driving ( http://arxiv.org/abs/2311.02424v1 ) ライセンス: Link先を確認	C. A. Downing and M. S. Ukhtary	(参考訳) 量子バッテリ(quantum battery)は、量子力学オブジェクトを使用して構築されたエネルギー貯蔵デバイスであり、古典的バッテリを上回ることを目的として開発された。量子優位性を利用する量子電池の最適設計を提供するには、高速充電、耐久性ストレージ、効率的な作業抽出のための競合する要求のバランスをとる必要がある。ここでは,エネルギーホルダに接続された駆動型チャージャーからなる2成分量子バッテリモデルについて,線形駆動と二次駆動の2つのパラダイムケースで理論的に検討する。リニアバッテリは、バッテリーの応答を2つのレジームに分割する単一の例外点によって制御される。二次駆動はスクイーズド量子電池につながり、散逸相転移に関連する臨界点付近で多くの有用な仕事を生み出す。我々の理論的結果は、パラメトリックキャビティや非線形回路によって実現され、スクイーズを示す量子電池の出現に繋がる可能性がある。 Quantum batteries are energy storage devices built using quantum mechanical objects, which are developed with the aim of outperforming their classical counterparts. Proposing optimal designs of quantum batteries which are able to exploit quantum advantages requires balancing the competing demands for fast charging, durable storage and effective work extraction. Here we study theoretically a bipartite quantum battery model, composed of a driven charger connected to an energy holder, within two paradigmatic cases of a driven-dissipative open quantum system: linear driving and quadratic driving. The linear battery is governed by a single exceptional point which splits the response of the battery into two regimes, one of which induces a good amount of useful work. Quadratic driving leads to a squeezed quantum battery, which generates plentiful useful work near to critical points associated with dissipative phase transitions. Our theoretical results may be realized with parametric cavities or nonlinear circuits, potentially leading to the manifestation of a quantum battery exhibiting squeezing.	翻訳日:2023-11-07 17:44:47 公開日:2023-11-04
# 量子ゲームにおける行列乗法重みを用いたペイオフ学習 Payoff-based learning with matrix multiplicative weights in quantum games ( http://arxiv.org/abs/2311.02423v1 ) ライセンス: Link先を確認	Kyriakos Lotidis and Panayotis Mertikopoulos and Nicholas Bambos and Jose Blanchet	(参考訳) 本稿では,量子ゲーム(および半定値ゲーム)における学習の問題について,スカラー,ペイオフに基づくフィードバックを用いて検討する。具体的には、広く使われている行列乗算重み (MMW) アルゴリズムに焦点をあて、プレイヤーにゲーム(および/またはそれぞれの選択した状態)の完全な知識を求める代わりに、異なる情報フレームワークに合わせた最小情報行列乗算重み (3MW) 法一式を導入する。この設定において収束を達成するのが難しいのは、古典的な有限ゲームとは対照的に、量子ゲームは純粋状態(純粋戦略の量子的等価性)の無限連続体を持つため、ペイオフベクトルを推定するための標準的な重要性重み付け技術は適用できないことである。その代わり、バンディット凸最適化のアイデアを借用し、問題の半定義幾何学に適応したゼロ次勾配サンプラーを設計する。最初の結果として,決定論的ペイオフフィードバックを持つ3MW法は,プレイヤーが1つのスカラーのみを観測したとしても,バニラの収束率$\mathcal{O}(1/\sqrt{T})$であり,量子ミニマックスゲームにおける完全情報MMWアルゴリズムであることを示す。その後、アルゴリズムの情報要求をさらに緩和し、3MW法を提供し、プレイヤーは彼らのペイオフ観測可能なランダムな実現を観測するだけで、$\mathcal{O}(T^{-1/4})$レートで平衡に収束する。最後に、ゼロサムゲームを超えて、提案した3MW法の正規化変種が、ある一階安定性条件を満たす全ての平衡に対して高い確率で局所収束を保証することを示す。 In this paper, we study the problem of learning in quantum games - and other classes of semidefinite games - with scalar, payoff-based feedback. For concreteness, we focus on the widely used matrix multiplicative weights (MMW) algorithm and, instead of requiring players to have full knowledge of the game (and/or each other's chosen states), we introduce a suite of minimal-information matrix multiplicative weights (3MW) methods tailored to different information frameworks. The main difficulty to attaining convergence in this setting is that, in contrast to classical finite games, quantum games have an infinite continuum of pure states (the quantum equivalent of pure strategies), so standard importance-weighting techniques for estimating payoff vectors cannot be employed. Instead, we borrow ideas from bandit convex optimization and we design a zeroth-order gradient sampler adapted to the semidefinite geometry of the problem at hand. As a first result, we show that the 3MW method with deterministic payoff feedback retains the $\mathcal{O}(1/\sqrt{T})$ convergence rate of the vanilla, full information MMW algorithm in quantum min-max games, even though the players only observe a single scalar. Subsequently, we relax the algorithm's information requirements even further and we provide a 3MW method that only requires players to observe a random realization of their payoff observable, and converges to equilibrium at an $\mathcal{O}(T^{-1/4})$ rate. Finally, going beyond zero-sum games, we show that a regularized variant of the proposed 3MW method guarantees local convergence with high probability to all equilibria that satisfy a certain first-order stability condition.	翻訳日:2023-11-07 17:44:30 公開日:2023-11-04
# 量子ウォークを用いたハイブリッド絡み合い状態の決定論的生成 Deterministic generation of hybrid entangled states using quantum walks ( http://arxiv.org/abs/2311.02419v1 ) ライセンス: Link先を確認	Jaskaran Singh, Vikash Mittal	(参考訳) 近年、量子ビットをコヒーレント状態と絡むハイブリッド絡み合い(he)が様々な量子情報処理タスク、特に量子鍵分布(arxiv:2305.18906 (2023))において優れた性能を示している。理論上の利点にもかかわらず、実験室でのこれらの状態の実際的な生成は困難であった。この文脈では、量子ウォークを用いてhe状態を生成する決定論的かつ効率的な手法を導入する。本手法は, 1次元分割ステップ量子ウォークにおいて, わずか20ステップで99.90 %の顕著な忠実度を実現する。これは、HE状態が確率的にのみ得られ、しばしば80%以下の忠実度を持つ以前のアプローチよりも顕著な改善である。我々のスキームはHE状態の生成に対する堅牢な解を提供するだけでなく、量子ウォークの独特な優位性を強調し、この急成長する分野の発展に寄与する。さらに,本手法は現在の技術で実験的に実現可能である。 Recently, hybrid entanglement (HE), which involves entangling a qubit with a coherent state, has demonstrated superior performance in various quantum information processing tasks, particularly in quantum key distribution [arXiv:2305.18906 (2023)]. Despite its theoretical advantages, the practical generation of these states in the laboratory has been a challenge. In this context, we introduce a deterministic and efficient approach for generating HE states using quantum walks. Our method achieves a remarkable fidelity of 99.90 % with just 20 time steps in a one-dimensional split-step quantum walk. This represents a significant improvement over prior approaches that yielded HE states only probabilistically, often with fidelities as low as 80 %. Our scheme not only provides a robust solution to the generation of HE states but also highlights a unique advantage of quantum walks, thereby contributing to the advancement of this burgeoning field. Moreover, our scheme is experimentally feasible with the current technology.	翻訳日:2023-11-07 17:43:55 公開日:2023-11-04
# P2O-Calib:点対空間閉塞関係を用いたカメラ-LiDAR校正 P2O-Calib: Camera-LiDAR Calibration Using Point-Pair Spatial Occlusion Relationship ( http://arxiv.org/abs/2311.02413v1 ) ライセンス: Link先を確認	Su Wang, Shini Zhang, Xuchong Qiu	(参考訳) センサの精度とロバストな校正結果は,自律走行・ロボット分野におけるフォローアップ研究の重要な構成要素であると考えられる。現在の3D LiDARとモノクルカメラの外部校正は、主にターゲットベースとターゲットレスの手法に焦点を当てている。ターゲットベースの手法はしばしば、追加のターゲット設計やターゲット配置制限などの制約のためにオフラインで使用される。現在のターゲットレスメソッドは、さまざまな環境で特徴的不確定性と特徴的ミスマッチに苦しむ。これらの制約を緩和するために, 3次元空間における閉塞関係を用いた2D-3Dエッジポイント抽出に基づく, ターゲットレスキャリブレーション手法を提案する。さらに,抽出した2D-3D点対に基づいて,校正精度を改善し,計算コストを削減するオクルージョン誘導点マッチング法を提案する。提案手法の有効性を検証するため,KITTIデータセットの実際の画像に対して定性的かつ定量的に評価を行った。その結果,本手法は既存のターゲットレス手法よりも優れており,低誤差・高ロバスト性を実現し,高品質カメラライダーキャリブレーションを応用できることを示す。 The accurate and robust calibration result of sensors is considered as an important building block to the follow-up research in the autonomous driving and robotics domain. The current works involving extrinsic calibration between 3D LiDARs and monocular cameras mainly focus on target-based and target-less methods. The target-based methods are often utilized offline because of restrictions, such as additional target design and target placement limits. The current target-less methods suffer from feature indeterminacy and feature mismatching in various environments. To alleviate these limitations, we propose a novel target-less calibration approach which is based on the 2D-3D edge point extraction using the occlusion relationship in 3D space. Based on the extracted 2D-3D point pairs, we further propose an occlusion-guided point-matching method that improves the calibration accuracy and reduces computation costs. To validate the effectiveness of our approach, we evaluate the method performance qualitatively and quantitatively on real images from the KITTI dataset. The results demonstrate that our method outperforms the existing target-less methods and achieves low error and high robustness that can contribute to the practical applications relying on high-quality Camera-LiDAR calibration.	翻訳日:2023-11-07 17:43:38 公開日:2023-11-04
# 科学論文の臨場感による要約 Citance-Contextualized Summarization of Scientific Papers ( http://arxiv.org/abs/2311.02408v1 ) ライセンス: Link先を確認	Shahbaz Syed, Ahmad Dawar Hakimi, Khalid Al-Khatib, Martin Potthast	(参考訳) 科学論文の自動要約への最近のアプローチは、抽象的な形で情報的な要約を生成する。しかし、要約は論文と引用された参考文献の関係を示すものではない。本稿では,参照の引用(いわゆる'citance'')を含む与えられた文に条件付けされた情報的要約を生成する新しい文脈的要約手法を提案する。この要約では引用位置に関連する引用論文の内容について概説する。そこで,本稿では,論文のクタンスを抽出・モデル化し,引用論文から関連する節を抽出し,各クタンスに合わせた要約要約を生成する。我々は,540Kのコンピュータ科学論文と4.6Mのアクセントを含む新しいデータセットである$\textbf{Webis-Context-SciSumm-2023}$を用いて,我々のアプローチを評価する。 Current approaches to automatic summarization of scientific papers generate informative summaries in the form of abstracts. However, abstracts are not intended to show the relationship between a paper and the references cited in it. We propose a new contextualized summarization approach that can generate an informative summary conditioned on a given sentence containing the citation of a reference (a so-called ``citance''). This summary outlines the content of the cited paper relevant to the citation location. Thus, our approach extracts and models the citances of a paper, retrieves relevant passages from cited papers, and generates abstractive summaries tailored to each citance. We evaluate our approach using $\textbf{Webis-Context-SciSumm-2023}$, a new dataset containing 540K~computer science papers and 4.6M~citances therein.	翻訳日:2023-11-07 17:43:21 公開日:2023-11-04
# ゲームにおける正規化学習における動的・戦略的安定性の等価性 The equivalence of dynamic and strategic stability under regularized learning in games ( http://arxiv.org/abs/2311.02407v1 ) ライセンス: Link先を確認	Victor Boone and Panayotis Mertikopoulos	(参考訳) 本稿では,有限ゲームにおける正規化非回帰学習の長期実行行動について検討する。フィールドでのよく知られた結果は、ノンレグレットプレイの実証的な頻度がゲームの粗い相関均衡に収束することを示しているが、プレイヤーの実際の戦略が時間とともにどのように進化するかに対する我々の理解は、より限定的であり、多くの場合、存在しない。この問題は、厳密なナッシュ均衡のみが安定し、正規化学習の下で引き寄せられることを示し、学習とポイントワイズ・ソリューションの概念との関係を特に解明することによってさらに悪化する。これの代わりに、我々はより一般的なアプローチをとり、プレイヤーの日々のプレーの「emph{setwise}」合理性特性を特徴付けようとしている。この目的を達成するために,我々は,集合からの一方的な逸脱が,よりよい応答(club)の下での閉性(closeness)と呼ばれる特性であるデビエータ(deviator)のコストを伴うという,集合的な戦略的安定性の最も厳密な基準の1つに焦点を当てている。純粋な戦略の製品は、そのスパンが安定していて、正規化学習の下で引き寄せられる場合に限り、より良い応答の下で閉じられる。さらに、そのような集合への収束率を推定し、エントロピー正則化に基づく手法(指数重み付けアルゴリズムなど)が幾何的な速度で収束するのに対し、射影に基づく手法は、帯域幅、ペイオフベースのフィードバックであっても有限個の反復に収束することを示す。 In this paper, we examine the long-run behavior of regularized, no-regret learning in finite games. A well-known result in the field states that the empirical frequencies of no-regret play converge to the game's set of coarse correlated equilibria; however, our understanding of how the players' actual strategies evolve over time is much more limited - and, in many cases, non-existent. This issue is exacerbated further by a series of recent results showing that only strict Nash equilibria are stable and attracting under regularized learning, thus making the relation between learning and pointwise solution concepts particularly elusive. In lieu of this, we take a more general approach and instead seek to characterize the \emph{setwise} rationality properties of the players' day-to-day play. To that end, we focus on one of the most stringent criteria of setwise strategic stability, namely that any unilateral deviation from the set in question incurs a cost to the deviator - a property known as closedness under better replies (club). In so doing, we obtain a far-reaching equivalence between strategic and dynamic stability: a product of pure strategies is closed under better replies if and only if its span is stable and attracting under regularized learning. In addition, we estimate the rate of convergence to such sets, and we show that methods based on entropic regularization (like the exponential weights algorithm) converge at a geometric rate, while projection-based methods converge within a finite number of iterations, even with bandit, payoff-based feedback.	翻訳日:2023-11-07 17:43:05 公開日:2023-11-04
# 肝ステアトーシス診断のためのハイブリッド量子画像分類と連合学習 Hybrid quantum image classification and federated learning for hepatic steatosis diagnosis ( http://arxiv.org/abs/2311.02402v1 ) ライセンス: Link先を確認	Luca Lusnig, Asel Sagingalieva, Mikhail Surmach, Tatjana Protasevich, Ovidiu Michiu, Joseph McLoughlin, Christopher Mansell, Graziano de' Petris, Deborah Bonazza, Fabrizio Zanconati, Alexey Melnikov, and Fabio Cavalli	(参考訳) 深層学習技術によって成熟することで、臨床画像の日常的解釈を支援するインテリジェントなシステムは非常に重要な役割を果たすことができる。さらに、ディープラーニングに適用される量子技術は、このパフォーマンスを向上し、連合学習技術は、異なる参加者間のプライバシーフレンドリな協調学習を実現し、機密データの使用によるプライバシ問題を解決し、個々の参加者に対して収集すべきデータ数を減らすことができる。本研究では,非アルコール性肝ステアトーシスの定量化に使用可能なハイブリッド量子ニューラルネットワークを提案するとともに,従来型深層学習法に基づく連合学習アプローチを提案する。 5つの量子ビットと100以上の変分ゲートからなるハイブリッド量子resnetモデルであるハイブリッド量子ニューラルネットワークの肝ステアトーシス画像分類精度は97%に達し、これは従来のresnetよりも1.8%高い。重要なのは、データセットを減らしたとしても、私たちのハイブリッドアプローチは従来のアプローチよりもずっと優れており、より優れた一般化と医療応用における過度な適合の可能性を示していることです。さらに、複数のクライアントによるフェデレートされたアプローチは、精度は低いが90%以上であるにもかかわらず、最大32まで、各参加者に対して非常に小さなデータセット、すなわち最大30分の1まで使用することができる。実語臨床データに基づく研究は,スケーラブルで協調的な出発点と見なすことができ,臨床病理学者の日常的な診断作業を容易にする効果的で信頼性の高いコンピュータ支援システムの必要性を満たすことができる。 With the maturity achieved by deep learning techniques, intelligent systems that can assist physicians in the daily interpretation of clinical images can play a very important role. In addition, quantum techniques applied to deep learning can enhance this performance, and federated learning techniques can realize privacy-friendly collaborative learning among different participants, solving privacy issues due to the use of sensitive data and reducing the number of data to be collected for each individual participant. We present in this study a hybrid quantum neural network that can be used to quantify non-alcoholic liver steatosis and could be useful in the diagnostic process to determine a liver's suitability for transplantation; at the same time, we propose a federated learning approach based on a classical deep learning solution to solve the same problem, but using a reduced data set in each part. The liver steatosis image classification accuracy of the hybrid quantum neural network, the hybrid quantum ResNet model, consisted of 5 qubits and more than 100 variational gates, reaches 97%, which is 1.8% higher than its classical counterpart, ResNet. Crucially, that even with a reduced dataset, our hybrid approach consistently outperformed its classical counterpart, indicating superior generalization and less potential for overfitting in medical applications. In addition, a federated approach with multiple clients, up to 32, despite the lower accuracy, but still higher than 90%, would allow using, for each participant, a very small dataset, i.e., up to one-thirtieth. Our work, based over real-word clinical data can be regarded as a scalable and collaborative starting point, could thus fulfill the need for an effective and reliable computer-assisted system that facilitates the daily diagnostic work of the clinical pathologist.	翻訳日:2023-11-07 17:42:32 公開日:2023-11-04
# BarcodeBERT:生物多様性分析用トランス BarcodeBERT: Transformers for Biodiversity Analysis ( http://arxiv.org/abs/2311.02401v1 ) ライセンス: Link先を確認	Pablo Millan Arias and Niousha Sadjadi and Monireh Safari and ZeMing Gong and Austin T. Wang and Scott C. Lowe and Joakim Bruslund Haurum and Iuliia Zarubiieva and Dirk Steinke and Lila Kari and Angel X. Chang and Graham W. Taylor	(参考訳) 生物多様性を理解することはグローバルな課題であり、DNAのバーコードショート断片が種によってクラスター化され、重要な役割を果たす。特に、非常に多様で未調査の群である無脊椎動物は、独特の分類学的複合体を呈する。我々は、教師付きCNN、微調整された基礎モデル、複雑度の異なるデータセット間でのDNAバーコード固有のマスキング戦略など、機械学習アプローチについて検討する。単純なデータセットやタスクは教師付きcnnや微調整されたトランスフォーマーを好むが、種レベルでの識別には、自己教師付き事前トレーニングへのパラダイムシフトが必要である。本稿では, 1.5Mの無脊椎動物DNAバーコード参照ライブラリを利用した, 生物多様性解析のための初の自己管理手法BarcodeBERTを提案する。この研究は、データセットの特定とカバレッジがモデル選択にどのように影響するかを強調し、種と属レベルでの高精度なDNAバーコードに基づく識別を達成する上で、自己教師付き事前訓練の役割を強調している。実際、細調整のステップなしで、大規模なDNAバーコードデータセットで事前訓練されたBarcodeBERTは、複数の下流分類タスクでDNABERTとDNABERT-2を上回っている。コードリポジトリはhttps://github.com/Kari-Genomics-Lab/BarcodeBERTで公開されている。 Understanding biodiversity is a global challenge, in which DNA barcodes - short snippets of DNA that cluster by species - play a pivotal role. In particular, invertebrates, a highly diverse and under-explored group, pose unique taxonomic complexities. We explore machine learning approaches, comparing supervised CNNs, fine-tuned foundation models, and a DNA barcode-specific masking strategy across datasets of varying complexity. While simpler datasets and tasks favor supervised CNNs or fine-tuned transformers, challenging species-level identification demands a paradigm shift towards self-supervised pretraining. We propose BarcodeBERT, the first self-supervised method for general biodiversity analysis, leveraging a 1.5 M invertebrate DNA barcode reference library. This work highlights how dataset specifics and coverage impact model selection, and underscores the role of self-supervised pretraining in achieving high-accuracy DNA barcode-based identification at the species and genus level. Indeed, without the fine-tuning step, BarcodeBERT pretrained on a large DNA barcode dataset outperforms DNABERT and DNABERT-2 on multiple downstream classification tasks. The code repository is available at https://github.com/Kari-Genomics-Lab/BarcodeBERT	翻訳日:2023-11-07 17:42:02 公開日:2023-11-04
# プレートから生産へ:現代の消費者駆動食品システムにおける人工知能 From Plate to Production: Artificial Intelligence in Modern Consumer-Driven Food Systems ( http://arxiv.org/abs/2311.02400v1 ) ライセンス: Link先を確認	Weiqing Min, Pengfei Zhou, Leyi Xu, Tao Liu, Tianhao Li, Mingyu Huang, Ying Jin, Yifan Yi, Min Wen, Shuqiang Jiang, Ramesh Jain	(参考訳) 世界の食料システムは、需要が増大する中で持続的で栄養豊かな食事を供給するという緊急の課題に直面している。 AI(Artificial Intelligence)の出現は、個人の選択革命をもたらし、AIによる個人による決定が、食卓から農場、そして皿へと、食品システムを変革する。この文脈で、aiアルゴリズムは個人の食事選択を洗練し、その後農業生産を形作り、消費から栽培まで最適なフィードバックループを促進する。最初は、食品サプライチェーンにまたがるAIツールやテクニックを調べ、その後、AIサブフィールドが機械学習、コンピュータビジョン、音声認識をどのように通過するかを評価する。 AIFSフレームワークの注目点として、デジタル化、ビッグデータ分析、バイオテクノロジー、そしてあらゆるコンポーネントの現代の食品システムで広く使用されているIoTなど、AIとAIの融合を強調しています。このパラダイムは、伝統的な「ファーム・トゥ・フォーク」の物語を循環型「消費者主導型ファーム・トゥ・フォーク」モデルにシフトさせ、持続的で栄養豊かな食事の実現に役立てる。本稿では、食品分野におけるaiの約束と本質的な課題について考察する。厳格なAIガバナンス、均一なデータアーキテクチャ、学際的なパートナーシップを推進することによって、消費者中心の戦略と相乗化するAIは、持続可能な軌道に向けて食品システムを操る可能性を秘めている、と私たちは主張する。我々は、食品システムの多様な側面における最先端技術に関する包括的な調査を行い、その後、ギャップを特定し、創発的なai方法論の公平で効果的な展開を提唱する。 Global food systems confront the urgent challenge of supplying sustainable, nutritious diets in the face of escalating demands. The advent of Artificial Intelligence (AI) is bringing in a personal choice revolution, wherein AI-driven individual decisions transform food systems from dinner tables, to the farms, and back to our plates. In this context, AI algorithms refine personal dietary choices, subsequently shaping agricultural outputs, and promoting an optimized feedback loop from consumption to cultivation. Initially, we delve into AI tools and techniques spanning the food supply chain, and subsequently assess how AI subfields$\unicode{x2013}$encompassing machine learning, computer vision, and speech recognition$\unicode{x2013}$are harnessed within the AI-enabled Food System (AIFS) framework, which increasingly leverages Internet of Things, multimodal sensors and real-time data exchange. We spotlight the AIFS framework, emphasizing its fusion of AI with technologies such as digitalization, big data analytics, biotechnology, and IoT extensively used in modern food systems in every component. This paradigm shifts the conventional "farm to fork" narrative to a cyclical "consumer-driven farm to fork" model for better achieving sustainable, nutritious diets. This paper explores AI's promise and the intrinsic challenges it poses within the food domain. By championing stringent AI governance, uniform data architectures, and cross-disciplinary partnerships, we argue that AI, when synergized with consumer-centric strategies, holds the potential to steer food systems toward a sustainable trajectory. We furnish a comprehensive survey for the state-of-the-art in diverse facets of food systems, subsequently pinpointing gaps and advocating for the judicious and efficacious deployment of emergent AI methodologies.	翻訳日:2023-11-07 17:41:40 公開日:2023-11-04
# 高速かつ高精度な分散GNNのためのエントロピーアウェアトレーニング Entropy Aware Training for Fast and Accurate Distributed GNN ( http://arxiv.org/abs/2311.02399v1 ) ライセンス: Link先を確認	Dhruv Deshmukh (1), Gagan Raj Gupta (1), Manisha Chawla (1), Vishwesh Jatala (1), Anirban Haldar (1) ((1) Department of CSE, IIT Bhilai, India)	(参考訳) 数十億規模のグラフ上でグラフニューラルネットワーク(gnn)をスケールするために、いくつかの分散フレームワークが開発された。いくつかのベンチマークでは、これらのフレームワークが生成するグラフ分割が異種データ分散とクラス不均衡を持ち、コンバージェンスに影響し、集中型実装よりもパフォーマンスが低下することを観察した。これらの課題に積極的に対処し、トレーニング時間を短縮し、精度を向上するテクニックを開発します。我々は,全エントロピーを最小化して,マイクロ平均F1スコア(精度)を改善するためにエッジ重み分割法を開発した。さらに、各計算ホストのモデルをローカルデータ分布に適応させる非同期パーソナライズフェーズを追加します。我々は,収束をかなりスピードアップするクラスバランススプリマーを設計した。アルゴリズムをDistDGLフレームワーク上に実装し、既存のトレーニング手法よりもはるかに優れたスケーリングを実現することを観察した。トレーニング時間では2～3倍のスピードアップを達成し,標準ベースラインと比較して5つのグラフベンチマークでマイクロF1スコアの平均4倍の改善を実現した。 Several distributed frameworks have been developed to scale Graph Neural Networks (GNNs) on billion-size graphs. On several benchmarks, we observe that the graph partitions generated by these frameworks have heterogeneous data distributions and class imbalance, affecting convergence, and resulting in lower performance than centralized implementations. We holistically address these challenges and develop techniques that reduce training time and improve accuracy. We develop an Edge-Weighted partitioning technique to improve the micro average F1 score (accuracy) by minimizing the total entropy. Furthermore, we add an asynchronous personalization phase that adapts each compute-host's model to its local data distribution. We design a class-balanced sampler that considerably speeds up convergence. We implemented our algorithms on the DistDGL framework and observed that our training techniques scale much better than the existing training approach. We achieved a (2-3x) speedup in training time and 4\% improvement on average in micro-F1 scores on 5 large graph benchmarks compared to the standard baselines.	翻訳日:2023-11-07 17:41:10 公開日:2023-11-04
# 固体ネオン上の環状表面状態に基づく単一電子量子ビット Single-electron qubits based on ring-shaped surface states on solid neon ( http://arxiv.org/abs/2311.02501v1 ) ライセンス: Link先を確認	Toshiaki Kanai, Dafei Jin, and Wei Guo	(参考訳) 最近の実験では、固体ネオン表面に結合した単一電子からなる電荷量子ビットは、非常に長いコヒーレンス時間を示し、量子コンピューティングのプラットフォームとして期待できる。しかし、いくつかの観測は、電子の結合機構と量子状態と応用された電気トラップポテンシャルとの直接相関に疑問を投げかけた。本研究では,電子とネオン表面地形(バンプや谷など)との相互作用を調べるための理論的枠組みを提案する。電子によって誘導される表面電荷を評価することにより、ネオン表面への強い垂直結合を示す。電子の2次元曲面上の横運動に対するシュロディンガー方程式は、広範な地形変化のために解かれる。その結果、表面バンプは電子に自然に結合し、実験的な観測と整合する一意なリング状量子状態を形成することが明らかとなった。また,電子の励起エネルギーは磁場を用いてスムーズに調整でき,量子ビット操作が容易になることを示す。本研究は、e-neon量子ビット特性の理解を深め、量子コンピューティングアーキテクチャの進化のための設計と最適化の指針となる基礎となる。 Recent experiments demonstrate that a charge qubit consisting of a single electron bound to a solid neon surface exhibits an exceptionally long coherence time, making it a promising platform for quantum computing. However, some observations cast doubt on the direct correlation between the electron's binding mechanism and quantum states with the applied electric trapping potential. In this study, we introduce a theoretical framework to examine the electron's interactions with neon surface topography, such as bumps and valleys. By evaluating the surface charges induced by the electron, we demonstrate its strong perpendicular binding to the neon surface. The Schrodinger equation for the electron's lateral motion on the curved 2D surface is then solved for extensive topographical variations. Our results reveal that surface bumps can naturally bind an electron, forming unique ring-shaped quantum states that align with experimental observations. We also show that the electron's excitation energy can be smoothly tuned using an magnetic field to facilitate qubit operation. This study offers a leap in our understanding of e-neon qubit properties, laying the groundwork to guide its design and optimization for advancing quantum computing architectures.	翻訳日:2023-11-07 17:33:11 公開日:2023-11-04
# 大規模固定点反復に対するアンダーソン加速度の収束率の改善 Improved Convergence Rates of Anderson Acceleration for a Large Class of Fixed-Point Iterations ( http://arxiv.org/abs/2311.02490v1 ) ライセンス: Link先を確認	Casey Garner and Gilad Lerman and Teng Zhang	(参考訳) 本稿では、固定点法${x}^{(k+1)}=q({x}^{(k)})$に対するアンダーソン加速度(AA)を研究する。これは作用素 $q$ が線型で対称であるとき、AA が固定点反復よりも根線型収束係数を改善するという最初の証明である。 q$ が非線形であるにもかかわらず、解に対称ヤコビアンを持つとき、少し修正されたaaアルゴリズムは、固定点反復よりも類似のルート線形収束係数が向上することが証明される。シミュレーションは我々の観察を検証する。さらに、異なるデータモデルを用いた実験により、AAはタイラーのM推定の標準的な固定点法よりもはるかに優れていることが示された。 This paper studies Anderson acceleration (AA) for fixed-point methods ${x}^{(k+1)}=q({x}^{(k)})$. It provides the first proof that when the operator $q$ is linear and symmetric, AA improves the root-linear convergence factor over the fixed-point iterations. When $q$ is nonlinear, yet has a symmetric Jacobian at the solution, a slightly modified AA algorithm is proved to have an analogous root-linear convergence factor improvement over fixed-point iterations. Simulations verify our observations. Furthermore, experiments with different data models demonstrate AA is significantly superior to the standard fixed-point methods for Tyler's M-estimation.	翻訳日:2023-11-07 17:32:52 公開日:2023-11-04
# コードレビューのスピードは実践者にとって重要か? Does Code Review Speed Matter for Practitioners? ( http://arxiv.org/abs/2311.02489v1 ) ライセンス: Link先を確認	Gunnar Kudrjavets (University of Groningen) and Ayushi Rastogi (University of Groningen)	(参考訳) コードベロシティの増大は、さまざまなソフトウェアプロジェクトの共通の目標である。コードレビュープロセスの効率は、コードが最終製品にマージされ、顧客に到達するまでの速度に大きく影響します。我々は、コードベロシティに関連する信念とプラクティスを研究するための調査を実施した。業界参加者39名から75名,オープンソースコミュニティから36名を対象に調査を行った。私たちの重要な発見は (a)業界とオープンソースコミュニティは同様の信念を持っている。 b) 迅速な反応時間は最も重要であり、他のエンジニアのツールインフラストラクチャや振る舞いに適用します。 c) time-to-mergeは、改善するために必要なコードレビューの基準です。 (d) エンジニアは、キャリアの成長に対するコードベロシティの増加の利点について異なる意見を持っている。 (e)コミット・then-reviewモデルの制御されたアプリケーションによってコード速度が向上する。私たちの研究は、基盤となる組織エコシステムに関係なく、コードベロシティへの投資と改善を継続する必要性をサポートします。 Increasing code velocity is a common goal for a variety of software projects. The efficiency of the code review process significantly impacts how fast the code gets merged into the final product and reaches the customers. We conducted a survey to study the code velocity-related beliefs and practices in place. We analyzed 75 completed surveys from 39 participants from the industry and 36 from the open-source community. Our critical findings are (a) the industry and open-source community hold a similar set of beliefs, (b) quick reaction time is of utmost importance and applies to the tooling infrastructure and the behavior of other engineers, (c) time-to-merge is the essential code review metric to improve, (d) engineers have differing opinions about the benefits of increased code velocity for their career growth, and (e) the controlled application of the commit-then-review model can increase code velocity. Our study supports the continued need to invest in and improve code velocity regardless of the underlying organizational ecosystem.	翻訳日:2023-11-07 17:32:40 公開日:2023-11-04
# スパースカテーテルパスを用いた左心房の神経再建 Neural Network Reconstruction of the Left Atrium using Sparse Catheter Paths ( http://arxiv.org/abs/2311.02488v1 ) ライセンス: Link先を確認	Alon Baram, Moshe Safran, Tomer Noy, Naveh Geri and Hayit Greenspan	(参考訳) 近年,カテーテルを用いた肺静脈分離用高周波アブレーションが心房細動治療の第一線となっている。これは、肺静脈のオスティアを含む左心房下心筋表面の比較的正確な地図を必要とし、表面の濃密なサンプリングと10分以上を要する。この研究の焦点は、手順の早期に左心房の可視化を提供することで、手順の複雑さを緩和し、表面のサンプリングが困難であるカテーテルの使用など、さらなるワークフローを可能にすることである。簡単なカテーテル操作から得られた部分的データから左心房の形状を再構築する新しい正規化項を持つ高密度エンコーダデコーダネットワークを提案する。ネットワークをトレーニングするために,3次元アトリア形状の大規模なデータセットを取得し,対応するカテーテル軌道を生成する。トレーニング後,提案するネットワークは,与えられた軌道に基づいて,十分なアトリリウム形状を近似できることを示す。 3次元心房再建のためのネットワークソリューションをいくつか比較した。提案手法は3分間の時間間隔で部分的取得を用いて現実的な可視化を実現する。合成およびヒトの臨床例が示される。 Catheter based radiofrequency ablation for pulmonary vein isolation has become the first line of treatment for atrial fibrillation in recent years. This requires a rather accurate map of the left atrial sub-endocardial surface including the ostia of the pulmonary veins, which requires dense sampling of the surface and takes more than 10 minutes. The focus of this work is to provide left atrial visualization early in the procedure to ease procedure complexity and enable further workflows, such as using catheters that have difficulty sampling the surface. We propose a dense encoder-decoder network with a novel regularization term to reconstruct the shape of the left atrium from partial data which is derived from simple catheter maneuvers. To train the network, we acquire a large dataset of 3D atria shapes and generate corresponding catheter trajectories. Once trained, we show that the suggested network can sufficiently approximate the atrium shape based on a given trajectory. We compare several network solutions for the 3D atrium reconstruction. We demonstrate that the solution proposed produces realistic visualization using partial acquisition within a 3-minute time interval. Synthetic and human clinical cases are shown.	翻訳日:2023-11-07 17:32:27 公開日:2023-11-04
# 時空間データに対する深層学習の不確かさの定量化--課題と機会 Uncertainty Quantification of Deep Learning for Spatiotemporal Data: Challenges and Opportunities ( http://arxiv.org/abs/2311.02485v1 ) ライセンス: Link先を確認	Wenchong He and Zhe Jiang	(参考訳) gps、リモートセンシング、計算シミュレーションの進歩により、大量の地理空間データと時空間データが高速に収集されている。このような時空間的なビッグデータ資産は、ディープラーニング技術の最近の進歩とともに、社会を変えるユニークな機会を提供する。しかし、深層学習が予期せぬ予測を不確実な自信で下し、高い意思決定アプリケーション(災害管理、医療診断、自律運転など)に重大な結果をもたらすことが広く認識されている。不確実性定量化(UQ)は、ディープラーニングモデルの信頼性を推定することを目的としている。本稿では,時空間データに対する深層学習のuqについて,その特異な課題や既存手法などについて概説する。特に不確実性源の重要性に注目する。時空間データの今後の研究方向を明らかにする。 With the advancement of GPS, remote sensing, and computational simulations, large amounts of geospatial and spatiotemporal data are being collected at an increasing speed. Such emerging spatiotemporal big data assets, together with the recent progress of deep learning technologies, provide unique opportunities to transform society. However, it is widely recognized that deep learning sometimes makes unexpected and incorrect predictions with unwarranted confidence, causing severe consequences in high-stake decision-making applications (e.g., disaster management, medical diagnosis, autonomous driving). Uncertainty quantification (UQ) aims to estimate a deep learning model's confidence. This paper provides a brief overview of UQ of deep learning for spatiotemporal data, including its unique challenges and existing methods. We particularly focus on the importance of uncertainty sources. We identify several future research directions for spatiotemporal data.	翻訳日:2023-11-07 17:32:10 公開日:2023-11-04
# 一般化されたゼロショットオーディオツーインテント分類 Generalized zero-shot audio-to-intent classification ( http://arxiv.org/abs/2311.02482v1 ) ライセンス: Link先を確認	Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki	(参考訳) 音声のみのデータを用いた音声言語理解システムの人気は高まっているが、未認識の意図を扱う能力は限られている。本研究では,インテント毎に数文のサンプル文しか持たない汎用的ゼロショット音声対インテント分類フレームワークを提案する。そこで我々はまず,自己教師付き事前学習モデルを用いて教師付きオーディオ・インテリジェント分類器を訓練する。次に、ニューラルオーディオシンセサイザーを利用して、サンプルテキスト発話のためのオーディオ埋め込みを作成し、コサイン類似性を用いて、見えない意図に対する一般化ゼロショット分類を行う。また,音声表現に語彙情報を組み込んでゼロショット性能を向上させるマルチモーダルトレーニング戦略を提案する。マルチモーダルトレーニングアプローチでは,音声のみの学習に比べて,slurpの意図を意識しない場合のゼロショットインテント分類の精度が2.75%,内部目標指向ダイアログデータセットでは18.2%向上している。 Spoken language understanding systems using audio-only data are gaining popularity, yet their ability to handle unseen intents remains limited. In this study, we propose a generalized zero-shot audio-to-intent classification framework with only a few sample text sentences per intent. To achieve this, we first train a supervised audio-to-intent classifier by making use of a self-supervised pre-trained model. We then leverage a neural audio synthesizer to create audio embeddings for sample text utterances and perform generalized zero-shot classification on unseen intents using cosine similarity. We also propose a multimodal training strategy that incorporates lexical information into the audio representation to improve zero-shot performance. Our multimodal training approach improves the accuracy of zero-shot intent classification on unseen intents of SLURP by 2.75% and 18.2% for the SLURP and internal goal-oriented dialog datasets, respectively, compared to audio-only training.	翻訳日:2023-11-07 17:31:55 公開日:2023-11-04
# 医用画像の循環翻訳のための厳密な境界付きディープネットワーク A Strictly Bounded Deep Network for Unpaired Cyclic Translation of Medical Images ( http://arxiv.org/abs/2311.02480v1 ) ライセンス: Link先を確認	Swati Rai, Jignesh S. Bhatt, and Sarat Kumar Patra	(参考訳) 医用画像翻訳は不適切な問題である。本稿では, 既存の一方向一方向翻訳ネットワークとは異なり, 不対化医療画像について検討し, 安定な双方向翻訳を実現する厳密な有界ネットワークを提供する。適応辞書学習に組み込んだパッチレベル連結巡回条件生成逆数ネットワーク(pCCGAN)を提案する。 47層のサイクリック接続された2つのCGANで構成されており、両方のジェネレータ(各32層の層)は、同じ臓器の入力とターゲットのモダリティ画像(地上の真理ではない)から異なる未対のパッチを連結して条件付けされている。鍵となる考え方は、近隣の文脈の特徴情報を利用して翻訳空間を束縛し、一般化を促進することである。ジェネレータはさらに、コンテキストパッチから学習した適応辞書を備えて、劣化の可能性を低減している。識別器は、ミニマックス関数を用いて翻訳画像を検証する15層ディープネットワークである。複合損失関数は, 対向的, 非対向的, 前向きの周期的, 同一性的損失で定式化され, 提案した学習機械の分散をさらに小さくする。定性的,定量的,アブレーション分析の結果,実際のCTおよびMRIでは良好な結果が得られた。 Medical image translation is an ill-posed problem. Unlike existing paired unbounded unidirectional translation networks, in this paper, we consider unpaired medical images and provide a strictly bounded network that yields a stable bidirectional translation. We propose a patch-level concatenated cyclic conditional generative adversarial network (pCCGAN) embedded with adaptive dictionary learning. It consists of two cyclically connected CGANs of 47 layers each; where both generators (each of 32 layers) are conditioned with concatenation of alternate unpaired patches from input and target modality images (not ground truth) of the same organ. The key idea is to exploit cross-neighborhood contextual feature information that bounds the translation space and boosts generalization. The generators are further equipped with adaptive dictionaries learned from the contextual patches to reduce possible degradation. Discriminators are 15-layer deep networks that employ minimax function to validate the translated imagery. A combined loss function is formulated with adversarial, non-adversarial, forward-backward cyclic, and identity losses that further minimize the variance of the proposed learning machine. Qualitative, quantitative, and ablation analysis show superior results on real CT and MRI.	翻訳日:2023-11-07 17:31:38 公開日:2023-11-04
# コンピュータサイエンスの教授・学生の学術的・個人的背景に基づく成功予測 Forecasting Success of Computer Science Professors and Students Based on Their Academic and Personal Backgrounds ( http://arxiv.org/abs/2311.02476v1 ) ライセンス: Link先を確認	Ghazal Kalhor and Behnam Bahrak	(参考訳) 大学院を修了した後、多くのコンピュータサイエンス(cs)の学生が北米における競争的な大学院プログラムに応募する。彼らの長期的な目標は、大手5社のうちの1社に採用されるか、あるいは教授になることだ。したがって、受け入れ基準の役割に気付くことで、目標に向かって最良の道を選ぶのに役立つかもしれない。本稿では,北米の高名な大学に入学し,将来教授として学界に復帰する可能性について,学生の過去の大学の影響を分析した。以上の結果から,先行大学ランキングが目標達成の重要な要因であることが示された。次に、上位25のコンピュータサイエンスプログラムを受講した学部の学生に偏見があることを示す。最後に,これらの大学における教授の成功を予測するために,機械学習モデルを用いる。我々はこの予測課題に対して7.85のRMSEを達成した。 After completing their undergraduate studies, many computer science (CS) students apply for competitive graduate programs in North America. Their long-term goal is often to be hired by one of the big five tech companies or to become a faculty member. Therefore, being aware of the role of admission criteria may help them choose the best path towards their goals. In this paper, we analyze the influence of students' previous universities on their chances of being accepted to prestigious North American universities and returning to academia as professors in the future. Our findings demonstrate that the ranking of their prior universities is a significant factor in achieving their goals. We then illustrate that there is a bias in the undergraduate institutions of students admitted to the top 25 computer science programs. Finally, we employ machine learning models to forecast the success of professors at these universities. We achieved an RMSE of 7.85 for this prediction task.	翻訳日:2023-11-07 17:31:16 公開日:2023-11-04
# ロボットスキルの精度保存外挿のための制約付き方程式学習ネットワーク Constrained Equation Learner Networks for Precision-Preserving Extrapolation of Robotic Skills ( http://arxiv.org/abs/2311.02475v1 ) ライセンス: Link先を確認	Hector Perez-Villeda, Justus Piater, and Matteo Saveriano	(参考訳) デモンストレーションによるプログラミングでは、ロボットは人間のデモから新しいスキルを学ぶ。学習後、ロボットはスキルを再現するだけでなく、新たなトレーニングデータを集めることなく、シフトしたドメインに一般化できるべきである。類似領域への適応は文献で研究されているが、オープンな問題は、データ分布の外にある異なる条件に学習スキルをどのように適応するか、そしてもっと重要なことは、望ましい適応の精度を保つかである。本稿では,制約付き回帰の観点からの演題によるプログラミングにおける軌道適応問題に対処する,制約付き方程式学習ネットワークと呼ばれる新しい教師付き学習フレームワークを提案する。制約付き回帰に対する従来のアプローチでは、例えばガウスでは、方程式学習ネットワークを利用して分析式を学習し、基底関数として使用する。これらの基礎関数は、トレーニングデータからの逸脱を最小限に抑えることを目的として、新しい初期点や最終点のような望ましい適応を表す制約を課す。ロボット軌道の適応には3つの課題がある。 1) 新しい適応のための軌道の歪みを最小限にすること 2) 適応の正確性を維持すること,及び 3)基礎関数の構造に関する直観の欠如に対処すること。本研究では,環境変化による適応を必要とするロボット作業のシミュレーションと実実験の両方において,本手法の有効性を検証し,既存の2つの手法との比較を行った。実験の結果,制約付き等式学習者ネットワークは,ロボットスキルの一般化と適応性の向上により,芸術的アプローチの状態を上回っていることがわかった。 In Programming by Demonstration, the robot learns novel skills from human demonstrations. After learning, the robot should be able not only to reproduce the skill, but also to generalize it to shifted domains without collecting new training data. Adaptation to similar domains has been investigated in the literature; however, an open problem is how to adapt learned skills to different conditions that are outside of the data distribution, and, more important, how to preserve the precision of the desired adaptations. This paper presents a novel supervised learning framework called Constrained Equation Learner Networks that addresses the trajectory adaptation problem in Programming by Demonstrations from a constrained regression perspective. While conventional approaches for constrained regression use one kind of basis function, e.g., Gaussian, we exploit Equation Learner Networks to learn a set of analytical expressions and use them as basis functions. These basis functions are learned from demonstration with the objective to minimize deviations from the training data while imposing constraints that represent the desired adaptations, like new initial or final points or maintaining the trajectory within given bounds. Our approach addresses three main difficulties in adapting robotic trajectories: 1) minimizing the distortion of the trajectory for new adaptations; 2) preserving the precision of the adaptations; and 3) dealing with the lack of intuition about the structure of basis functions. We validate our approach both in simulation and in real experiments in a set of robotic tasks that require adaptation due to changes in the environment, and we compare obtained results with two existing approaches. Performed experiments show that Constrained Equation Learner Networks outperform state of the art approaches by increasing generalization and adaptability of robotic skills.	翻訳日:2023-11-07 17:31:02 公開日:2023-11-04
# クラスタネットワーク干渉による個別政策評価と学習 Individualized Policy Evaluation and Learning under Clustered Network Interference ( http://arxiv.org/abs/2311.02467v1 ) ライセンス: Link先を確認	Yi Zhang, Kosuke Imai	(参考訳) 現在、政策評価と学習に関する文献が多数存在するが、先行研究の多くは、ある単位の処理課題が別の単位の結果に影響を及ぼさないと仮定している。あいにく、干渉を無視して政策評価が偏り、学習方針が無効になることがある。例えば、多くの友人を持つ影響力のある個人を治療すると、ポジティブな流出効果が生じ、個別化された治療規則(ITR)の全体的な性能が向上する。本稿では,集団ネットワーク(あるいは部分的)干渉下での最適ITRの評価と学習の問題について考察する。このモデルでは、itrの実証的性能を評価するために使用できる推定器を提案する。この推定器は標準逆確率重み推定器よりも実質的に効率的であり, 流出効果についての仮定を課さない。学習ITRに対する有限サンプル残差を導出し、効率的な評価推定器の使用により学習ポリシーの性能が向上することを示す。最後に,提案手法の利点を説明するためにシミュレーションと経験的研究を行う。 While there now exists a large literature on policy evaluation and learning, much of prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference may lead to biased policy evaluation and yield ineffective learned policies. For example, treating influential individuals who have many friends can generate positive spillover effects, thereby improving the overall performance of an individualized treatment rule (ITR). We consider the problem of evaluating and learning an optimal ITR under clustered network (or partial) interference where clusters of units are sampled from a population and units may influence one another within each cluster. Under this model, we propose an estimator that can be used to evaluate the empirical performance of an ITR. We show that this estimator is substantially more efficient than the standard inverse probability weighting estimator, which does not impose any assumption about spillover effects. We derive the finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the improved performance of learned policies. Finally, we conduct simulation and empirical studies to illustrate the advantages of the proposed methodology.	翻訳日:2023-11-07 17:30:35 公開日:2023-11-04
# 多状態脳ネットワーク発見 Multi-State Brain Network Discovery ( http://arxiv.org/abs/2311.02466v1 ) ライセンス: Link先を確認	Hang Yin and Yao Su and Xinyue Liu and Thomas Hartvigsen and Yanhua Li and Xiangnan Kong	(参考訳) 脳ネットワーク発見は、人間の脳のfMRIスキャンなどの神経画像データから得られる時空間信号からノードとエッジを見つけることを目的としている。既存の方法は、観測された信号が単一の脳活動状態によってのみ生成されると仮定して、代表的または平均的な脳ネットワークを導出する傾向がある。しかし、ヒトの脳は通常複数の活動状態を含み、協調して脳の活動を決定する。脳の領域とその接続は通常、単一の状態のネットワークだけでは捉えにくい複雑なパターンを示す。最近の研究では、脳の活動状態に応じて脳のパーセレーションと接続が変化している。このような脳ネットワークをマルチステートと呼び、この混合物は人間の行動を理解するのに役立ちます。したがって、単一状態ネットワークと比較して、複数状態ネットワークは認知脳ネットワークの重要な情報を失うことを防げる。そこで我々は,CGL(コヒーレントなグラフィカルラッソ)とGMM(ガウス混合モデル)を組み合わせることで,多状態脳ネットワークのモデル化に成功したMNGL(Multi-state Network Graphical Lasso)という新しいモデルを提案する。合成および実世界のADHD 200 fMRIデータセットを用いて、MNGLがより説明的で現実的な結果を発見することによって、最近の最先端の代替品より優れていることを示す。 Brain network discovery aims to find nodes and edges from the spatio-temporal signals obtained by neuroimaging data, such as fMRI scans of human brains. Existing methods tend to derive representative or average brain networks, assuming observed signals are generated by only a single brain activity state. However, the human brain usually involves multiple activity states, which jointly determine the brain activities. The brain regions and their connectivity usually exhibit intricate patterns that are difficult to capture with only a single-state network. Recent studies find that brain parcellation and connectivity change according to the brain activity state. We refer to such brain networks as multi-state, and this mixture can help us understand human behavior. Thus, compared to a single-state network, a multi-state network can prevent us from losing crucial information of cognitive brain network. To achieve this, we propose a new model called MNGL (Multi-state Network Graphical Lasso), which successfully models multi-state brain networks by combining CGL (coherent graphical lasso) with GMM (Gaussian Mixture Model). Using both synthetic and real world ADHD 200 fMRI datasets, we demonstrate that MNGL outperforms recent state-of-the-art alternatives by discovering more explanatory and realistic results.	翻訳日:2023-11-07 17:30:15 公開日:2023-11-04
# AGIのレベル:AGIへの道のりをめざして Levels of AGI: Operationalizing Progress on the Path to AGI ( http://arxiv.org/abs/2311.02462v1 ) ライセンス: Link先を確認	Meredith Ringel Morris, Jascha Sohl-dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg	(参考訳) 本稿では,人工知能(AGI)モデルとその前駆体の性能と動作を分類する枠組みを提案する。このフレームワークは、AGIパフォーマンス、一般性、自律性のレベルを導入します。モデルの比較,リスク評価,AGIへの道程の進捗測定を行う共通言語を提供することで,この枠組みが自律運転のレベルに類似した形で有用になることを願っている。フレームワークを開発するために、既存のAGIの定義を分析し、AGIにとって有用なオントロジーが満たすべき6つの原則を抽出する。これらの原則には、メカニズムよりも能力にフォーカスすること、汎用性とパフォーマンスを別々に評価すること、エンドポイントではなくagiに向かう段階を定義することが含まれる。これらの原則を念頭に置いて,奥行き(性能)と能力の広さ(一般性)に基づく「アギのレベル」を提案し,このオントロジーに現在のシステムがどのように適合するかを考察する。これらのレベルに対してAGIモデルの振る舞いと能力を定量化する将来のベンチマークの課題について論じる。最後に、これらのAGIのレベルが自律性やリスクといったデプロイメント上の考慮事項とどのように相互作用するかについて議論し、高機能なAIシステムの責任と安全なデプロイメントにおいて、ヒューマン・AIインタラクションパラダイムを慎重に選択することの重要性を強調します。 We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy. It is our hope that this framework will be useful in an analogous way to the levels of autonomous driving, by providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill six principles that a useful ontology for AGI should satisfy. These principles include focusing on capabilities rather than mechanisms; separately evaluating generality and performance; and defining stages along the path toward AGI, rather than focusing on the endpoint. With these principles in mind, we propose 'Levels of AGI' based on depth (performance) and breadth (generality) of capabilities, and reflect on how current systems fit into this ontology. We discuss the challenging requirements for future benchmarks that quantify the behavior and capabilities of AGI models against these levels. Finally, we discuss how these levels of AGI interact with deployment considerations such as autonomy and risk, and emphasize the importance of carefully selecting Human-AI Interaction paradigms for responsible and safe deployment of highly capable AI systems.	翻訳日:2023-11-07 17:29:51 公開日:2023-11-04
# SPHEAR: 完全統計的3次元モデリングのための球面頭部登録 SPHEAR: Spherical Head Registration for Complete Statistical 3D Modeling ( http://arxiv.org/abs/2311.02461v1 ) ライセンス: Link先を確認	Eduard Gabriel Bazavan, Andrei Zanfir, Thiemo Alldieck, Teodor Alexandru Szente, Mihai Zanfir and Cristian Sminchisescu	(参考訳) 本研究では,球面埋め込みに基づく新しい3次元登録法により,高精度で微分可能な3次元人頭モデルである \emph{sphear} を提案する。従来の非リジッド登録法からパラダイムを移行し,様々な表面前処理の下で動作し,再構築の忠実性を高め,必要な介入を最小化する。さらに、sphear は \emph{complete} モデルであり、多様な合成頭の形や表情をサンプリングするだけでなく、視線方向、高解像度のカラーテクスチャ、表面の正常な地図、細部で表現されたヘアカットをストランドとしてサンプリングすることができる。 SPHEARは、自動現実的な視覚データ生成、セマンティックアノテーション、一般的な再構築タスクに使用できる。最先端のアプローチと比較して,我々のコンポーネントは高速かつメモリ効率が高く,設計選択の妥当性と登録,再構築,生成の精度を実験でサポートしています。 We present \emph{SPHEAR}, an accurate, differentiable parametric statistical 3D human head model, enabled by a novel 3D registration method based on spherical embeddings. We shift the paradigm away from the classical Non-Rigid Registration methods, which operate under various surface priors, increasing reconstruction fidelity and minimizing required human intervention. Additionally, SPHEAR is a \emph{complete} model that allows not only to sample diverse synthetic head shapes and facial expressions, but also gaze directions, high-resolution color textures, surface normal maps, and hair cuts represented in detail, as strands. SPHEAR can be used for automatic realistic visual data generation, semantic annotation, and general reconstruction tasks. Compared to state-of-the-art approaches, our components are fast and memory efficient, and experiments support the validity of our design choices and the accuracy of registration, reconstruction and generation techniques.	翻訳日:2023-11-07 17:29:26 公開日:2023-11-04
# ヒューリスティック画像処理による企業組織図からのネットワーク構造抽出 Extracting Network Structures from Corporate Organization Charts Using Heuristic Image Processing ( http://arxiv.org/abs/2311.02460v1 ) ライセンス: Link先を確認	Hiroki Sayama and Junichi Yamanoi	(参考訳) 企業の組織構造は、企業運営のダイナミクスとパフォーマンスに影響を及ぼす可能性がある。しかし、このテーマは、簡単に利用できる組織ネットワークデータセットが不足しているため、未調査のままである。このギャップを克服するため,我々は組織図から組織ネットワークデータを抽出・再構成する新しいヒューリスティック画像処理手法を開発した。本手法は,企業組織図のPDFファイルを解析し,テキストラベル,ボックス,接続線,その他のオブジェクトをヒューリスティックに実装した複数ステップの画像処理により検出する。検出されたコンポーネントは、視覚化、バリデーション、さらにネットワーク分析のために、PythonのNetworkX Graphオブジェクトにまとめられる。 2008年から2011年までdiamond, inc.が発行した「組織図/システム図手帳」に示す日本全上場企業の組織図に本手法を適用した。 10,008の組織図PDFファイルのうち,4,606の組織ネットワークを再構築することができた(データ取得成功率:46%)。再建された組織ネットワーク毎にいくつかのネットワーク診断を行い,企業行動とパフォーマンスの関連性を調べるために,さらなる統計分析に活用する。 Organizational structure of corporations has potential to provide implications for dynamics and performance of corporate operations. However, this subject has remained unexplored because of the lack of readily available organization network datasets. To overcome the this gap, we developed a new heuristic image-processing method to extract and reconstruct organization network data from published organization charts. Our method analyzes a PDF file of a corporate organization chart and detects text labels, boxes, connecting lines, and other objects through multiple steps of heuristically implemented image processing. The detected components are reorganized together into a Python's NetworkX Graph object for visualization, validation and further network analysis. We applied the developed method to the organization charts of all the listed firms in Japan shown in the ``Organization Chart/System Diagram Handbook'' published by Diamond, Inc., from 2008 to 2011. Out of the 10,008 organization chart PDF files, our method was able to reconstruct 4,606 organization networks (data acquisition success rate: 46%). For each reconstructed organization network, we measured several network diagnostics, which will be used for further statistical analysis to investigate their potential correlations with corporate behavior and performance.	翻訳日:2023-11-07 17:29:08 公開日:2023-11-04
# 注意に基づくマルチインスタンス混合モデル Attention-based Multi-instance Mixed Models ( http://arxiv.org/abs/2311.02455v1 ) ライセンス: Link先を確認	Jan P. Engelmann, Alessandro Palma, Jakub M. Tomczak, Fabian J Theis, Francesco Paolo Casale	(参考訳) 単細胞データから患者の特徴を予測することは、健康や疾患にかかわる細胞状態を明らかにすることができる。線形モデルと平均的な細胞型表現は、その効率性と頑健性のためにこのタスクに好まれるが、単細胞データに固有の豊富な細胞多様性を見落としている。このギャップに対処するために,汎用線形混合モデル (GLMM) と多重インスタンス学習 (MIL) を統合したフレームワークであるGMILを導入し,セル状態の不均一性をモデル化しながら線形モデルの利点を裏付ける。 GMILは、事前に定義されたセル埋め込みを活用することにより、計算効率を高め、シングルセル表現学習の最近の進歩と整合する。実験の結果,GMILは単一セルデータセットにおいて既存のMILモデルよりも優れており,新たな関連性を明らかにし,異なる領域にわたる生物学的機構を明らかにする。 Predicting patient features from single-cell data can unveil cellular states implicated in health and disease. Linear models and average cell type expressions are typically favored for this task for their efficiency and robustness, but they overlook the rich cell heterogeneity inherent in single-cell data. To address this gap, we introduce GMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL), upholding the advantages of linear models while modeling cell-state heterogeneity. By leveraging predefined cell embeddings, GMIL enhances computational efficiency and aligns with recent advancements in single-cell representation learning. Our empirical results reveal that GMIL outperforms existing MIL models in single-cell datasets, uncovering new associations and elucidating biological mechanisms across different domains.	翻訳日:2023-11-07 17:28:48 公開日:2023-11-04
# QOCO:モバイルエッジコンピューティングのための深層強化学習に基づくQoE指向計算オフロードアルゴリズム QOCO: A QoE-Oriented Computation Offloading Algorithm based on Deep Reinforcement Learning for Mobile Edge Computing ( http://arxiv.org/abs/2311.02525v1 ) ライセンス: Link先を確認	Iman Rahmati, Hamed Shah-Mansouri, Ali Movaghar	(参考訳) モバイルエッジコンピューティング(MEC)の領域では、効率的な計算タスクのオフロードは、ユーザにとってシームレスな品質のエクスペリエンス(QoE)を保証する上で重要な役割を果たす。ユーザが応答性と信頼性の高いサービスを要求する、今日の相互接続の世界では、高いQoEを維持することが最重要である。この課題は、動的で不確実なモバイル環境の処理に寄与する最も重要な要因の1つである。本研究では,厳密なタスク処理期限とエネルギー制約がシステム性能に悪影響を及ぼすようなMECシステムの計算オフロードについて検討する。計算タスクオフロード問題をマルコフ決定プロセス(mdp)として定式化し,各ユーザの長期qoeを個別に最大化する。本稿では、モバイルデバイスが他のデバイスによる意思決定の知識を必要とせずに、そのオフロード決定を行うことを可能にする、深層強化学習(DRL)に基づく分散QoE指向計算オフロード(QOCO)アルゴリズムを提案する。数値解析により,QOCOの性能評価を行った。シミュレーションの結果、QOCOアルゴリズムはエッジノードの計算資源を効率的に活用することを確認した。その結果、14%のタスクを完了でき、タスクの遅延とエネルギー消費をそれぞれ9%と6%削減できる。これらを組み合わせると、既存のアルゴリズムと比較して、qoeの平均値が少なくとも37%向上する。 In the realm of mobile edge computing (MEC), efficient computation task offloading plays a pivotal role in ensuring a seamless quality of experience (QoE) for users. Maintaining a high QoE is paramount in today's interconnected world, where users demand responsive and reliable services. This challenge stands as one of the most primary key factors contributing to handling dynamic and uncertain mobile environment. In this study, we delve into computation offloading in MEC systems, where strict task processing deadlines and energy constraints can adversely affect the system performance. We formulate the computation task offloading problem as a Markov decision process (MDP) to maximize the long-term QoE of each user individually. We propose a decentralized QoE-oriented computation offloading (QOCO) algorithm based on deep reinforcement learning (DRL) that empowers mobile devices to make their offloading decisions without requiring knowledge of decisions made by other devices. Through numerical studies, we evaluate the performance of QOCO. Simulation results validate that the QOCO algorithm efficiently exploits the computational resources of edge nodes. Consequently, it can complete 14% more tasks and reduce task delay and energy consumption by 9% and 6%, respectively. These together contribute to a significant improvement of at least 37% in average QoE compared to an existing algorithm.	翻訳日:2023-11-07 17:19:58 公開日:2023-11-04
# UniTSFace: 顔認識のための統一された閾値統合型サンプル対サンプル損失 UniTSFace: Unified Threshold Integrated Sample-to-Sample Loss for Face Recognition ( http://arxiv.org/abs/2311.02523v1 ) ライセンス: Link先を確認	Qiufu Li, Xi Jia, Jiancan Zhou, Linlin Shen and Jinming Duan	(参考訳) サンプル対クラスベースの顔認識モデルは、大量の顔画像間のクロスサンプル関係を十分に調べることができない。さらに、どちらの方法も、正と負の対を分離する統一しきい値が期待できる実世界の顔認証アプリケーションの要件を満たすものではない。本稿では,正の対と負の対を区別するための明確な統一された閾値を特徴とする,試料対サンプル損失(USS損失)の統一しきい値を提案する。 USSの損失にインスパイアされ、サンプル対サンプルベースのソフトマックスとBCEの損失を導き、それらの関係を議論する。 MFR, IJB-C, LFW, CFP-FP, AgeDB, MegaFaceなど,複数のベンチマークデータセットに対する大規模な評価は,提案されたUSS損失が極めて効率的で,サンプル-クラスベースの損失とシームレスに動作することを示した。組込み損失(USSとSprint-to-class Softmax損失)は、以前のアプローチの落とし穴を克服し、訓練された顔モデルUniTSFaceは、CosFace、ArcFace、VPL、AnchorFace、UNPGといった最先端のメソッドよりも優れたパフォーマンスを示す。私たちのコードは利用可能です。 Sample-to-class-based face recognition models can not fully explore the cross-sample relationship among large amounts of facial images, while sample-to-sample-based models require sophisticated pairing processes for training. Furthermore, neither method satisfies the requirements of real-world face verification applications, which expect a unified threshold separating positive from negative facial pairs. In this paper, we propose a unified threshold integrated sample-to-sample based loss (USS loss), which features an explicit unified threshold for distinguishing positive from negative pairs. Inspired by our USS loss, we also derive the sample-to-sample based softmax and BCE losses, and discuss their relationship. Extensive evaluation on multiple benchmark datasets, including MFR, IJB-C, LFW, CFP-FP, AgeDB, and MegaFace, demonstrates that the proposed USS loss is highly efficient and can work seamlessly with sample-to-class-based losses. The embedded loss (USS and sample-to-class Softmax loss) overcomes the pitfalls of previous approaches and the trained facial model UniTSFace exhibits exceptional performance, outperforming state-of-the-art methods, such as CosFace, ArcFace, VPL, AnchorFace, and UNPG. Our code is available.	翻訳日:2023-11-07 17:19:34 公開日:2023-11-04
# Forward $\chi^2$ Divergence based Variational Importance Smpling Forward $\chi^2$ Divergence Based Variational Importance Sampling ( http://arxiv.org/abs/2311.02516v1 ) ライセンス: Link先を確認	Chengrui Li, Yule Wang, Weihan Li and Anqi Wu	(参考訳) ログの最大化は潜在変数モデルを学ぶ上で重要な側面であり、変分推論(VI)は一般的に採用されている手法である。しかし、複雑な後続分布を扱う場合、VIは高いログライクな状態を達成する上で困難に直面する可能性がある。この制限に応えて,ログ類似度を直接推定し,最大化する,新しい変動重要度サンプリング(VIS)手法を導入する。 VISは、forward $\chi^2$ divergence を最小化して達成した最適な提案分布を活用し、ログ類似度推定を強化する。混合モデル、変分オートエンコーダ、部分観測可能な一般化線形モデルなど、様々な一般的な潜在変数モデルにvisを適用する。その結果,本手法は,ログ類似度とモデルパラメータ推定の両面で,最先端のベースラインを一貫して上回ることを示した。 Maximizing the log-likelihood is a crucial aspect of learning latent variable models, and variational inference (VI) stands as the commonly adopted method. However, VI can encounter challenges in achieving a high log-likelihood when dealing with complicated posterior distributions. In response to this limitation, we introduce a novel variational importance sampling (VIS) approach that directly estimates and maximizes the log-likelihood. VIS leverages the optimal proposal distribution, achieved by minimizing the forward $\chi^2$ divergence, to enhance log-likelihood estimation. We apply VIS to various popular latent variable models, including mixture models, variational auto-encoders, and partially observable generalized linear models. Results demonstrate that our approach consistently outperforms state-of-the-art baselines, both in terms of log-likelihood and model parameter estimation.	翻訳日:2023-11-07 17:19:06 公開日:2023-11-04
# 単層WSe2/ギャッププラズモンナノキャビティにおける高可変室温プレクシトン Highly tunable room-temperature plexcitons in monolayer WSe2 /gap-plasmon nanocavities ( http://arxiv.org/abs/2311.02513v1 ) ライセンス: Link先を確認	Thomas P. Darlington, Mahfujur Rahaman, Kevin W.C. Kwock, Emanuil Yanev, Xuehao Wu, Luke N. Holtzman, Madisen Holbrook, Gwangwoo Kim, Kyung Yeol Ma, Hyeon Suk Shin, Andrey Krayev, Matthew Strasbourg, Nicholas J. Borys, D. N. Basov, Katayun Barmak, James C. Hone, Abhay N. Pasupathy, Deep Jariwala, P. James Schuck	(参考訳) 量子フォトニック技術の進歩は、光学活性状態の自由度を正確に制御する能力に依存している。そこで我々は, ストレイン工学と電圧調整可能なプラズモンナノキャビティを組み合わせた一般手法により, 2次元半導体単層におけるリアルタイム, 室温調整可能な強プラズモン・エキシトン結合を実現する。エキシトンエネルギーとナノキャビティプラズモン共鳴は、プラズモニック・ナノプローブに圧力を印加することで、同期的に制御可能であり、ラビの分裂が100mevを超え、デチューニングと結合強度をオペロンドで制御できることを示した。相関力分光法、ナノフォトルミネッセンス(ナノPL)、ナノラマン測定を応用し、電磁シミュレーションで拡張し、異なる偏光子バンドと暗い偏光子状態を特定し、ナノギャップとひずみチューニングの関数としてそれらの進化をマッピングした。このシステムは、デチューンを劇的に変更することなく、様々な空洞パラメータの結合強度を操作できる。さらに,複数の押圧サイクルと複数のナノバブルを用いた繰り返し実験により,波長可変の強いカップリングが頑健であることが判明した。最後に, ナノギャップサイズは, 基板とプラズモニック先端間の印加直流電圧により直接変調可能であることを示し, 複合型ナノエレクトロメカニカルシステム(NEMS)としての概念の性質を強調した。本研究は,単層(1l)遷移金属ジカルコゲナイド (tmds) に局在するプレキシトン状態を正確に制御し, 調整する可能性を実証し, 量子情報処理から光化学へのオンチップ・ポラリトン系ナノフォトニクス応用への道を開く。 The advancement of quantum photonic technologies relies on the ability to precisely control the degrees of freedom of optically active states. Here, we realize real-time, room-temperature tunable strong plasmon-exciton coupling in 2D semiconductor monolayers enabled by a general approach that combines strain engineering plus force- and voltage-adjustable plasmonic nanocavities. We show that the exciton energy and nanocavity plasmon resonance can be controllably toggled in concert by applying pressure with a plasmonic nanoprobe, allowing in operando control of detuning and coupling strength, with observed Rabi splittings >100 meV. Leveraging correlated force spectroscopy, nano-photoluminescence (nano-PL) and nano-Raman measurements, augmented with electromagnetic simulations, we identify distinct polariton bands and dark polariton states, and map their evolution as a function of nanogap and strain tuning. Uniquely, the system allows for manipulation of coupling strength over a range of cavity parameters without dramatically altering the detuning. Further, we establish that the tunable strong coupling is robust under multiple pressing cycles and repeated experiments over multiple nanobubbles. Finally, we show that the nanogap size can be directly modulated via an applied DC voltage between the substrate and plasmonic tip, highlighting the inherent nature of the concept as a plexcitonic nano-electro-mechanical system (NEMS). Our work demonstrates the potential to precisely control and tailor plexciton states localized in monolayer (1L) transition metal dichalcogenides (TMDs), paving the way for on-chip polariton-based nanophotonic applications spanning quantum information processing to photochemistry.	翻訳日:2023-11-07 17:18:49 公開日:2023-11-04
# ニューラルオブジェクト形状コンプリートを用いた擬似グラスピング Anthropomorphic Grasping with Neural Object Shape Completion ( http://arxiv.org/abs/2311.02510v1 ) ライセンス: Link先を確認	Diego Hidalgo-Carvajal, Hanzhi Chen, Gemma C. Bettelani, Jaesug Jung, Melissa Zavaglia, Laura Busse, Abdeldjallil Naceri, Stefan Leutenegger, Sami Haddadin	(参考訳) 人間に合った環境におけるロボットの進歩的な普及は、デクスタリティが重要な役割を果たす無数のオブジェクト操作技術を生み出した。人間は物体を扱う際、異常なデクスター性を示すことが確立されている。このようなデキスタリティは、物体の性質(重量、大きさ、形状など)の堅牢な理解と、それらと相互作用する顕著な能力に由来すると考えられる。手の姿勢は、通常、特定の領域が、特に部分的に見える場合は、把握する必要がある物体に与える影響を示す。本研究では, 部分的観察から全形状を再構築し, 7自由度ロボットハンドで操作することで, 人間の物体理解を活用した。提案手法は, 部分的再構成のみでベースラインの把持成功率を30%近く向上させ, 3つの異なる対象カテゴリで150以上の把持を達成した。これは,現実のシナリオにおいて,様々な方向や位置から完成した物体形状に基づいて,把持姿勢を予測・実行するためのアプローチの一貫した能力を示す。我々の研究は、現実世界の再構成された物体の正確な把握と操作のスキルを必要とするロボットアプリケーションを強化する新たな可能性を開く。 The progressive prevalence of robots in human-suited environments has given rise to a myriad of object manipulation techniques, in which dexterity plays a paramount role. It is well-established that humans exhibit extraordinary dexterity when handling objects. Such dexterity seems to derive from a robust understanding of object properties (such as weight, size, and shape), as well as a remarkable capacity to interact with them. Hand postures commonly demonstrate the influence of specific regions on objects that need to be grasped, especially when objects are partially visible. In this work, we leverage human-like object understanding by reconstructing and completing their full geometry from partial observations, and manipulating them using a 7-DoF anthropomorphic robot hand. Our approach has significantly improved the grasping success rates of baselines with only partial reconstruction by nearly 30% and achieved over 150 successful grasps with three different object categories. This demonstrates our approach's consistent ability to predict and execute grasping postures based on the completed object shapes from various directions and positions in real-world scenarios. Our work opens up new possibilities for enhancing robotic applications that require precise grasping and manipulation skills of real-world reconstructed objects.	翻訳日:2023-11-07 17:18:12 公開日:2023-11-04
# MAAIP:物理系文字に対する実演の模倣を前提とした多エージェント対人インタラクション MAAIP: Multi-Agent Adversarial Interaction Priors for imitation from fighting demonstrations for physics-based characters ( http://arxiv.org/abs/2311.02502v1 ) ライセンス: Link先を確認	Mohamed Younes, Ewa Kijak, Richard Kulpa, Simon Malinowski, Franck Multon	(参考訳) 物理に基づくキャラクターのリアルな相互作用と動きのシミュレーションは、インタラクティブなアプリケーションや、映画やビデオゲーム産業における自動セカンダリキャラクタアニメーションに非常に関心がある。近年の強化学習の成果は, シングルキャラクタシミュレーション, 特に模倣学習に基づく手法を用いた実験において, 顕著な結果が提案されている。しかし、複数の文字の相互作用と動きを模倣するには、その相互作用をモデル化する必要がある。本稿では,複数の物理系文字の相互作用と動作の両方を扱うために,一つの文字に対する動き模倣の考え方を一般化した,新しいマルチエージェント生成逆模倣学習手法を提案する。入力として2つの非構造化データセットが与えられる。 1) 特定のアプリケーションにリンクした一連の動作を行う単一のアクターの動作を含む単一アクターデータセット 2) 複数のアクター間のインタラクションのいくつかの例を含むインタラクションデータセット。これらのデータセットに基づいて,本システムは,本質的なスタイルを保ちながら,各キャラクターが各アクターに関連する対話的スキルを模倣できるように制御ポリシーを訓練する。このアプローチはボクシングとフルボディの格闘技の2つの異なるスタイルでテストされ、異なるスタイルを模倣する手法の能力を実証している。 Simulating realistic interaction and motions for physics-based characters is of great interest for interactive applications, and automatic secondary character animation in the movie and video game industries. Recent works in reinforcement learning have proposed impressive results for single character simulation, especially the ones that use imitation learning based techniques. However, imitating multiple characters interactions and motions requires to also model their interactions. In this paper, we propose a novel Multi-Agent Generative Adversarial Imitation Learning based approach that generalizes the idea of motion imitation for one character to deal with both the interaction and the motions of the multiple physics-based characters. Two unstructured datasets are given as inputs: 1) a single-actor dataset containing motions of a single actor performing a set of motions linked to a specific application, and 2) an interaction dataset containing a few examples of interactions between multiple actors. Based on these datasets, our system trains control policies allowing each character to imitate the interactive skills associated with each actor, while preserving the intrinsic style. This approach has been tested on two different fighting styles, boxing and full-body martial art, to demonstrate the ability of the method to imitate different styles.	翻訳日:2023-11-07 17:17:53 公開日:2023-11-04
# チャットGPTは言語実験を解けるか? Can Chat GPT solve a Linguistics Exam? ( http://arxiv.org/abs/2311.02499v1 ) ライセンス: Link先を確認	Patricia Ronan, Gerold Schneider	(参考訳) 本研究は、言語モデルGPT4を用いたChatGPT4のバージョンであるChatGPT4が、導入言語試験をうまく解決できるかどうかを問うものである。ドイツの大学における言語学コースの紹介に関するこれまでの試験質問は、この試験に使われている。試験質問は、最小限の事前処理のみでChatGPT4に送付された。その結果,複雑なタスクやネストタスクの解釈においても,言語モデルは非常に成功していることがわかった。広い音素転写のタスクでは驚くほど成功したが、形態素やフレーズの分析ではあまりうまく機能しなかった。単純な場合では十分に機能するが、特に1対1の対応が欠如している稀なケースは、現在混合した結果で処理されている。このモデルは、構文木の分析や生成のような視覚化を扱うことができない。これらのタスクをテキストデータに変換するより広範な前処理は、モデルがこれらのタスクをうまく解決するのを可能にする。 The present study asks if ChatGPT4, the version of ChatGPT which uses the language model GPT4, can successfully solve introductory linguistic exams. Previous exam questions of an Introduction to Linguistics course at a German university are used to test this. The exam questions were fed into ChatGPT4 with only minimal preprocessing. The results show that the language model is very successful in the interpretation even of complex and nested tasks. It proved surprisingly successful in the task of broad phonetic transcription, but performed less well in the analysis of morphemes and phrases. In simple cases it performs sufficiently well, but rarer cases, particularly with missing one-to-one correspondence, are currently treated with mixed results. The model is not yet able to deal with visualisations, such as the analysis or generation of syntax trees. More extensive preprocessing, which translates these tasks into text data, allow the model to also solve these tasks successfully.	翻訳日:2023-11-07 17:17:31 公開日:2023-11-04
# 極性超低温反応:空洞制御分子光解離 Polaritonic ultracold reactions: cavity controlled molecular photoassociation ( http://arxiv.org/abs/2311.02497v1 ) ライセンス: Link先を確認	Vasil Rokaj, Simeon I. Mistakidis, and H. R. Sadeghpour	(参考訳) ルビジウム二量体とテラヘルツ空洞との共振振動強い結合を考慮した超低温光化学のキャビティ分極制御のための原型モデルを提案する。振動励起と真空光子吸収の交差を避けるために、分子と光子の間の分極状態は、分子振動のフランク・コンドン(FC)因子を効率的に制御できることを示した。光と物質の絡み合いにより、FC因子は1つの極性分岐から別の極性分岐に移動され、実質的に強化されたFC因子を持つ偏光子となる。この偏光子状態を利用して光解離し、超低温分子の形成が促進される。この研究は、キャビティ真空場と光結合を制御する道筋を示唆し、極性超低温化学の新たなサブフィールドの基盤を築いている。 We introduce a prototypical model for cavity polaritonic control of ultracold photochemistry by considering the resonant vibrational strong coupling of a rubidium dimer to a terahertz cavity. We demonstrate that at avoided crossings between a vibrational excitation and the vacuum photon absorption, the resulting polaritonic states between the molecule and photons can efficiently control the molecular vibrational Franck-Condon (FC) factors. Due to the entanglement between light and matter, FC factor is transferred from one polaritonic branch to other, leading to a polariton with a substantially enhanced FC factor. Utilizing this polariton state for photoassociation results in the enhanced formation of ultracold molecules. This work suggests a path to controlling photoassociation with cavity vacuum fields, and lays the ground for the emerging subfield of polaritonic ultracold chemistry.	翻訳日:2023-11-07 17:17:14 公開日:2023-11-04
# locomujoco:locomotionのための総合的模倣学習ベンチマーク LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion ( http://arxiv.org/abs/2311.02496v1 ) ライセンス: Link先を確認	Firas Al-Hafez and Guoping Zhao and Jan Peters and Davide Tateo	(参考訳) Imitation Learning (IL)は、エンボディエージェントでアジャイルの移動を可能にするための大きな約束を持っています。しかし、既存のlocomotionベンチマークの多くは、主に単純化されたおもちゃのタスクに焦点を当てており、しばしば現実のシナリオの複雑さを捉えず、非現実的なドメインに対する研究の運営に失敗した。そこで本研究では,ILアルゴリズムの厳密な評価と比較を容易にするための新しいベンチマークを提案する。このベンチマークは、四足歩行、二足歩行、筋骨格の人間モデルを含む多様な環境を包含しており、それぞれが実際のノイズモーションキャプチャデータ、グランド・トゥルート・エキスパート・データ、グランド・トゥルート・サブ・オプティカル・データなどの包括的なデータセットを伴い、難易度レベルのスペクトルをまたいで評価することができる。学習エージェントの堅牢性を高めるために、動的ランダム化のための簡単なインタフェースを提供し、異なる実施形態でエージェントを訓練するための広範囲な部分観測可能なタスクを提供する。最後に、各タスクに手作りのメトリクスを提供し、評価を容易にし、高速なベンチマークを可能にする最先端のベースラインアルゴリズムでベンチマークを出荷する。 Imitation Learning (IL) holds great promise for enabling agile locomotion in embodied agents. However, many existing locomotion benchmarks primarily focus on simplified toy tasks, often failing to capture the complexity of real-world scenarios and steering research toward unrealistic domains. To advance research in IL for locomotion, we present a novel benchmark designed to facilitate rigorous evaluation and comparison of IL algorithms. This benchmark encompasses a diverse set of environments, including quadrupeds, bipeds, and musculoskeletal human models, each accompanied by comprehensive datasets, such as real noisy motion capture data, ground truth expert data, and ground truth sub-optimal data, enabling evaluation across a spectrum of difficulty levels. To increase the robustness of learned agents, we provide an easy interface for dynamics randomization and offer a wide range of partially observable tasks to train agents across different embodiments. Finally, we provide handcrafted metrics for each task and ship our benchmark with state-of-the-art baseline algorithms to ease evaluation and enable fast benchmarking.	翻訳日:2023-11-07 17:16:57 公開日:2023-11-04
# ベイズニューラルネットワークを用いた材料特性予測のための多変量回帰の不確かさ定量化 Uncertainty Quantification in Multivariable Regression for Material Property Prediction with Bayesian Neural Networks ( http://arxiv.org/abs/2311.02495v1 ) ライセンス: Link先を確認	Longze li, Jiang Chang, Aleksandar Vakanski, Min Xian	(参考訳) 物質科学におけるデータ駆動アプローチと機械学習に基づく手法の利用の増加により、情報決定のための予測変数の信頼性確実性定量化(UQ)の重要性は過大評価されない。材料特性予測におけるUQは、先進的な材料のマルチスケールおよびマルチフィジカルな性質、多数の要因間の複雑な相互作用、モデルトレーニングのための大規模キュレートデータセットの限定的利用など、ユニークな課題を提起する。近年、ベイジアンニューラルネットワーク(BNN)がUQの有望なアプローチとして登場し、ニューラルネットワーク内の不確実性を捉えるための確率的フレームワークを提供している。そこで本研究では,物質モデリングにおける規制法則から知識を統合し,モデルを物理的に一貫した予測へと導く手法を提案する。本手法の有効性を評価するために, 鋼のクリープ破断寿命を予測するケーススタディを提案する。クリープ試験から収集した3つのデータセットによる実験的検証は、従来のガウス過程回帰法の性能を超える、正確な点と不確実性の推定値を生成するBNNの能力を示す。同様に、アクティブラーニングアプリケーションにおけるBNNのUQに対する適合性を評価し、競合性能を報告した。最も有望なクリープ寿命予測の枠組みはマルコフ連鎖モンテカルロ近似に基づくbnnであり、変動推論近似や確率的アウトプットを持つ関連するnnに基づくbnnと比較してより信頼性の高い結果が得られた。コードはhttps://github.com/avakanski/creep-uncertainty-quantificationで入手できる。 With the increased use of data-driven approaches and machine learning-based methods in material science, the importance of reliable uncertainty quantification (UQ) of the predicted variables for informed decision-making cannot be overstated. UQ in material property prediction poses unique challenges, including the multi-scale and multi-physics nature of advanced materials, intricate interactions between numerous factors, limited availability of large curated datasets for model training, etc. Recently, Bayesian Neural Networks (BNNs) have emerged as a promising approach for UQ, offering a probabilistic framework for capturing uncertainties within neural networks. In this work, we introduce an approach for UQ within physics-informed BNNs, which integrates knowledge from governing laws in material modeling to guide the models toward physically consistent predictions. To evaluate the effectiveness of this approach, we present case studies for predicting the creep rupture life of steel alloys. Experimental validation with three datasets of collected measurements from creep tests demonstrates the ability of BNNs to produce accurate point and uncertainty estimates that are competitive or exceed the performance of the conventional method of Gaussian Process Regression. Similarly, we evaluated the suitability of BNNs for UQ in an active learning application and reported competitive performance. The most promising framework for creep life prediction is BNNs based on Markov Chain Monte Carlo approximation of the posterior distribution of network parameters, as it provided more reliable results in comparison to BNNs based on variational inference approximation or related NNs with probabilistic outputs. The codes are available at: https://github.com/avakanski/Creep-uncertainty-quantification.	翻訳日:2023-11-07 17:16:36 公開日:2023-11-04
# 畳み込み長期記憶テンソル回帰ネットワークを用いたカリフォルニアにおける火炎後植生回復予測 Forecasting Post-Wildfire Vegetation Recovery in California using a Convolutional Long Short-Term Memory Tensor Regression Network ( http://arxiv.org/abs/2311.02492v1 ) ライセンス: Link先を確認	Jiahe Liu, Xiaodi Wang	(参考訳) 森林火災後の植物再生の研究は, 生態系回復戦略の立案に不可欠である。先行研究は主に、火災後の継承に影響を与える重要な生態学的・生物地理学的要因を調査した。本研究は, 火災後の植物回復を予測し, 解析するための新しいアプローチを提案する。本研究では, 火災封止後の短期植物生育データに基づいて, 将来の正規化差分植生指数(NDVI)を予測する畳み込み長短期記憶テンソル回帰(ConvLSTMTR)ネットワークを開発した。このモデルは2013年から2020年にかけてカリフォルニア州で発生した104の野火で訓練され、テストされている。 ConvLSTMとテンソル回帰の統合により、予測されたNDVIを用いて全体的なロジスティック成長率kを計算することができる。全体として、我々のk値予測は印象的なパフォーマンスを示し、予測の50%は絶対誤差0.12以下、75%は誤差0.24以下である。最後に,uniform manifold approximation and projection (umap) と knn clustering を用いて回復傾向を同定し,回復率の異なる領域への洞察を提供する。本研究は, テンソル回帰とConvLSTMの併用を先導し, 類似の山火事のクラスター化に UMAP を適用した。これは予測的生態モデリングを推進し、将来の火災後の植生管理戦略を知らせる可能性がある。 The study of post-wildfire plant regrowth is essential for developing successful ecosystem recovery strategies. Prior research mainly examines key ecological and biogeographical factors influencing post-fire succession. This research proposes a novel approach for predicting and analyzing post-fire plant recovery. We develop a Convolutional Long Short-Term Memory Tensor Regression (ConvLSTMTR) network that predicts future Normalized Difference Vegetation Index (NDVI) based on short-term plant growth data after fire containment. The model is trained and tested on 104 major California wildfires occurring between 2013 and 2020, each with burn areas exceeding 3000 acres. The integration of ConvLSTM with tensor regression enables the calculation of an overall logistic growth rate k using predicted NDVI. Overall, our k-value predictions demonstrate impressive performance, with 50% of predictions exhibiting an absolute error of 0.12 or less, and 75% having an error of 0.24 or less. Finally, we employ Uniform Manifold Approximation and Projection (UMAP) and KNN clustering to identify recovery trends, offering insights into regions with varying rates of recovery. This study pioneers the combined use of tensor regression and ConvLSTM, and introduces the application of UMAP for clustering similar wildfires. This advances predictive ecological modeling and could inform future post-fire vegetation management strategies.	翻訳日:2023-11-07 17:16:03 公開日:2023-11-04
# CenterRadarNet: 4D FMCWレーダを用いた3次元物体検出・追跡フレームワーク CenterRadarNet: Joint 3D Object Detection and Tracking Framework using 4D FMCW Radar ( http://arxiv.org/abs/2311.01423v2 ) ライセンス: Link先を確認	Jen-Hao Cheng, Sheng-Yao Kuan, Hugo Latapie, Gaowen Liu, Jenq-Neng Hwang	(参考訳) ロバストな認識は、安全な自律運転と補助運転を確保する上で不可欠な要素である。耐候性センサーを提供する自動車レーダー(77 - 81 GHz)は、先進的なLiDARベースの自動運転システムに補完機能を提供する。無線周波数(RF)レーダーテンソルは3D位置情報以外に、時空間のセマンティクスが豊富である。従来の手法のほとんどは3D (Doppler-range-azimuth) RFレーダーテンソルを用いており、鳥の目視(BEV)における物体の位置、方向角、大きさを予測できる。しかし、3D空間におけるオブジェクトのサイズ、向き、アイデンティティを同時に推測する能力は欠如している。この制限を克服するために,3次元物体検出および再同定(re-ID)タスクのための4Dレーダデータからの高分解能表現学習を容易にするために,CenterRadarNetと呼ばれる効率的なジョイントアーキテクチャを提案する。シングルステージの3Dオブジェクト検出器として、CenterRadarNetはBEVオブジェクト分布の信頼性マップ、対応する3Dバウンディングボックス属性、各ピクセルの外観埋め込みを直接推論する。さらに,学習した外見埋め込みをre-IDに応用したオンライントラッカーを構築した。 CenterRadarNetは、K-Radar 3Dオブジェクト検出ベンチマークで最先端の結果を達成する。さらに、K-RadarデータセットV2にレーダーを用いた最初の3次元オブジェクト追跡結果を示す。さまざまな駆動シナリオにおいて、CenterRadarNetは一貫性があり、堅牢なパフォーマンスを示し、その広範な適用性を強調している。 Robust perception is a vital component for ensuring safe autonomous and assisted driving. Automotive radar (77 to 81 GHz), which offers weather-resilient sensing, provides a complementary capability to the vision- or LiDAR-based autonomous driving systems. Raw radio-frequency (RF) radar tensors contain rich spatiotemporal semantics besides 3D location information. The majority of previous methods take in 3D (Doppler-range-azimuth) RF radar tensors, allowing prediction of an object's location, heading angle, and size in bird's-eye-view (BEV). However, they lack the ability to at the same time infer objects' size, orientation, and identity in the 3D space. To overcome this limitation, we propose an efficient joint architecture called CenterRadarNet, designed to facilitate high-resolution representation learning from 4D (Doppler-range-azimuth-elevation) radar data for 3D object detection and re-identification (re-ID) tasks. As a single-stage 3D object detector, CenterRadarNet directly infers the BEV object distribution confidence maps, corresponding 3D bounding box attributes, and appearance embedding for each pixel. Moreover, we build an online tracker utilizing the learned appearance embedding for re-ID. CenterRadarNet achieves the state-of-the-art result on the K-Radar 3D object detection benchmark. In addition, we present the first 3D object-tracking result using radar on the K-Radar dataset V2. In diverse driving scenarios, CenterRadarNet shows consistent, robust performance, emphasizing its wide applicability.	翻訳日:2023-11-07 11:18:45 公開日:2023-11-04

Title

Authors

Abstract

論文公表日・翻訳日

# ユニバーサル・アトミック・コンポータビリティを目指して:Ethereum上のマルチロール環境のための形式モデル

Towards Universal Atomic Composability: A Formal Model for Multi-Rollup Environments on Ethereum ( http://arxiv.org/abs/2311.00422v2 )

ライセンス: Link先を確認

Dipankar Sarkar,

(参考訳) 分散台帳技術の急速に発展する領域において、スケーラビリティと相互運用性は、学術と産業の両方において最重要課題となっている。本稿では,Ethereum上の複数のロールアップにまたがるアトミックコンポーザビリティに対処する包括的形式モデルを提案する。提案モデルではバッファリング,依存性管理,並行性制御,ゼロ知識証明などの機構を取り入れた。さらに, その実用的反感, 強み, 弱みを評価し, 操作性や誤動作に対する弾力性を確保する。提案したモデルを共有シーケンサや他の既存ソリューションに適用することは、その汎用性と普遍性をアクセントする。

In the rapidly evolving domain of distributed ledger technology, scalability and interoperability have become paramount challenges for both academic and industry sectors. In this paper, we introduce a comprehensive formal model to address atomic composability across multiple rollups on Ethereum. The proposed model incorporates mechanisms like buffering, dependency management, concurrency control, and the groundbreaking zero-knowledge proofs. Moreover, we evaluate its practical repercussions, strengths, and weaknesses, ensuring resilience against manipulative or erroneous actions. The application of the proposed model to shared sequencers and other existing solutions accentuates its versatility and universality.

翻訳日:2024-03-25 13:55:39 公開日:2023-11-04

# OverHear:ヘッドホンベースのマルチセンサー・キーストローク推論

OverHear: Headphone based Multi-sensor Keystroke Inference ( http://arxiv.org/abs/2311.02288v1 )

ライセンス: Link先を確認

Raveen Wijewickrama, Maryam Abbasihafshejani, Anindya Maiti, Murtuza Jadliwala,

(参考訳) ヘッドホンは伝統的にオーディオ再生に限られており、高解像度マイクや加速度計のようなセンサーを統合するように進化してきた。これらの進歩によってユーザエクスペリエンスが向上する一方で、キーストローク推論がこの作業における私たちの関心事として、盗聴の潜在的な脆弱性も導入されます。この脅威を検証するために,ヘッドホンの音響および加速度計データを活用するキーストローク推論フレームワークであるOverHearを開発した。加速度計データは、個々のキーストローク識別に十分な詳細ではないが、手の位置によるクラスタリングキーの押圧を支援する。同時に、音響データからMel Frequency Cepstral Coefficients (MFCC)を抽出し、異なるキーストロークの区別を支援する。これらの機能はキーストローク予測のための機械学習モデルにフィードされ、その結果は辞書ベースの単語予測手法によってさらに洗練される。実験では,異なる環境条件下で様々なキーボードのタイプを試験した。メカニカルキーボードでは,トップ5キー予測精度が約80%,膜キーボードでは約60%,すべてのキーボードでは上位100ワード予測精度が70%以上であった。その結果,現実シナリオの文脈におけるアプローチの有効性と限界が浮き彫りになった。

Headphones, traditionally limited to audio playback, have evolved to integrate sensors like high-definition microphones and accelerometers. While these advancements enhance user experience, they also introduce potential eavesdropping vulnerabilities, with keystroke inference being our concern in this work. To validate this threat, we developed OverHear, a keystroke inference framework that leverages both acoustic and accelerometer data from headphones. The accelerometer data, while not sufficiently detailed for individual keystroke identification, aids in clustering key presses by hand position. Concurrently, the acoustic data undergoes analysis to extract Mel Frequency Cepstral Coefficients (MFCC), aiding in distinguishing between different keystrokes. These features feed into machine learning models for keystroke prediction, with results further refined via dictionary-based word prediction methods. In our experimental setup, we tested various keyboard types under different environmental conditions. We were able to achieve top-5 key prediction accuracy of around 80% for mechanical keyboards and around 60% for membrane keyboards with top-100 word prediction accuracies over 70% for all keyboard types. The results highlight the effectiveness and limitations of our approach in the context of real-world scenarios.

翻訳日:2024-03-25 13:45:54 公開日:2023-11-04

# 境界型・非バイアス型複合微分プライバシー

Bounded and Unbiased Composite Differential Privacy ( http://arxiv.org/abs/2311.02324v1 )

ライセンス: Link先を確認

Kai Zhang, Yanjun Zhang, Ruoxi Sun, Pei-Wei Tsai, Muneeb Ul Hassan, Xin Yuan, Minhui Xue, Jinjun Chen,

(参考訳) 差分プライバシ(DP)の目的は、隣接する2つのデータベース間で区別できない出力分布を生成することにより、プライバシを保護することである。しかし、従来の微分プライベートなメカニズムは最大外乱範囲を達成するために非有界な出力を生成する傾向があり、これは現実世界の応用と必ずしも一致しない。既存のソリューションは、出力結果を制限するために後処理や切り離し技術を用いてこの問題に対処しようとするが、バイアス問題を導入するコストがかかる。本稿では,複素確率密度関数を用いて,任意の数値入力データに対して有界および非偏りの出力を生成する新しい微分プライベート機構を提案する。この構成は、アクティベーション関数とベース関数から構成され、DP制約に従って機能を定義する柔軟性を提供する。また、繰り返し実験をすることなく最適なハイパーパラメータ設定を反復的に探索できる最適化アルゴリズムを開発し、さらなるプライバシー上のオーバーヘッドを防止する。さらに、合成確率密度関数の分散を評価し、分散推定よりも計算が簡単な2つの代替指標を導入することにより、提案手法の有用性を評価する。 3つのベンチマークデータセットに対する広範な評価は、従来のラプラスとガウスのメカニズムよりも一貫性があり、顕著な改善を示している。提案された有界・無バイアスの複合的私的メカニズムは、幅広いDP兵器の基盤となり、将来のプライバシー保護研究を促進する。

The objective of differential privacy (DP) is to protect privacy by producing an output distribution that is indistinguishable between any two neighboring databases. However, traditional differentially private mechanisms tend to produce unbounded outputs in order to achieve maximum disturbance range, which is not always in line with real-world applications. Existing solutions attempt to address this issue by employing post-processing or truncation techniques to restrict the output results, but at the cost of introducing bias issues. In this paper, we propose a novel differentially private mechanism which uses a composite probability density function to generate bounded and unbiased outputs for any numerical input data. The composition consists of an activation function and a base function, providing users with the flexibility to define the functions according to the DP constraints. We also develop an optimization algorithm that enables the iterative search for the optimal hyper-parameter setting without the need for repeated experiments, which prevents additional privacy overhead. Furthermore, we evaluate the utility of the proposed mechanism by assessing the variance of the composite probability density function and introducing two alternative metrics that are simpler to compute than variance estimation. Our extensive evaluation on three benchmark datasets demonstrates consistent and significant improvement over the traditional Laplace and Gaussian mechanisms. The proposed bounded and unbiased composite differentially private mechanism will underpin the broader DP arsenal and foster future privacy-preserving studies.

翻訳日:2024-03-25 13:45:54 公開日:2023-11-04

# NODLINK: きめ細かいAPT攻撃検出と調査のためのオンラインシステム

NODLINK: An Online System for Fine-Grained APT Attack Detection and Investigation ( http://arxiv.org/abs/2311.02331v1 )

ライセンス: Link先を確認

Shaofei Li, Feng Dong, Xusheng Xiao, Haoyu Wang, Fei Shao, Jiedong Chen, Yao Guo, Xiangqun Chen, Ding Li,

(参考訳) 先進的永続的脅威(APT)攻撃は現代企業を悩ませ、大きな損失をもたらした。これらの攻撃に対抗するため、研究者はシステムエンティティとその依存関係をモデル化するために、証明グラフを使用してAPT攻撃の複雑でステルス的なシナリオをキャプチャする手法を提案する。特に、攻撃の検出を加速し、財政的損失を減らすために、タイムラインや限られたリソースの制約の下でAPT攻撃を検知し、調査するオンライン証明ベースの検知システムが必要である。残念ながら、既存のオンラインシステムは通常、検出の粒度を犠牲にして計算の複雑さを減らし、10万以上のノードを持つ証明グラフを生成し、セキュリティ管理者が検出結果を解釈する上での課題を提起する。本稿では,検出粒度を犠牲にすることなく高い検出精度を維持する最初のオンライン検出システムであるNodLinkの設計と実装を行う。我々の知見は、オンライン証明に基づく検出システムにおけるAPT攻撃検出プロセスは、理論的に有界な誤差で簡潔な攻撃関連前処理グラフを復元する効率的なオンライン近似アルゴリズムを持つSteiner Tree Problem (STP)としてモデル化できるということである。 APT攻撃検出のためのSTP近似アルゴリズムフレームワークを利用するために、同じ複雑さを維持しつつ、従来のAPT攻撃検出よりも効率的であるインメモリキャッシュ、効率的な攻撃スクリーニング方法、および新しいSTP近似アルゴリズムを提案する。実運用環境でのNodLinkの評価を行った。オープンワールド実験は、NodLinkが2つの最先端(SOTA)オンライン証明分析システムより優れており、同じまたは高いスループットを持ちながら、高い検出精度と調査精度を達成していることを示している。

Advanced Persistent Threats (APT) attacks have plagued modern enterprises, causing significant financial losses. To counter these attacks, researchers propose techniques that capture the complex and stealthy scenarios of APT attacks by using provenance graphs to model system entities and their dependencies. Particularly, to accelerate attack detection and reduce financial losses, online provenance-based detection systems that detect and investigate APT attacks under the constraints of timeliness and limited resources are in dire need. Unfortunately, existing online systems usually sacrifice detection granularity to reduce computational complexity and produce provenance graphs with more than 100,000 nodes, posing challenges for security admins to interpret the detection results. In this paper, we design and implement NodLink, the first online detection system that maintains high detection accuracy without sacrificing detection granularity. Our insight is that the APT attack detection process in online provenance-based detection systems can be modeled as a Steiner Tree Problem (STP), which has efficient online approximation algorithms that recover concise attack-related provenance graphs with a theoretically bounded error. To utilize STP approximation algorithm frameworks for APT attack detection, we propose a novel design of in-memory cache, an efficient attack screening method, and a new STP approximation algorithm that is more efficient than the conventional one in APT attack detection while maintaining the same complexity. We evaluate NodLink in a production environment. The open-world experiment shows that NodLink outperforms two state-of-the-art (SOTA) online provenance analysis systems by achieving magnitudes higher detection and investigation accuracy while having the same or higher throughput.

翻訳日:2024-03-25 13:45:54 公開日:2023-11-04

# P2P方式のソフトウェア: 中央ソフトウェアのないソフトウェアモデルで、いかなるソフトウェアも自由に参加または離脱できる

Software in P2P way: a software model without central software and enabling any software to join or leave freely ( http://arxiv.org/abs/2311.02351v1 )

ライセンス: Link先を確認

Hong Su,

(参考訳) P2Pモデルは、ハードウェアであれソフトウェアであれ、同じピアのネットワークを含み、中央制御なしで自律的に動作し、高い可用性を確保しながら個々のピア障害を許容する。しかしながら、現在のP2P技術は主にハードウェアレベルのレジリエンスに焦点を当てており、しばしばP2Pネットワークと呼ばれる。本稿では,ソフトウェアレベルの高可用性向上を目的としたP2P(Peer-to-Peer)ソフトウェアモデルについて紹介する。一般的なハードウェア中心のP2P技術とは違い、このモデルは様々なソフトウェアコンポーネントの分散した性質、すなわち独立して機能する"ソフトウェアピア"をアクセントし、中央のソフトウェアに頼ることなくシームレスなネットワークの入退避を可能にする。このモデルの協調的なアプローチは、ネットワークトポロジを複数の自律的な処理パスで培養し、動的タスク割り当てによる連続的な操作を分散的に保証する。従来の冗長性手法の限界を超えることで、このP2Pモデルは、堅牢な可用性を実現するための適応的でスケーラブルなソリューションを提供する。検証結果は、高い可用性を確保しつつ、タスク処理を成功させる可能性を高める上で、モデルの有効性を裏付けるものである。

The P2P model encompasses a network of equal peers, whether in hardware or software, operating autonomously without central control, allowing individual peer failure while ensuring high availability. Nevertheless, current P2P technologies primarily focus on hardware-level resilience, often referred to as P2P networks, which do not safeguard against software failures. This paper introduces a pioneering Peer-to-Peer (P2P) software model aimed at enhancing software-level high availability. Diverging from prevalent hardware-centric P2P technologies, this model accentuates the decentralized nature of various software components, or "software peers," which function independently, enabling seamless network entry and exit without relying on central software. The model's collaborative approach cultivates a network topology with multiple autonomous processing paths, ensuring continuous operation through dynamic task allocation in a distributed manner. By surpassing the limitations of traditional redundancy methods, this P2P model provides an adaptive and scalable solution for achieving robust availability. Validation results underscore the model's effectiveness in enhancing the probabilities of successful task processing while ensuring high availability.

翻訳日:2024-03-25 13:45:54 公開日:2023-11-04

# Cryo-EMマイクログラフにおけるタンパク質同定のためのプロンプト学習によるセグメンテーションモデル(SAM)の適応

Adapting Segment Anything Model (SAM) through Prompt-based Learning for Enhanced Protein Identification in Cryo-EM Micrographs ( http://arxiv.org/abs/2311.16140v1 )

ライセンス: Link先を確認

Fei He, Zhiyuan Yang, Mingyue Gao, Biplab Poudel, Newgin Sam Ebin Sam Dhas, Rajan Gyawali, Ashwin Dhakal, Jianlin Cheng, Dong Xu

(参考訳) cryo-electron microscope (cryo-em) は構造生物学において重要な役割を担っているが、3dタンパク質構造構築に不可欠なタンパク質粒子ピッキングの課題は手作業による非効率化である。 TopazやcrYOLOといった最近のAIツールはこの分野を前進させているが、低コントラスト、複雑な形状、異質なコンフォーメーションなど、Cryo-EMイメージの課題を完全には解決していない。本研究では,Creo-EMのための画像分割基礎モデルSegment Anything Model (SAM) を即時学習により適用することを検討した。この焦点は、事前訓練されたパラメータを変更することなく、少数のラベル付きデータでモデルパフォーマンスを最適化し、適応性と基礎的知識保持のバランスを図ることを目的としていた。頭部プロンプト,プレフィックスプロンプト,エンコーダプロンプトという3つのプロンプトベースの学習戦略による試行を通じて,パフォーマンスの向上と,微調整アプローチと比較して計算要件の低減を観察した。この研究は、Cryo-EMマイクログラフからSAMを同定する可能性を強調するだけでなく、バイオメディカル画像のセグメンテーションや物体検出において幅広い可能性を示唆している。

Cryo-electron microscopy (cryo-EM) remains pivotal in structural biology, yet the task of protein particle picking, integral for 3D protein structure construction, is laden with manual inefficiencies. While recent AI tools such as Topaz and crYOLO are advancing the field, they do not fully address the challenges of cryo-EM images, including low contrast, complex shapes, and heterogeneous conformations. This study explored prompt-based learning to adapt the state-of-the-art image segmentation foundation model Segment Anything Model (SAM) for cryo-EM. This focus was driven by the desire to optimize model performance with a small number of labeled data without altering pre-trained parameters, aiming for a balance between adaptability and foundational knowledge retention. Through trials with three prompt-based learning strategies, namely head prompt, prefix prompt, and encoder prompt, we observed enhanced performance and reduced computational requirements compared to the fine-tuning approach. This work not only highlights the potential of prompting SAM in protein identification from cryo-EM micrographs but also suggests its broader promise in biomedical image segmentation and object detection.

翻訳日:2023-12-03 13:17:00 公開日:2023-11-04

# 健常者からの閉塞型睡眠時無呼吸症患者の心拍数-エントロピー指標による血圧結合の定量化

Differentiating patients with obstructive sleep apnea from healthy controls based on heart rate - blood pressure coupling quantified by entropy-based indices ( http://arxiv.org/abs/2311.10752v1 )

ライセンス: Link先を確認

Pawe{\l} Pilarczyk, Grzegorz Graff, Jos\'e M. Amig\'o, Katarzyna Tessmer, Krzysztof Narkiewicz, Beata Graff

(参考訳) 心拍数と心拍数と血圧記録の相互依存性を定量化するために, エントロピーに基づく一対の配列(ecps)の分類法を提案する。本手法の目的は,各項目が対象の2つの中間データ系列から構成されるデータ分類器を構築することである。この方法は順序パターンに基づいており、エントロピーのような指標を用いる。機械学習は、閉塞型睡眠時無呼吸症候群患者と対照群を区別するための最適かつ簡単なモデルを構築するために、分類問題に最も適した指標のサブセットを選択するために使用される。

We introduce an entropy-based classification method for pairs of sequences (ECPS) for quantifying mutual dependencies in heart rate and beat-to-beat blood pressure recordings. The purpose of the method is to build a classifier for data in which each item consists of the two intertwined data series taken for each subject. The method is based on ordinal patterns, and uses entropy-like indices. Machine learning is used to select a subset of indices most suitable for our classification problem in order to build an optimal yet simple model for distinguishing between patients suffering from obstructive sleep apnea and a control group.

翻訳日:2023-11-27 00:46:23 公開日:2023-11-04

# 自己調整カーネル回帰を用いたモバイルインターネット品質評価

Mobile Internet Quality Estimation using Self-Tuning Kernel Regression ( http://arxiv.org/abs/2311.05641v1 )

ライセンス: Link先を確認

Hanyang Jiang, Henry Shaowu Yuchi, Elizabeth Belding, Ellen Zegura, Yao Xie

(参考訳) 空間データのモデリングと推定は、実生活においてユビキタスであり、しばしば天気予報、汚染検知、農業に現れる。空間データ分析は、しばしば巨大なデータセットを処理する。本研究では,ooklaの大規模インターネット品質オープンデータセットに注目した。米国内の国の規模で、モバイル(セルラー)インターネットの品質を推定することを検討する。特に、高度に不均衡なデータに基づいて推定を行うことを目標としています: サンプルの大部分は限られた領域に集中していますが、残りの部分で利用できるものはほとんどありません。本稿では,データ不均衡の悪影響を軽減するために自己調整型カーネルを用いた適応型カーネル回帰手法を提案する。 2つの異なる移動ネットワーク計測データセットの比較実験を通じて,提案手法がより正確な予測を生成することを実証し,他のアプリケーションに適用する可能性を示した。

Modeling and estimation for spatial data are ubiquitous in real life, frequently appearing in weather forecasting, pollution detection, and agriculture. Spatial data analysis often involves processing datasets of enormous scale. In this work, we focus on large-scale internet-quality open datasets from Ookla. We look into estimating mobile (cellular) internet quality at the scale of a state in the United States. In particular, we aim to conduct estimation based on highly {\it imbalanced} data: Most of the samples are concentrated in limited areas, while very few are available in the rest, posing significant challenges to modeling efforts. We propose a new adaptive kernel regression approach that employs self-tuning kernels to alleviate the adverse effects of data imbalance in this problem. Through comparative experimentation on two distinct mobile network measurement datasets, we demonstrate that the proposed self-tuning kernel regression method produces more accurate predictions, with the potential to be applied in other applications.

翻訳日:2023-11-19 14:29:10 公開日:2023-11-04

# 電力流動解析のための量子ニューラルネットワーク

Quantum Neural Networks for Power Flow Analysis ( http://arxiv.org/abs/2311.06293v1 )

ライセンス: Link先を確認

Zeynab Kaseb, Matthias Moller, Giorgio Tosti Balducci, Peter Palensky, Pedro P. Vergara

(参考訳) 本稿では,量子ニューラルネットワークとハイブリッド量子古典ニューラルネットワークのパワーフロー解析への応用について検討する。 IEEE 4-busと33-busテストシステムに基づく2つの小型データセットを用いて実験を行った。また, 量子, ハイブリッド量子古典, 古典ニューラルネットワークの系統的性能比較を行った。比較はそれに基づいています (i)一般化能力、 (ii)堅牢性 (iii)必要なデータセットのサイズを訓練すること。 (iv) トレーニングエラー。 (v)計算時間の訓練、 (vi)訓練プロセス安定性。その結果,開発した量子古典ニューラルネットワークは,量子ニューラルネットワークと古典ニューラルネットワークの両方より優れており,ノイズ・中間規模量子(NISQ)時代の深層学習に基づく電力フロー解析を改善することができることがわかった。

This paper explores the potential application of quantum and hybrid quantum-classical neural networks in power flow analysis. Experiments are conducted using two small-size datasets based on the IEEE 4-bus and 33-bus test systems. A systematic performance comparison is also conducted among quantum, hybrid quantum-classical, and classical neural networks. The comparison is based on (i) generalization ability, (ii) robustness, (iii) training dataset size needed, (iv) training error. (v) training computational time, and (vi) training process stability. The results show that the developed quantum-classical neural network outperforms both quantum and classical neural networks, and hence can improve deep learning-based power flow analysis in the noisy-intermediate-scale quantum (NISQ) era.

翻訳日:2023-11-19 14:14:51 公開日:2023-11-04

# WD3: 深層強化学習における評価バイアスの活用

WD3: Taming the Estimation Bias in Deep Reinforcement Learning ( http://arxiv.org/abs/2006.12622v2 )

ライセンス: Link先を確認

Qiang He, Xinwen Hou

(参考訳) 関数近似によって引き起こされる過剰推定現象は、ディープq-ネットワークやddpgのような値ベースの強化学習アルゴリズムでよく知られた問題である。この問題を解決するため、TD3は2人の批評家の間で最小値を取る。本稿では,td3アルゴリズムが軽度仮定に過大評価バイアスを導入することを示す。より正確な価値関数の推定を得るため、これら2つの逆を統一し、推定バイアスを取り除き、一対の批評家を重み付けて性能をさらに向上できる新しいアルゴリズム \underline{w}eighted \underline{d}elayed \underline{d}eep \underline{d}eterministic policy gradient (wd3)を提案する。 WD3の有効性を示すため,DDPG,TD3,WD3の値関数の学習過程を比較した。その結果,提案アルゴリズムは値関数の推定誤差を除去することを確認した。さらに,連続制御タスクにおけるアルゴリズムの評価を行った。各テストタスクにおいて、WD3のパフォーマンスは一貫して上回り、少なくとも、最先端のアルゴリズムである\footnote{Ourコードのパフォーマンスは、~\href{https://sites.google.com/view/ictai20-wd3/}{https://sites.google.com/view/ictai20-wd3/}で利用可能である。 }.

The overestimation phenomenon caused by function approximation is a well-known issue in value-based reinforcement learning algorithms such as deep Q-networks and DDPG, which could lead to suboptimal policies. To address this issue, TD3 takes the minimum value between a pair of critics. In this paper, we show that the TD3 algorithm introduces underestimation bias in mild assumptions. To obtain a more precise estimation for value function, we unify these two opposites and propose a novel algorithm \underline{W}eighted \underline{D}elayed \underline{D}eep \underline{D}eterministic Policy Gradient (WD3), which can eliminate the estimation bias and further improve the performance by weighting a pair of critics. To demonstrate the effectiveness of WD3, we compare the learning process of value function between DDPG, TD3, and WD3. The results verify that our algorithm does eliminate the estimation error of value functions. Furthermore, we evaluate our algorithm on the continuous control tasks. We observe that in each test task, the performance of WD3 consistently outperforms, or at the very least matches, that of the state-of-the-art algorithms\footnote{Our code is available at~\href{https://sites.google.com/view/ictai20-wd3/}{https://sites.google.com/view/ictai20-wd3/}.}.

翻訳日:2023-11-09 20:51:29 公開日:2023-11-04

# インテリジェント交通システムのための大規模道路サイドマルチビューマルチセンサ空間同期フレームワーク

A Practical Large-Scale Roadside Multi-View Multi-Sensor Spatial Synchronization Framework for Intelligent Transportation Systems ( http://arxiv.org/abs/2311.04231v1 )

ライセンス: Link先を確認

Yong Li, Zhiguo Zhao, Yunli Chen, Rui Tian

(参考訳) 道路側シナリオにおける空間同期は、異なる場所で複数のセンサーからのデータを統合するために不可欠である。カスケード空間変換(CST)を用いた現在の手法は、大規模な展開において累積誤差につながることが多い。手動カメラのキャリブレーションは不十分で、広範囲の手動作業が必要であり、既存の方法は制御されたシナリオや単視点シナリオに限定されている。これらの課題に対処するため,本研究では,大規模マルチビューマルチセンサシナリオのための並列空間変換(pst)ベースのフレームワークを提案する。 PSTはセンサ座標系変換を並列化し、累積誤差を低減する。深層学習を道路側単眼のグローバルローカライゼーションに応用し,手作業の削減を図る。さらに,同期精度を向上させるために,位置情報と最適化アルゴリズムを用いる。我々のフレームワークは実世界のシナリオでテストされ、CSTベースの手法よりも優れています。大規模道路側におけるマルチパースペクティブ・マルチセンサ空間同期を著しく向上させ、デプロイメントコストを低減させる。

Spatial synchronization in roadside scenarios is essential for integrating data from multiple sensors at different locations. Current methods using cascading spatial transformation (CST) often lead to cumulative errors in large-scale deployments. Manual camera calibration is insufficient and requires extensive manual work, and existing methods are limited to controlled or single-view scenarios. To address these challenges, our research introduces a parallel spatial transformation (PST)-based framework for large-scale, multi-view, multi-sensor scenarios. PST parallelizes sensor coordinate system transformation, reducing cumulative errors. We incorporate deep learning for precise roadside monocular global localization, reducing manual work. Additionally, we use geolocation cues and an optimization algorithm for improved synchronization accuracy. Our framework has been tested in real-world scenarios, outperforming CST-based methods. It significantly enhances large-scale roadside multi-perspective, multi-sensor spatial synchronization, reducing deployment costs.

翻訳日:2023-11-09 18:18:46 公開日:2023-11-04

# rmt: 注意ネットワークが視覚トランスフォーマーに対応

RMT: Retentive Networks Meet Vision Transformers ( http://arxiv.org/abs/2309.11523v4 )

ライセンス: Link先を確認

Qihang Fan, Huaibo Huang, Mingrui Chen, Hongmin Liu and Ran He

(参考訳) Retentive NetworkはNLPのドメインで最初に登場し、その顕著な性能のためにすぐに注目を集めた。その印象的な能力のかなりの部分は、貴重な事前知識を含む明示的な崩壊機構に由来する。しかし、この明示的な減衰は一方向的で一次元であり、画像ベースタスクに必要な双方向2次元モデリングには適さない。そこで本研究では,視覚モデルを用いた距離関連事前知識の導入を目的とした,双方向2次元の明示的減衰法を提案する。さらに、言語モデルとは異なり、視覚バックボーンはトレーニングや推論中に同じ並列フォームを使用する。この並列形式が再帰的あるいはチャンク的リカレント形式に置き換えられると、モデルの並列性は著しく乱れ、非常に遅い推論速度となる。そのため、元のRetNetにある2つの追加の推論モードを捨て、並列フォームのみを保持します。具体的には、双方向の2次元明示的減衰を自己アテンションに組み込んで \textbf{re}tentive \textbf{s}elf-\textbf{a}ttention (resa) を形成する。さらに,大域的モデリングの複雑さを軽減するため,画像の2軸に沿ってReSAを分解する。 ReSAに基づいて、強力なビジョンバックボーンであるRTTを構築します。冗長な実験により、RTTは様々なコンピュータビジョンタスクにおいて例外的な性能を示した。例えば、RTT は単に \textbf{4.5G} FLOPs を用いて ImageNet-1k 上で \textbf{84.1\%} Top1-acc を達成する。我々の知る限りでは、RTTはモデルが同じサイズで同じ戦略で訓練された場合、トップ1-accを達成しています。さらに、RTTは下流タスクにおいて、既存のビジョンバックボーンを著しく上回る。コードはhttps://github.com/qhfan/rmtでリリースされる。

Retentive Network first emerged in the domain of NLP and immediately gained widespread attention due to its remarkable performance. A significant portion of its impressive capabilities stems from its explicit decay mechanism, which incorporates valuable prior knowledge. However, this explicit decay is unidirectional and one-dimensional, making it unsuitable for the bidirectional, two-dimensional modeling required in image-based tasks. To solve this, we propose a bidirectional, two-dimensional form of explicit decay specifically designed for vision models to introduce distance-related prior knowledge. Besides, unlike language models, the vision backbones use the same parallel form during training and inference. If this parallel form is replaced with recurrent or chunk-wise recurrent form, the parallelism of the model will be significantly disrupted, resulting in extremely slow inference speed. So we discard the two additional inference modes present in the original RetNet, retaining only the parallel form. Specifically, we incorporate bidirectional, two-dimensional explicit decay into the Self-Attention to form \textbf{Re}tentive \textbf{S}elf-\textbf{A}ttention (ReSA). Furthermore, to reduce the complexity of global modeling, we decompose ReSA along the two axes of the image. Building upon ReSA, we construct RMT, a strong vision backbone. Abundant experiments have demonstrated that our RMT exhibits exceptional performance across various computer vision tasks. For example, RMT achieves \textbf{84.1\%} Top1-acc on ImageNet-1k using merely \textbf{4.5G} FLOPs. To the best of our knowledge, among all models, RMT achieves the highest Top1-acc when models are of similar size and trained with the same strategy. Moreover, RMT significantly outperforms existing vision backbones in downstream tasks. Code will be released at https://github.com/qhfan/RMT.

翻訳日:2023-11-08 19:06:53 公開日:2023-11-04

# ARNIQA:画像品質評価のための歪みマニフォールド学習

ARNIQA: Learning Distortion Manifold for Image Quality Assessment ( http://arxiv.org/abs/2310.14918v2 )

ライセンス: Link先を確認

Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto Del Bimbo

(参考訳) No-Reference Image Quality Assessment (NR-IQA) は、高品質な参照画像を必要としない、人間の知覚に合わせて画像品質を測定する手法を開発することを目的としている。本研究では、画像歪み多様体をモデル化し、本質的な表現を得るための自己教師型アプローチ ARNIQA (leArning distoRtion maNifold for Image Quality Assessment) を提案する。まず,連続した歪みの順序列をランダムに合成する画像劣化モデルを提案する。このようにして、多種多様な劣化パターンで画像を合成分解することができる。第2に,異なる画像のパッチ表現間の類似性を最大化することで,異なるコンテンツに拘わらず等しく歪んだモデルを構築することを提案する。したがって、同じ方法で劣化した画像は歪み多様体内の隣接位置に対応する。最後に、画像表現を単純な線形レグレッサで品質スコアにマッピングし、エンコーダ重みを微調整することなく表示する。実験により,本手法は複数のデータセット上で最先端の性能を実現することを示す。さらに、ARNIQAは競合する手法と比較してデータ効率、一般化能力、堅牢性が改善されている。コードとモデルはhttps://github.com/miccunifi/arniqaで公開されている。

No-Reference Image Quality Assessment (NR-IQA) aims to develop methods to measure image quality in alignment with human perception without the need for a high-quality reference image. In this work, we propose a self-supervised approach named ARNIQA (leArning distoRtion maNifold for Image Quality Assessment) for modeling the image distortion manifold to obtain quality representations in an intrinsic manner. First, we introduce an image degradation model that randomly composes ordered sequences of consecutively applied distortions. In this way, we can synthetically degrade images with a large variety of degradation patterns. Second, we propose to train our model by maximizing the similarity between the representations of patches of different images distorted equally, despite varying content. Therefore, images degraded in the same manner correspond to neighboring positions within the distortion manifold. Finally, we map the image representations to the quality scores with a simple linear regressor, thus without fine-tuning the encoder weights. The experiments show that our approach achieves state-of-the-art performance on several datasets. In addition, ARNIQA demonstrates improved data efficiency, generalization capabilities, and robustness compared to competing methods. The code and the model are publicly available at https://github.com/miccunifi/ARNIQA.

翻訳日:2023-11-08 18:53:47 公開日:2023-11-04

# ZEETAD:ゼロショット終端動作検出のための事前学習型視覚言語モデルの適用

ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection ( http://arxiv.org/abs/2311.00729v2 )

ライセンス: Link先を確認

Thinh Phan, Khoa Vo, Duy Le, Gianfranco Doretto, Donald Adjeroh, Ngan Le

(参考訳) 時間的行動検出(TAD)は、未トリミングビデオ内のアクションインスタンスのローカライズと分類を含む。最近のゼロショットTAD手法では,大規模コントラスト視覚言語(ViL)事前訓練モデルを活用することで,有望なオープンセット設定を示す。しかし、既存のゼロショットTAD法は、ローカライゼーションと分類の2つの相互依存タスク間の強い関係を適切に構築し、ビデオ理解にViLモデルを適用する方法に制限がある。本稿では,デュアルローカライズとゼロショットの提案分類という2つのモジュールを特徴とするゼータドを提案する。前者はtransformerベースのモジュールで、アクションイベントを検出し、後で認識するために重要な意味埋め込みを選択的に収集する。後者はCLIPベースのモジュールで、時間単位ごとにテキストとフレーム入力からセマンティック埋め込みを生成する。さらに,軽量アダプタで冷凍したCLIPエンコーダを最小限に更新することで,未確認クラスの識別能力を向上させる。 THUMOS14とActivityNet-1.3データセットの大規模な実験は、ゼロショットTADにおける我々のアプローチの優れた性能と、ViLモデルから目に見えないアクションカテゴリへの効果的な知識伝達を示す。

Temporal action detection (TAD) involves the localization and classification of action instances within untrimmed videos. While standard TAD follows fully supervised learning with closed-set setting on large training data, recent zero-shot TAD methods showcase the promising open-set setting by leveraging large-scale contrastive visual-language (ViL) pretrained models. However, existing zero-shot TAD methods have limitations on how to properly construct the strong relationship between two interdependent tasks of localization and classification and adapt ViL model to video understanding. In this work, we present ZEETAD, featuring two modules: dual-localization and zero-shot proposal classification. The former is a Transformer-based module that detects action events while selectively collecting crucial semantic embeddings for later recognition. The latter one, CLIP-based module, generates semantic embeddings from text and frame inputs for each temporal unit. Additionally, we enhance discriminative capability on unseen classes by minimally updating the frozen CLIP encoder with lightweight adapters. Extensive experiments on THUMOS14 and ActivityNet-1.3 datasets demonstrate our approach's superior performance in zero-shot TAD and effective knowledge transfer from ViL models to unseen action categories.

翻訳日:2023-11-08 18:41:24 公開日:2023-11-04

# FPGA-QHAR:エッジ上での人間の行動認識のためのスループット最適化

FPGA-QHAR: Throughput-Optimized for Quantized Human Action Recognition on The Edge ( http://arxiv.org/abs/2311.03390v1 )

ライセンス: Link先を確認

Azzam Alhussain and Mingjie Lin

(参考訳) エッジチップ上でのリアルタイム監視とロボットシステムのためのHAR(Human Action Recognition)の効率的な高速化は、高い計算とメモリ要求を考えると、依然として困難な研究分野である。本稿では,8ビット量子化された2ストリームSimpleNet-PyTorch CNNアーキテクチャに基づく,エンドツーエンドHAR拡張型HW/SWアクセラレータ共設計を提案する。我々のネットワークアクセラレーターは、UCF101とUCF24データセットで訓練され、エッジSoC-FPGAで実装された。当社の開発では、部分ストリーミングデータフローアーキテクチャを使用して、ネットワーク設計とリソース利用トレードオフよりも高いスループットを実現しています。我々はまた、全ての畳み込み、バッチノルム、ReLU演算を単一均一層に融合させ、Lucas-Kanade運動流法を用いて高並列性加速器の設計とオンチップエンジンの最適化を実現したが、提案手法は、従来の研究より1.7x-1.9倍高いZCU104上の187MHzのリアルタイム推論スループットで、約81%の予測精度を達成した。最後に、設計されたフレームワークは、スループットとパフォーマンス測定のためにいくつかのハードウェアチップに対してベンチマークされ、エッジプラットフォームでのトレーニングと実装のためのgithubのオープンソースプロジェクトとして利用できる。

Accelerating Human Action Recognition (HAR) efficiently for real-time surveillance and robotic systems on edge chips remains a challenging research field, given its high computational and memory requirements. This paper proposed an integrated end-to-end HAR scalable HW/SW accelerator co-design based on an enhanced 8-bit quantized Two-Stream SimpleNet-PyTorch CNN architecture. Our network accelerator was trained on UCF101 and UCF24 datasets and implemented on edge SoC-FPGA. Our development uses partially streaming dataflow architecture to achieve higher throughput versus network design and resource utilization trade-off. We also fused all convolutional, batch-norm, and ReLU operations into a single homogeneous layer and utilized the Lucas-Kanade motion flow method to enable a high parallelism accelerator design and optimized on-chip engine computing.Furthermore, our proposed methodology achieved nearly 81% prediction accuracy with an approximately 24 FPS real-time inference throughput at 187MHz on ZCU104, which is 1.7x - 1.9x higher than the prior research. Lastly, the designed framework was benchmarked against several hardware chips for higher throughput and performance measurements and is now available as an open-source project on GitHub for training and implementation on edge platforms.

翻訳日:2023-11-08 18:27:43 公開日:2023-11-04

# 異節音声表現の学習

Learning Disentangled Speech Representations ( http://arxiv.org/abs/2311.03389v1 )

ライセンス: Link先を確認

Yusuf Brima, Ulf Krumnack, Simone Pika and Gunther Heidemann

(参考訳) 多くのアプリケーション領域において重要でありながら、音声からのアンタングル表現学習は限定的である。主要な課題は、メソッドを評価するための既知の生成因子を持つ音声データセットの欠如である。本稿では, 音声表現の非接触化に関する研究を可能にする基礎的真理因子を用いた合成音声データセットSynSpeechを提案する。本研究は,教師付きディスタングルメント指標を用いて教師付き手法の評価を行う。このベンチマークデータセットとフレームワークは、最先端不連続音声表現学習法の厳密な評価のギャップに対処する。我々の発見は、この未探索領域を前進させ、より堅牢な音声表現を可能にする洞察を与える。

Disentangled representation learning from speech remains limited despite its importance in many application domains. A key challenge is the lack of speech datasets with known generative factors to evaluate methods. This paper proposes SynSpeech: a novel synthetic speech dataset with ground truth factors enabling research on disentangling speech representations. We plan to present a comprehensive study evaluating supervised techniques using established supervised disentanglement metrics. This benchmark dataset and framework address the gap in the rigorous evaluation of state-of-the-art disentangled speech representation learning methods. Our findings will provide insights to advance this underexplored area and enable more robust speech representations.

翻訳日:2023-11-08 18:27:19 公開日:2023-11-04

# 強化学習を用いたグループミッションにおけるエージェントのエラー源の自動同定

Using reinforcement learning to autonomously identify sources of error for agents in group missions ( http://arxiv.org/abs/2107.09232v4 )

ライセンス: Link先を確認

Keishu Utimula, Ken-taro Hayaschi, Trevor J. Bihl, Kenta Hongo, Ryo Maezono

(参考訳) エージェントが任務を実行するために群がると、いくつかのエージェントは、コマンドベースから観察されるように、しばしば突然の失敗を示す。一般に、コマンドベースとエージェント間の通信のみに依存することで、アクチュエータ(h_a$)やセンサ(h_s$)によって障害が発生するかどうかを判断するのは困難である。言い換えると、我々は対応する変位を$h_a$ で検出するが、$h_s$ では検出しない。本研究では,人工知能が自律的に行動計画「\boldsymbol{g}$」を作成できるかどうかについて考察した。一般的に、$\boldsymbol{g}$に対する期待された応答は、採用されている仮説に依るので、その違いは $d(\boldsymbol{g})$] で示され、$d\left(\boldsymbol{g}\right)$ を使用して原因を特定できる。例えば、$d(\boldsymbol{g})$を最大化する$\boldsymbol{g}^*$は、このタスクに適したアクションプランであるが、$d(\boldsymbol{g})$は、他のエージェントとの衝突のような稀なイベントにおいて非ゼロとなり、ほとんどのスウォームアクション$\boldsymbol{g}$は$d(\boldsymbol{g})=0$となるため、従来の勾配法を用いて達成することは困難である。言い換えると、$\boldsymbol{g}$, $d(\boldsymbol{g})$ の空間のほとんど全体が勾配がゼロであり、勾配法は適用されない。そこで我々は,Qテーブル強化学習を用いた行動計画を立てた。意外なことに、強化学習によって生成された最適なアクションプランは、他のエージェントと失敗したエージェントを連携させることで問題を特定するための人間的なソリューションを示しました。この簡単なプロトタイプを用いて,障害原因を特定できる自律的行動計画にqテーブル強化学習手法を適用する可能性を実証した。

When agents swarm to execute a mission, some of them frequently exhibit sudden failure, as observed from the command base. It is generally difficult to determine whether a failure is caused by actuators (hypothesis, $h_a$) or sensors (hypothesis, $h_s$) by solely relying on the communication between the command base and concerning agent. However, by instigating collusion between the agents, the cause of failure can be identified; in other words, we expect to detect corresponding displacements for $h_a$ but not for $h_s$. In this study, we considered the question as to whether artificial intelligence can autonomously generate an action plan $\boldsymbol{g}$ to pinpoint the cause as aforedescribed. Because the expected response to $\boldsymbol{g}$ generally depends upon the adopted hypothesis [let the difference be denoted by $D(\boldsymbol{g})$], a formulation that uses $D\left(\boldsymbol{g}\right)$ to pinpoint the cause can be made. Although a $\boldsymbol{g}^*$ that maximizes $D(\boldsymbol{g})$ would be a suitable action plan for this task, such an optimization is difficult to achieve using the conventional gradient method, as $D(\boldsymbol{g})$ becomes nonzero in rare events such as collisions with other agents, and most swarm actions $\boldsymbol{g}$ give $D(\boldsymbol{g})=0$. In other words, throughout almost the entire space of $\boldsymbol{g}$, $D(\boldsymbol{g})$ has zero gradient, and the gradient method is not applicable. To overcome this problem, we formulated an action plan using Q-table reinforcement learning. Surprisingly, the optimal action plan generated via reinforcement learning presented a human-like solution to pinpoint the problem by colliding other agents with the failed agent. Using this simple prototype, we demonstrated the potential of applying Q-table reinforcement learning methods to plan autonomous actions to pinpoint the causes of failure.

翻訳日:2023-11-08 02:16:21 公開日:2023-11-04

# 有限体上のランダム原始多項式生成のための量子加速アルゴリズム

Quantum-accelerated algorithms for generating random primitive polynomials over finite fields ( http://arxiv.org/abs/2203.12884v3 )

ライセンス: Link先を確認

Shan Huang, Hua-Lei Yin, Zeng-Bing Chen, Shengjun Wu

(参考訳) 有限体上の原始多項式は、古典的擬似ランダム数生成、符号化理論、ポスト量子暗号など、コンピュータ科学の様々な領域において重要である。それでも、有限体上のランダム原始多項式を生成するための効率的な古典的アルゴリズムの追求は今も続いている課題である。本稿では,この問題をハイブリッド量子古典アルゴリズムを用いて効率的に解く方法を示し,それらを実装するための特定の量子回路の設計について述べる。本研究は,多種多様な量子通信および計算応用におけるランダムプリミティブ多項式の高速かつリアルタイムな生成方法である。

Primitive polynomials over finite fields are crucial for various domains of computer science, including classical pseudo-random number generation, coding theory and post-quantum cryptography. Nevertheless, the pursuit of an efficient classical algorithm for generating random primitive polynomials over finite fields remains an ongoing challenge. In this paper, we show how to solve this problem efficiently through hybrid quantum-classical algorithms, and designs of the specific quantum circuits to implement them are also presented. Our research paves the way for the rapid and real-time generation of random primitive polynomials in diverse quantum communication and computation applications.

翻訳日:2023-11-08 02:09:10 公開日:2023-11-04

# 気晴らしは公正に必要なのは

Distraction is All You Need for Fairness ( http://arxiv.org/abs/2203.07593v3 )

ライセンス: Link先を確認

Mehdi Yazdani-Jahromi and AmirArsalan Rajabi and Ali Khodabandeh Yalabadi and Aida Tayebi and Ozlem Ozmen Garibay

(参考訳) トレーニングデータセットのバイアスは、同等または同等の処置を保証するために、分類タスクのさまざまなグループのために管理されなければならない。近年の人工知能モデルの成長と、自動意思決定におけるその役割拡大により、これらのモデルがバイアスを受けないことが不可欠である。これらのモデルには、訓練対象の関数や学習アルゴリズムに固有の、トレーニング対象のデータに存在するバイアスを含む、あるいは増幅する証拠が多数存在する。多くの研究者は、この問題に対して、統計的に独立なデータに変更、パリティを最大化しようとする特定の競争相手の能力を制限するための敵対的トレーニングなど、さまざまな方向に注意を向けている。これらの手法は情報損失をもたらし、正確さと公平さのバランスを適切にとらないか、あるいはトレーニングにおけるバイアスを確実に制限しない。そこで本研究では,分類結果に影響を及ぼすバイアスの制御に理論的に有効であることを証明し,ディープラーニングモデルを学習するための強力な戦略を提案する。この方法は、異なるデータタイプ(例えば、表、画像、グラフなど)で利用することができる。提案手法は,uci成人・遺産健康データセット (tabular), pokec-z, pokec-n, nbaデータセット (graph), celebaデータセット (vision) でテストすることにより有効性を示す。各データセットの公正度文献に提案する最先端手法を用いて、バイアスを最小限に抑え精度を維持する上で、提案手法よりも優れたモデルを示す。

Bias in training datasets must be managed for various groups in classification tasks to ensure parity or equal treatment. With the recent growth in artificial intelligence models and their expanding role in automated decision-making, ensuring that these models are not biased is vital. There is an abundance of evidence suggesting that these models could contain or even amplify the bias present in the data on which they are trained, inherent to their objective function and learning algorithms; Many researchers direct their attention to this issue in different directions, namely, changing data to be statistically independent, adversarial training for restricting the capabilities of a particular competitor who aims to maximize parity, etc. These methods result in information loss and do not provide a suitable balance between accuracy and fairness or do not ensure limiting the biases in training. To this end, we propose a powerful strategy for training deep learning models called the Distraction module, which can be theoretically proven effective in controlling bias from affecting the classification results. This method can be utilized with different data types (e.g., Tabular, images, graphs, etc.). We demonstrate the potency of the proposed method by testing it on UCI Adult and Heritage Health datasets (tabular), POKEC-Z, POKEC-N and NBA datasets (graph), and CelebA dataset (vision). Using state-of-the-art methods proposed in the fairness literature for each dataset, we exhibit our model is superior to these proposed methods in minimizing bias and maintaining accuracy.

翻訳日:2023-11-08 02:09:01 公開日:2023-11-04

# 新規探索に基づく粒子群最適化

Particle Swarm Optimization based on Novelty Search ( http://arxiv.org/abs/2203.05674v2 )

ライセンス: Link先を確認

Mr.Rajesh Misra and Dr. Kumar S Ray

(参考訳) 本稿では,ノベルティ探索と組み合わせた粒子群最適化アルゴリズムを提案する。 Novelty Searchは、検索ドメインで検索する新しい場所を見つけ、次にParticle Swarm Optimizationはその領域を厳格に検索して、グローバルな最適解を求める。この方法は、客観的な自由であるノベルティサーチによって制御されるため、ローカルオプティマではブロックされない。より局所的な最適値と第二大域的最適値がより多く存在する関数に対して、本手法はうまく機能する。現在のアルゴリズムは、検索エリア全体を検索するまで停止しない。一連の実験により、複素最適化テスト関数に対する現在のアルゴリズムの堅牢性と有効性が証明された。

In this paper we propose a Particle Swarm Optimization algorithm combined with Novelty Search. Novelty Search finds novel place to search in the search domain and then Particle Swarm Optimization rigorously searches that area for global optimum solution. This method is never blocked in local optima because it is controlled by Novelty Search which is objective free. For those functions where there are many more local optima and second global optimum is far from true optimum, the present method works successfully. The present algorithm never stops until it searches entire search area. A series of experimental trials prove the robustness and effectiveness of the present algorithm on complex optimization test functions.

翻訳日:2023-11-08 02:08:34 公開日:2023-11-04

# ニューラルタンジェントカーネルを用いたグラフ畳み込みネットワークの新しい展望

New Insights into Graph Convolutional Networks using Neural Tangent Kernels ( http://arxiv.org/abs/2110.04060v2 )

ライセンス: Link先を確認

Mahalakshmi Sabanayagam, Pascal Esser, Debarghya Ghoshdastidar

(参考訳) Graph Convolutional Networks (GCN)は、ネットワーク構造化データを学ぶための強力なツールとして登場した。実験的に成功したが、GCNは厳密な説明を持たない特定の振る舞いを示す。例えば、GCNのパフォーマンスはネットワーク深さの増加とともに著しく低下する。本稿では,グラフに関する半教師付き学習に注目し,その観察をNutral Tangent Kernels (NTK) のレンズを通して説明する。我々は(スキップ接続なしで)無限に広いgcnに対応するntkを導出する。その後、得られたNTKを用いて、適切な正規化を行うと、ネットワーク深さがGCNの性能を劇的に低下させるとは限らないことを確認する。さらに,超パラメータ自由決定性カーネルであるため,超パラメータチューニングによる性能変動に悩まされないGCNに対する効率的な「代理モデル」としてNTKを提案する。このアイデアの有効性は、サロゲートNTKを用いたGCNに対する異なるスキップ接続の比較によって示される。

Graph Convolutional Networks (GCNs) have emerged as powerful tools for learning on network structured data. Although empirically successful, GCNs exhibit certain behaviour that has no rigorous explanation -- for instance, the performance of GCNs significantly degrades with increasing network depth, whereas it improves marginally with depth using skip connections. This paper focuses on semi-supervised learning on graphs, and explains the above observations through the lens of Neural Tangent Kernels (NTKs). We derive NTKs corresponding to infinitely wide GCNs (with and without skip connections). Subsequently, we use the derived NTKs to identify that, with suitable normalisation, network depth does not always drastically reduce the performance of GCNs -- a fact that we also validate through extensive simulation. Furthermore, we propose NTK as an efficient `surrogate model' for GCNs that does not suffer from performance fluctuations due to hyper-parameter tuning since it is a hyper-parameter free deterministic kernel. The efficacy of this idea is demonstrated through a comparison of different skip connections for GCNs using the surrogate NTKs.

翻訳日:2023-11-08 02:05:11 公開日:2023-11-04

# 分布サンプルを用いた非iidデータからのグレイ学習

Gray Learning from Non-IID Data with Out-of-distribution Samples ( http://arxiv.org/abs/2206.09375v2 )

ライセンス: Link先を確認

Zhilin Zhao and Longbing Cao and Chang-Dong Wang

(参考訳) 専門家がアノテートしても、トレーニングデータの完全性は保証されていない。特に、in-of-distriionサンプルとout-of-distriionサンプルで構成される非IIDデータセットに対して。理想的なシナリオでは、サンプルの大部分は分散内であり、意味的に逸脱したサンプルは分散外と識別され、アノテーションプロセス中に除外される。しかし、専門家は誤ってこれらの分布外サンプルを分布内として分類し、本質的に信頼できないラベルを割り当てることがある。この信頼できないラベルとさまざまなデータ型の組み合わせは、堅牢なニューラルネットワークを学習するタスクを特に困難にしている。信頼性の低い基底トラスラベルを別にすれば、分布内および分布外の両方のサンプルは、必ず特定のクラスに属するものから除外できる。これは、サンプルが属していないクラスを示す信頼できる補完ラベルを利用する可能性を開く。この知見に導かれて,本研究では,基礎的真理と相補的ラベルの両面を活用した新しいアプローチである「textit{Gray Learning} (GL)」を導入する。重要なことに、GLは予測信頼度に基づいてこれらの2つのラベルの損失重みを適応的に調整する。統計学習理論のアプローチを基礎として一般化誤差の境界を導出し,非IID設定においてもGLが厳密な制約を達成できることを実証する。実験結果から,本手法はロバストな統計に基づく代替手法よりも優れていることがわかった。

The integrity of training data, even when annotated by experts, is far from guaranteed, especially for non-IID datasets comprising both in- and out-of-distribution samples. In an ideal scenario, the majority of samples would be in-distribution, while samples that deviate semantically would be identified as out-of-distribution and excluded during the annotation process. However, experts may erroneously classify these out-of-distribution samples as in-distribution, assigning them labels that are inherently unreliable. This mixture of unreliable labels and varied data types makes the task of learning robust neural networks notably challenging. We observe that both in- and out-of-distribution samples can almost invariably be ruled out from belonging to certain classes, aside from those corresponding to unreliable ground-truth labels. This opens the possibility of utilizing reliable complementary labels that indicate the classes to which a sample does not belong. Guided by this insight, we introduce a novel approach, termed \textit{Gray Learning} (GL), which leverages both ground-truth and complementary labels. Crucially, GL adaptively adjusts the loss weights for these two label types based on prediction confidence levels. By grounding our approach in statistical learning theory, we derive bounds for the generalization error, demonstrating that GL achieves tight constraints even in non-IID settings. Extensive experimental evaluations reveal that our method significantly outperforms alternative approaches grounded in robust statistics.

翻訳日:2023-11-08 01:55:58 公開日:2023-11-04

# ソースコード要約のための抽出・要約フレームワーク

An Extractive-and-Abstractive Framework for Source Code Summarization ( http://arxiv.org/abs/2206.07245v2 )

ライセンス: Link先を確認

Weisong Sun and Chunrong Fang and Yuchen Chen and Quanjun Zhang and Guanhong Tao and Tingxu Han and Yifei Ge and Yudu You and Bin Luo

(参考訳) (資料) コード要約は、自然言語の形式で与えられたコードスニペットの要約/記事を自動的に生成することを目的としている。このような要約は、開発者がソースコードを理解し維持するのを手助けする上で重要な役割を果たす。既存のコード要約技術は抽出メソッドと抽象メソッドに分類できる。抽出方法は、検索技術を用いてコードスニペットから重要文とキーワードのサブセットを抽出し、重要文とキーワードの事実的詳細を保持する要約を生成する。しかし、そのようなサブセットは識別子やエンティティの命名を見逃す可能性があり、その結果、生成された要約の自然性は通常貧弱である。この抽象的手法は、ニューラルネットワーク翻訳ドメインからエンコーダ・デコーダモデルを利用した人書き的な要約を生成することができる。生成された要約は、しばしば重要な事実の詳細を見逃す。実物的詳細を保存した人文的要約を生成するために,新しい抽出・要約フレームワークを提案する。フレームワークの抽出モジュールは、コードスニペットを取り込んで、重要な事実の詳細を含む重要なステートメントを予測する、抽出コード要約のタスクを実行する。フレームワークの抽象モジュールは、コードスニペット全体と重要な文を並行して取り込んで、簡潔で人書きのような自然言語要約を生成する抽象的なコード要約のタスクを実行する。 6つのプログラミング言語を含む3つのデータセットに対して広範な実験を行うことで、EACSと呼ばれる手法の有効性を評価する。実験の結果, EACSはBLEU, METEOR, ROUGH-Lの3つの指標において, 最先端技術よりも優れていた。

(Source) Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code summarization techniques can be categorized into extractive methods and abstractive methods. The extractive methods extract a subset of important statements and keywords from the code snippet using retrieval techniques, and generate a summary that preserves factual details in important statements and keywords. However, such a subset may miss identifier or entity naming, and consequently, the naturalness of generated summary is usually poor. The abstractive methods can generate human-written-like summaries leveraging encoder-decoder models from the neural machine translation domain. The generated summaries however often miss important factual details. To generate human-written-like summaries with preserved factual details, we propose a novel extractive-and-abstractive framework. The extractive module in the framework performs a task of extractive code summarization, which takes in the code snippet and predicts important statements containing key factual details. The abstractive module in the framework performs a task of abstractive code summarization, which takes in the entire code snippet and important statements in parallel and generates a succinct and human-written-like natural language summary. We evaluate the effectiveness of our technique, called EACS, by conducting extensive experiments on three datasets involving six programming languages. Experimental results show that EACS significantly outperforms state-of-the-art techniques in terms of all three widely used metrics, including BLEU, METEOR, and ROUGH-L.

翻訳日:2023-11-08 01:55:33 公開日:2023-11-04

# d2d対応ヘテロジニアスネットワークにおける分散機械学習:アーキテクチャ、パフォーマンス、オープンチャレンジ

Distributed Machine Learning in D2D-Enabled Heterogeneous Networks: Architectures, Performance, and Open Challenges ( http://arxiv.org/abs/2206.01906v2 )

ライセンス: Link先を確認

Zhipeng Cheng, Xuwei Fan, Minghui Liwang, Ning Chen, Xiaoyu Xia, Xianbin Wang

(参考訳) データプライバシに関する懸念は、マシンラーニング(ML)アーキテクチャを集中型から分散型に移行させ、プライバシを保存する2つの主要なメカニズムとして、フェデレーション付き学習(FL)とスプリットラーニング(SL)を生み出した。しかしながら、デバイス間(d2d)対応のヘテロジニアスネットワークにおけるflやslの実装は、アーキテクチャのスケーラビリティやトレーニングの遅延の長期化など、大きな課題となっている。これらの課題に対処するため、本稿では、ハイブリッドスプリットFL(HSFL)とハイブリッドフェデレーションSL(HFSL)という、2つの革新的なハイブリッド分散MLアーキテクチャを紹介する。このようなアーキテクチャは、D2D対応ヘテロジニアス無線ネットワークにおけるFLとSLの長所を組み合わせたものである。 HSFLとHFSLの性能と利点を包括的に分析するとともに,今後の探索に向けたオープンな課題も強調する。我々は,非独立および非独立に分散した3つのデータセットを用いて予備シミュレーションを行い,アーキテクチャの実現可能性を示す。シミュレーションの結果,従来のflおよびslと比較して通信/計算コストとトレーニング遅延が著しく低減した。

The ever-growing concerns regarding data privacy have led to a paradigm shift in machine learning (ML) architectures from centralized to distributed approaches, giving rise to federated learning (FL) and split learning (SL) as the two predominant privacy-preserving ML mechanisms. However,implementing FL or SL in device-to-device (D2D)-enabled heterogeneous networks with diverse clients presents substantial challenges, including architecture scalability and prolonged training delays. To address these challenges, this article introduces two innovative hybrid distributed ML architectures, namely, hybrid split FL (HSFL) and hybrid federated SL (HFSL). Such architectures combine the strengths of both FL and SL in D2D-enabled heterogeneous wireless networks. We provide a comprehensive analysis of the performance and advantages of HSFL and HFSL, while also highlighting open challenges for future exploration. We support our proposals with preliminary simulations using three datasets in non-independent and non-identically distributed settings, demonstrating the feasibility of our architectures. Our simulations reveal notable reductions in communication/computation costs and training delays as compared to conventional FL and SL.

翻訳日:2023-11-08 01:54:40 公開日:2023-11-04

# トピック: 注意力を用いたソースコードからの学習リポジトリ埋め込み

Topical: Learning Repository Embeddings from Source Code using Attention ( http://arxiv.org/abs/2208.09495v4 )

ライセンス: Link先を確認

Agathe Lherondelle, Varun Babbar, Yash Satsangi, Fran Silavong, Shaltiel Eloul, Sean Moran

(参考訳) 本稿では,リポジトリレベルの埋め込みのための新しいディープニューラルネットワークである topical を提案する。自然言語ドキュメンテーションやナイーブアグリゲーション技術に依存した既存の手法は、トピックルが注意の仕組みを活用していることより優れている。このメカニズムはソースコード、フル依存グラフ、スクリプトレベルのテキストデータからリポジトリレベルの表現を生成する。公開アクセス可能なgithubリポジトリでトレーニングされた topical は,リポジトリの自動タグ付けなどのタスクにおいて,複数のベースラインを越えたものだ。 Topicalはスケーラビリティと効率性を実証し、リポジトリレベルの表現計算に価値ある貢献をする。さらなる研究のために、関連するツール、コード、トレーニングデータセットがhttps://github.com/jpmorganchase/topicalで提供されている。

This paper presents Topical, a novel deep neural network for repository level embeddings. Existing methods, reliant on natural language documentation or naive aggregation techniques, are outperformed by Topical's utilization of an attention mechanism. This mechanism generates repository-level representations from source code, full dependency graphs, and script level textual data. Trained on publicly accessible GitHub repositories, Topical surpasses multiple baselines in tasks such as repository auto-tagging, highlighting the attention mechanism's efficacy over traditional aggregation methods. Topical also demonstrates scalability and efficiency, making it a valuable contribution to repository-level representation computation. For further research, the accompanying tools, code, and training dataset are provided at: https://github.com/jpmorganchase/topical.

翻訳日:2023-11-08 01:41:53 公開日:2023-11-04

# ポスト量子非可視性の新しいアプローチ

A New Approach to Post-Quantum Non-Malleability ( http://arxiv.org/abs/2207.05861v3 )

ライセンス: Link先を確認

Xiao Liang, Omkant Pandey, Takashi Yamakawa

(参考訳) 我々は、最初の$\mathit{constant}$-$\mathit{round}$ が、$\mathit{post}$-$\mathit{quantum}$$$\mathit{one}$-$$\mathit{way}$$$$\mathit{functions}$という最小限の仮定の下で、ポスト量子化後の非可算コミットメントの構成を提供する。コミットメントに関して、非適合性の標準概念を達成する。以前の構成では同じ仮定で$\Omega(\log^*\lambda)$ラウンドが必要だった。我々は,ポスト量子環境において使用しやすい非可算コミットメントのための新しい手法により,結果を得る。この手法はまた、古典的設定において、一定周期の非可算なコミットメントに対するセキュリティのほぼ初歩的な証明を与える。既存の研究と組み合わせると、我々の結果は古典関数と量子関数の両方に対して最初の定ラウンドの量子セキュアなマルチパーティ計算($\mathit{in}$ $\mathit{the}$ $\mathit{plain}$ $\mathit{model}$, $\mathit{polynomial}$ hardness of quantum full-homomorphic encryption and quantum learning with error)が得られる。

We provide the first $\mathit{constant}$-$\mathit{round}$ construction of post-quantum non-malleable commitments under the minimal assumption that $\mathit{post}$-$\mathit{quantum}$ $\mathit{one}$-$\mathit{way}$ $\mathit{functions}$ exist. We achieve the standard notion of non-malleability with respect to commitments. Prior constructions required $\Omega(\log^*\lambda)$ rounds under the same assumption. We achieve our results through a new technique for constant-round non-malleable commitments which is easier to use in the post-quantum setting. The technique also yields an almost elementary proof of security for constant-round non-malleable commitments in the classical setting, which may be of independent interest. When combined with existing work, our results yield the first constant-round quantum-secure multiparty computation for both classical and quantum functionalities $\mathit{in}$ $\mathit{the}$ $\mathit{plain}$ $\mathit{model}$, under the $\mathit{polynomial}$ hardness of quantum fully-homomorphic encryption and quantum learning with errors.

翻訳日:2023-11-08 01:39:54 公開日:2023-11-04

# 線形マルコフ決定過程に対する最短最適強化学習

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes ( http://arxiv.org/abs/2212.06132v3 )

ライセンス: Link先を確認

Jiafan He and Heyang Zhao and Dongruo Zhou and Quanquan Gu

(参考訳) 線形関数近似による強化学習(rl)について検討した。任意の特徴写像の線形関数として遷移確率をパラメータ化できるエピソドック時間不均質線形マルコフ決定過程(線形mdp)に対して、ほぼミニマックスの最適後悔である$\tilde o(d\sqrt{h^3k})$ を達成する最初の計算効率の高いアルゴリズムを提案し、ここで$d$ は特徴写像の次元、$h$ は計画の地平線、$k$ はエピソード数である。本アルゴリズムは,(1)最適値関数の分散を直接推定し,(2)エピソード数に対して単調に減少して推定精度が向上し,(3)推定値関数クラスの複雑性を制御するために,値関数推定器の更新にレアスイッチングポリシを用いる新しい分散推定器に依存する,注意深く設計された重み付き線形回帰スキームに基づいている。本研究は,線形mdpを用いた最適rlに対する完全な回答を提供するとともに,開発したアルゴリズムと理論的ツールが独立した興味を持つかもしれない。

We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given feature mapping, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $\tilde O(d\sqrt{H^3K})$, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $K$ is the number of episodes. Our algorithm is based on a weighted linear regression scheme with a carefully designed weight, which depends on a new variance estimator that (1) directly estimates the variance of the optimal value function, (2) monotonically decreases with respect to the number of episodes to ensure a better estimation accuracy, and (3) uses a rare-switching policy to update the value function estimator to control the complexity of the estimated value function class. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.

翻訳日:2023-11-08 01:31:40 公開日:2023-11-04

# コントラル検出のための深部セグメンテーションモデルの性能評価

Performance evaluation of deep segmentation models for Contrails detection ( http://arxiv.org/abs/2211.14851v4 )

ライセンス: Link先を確認

Akshat Bhandari and Sriya Rallabandi and Sanchit Singhal and Aditya Kasliwal and Pratinav Seth

(参考訳) コントラル(Contrail)は、冷たく湿った空気を飛ぶ際に航空機のエンジンの排気によって生じる線状の氷雲である。放射される長波の約33%を地球に吸収または誘導することで温室効果を発生させる。それらは航空活動による気候変動の半分以上を占める。コントラルの回避と飛行経路の調整は、その影響を減らすための安価で効果的な方法である可能性がある。違反回避戦略の開発と評価には,正確で自動化された信頼性の高い検出アルゴリズムが必要である。コントラル検出の進歩は、いくつかの要因により、主に品質ラベル付きデータの欠如により、著しく制限されている。近年,大型のLandsat-8コントラルデータセットが提案されている。各コントラルには、ランドサット8衛星画像の様々な場面で様々な入力が慎重にラベル付けされている。本研究では,様々な損失関数とエンコーダのバックボーンを組み合わせたセグメンテーションモデルをベンチマークする。この研究は、低軌道衛星画像の反則を検出するために最先端のセグメンテーション技術を適用した最初のものである。私たちの作品は、反則セグメンテーションのオープンベンチマークとしても使用でき、公開されています。

Contrails, short for condensation trails, are line-shaped ice clouds produced by aircraft engine exhaust when they fly through cold and humid air. They generate a greenhouse effect by absorbing or directing back to Earth approximately 33% of emitted outgoing longwave radiation. They account for over half of the climate change resulting from aviation activities. Avoiding contrails and adjusting flight routes could be an inexpensive and effective way to reduce their impact. An accurate, automated, and reliable detection algorithm is required to develop and evaluate contrail avoidance strategies. Advancement in contrail detection has been severely limited due to several factors, primarily due to a lack of quality-labeled data. Recently, proposed a large human-labeled Landsat-8 contrails dataset. Each contrail is carefully labeled with various inputs in various scenes of Landsat-8 satellite imagery. In this work, we benchmark several popular segmentation models with combinations of different loss functions and encoder backbones. This work is the first to apply state-of-the-art segmentation techniques to detect contrails in low-orbit satellite imagery. Our work can also be used as an open benchmark for contrail segmentation and is publicly available.

翻訳日:2023-11-08 01:30:38 公開日:2023-11-04

# 連合学習とメタ学習:アプローチ、応用、方向性

Federated Learning and Meta Learning: Approaches, Applications, and Directions ( http://arxiv.org/abs/2210.13111v2 )

ライセンス: Link先を確認

Xiaonan Liu and Yansha Deng and Arumugam Nallanathan and Mehdi Bennis

(参考訳) ここ数年、リソース管理、干渉管理、自律性、無線ネットワークにおける意思決定に対処するため、機械学習(ML)の分野で大きな進歩を遂げてきた。従来のMLアプローチは、トレーニングのために中央サーバでデータを収集する集中型メソッドに依存している。しかし、このアプローチはデバイスのデータのプライバシを維持するという点で課題となる。この問題に対処するため、フェデレーション学習(fl)は、データプライバシを損なうことなく、エッジデバイスが協調的にmlモデルをトレーニングできる効果的なソリューションとして浮上した。 FLでは、ローカルデータセットは共有されず、すべてのデバイスを含む特定のタスクのグローバルモデル学習に重点を置いている。しかし、FLは、異なるデータ分布を持つデバイスにモデルを適応することに関して制限がある。このような場合、メタラーニングは、少数のデータサンプルを用いて異なるデータ分布に学習モデルの適応を可能にするため、考慮される。本稿では,fl,meta learning,federated meta learning (fedmeta)の包括的レビューを紹介する。他のチュートリアルと異なり、私たちの目標はFL、メタラーニング、FedMetaの方法論をどのように設計、最適化、進化させ、無線ネットワーク上で応用するかを探ることです。また、これらの学習アルゴリズム間の関係を分析し、実世界の応用におけるそれらの利点と欠点について検討する。

Over the past few years, significant advancements have been made in the field of machine learning (ML) to address resource management, interference management, autonomy, and decision-making in wireless networks. Traditional ML approaches rely on centralized methods, where data is collected at a central server for training. However, this approach poses a challenge in terms of preserving the data privacy of devices. To address this issue, federated learning (FL) has emerged as an effective solution that allows edge devices to collaboratively train ML models without compromising data privacy. In FL, local datasets are not shared, and the focus is on learning a global model for a specific task involving all devices. However, FL has limitations when it comes to adapting the model to devices with different data distributions. In such cases, meta learning is considered, as it enables the adaptation of learning models to different data distributions using only a few data samples. In this tutorial, we present a comprehensive review of FL, meta learning, and federated meta learning (FedMeta). Unlike other tutorial papers, our objective is to explore how FL, meta learning, and FedMeta methodologies can be designed, optimized, and evolved, and their applications over wireless networks. We also analyze the relationships among these learning algorithms and examine their advantages and disadvantages in real-world applications.

翻訳日:2023-11-08 01:27:36 公開日:2023-11-04

# 大規模言語モデルにおける言語と思考の解離

Dissociating language and thought in large language models ( http://arxiv.org/abs/2301.06627v2 )

ライセンス: Link先を確認

Kyle Mahowald, Anna A. Ivanova, Idan A. Blank, Nancy Kanwisher, Joshua B. Tenenbaum, Evelina Fedorenko

(参考訳) 大規模言語モデル(LLM)は、現在まで人間の言語を習得する上で最も近いモデルとなっているが、その言語的および認知的能力に関する意見は相変わらず分かれている。本稿では,言語規則とパターンの理解-および機能的言語能力-世界における言語の理解と活用-を,形式的言語能力の区別を用いて評価する。我々はこの区別を人間の神経科学に置き、形式的および機能的な能力は異なる神経機構に依存していることを示す。 LLMの形式的能力は驚くほど優れているが、機能的能力のタスクのパフォーマンスは不明瞭であり、しばしば外部モジュールとの特別な微調整や結合を必要とする。要するに、LLMは言語の優れたモデルであるが、人間の思考の不完全なモデルである。

Large language models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their linguistic and cognitive capabilities remain split. Here, we evaluate LLMs using a distinction between formal linguistic competence--knowledge of linguistic rules and patterns--and functional linguistic competence--understanding and using language in the world. We ground this distinction in human neuroscience, showing that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty and often requires specialized fine-tuning and/or coupling with external modules. In short, LLMs are good models of language but incomplete models of human thought.

翻訳日:2023-11-08 01:18:19 公開日:2023-11-04

# 完全逆数検出のための(ほぼ)局所的成長速度推定

Unfolding Local Growth Rate Estimates for (Almost) Perfect Adversarial Detection ( http://arxiv.org/abs/2212.06776v3 )

ライセンス: Link先を確認

Peter Lorenz, Margret Keuper and Janis Keuper

(参考訳) 畳み込みニューラルネットワーク(CNN)は、多くの知覚的タスクにおける最先端のソリューションを定義する。しかし、現在のCNNアプローチは、人間の目に準知覚できない状態でシステムを騙すために特別に作られた入力の敵の摂動に対して脆弱なままである。近年、モデル硬化や明示的な防御機構の追加など、CNNをこのような攻撃から守るための様々なアプローチが提案されている。これにより、ネットワークに小さな「検出器」が含まれ、真データと逆摂動を含むデータとを区別する二分分類タスクで訓練される。本研究では,ネットワークの局所固有次元(LID)と敵攻撃の関係について,最近の知見を生かした,シンプルで軽量な検出器を提案する。 LID測度の再解釈といくつかの単純な適応に基づいて、敵検出の最先端をかなりのマージンで超越し、複数のネットワークやデータセットのF1スコアでほぼ完璧な結果を得る。出典: https://github.com/adverML/multiLID

Convolutional neural networks (CNN) define the state-of-the-art solution on many perceptual tasks. However, current CNN approaches largely remain vulnerable against adversarial perturbations of the input that have been crafted specifically to fool the system while being quasi-imperceptible to the human eye. In recent years, various approaches have been proposed to defend CNNs against such attacks, for example by model hardening or by adding explicit defence mechanisms. Thereby, a small "detector" is included in the network and trained on the binary classification task of distinguishing genuine data from data containing adversarial perturbations. In this work, we propose a simple and light-weight detector, which leverages recent findings on the relation between networks' local intrinsic dimensionality (LID) and adversarial attacks. Based on a re-interpretation of the LID measure and several simple adaptations, we surpass the state-of-the-art on adversarial detection by a significant margin and reach almost perfect results in terms of F1-score for several networks and datasets. Sources available at: https://github.com/adverML/multiLID

翻訳日:2023-11-08 01:14:57 公開日:2023-11-04

# 分極正規化とワンパス学習による合意サブネットワークの学習

Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training ( http://arxiv.org/abs/2302.10798v4 )

ライセンス: Link先を確認

Xiaoying Zhi, Varun Babbar, Pheobe Sun, Fran Silavong, Ruibo Shi, Sean Moran

(参考訳) 最近の大規模で複雑なニューラルネットワークモデルの動向を考えると、グリーンAIの主題はディープラーニングコミュニティ内で注目を集めている。推論時のトレーニングの計算負荷を削減する既存のソリューションは、通常ネットワークパラメータの刈り込みを伴う。プルーニングスキームは、反復的なトレーニングと静的プルーニングの微調整、動的プルーニンググラフの反復計算によって余分なオーバーヘッドを生み出す。そこで本研究では, 省エネコストを最小にしつつ, 下流タスクの完全パラメータ化ネットワークと同等の性能を維持する軽量サブネットワークを学習するための新しいパラメータプルーニング手法を提案する。提案手法はグリーン指向であり,動的プルーニング法により最適な静的サブネットワークを発見するためには,ワンオフトレーニングのみを必要とする。プルーニング方式は、二分ゲーティングモジュールと、ユーザが定義した間隔でサブネットワークを探索する新しい損失関数から構成される。提案手法は,訓練段階と推論段階の両方でエネルギーを節約し,演算オーバーヘッドの増大を回避し,同時に刈り取り訓練を可能にする。 CIFAR-10 と CIFAR-100 では,分類精度が1% 未満の深層ネットワークにおける接続の50%を除去できる可能性が示唆された。本手法は他のプルーニング法と比較して,計算コストの等価な削減のための精度の低下を示す。

The subject of green AI has been gaining attention within the deep learning community given the recent trend of ever larger and more complex neural network models. Existing solutions for reducing the computational load of training at inference time usually involve pruning the network parameters. Pruning schemes often create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph. We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks. Our proposed pruning scheme is green-oriented, as it only requires a one-off training to discover the optimal static sub-networks by dynamic pruning methods. The pruning scheme consists of a binary gating module and a novel loss function to uncover sub-networks with user-defined sparsity. Our method enables pruning and training simultaneously, which saves energy in both the training and inference phases and avoids extra computational overhead from gating modules at inference time. Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy. Compared to other related pruning methods, our method demonstrates a lower drop in accuracy for equivalent reductions in computational cost.

翻訳日:2023-11-08 01:05:42 公開日:2023-11-04

# AfriSenti: アフリカの言語に対するTwitterの感情分析ベンチマーク

AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages ( http://arxiv.org/abs/2302.08956v5 )

ライセンス: Link先を確認

Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, Nedjma Ousidhoum, David Ifeoluwa Adelani, Seid Muhie Yimam, Ibrahim Sa'id Ahmad, Meriem Beloucif, Saif M. Mohammad, Sebastian Ruder, Oumaima Hourrane, Pavel Brazdil, Felermino D\'ario M\'ario Ant\'onio Ali, Davis David, Salomey Osei, Bello Shehu Bello, Falalu Ibrahim, Tajuddeen Gwadabe, Samuel Rutunda, Tadesse Belay, Wendimu Baye Messelle, Hailu Beshada Balcha, Sisay Adugna Chala, Hagos Tesfahun Gebremichael, Bernard Opoku, Steven Arthur

(参考訳) アフリカには6以上の言語族から2000以上の言語があり、全大陸で最高の言語多様性がある。 75の言語があり、それぞれ100万人の話者がいる。しかし、アフリカ語に関するNLP研究はほとんど行われていない。このような研究を可能にする上で重要なのは、高品質な注釈付きデータセットの提供だ。本稿では,4つの言語族から,14のアフリカ語(アムハラ語,アルジェリア語,ハウサ語,イグボ語,キニャルワンダ語,モロッコ語,モザンビーク語,ナイジェリア・ピジン語,オロモ語,スワヒリ語,ティグリニャ語,トワイ語,キシトニガ語,ヨルジュブワ語)で合計110,000以上のツイートを含む感情分析ベンチマークであるafrisentiを紹介する。ツイートはネイティブスピーカーによって注釈付けされ、AfriSenti-SemEval共有タスクで使用された(AfriSenti Shared Taskには200人以上の参加者がいた)。各データセットのキュレーションにおいて,データ収集の方法論,アノテーションプロセス,対処すべき課題について述べる。さらに,異なるデータセット上で実施したベースライン実験を報告し,その有用性について考察する。

Africa is home to over 2,000 languages from more than six language families and has the highest linguistic diversity among all continents. These include 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial to enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, a sentiment analysis benchmark that contains a total of >110,000 tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yor\`ub\'a) from four language families. The tweets were annotated by native speakers and used in the AfriSenti-SemEval shared task (The AfriSenti Shared Task had over 200 participants. See website at https://afrisenti-semeval.github.io). We describe the data collection methodology, annotation process, and the challenges we dealt with when curating each dataset. We further report baseline experiments conducted on the different datasets and discuss their usefulness.

翻訳日:2023-11-08 01:05:20 公開日:2023-11-04

# 病理組織学的全スライディング画像におけるメラノーマ皮膚癌の検出と局在

Detection and Localization of Melanoma Skin Cancer in Histopathological Whole Slide Images ( http://arxiv.org/abs/2302.03014v4 )

ライセンス: Link先を確認

Neel Kanwal, Roger Amundsen, Helga Hardardottir, Luca Tomasetti, Erling Sandoy Undersrud, Emiel A.M. Janssen, Kjersti Engan

(参考訳) 早期に診断および治療を行ったメラノーマは生存率を高めることができる。皮膚がんの発生が予想される増加と皮膚病理学者の足跡は、計算病理学(CPATH)システムの必要性を強調している。深層学習(DL)モデルを持つCPATHシステムは、基礎となる形態学的および細胞的特徴を利用してメラノーマの存在を識別する可能性がある。本論文は,WSI(Whole Slide Images)における悪性黒色腫の検出と正常皮膚と良性悪性黒色腫病変の鑑別を目的としたDL法を提案する。本手法は, 病変を高精度に検出し, 病理医の関心領域を特定するためにWSI上に局在する。興味深いことに,本手法では,まず1つのCNNネットワークを用いて局所化マップを作成し,それを用いてスライドレベルの予測を行い,メラノーマ患者を判定する。ベストモデルでは、0.992のF1スコアと0.99の感度でパッチ単位の分類結果が得られる。ソースコードはhttps://github.com/RogerAmundsen/Melanoma-Diagnosis-and-Localization-from-Whole-Slide-Images-using-C onvolutional-Neural-Networks。

Melanoma diagnosed and treated in its early stages can increase the survival rate. A projected increase in skin cancer incidents and a dearth of dermatopathologists have emphasized the need for computational pathology (CPATH) systems. CPATH systems with deep learning (DL) models have the potential to identify the presence of melanoma by exploiting underlying morphological and cellular features. This paper proposes a DL method to detect melanoma and distinguish between normal skin and benign/malignant melanocytic lesions in Whole Slide Images (WSI). Our method detects lesions with high accuracy and localizes them on a WSI to identify potential regions of interest for pathologists. Interestingly, our DL method relies on using a single CNN network to create localization maps first and use them to perform slide-level predictions to determine patients who have melanoma. Our best model provides favorable patch-wise classification results with a 0.992 F1 score and 0.99 sensitivity on unseen data. The source code is https://github.com/RogerAmundsen/Melanoma-Diagnosis-and-Localization-from-Whole-Slide-Images-using-C onvolutional-Neural-Networks.

翻訳日:2023-11-08 01:03:20 公開日:2023-11-04

# GQE-Net:ポイントクラウドカラー属性のためのグラフベースの品質向上ネットワーク

GQE-Net: A Graph-based Quality Enhancement Network for Point Cloud Color Attribute ( http://arxiv.org/abs/2303.13764v2 )

ライセンス: Link先を確認

Jinrui Xing, Hui Yuan, Raouf Hamzaoui, Hao Liu, and Junhui Hou

(参考訳) 近年、点雲は3次元(3次元)の視覚オブジェクトやシーンを表現するために人気が高まっている。点雲を効率的に保存・送信するために圧縮法が開発されているが、品質が劣化することが多い。点雲の色歪みを低減するため,幾何学情報を補助入力とし,グラフ畳み込みブロックを用いて局所特徴を効率的に抽出するグラフベース品質向上ネットワーク(GQE-Net)を提案する。具体的には,マルチヘッドグラフアテンション機構を備えた並列シリアルグラフアテンションモジュールを用いて重要な点や特徴に着目し,それらを融合させる。さらに,点間の正規性と幾何学的距離を考慮に入れた特徴改善モジュールを設計する。 GPUメモリ容量の制限の中で機能するために、歪んだポイントクラウドはオーバーラップ可能な3Dパッチに分割され、品質向上のためにGQE-Netに送られる。異なる色成分間のデータ分布の違いを考慮するため、3つの色成分について3つのモデルを訓練する。実験結果から,本手法は最先端性能を実現することが示された。例えば、幾何ベースのポイントクラウド圧縮 (g-pcc) 標準である 0.43 db, 0.25 db, 0.36 db bjontegaard delta (bd)-peak-signal-to-noise ratio (psnr) の最近のテストモデル上でgqe-netを実装する場合、それぞれ、y、cb、crコンポーネントの高密度ポイントクラウド上で、14.0%、9.3%、14.5%のbdレート節約を達成できる。このメソッドのソースコードはhttps://github.com/xjr998/gqe-netで入手できる。

In recent years, point clouds have become increasingly popular for representing three-dimensional (3D) visual objects and scenes. To efficiently store and transmit point clouds, compression methods have been developed, but they often result in a degradation of quality. To reduce color distortion in point clouds, we propose a graph-based quality enhancement network (GQE-Net) that uses geometry information as an auxiliary input and graph convolution blocks to extract local features efficiently. Specifically, we use a parallel-serial graph attention module with a multi-head graph attention mechanism to focus on important points or features and help them fuse together. Additionally, we design a feature refinement module that takes into account the normals and geometry distance between points. To work within the limitations of GPU memory capacity, the distorted point cloud is divided into overlap-allowed 3D patches, which are sent to GQE-Net for quality enhancement. To account for differences in data distribution among different color components, three models are trained for the three color components. Experimental results show that our method achieves state-of-the-art performance. For example, when implementing GQE-Net on a recent test model of the geometry-based point cloud compression (G-PCC) standard, 0.43 dB, 0.25 dB, and 0.36 dB Bjontegaard delta (BD)-peak-signal-to-noise ratio (PSNR), corresponding to 14.0%, 9.3%, and 14.5% BD-rate savings can be achieved on dense point clouds for the Y, Cb, and Cr components, respectively. The source code of our method is available at https://github.com/xjr998/GQE-Net.

翻訳日:2023-11-07 23:21:01 公開日:2023-11-04

# 命令型ニューラル表現を用いたタスク指向型ヒューマンオブジェクトインタラクション生成

Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations ( http://arxiv.org/abs/2303.13129v2 )

ライセンス: Link先を確認

Quanzhou Li, Jingbo Wang, Chen Change Loy, Bo Dai

(参考訳) デジタルヒューマンモーション合成は、映画、AR/VR、ビデオゲームに応用される活発な研究分野である。自然で現実的な人間の動きを生成する方法が提案されたが、ほとんどは人間のモデリングに焦点を合わせ、物体の動きを無視した。シミュレーションにおけるタスク指向の人間-物体相互作用運動の生成は困難である。物体の使用の異なる意図のために、人間は様々な動きを行うため、人間はまず物体に接近し、そこに留まる代わりに人間と連続して動くように要求する。また、下流アプリケーションに展開するためには、合成された動きは、様々な目的のために予測された動きをパーソナライズするオプションを提供するために、長めの柔軟性が望まれる。この目的のために,タスクタイプ,オブジェクト,および開始状態のみを与えられた特定のタスクを実行するために,完全なヒューマン・オブジェクトインタラクション動作を生成する暗黙の神経表現によるタスク指向のヒューマン・オブジェクトインタラクション生成を提案する。 TOHOは3ステップで人物体の動きを生成する。 1) タスクの種類と対象情報を与えられたタスクを実行する際のキーフレームのポーズを最初に見積もる。 2) キーフレームを満たし,連続的な動作を生成する。 3) 最後に,コンパクトな閉形式物体運動推定を適用し,物体運動を生成する。本手法では,時間座標のみによってパラメータ化される連続運動を生成し,任意のフレームへのシーケンスのアップサンプリングやダウンサンプリングを可能にし,時間座標ベクトルの設計による動き速度の調整を行う。本手法の有効性を質的および定量的に実証する。この研究は、一般の人間とシーンの相互作用シミュレーションに向けてさらに一歩前進する。

Digital human motion synthesis is a vibrant research field with applications in movies, AR/VR, and video games. Whereas methods were proposed to generate natural and realistic human motions, most only focus on modeling humans and largely ignore object movements. Generating task-oriented human-object interaction motions in simulation is challenging. For different intents of using the objects, humans conduct various motions, which requires the human first to approach the objects and then make them move consistently with the human instead of staying still. Also, to deploy in downstream applications, the synthesized motions are desired to be flexible in length, providing options to personalize the predicted motions for various purposes. To this end, we propose TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations, which generates full human-object interaction motions to conduct specific tasks, given only the task type, the object, and a starting human status. TOHO generates human-object motions in three steps: 1) it first estimates the keyframe poses of conducting a task given the task type and object information; 2) then, it infills the keyframes and generates continuous motions; 3) finally, it applies a compact closed-form object motion estimation to generate the object motion. Our method generates continuous motions that are parameterized only by the temporal coordinate, which allows for upsampling or downsampling of the sequence to arbitrary frames and adjusting the motion speeds by designing the temporal coordinate vector. We demonstrate the effectiveness of our method, both qualitatively and quantitatively. This work takes a step further toward general human-scene interaction simulation.

翻訳日:2023-11-07 23:20:23 公開日:2023-11-04

# 触媒による熱過程の階層崩壊

A hierarchy of thermal processes collapses under catalysis ( http://arxiv.org/abs/2303.13020v2 )

ライセンス: Link先を確認

Jeongrak Son, Nelly H.Y. Ng

(参考訳) 熱操作は、熱力学的制約下での許容状態遷移の一般的な記述である。しかし、これらのプロセスをすべて包含するシンプルな方法の探求は未完成のままである。この課題は、容易に利用できると仮定された熱浴の触媒利用によって解決する。基本熱操作とマルコフ熱操作の2つの簡易操作を選択した。彼らは実験的な実現性で知られているが、生来のマルコヴィアン性のために熱活動の完全な範囲を捉えられなかった。しかし, 環境温度のギブス状態触媒によって操作が強化されると, この制限を克服できることを示す。以上の結果から, 熱的操作における自由状態は, より単純な操作に必要な非マルコビアン性を与える触媒として機能することが示唆された。さらに, 触媒が適用できる場合, 異なる熱過程(熱操作, 初等熱操作, マルコフ熱操作)が収束することを示す。特に,エネルギー固有ベイシスにおける初期状態のコヒーレンスに関するシナリオは,その特徴付けが難しいことで悪名高い。

Thermal operations are a generic description for allowed state transitions under thermodynamic restrictions. However, the quest for simpler methods to encompass all these processes remains unfulfilled. We resolve this challenge through the catalytic use of thermal baths, which are assumed to be easily accessible. We select two sets of simplified operations: elementary thermal operations and Markovian thermal operations. They are known for their experimental feasibility, but fail to capture the full extent of thermal operations due to their innate Markovianity. We nevertheless demonstrate that this limitation can be overcome when the operations are enhanced by ambient-temperature Gibbs state catalysts. In essence, our result indicates that free states within thermal operations can act as catalysts that provide the necessary non-Markovianity for simpler operations. Furthermore, we prove that when any catalyst can be employed, different thermal processes (thermal operations, elementary thermal operations, and Markovian thermal operations) converge. Notably, our results extend to scenarios involving initial states with coherence in the energy eigenbasis, a notoriously difficult process to characterise.

翻訳日:2023-11-07 23:19:58 公開日:2023-11-04

# ネットワークシナリオにおける局所モデルの数値支援決定

Numerically assisted determination of local models in network scenarios ( http://arxiv.org/abs/2303.09954v3 )

ライセンス: Link先を確認

Jos\'e M\'ario da Silva and Fernando Parisio

(参考訳) ネットワークシナリオにおける隠れ変数の濃度が一般性を失うことなく有限であると仮定できるという事実を生かして、与えられた統計的振る舞いを再現する明示的な局所モデルを見つけるための数値ツールを開発した。次に,ネットワーク局所境界が知られている統計的行動の家族を用いて,二元的シナリオを用いて数値計算を行った。さらに,入力のない三角形ネットワークにおいて,均一なランダムノイズを混合した3つの顕著な分布の臨界可視性について検討した。グリーンベルガー・ホルン・ザイリンガー(GHZ)およびW分布(第4次多項式の根である)の臨界可視性についての予想と、エレガント関節計測分布の臨界可視性の低い境界推定を提供する。開発されたコードとドキュメントはgithub.com/mariofilho281/localmodelsで公開されている

Taking advantage of the fact that the cardinalities of hidden variables in network scenarios can be assumed to be finite without loss of generality, a numerical tool for finding explicit local models that reproduce a given statistical behaviour was developed. The numerical procedure was then validated using families of statistical behaviours for which the network-local boundary is known, in the bilocal scenario. Furthermore, the critical visibility for 3 notable distributions mixed with a uniform random noise is investigated in the triangle network without inputs. We provide conjectures for the critical visibilities of the Greenberger-Horne-Zeilinger (GHZ) and W distributions (which are roots of 4th degree polynomials), as well as a lower bound estimate of the critical visibility of the Elegant Joint Measurement distribution. The developed codes and documentation are publicly available at github.com/mariofilho281/localmodels

翻訳日:2023-11-07 23:18:49 公開日:2023-11-04

# 多値拡散:画像生成のための無限次元スコアベース拡散モデル

Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation ( http://arxiv.org/abs/2303.04772v3 )

ライセンス: Link先を確認

Paul Hagemann, Sophie Mildenberger, Lars Ruthotto, Gabriele Steidl, Nicole Tianjiao Yang

(参考訳) スコアベース拡散モデル(SBDM)は画像生成のための最先端のアプローチとして最近登場した。既存のSBDMは通常有限次元の設定で定式化され、画像は有限サイズのテンソルと見なされる。本稿では,SBDMを無限次元設定,すなわち矩形領域でサポートされている関数としてトレーニングデータをモデル化する。より高解像度で画像を生成することの探求に加えて、私たちの一番の動機は、複数の解像度レベルで一貫した識別を可能にするために、よく考えられた無限次元の学習問題を作ることです。そこで我々は,様々な解像度レベルで一般化し,学習過程の効率化を図る拡散モデルを得る。無限次元設定におけるsbdmアプローチの2つの欠点を克服する方法を示す。まず, 潜在分布が無限次元設定においてトレースクラス作用素の概念を用いて well-defined であることを保証するために, フォワードプロセスを修正した。有限近似に対する逆過程を導出する。第2に,オペレータネットワークでスコア関数を近似することは,多レベルトレーニングに有用であることを示す。離散化の収束とマルチレベルトレーニングの近似を導出した後、無限次元SBDM手法を実装し、MNISTとFashion-MNISTで最初の有望な結果を示す。

Score-based diffusion models (SBDM) have recently emerged as state-of-the-art approaches for image generation. Existing SBDMs are typically formulated in a finite-dimensional setting, where images are considered as tensors of finite size. This paper develops SBDMs in the infinite-dimensional setting, that is, we model the training data as functions supported on a rectangular domain. Besides the quest for generating images at ever higher resolution, our primary motivation is to create a well-posed infinite-dimensional learning problem so that we can discretize it consistently on multiple resolution levels. We thereby intend to obtain diffusion models that generalize across different resolution levels and improve the efficiency of the training process. We demonstrate how to overcome two shortcomings of current SBDM approaches in the infinite-dimensional setting. First, we modify the forward process to ensure that the latent distribution is well-defined in the infinite-dimensional setting using the notion of trace class operators. We derive the reverse processes for finite approximations. Second, we illustrate that approximating the score function with an operator network is beneficial for multilevel training. After deriving the convergence of the discretization and the approximation of multilevel training, we implement an infinite-dimensional SBDM approach and show the first promising results on MNIST and Fashion-MNIST, underlining our developed theory.

翻訳日:2023-11-07 23:17:02 公開日:2023-11-04

# 2層ReLU畳み込みニューラルネットワークの良性オーバーフィッティング

Benign Overfitting for Two-layer ReLU Convolutional Neural Networks ( http://arxiv.org/abs/2303.04145v2 )

ライセンス: Link先を確認

Yiwen Kou and Zixiang Chen and Yuanzhou Chen and Quanquan Gu

(参考訳) 優れた表現力を持つ現代のディープラーニングモデルは、トレーニングデータに過度に適合するが、それでも十分に一般化できる。この現象は \textit{benign overfitting} と呼ばれる。近年、ニューラルネットワークの良性過剰適合を理論的に理解しようとする研究がいくつかある。しかしながら、これらの研究は、スムーズな活性化機能を持つニューラルネットワークや、ニューラルタンジェントカーネル体制に限られている。 ReLUニューラルネットワークが過度に適合する理由と時期は未解決のままである。本研究では,ラベルフリップ雑音を伴う2層ReLU畳み込みニューラルネットワークを学習するアルゴリズム依存型リスク境界を確立することにより,この問題に対処する。緩やかな条件下では、勾配降下によってトレーニングされたニューラルネットワークは、ほぼゼロに近いトレーニング損失とベイズ最適試験リスクを達成できることを示す。また,テストリスクの観点から,データ分布の異なる条件下での良性と有害なオーバーフィッティングの急激な移行も明らかにした。私たちの理論を裏付ける合成データの実験。

Modern deep learning models with great expressive power can be trained to overfit the training data but still generalize well. This phenomenon is referred to as \textit{benign overfitting}. Recently, a few studies have attempted to theoretically understand benign overfitting in neural networks. However, these works are either limited to neural networks with smooth activation functions or to the neural tangent kernel regime. How and when benign overfitting can occur in ReLU neural networks remains an open problem. In this work, we seek to answer this question by establishing algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise. We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk. Our result also reveals a sharp transition between benign and harmful overfitting under different conditions on data distribution in terms of test risk. Experiments on synthetic data back up our theory.

翻訳日:2023-11-07 23:16:38 公開日:2023-11-04

# 高レベルロボット説明のための逆解法について

A Closer Look at Reward Decomposition for High-Level Robotic Explanations ( http://arxiv.org/abs/2304.12958v2 )

ライセンス: Link先を確認

Wenhao Lu, Xufeng Zhao, Sven Magg, Martin Gromniak, Mengdi Li, Stefan Wermter

(参考訳) 強化学習(RL)によって人間に学習された知的エージェントの振る舞いを説明することは、理解不能な先天受容状態、変分中間目標、そして結果として予測不可能であるために、非常に難しい。さらに、RLエージェントの1段階の説明は、各遷移におけるエージェントの将来の振る舞いを説明できないため曖昧になり、ロボットアクションを説明する複雑さが増す。タスク固有のプリミティブにマップする抽象的なアクションを活用することで、動作レベルの説明を避けることができる。ロボットシステムの透明性と説明可能性をさらに向上するために,報酬分解(RD)と抽象的な行動空間を組み合わせたQ-Map学習フレームワークを提案する。本研究では,人間の理解が容易なRD説明の出力成果から視覚的・テキスト的説明を提示する,2つのシナリオの定量的・定性的な分析を通じて,フレームワークの有効性を実証する。さらに,これらのアーティファクトを大規模言語モデル(llm)に統合し,推論と対話的なクエリを行う汎用性を示す。

Explaining the behaviour of intelligent agents learned by reinforcement learning (RL) to humans is challenging yet crucial due to their incomprehensible proprioceptive states, variational intermediate goals, and resultant unpredictability. Moreover, one-step explanations for RL agents can be ambiguous as they fail to account for the agent's future behaviour at each transition, adding to the complexity of explaining robot actions. By leveraging abstracted actions that map to task-specific primitives, we avoid explanations on the movement level. To further improve the transparency and explainability of robotic systems, we propose an explainable Q-Map learning framework that combines reward decomposition (RD) with abstracted action spaces, allowing for non-ambiguous and high-level explanations based on object properties in the task. We demonstrate the effectiveness of our framework through quantitative and qualitative analysis of two robotic scenarios, showcasing visual and textual explanations, from output artefacts of RD explanations, that are easy for humans to comprehend. Additionally, we demonstrate the versatility of integrating these artefacts with large language models (LLMs) for reasoning and interactive querying.

翻訳日:2023-11-07 23:09:01 公開日:2023-11-04

# ChatGPT対応労働市場の将来--中国における予備的研究

The Future of ChatGPT-enabled Labor Market: A Preliminary Study in China ( http://arxiv.org/abs/2304.09823v4 )

ライセンス: Link先を確認

Lan Chen, Xi Chen, Shiyu Wu, Yaqi Yang, Meng Chang, Hengshu Zhu

(参考訳) 驚くべき大きな言語モデルとして、chatgptは様々な現実世界のタスクで並行して成功し、日々の生活や仕事においてますます重要な役割を演じています。しかし、倫理的な問題、特にChatGPTのような人工知能(AGI)が人間の仕事を置き換えるかどうかについても、大きな懸念が持ち上がっている。そこで,本稿では,人間-AIコンファレンスではなく,人間-AI共生の観点から,ChatGPTを活用した労働市場の将来に関する予備的なデータ駆動研究を紹介する。具体的には、中国最大のオンラインリクルートプラットフォームであるboss zhipinで、大規模求人データの詳細な分析をまず実施する。その結果、現在の労働市場の職業の約28%はChatGPT関連のスキルを必要とすることがわかった。さらに,大規模職業中心知識グラフに基づいて,労働市場における職業スキル関係を予測するための意味情報強化協調フィルタリングアルゴリズムを開発した。その結果,今後45%の職業がchatgpt関連のスキルを必要とすることがわかった。特に、技術、製品、オペレーションに関連する産業は、ChatGPT関連のスキルに対して高い熟練度を要求され、一方、製造、サービス、教育、健康科学関連産業は、ChatGPT関連スキルに対してより低い熟練度を要求される。

As a phenomenal large language model, ChatGPT has achieved unparalleled success in various real-world tasks and increasingly plays an important role in our daily lives and work. However, extensive concerns are also raised about the potential ethical issues, especially about whether ChatGPT-like artificial general intelligence (AGI) will replace human jobs. To this end, in this paper, we introduce a preliminary data-driven study on the future of ChatGPT-enabled labor market from the view of Human-AI Symbiosis instead of Human-AI Confrontation. To be specific, we first conduct an in-depth analysis of large-scale job posting data in BOSS Zhipin, the largest online recruitment platform in China. The results indicate that about 28% of occupations in the current labor market require ChatGPT-related skills. Furthermore, based on a large-scale occupation-centered knowledge graph, we develop a semantic information enhanced collaborative filtering algorithm to predict the future occupation-skill relations in the labor market. As a result, we find that additional 45% occupations in the future will require ChatGPT-related skills. In particular, industries related to technology, products, and operations are expected to have higher proficiency requirements for ChatGPT-related skills, while the manufacturing, services, education, and health science related industries will have lower requirements for ChatGPT-related skills.

翻訳日:2023-11-07 23:07:43 公開日:2023-11-04

# 視覚言語事前学習のためのベースラインの改善

Improved baselines for vision-language pre-training ( http://arxiv.org/abs/2305.08675v2 )

ライセンス: Link先を確認

Enrico Fini and Pietro Astolfi and Adriana Romero-Soriano and Jakob Verbeek and Michal Drozdzal

(参考訳) コントラスト学習はマルチモーダル表現を学習するための効率的なフレームワークとして登場した。この領域の独創的な研究であるクリップは、コントラスト損失を使ってペア画像テキストデータをトレーニングすることで素晴らしい結果を得た。最近の研究は、自己教師型学習にインスパイアされた非コントラスト的損失によるCLIPの改善を主張している。しかし、モデルのトレーニングに使用されるデータ拡張や正規化といった他の実装の詳細から、これらの追加的な損失の貢献を外すのは難しい場合があります。そこで本稿では,コントラスト学習と近年の自己教師型学習の進歩を組み合わせることで得られるいくつかの基本点を,まず提案し,実装し,評価する。特に,視覚的自己指導学習において得られた損失関数を用いて画像とテキストのモダリティを整列させる。これらのベースラインはCLIPの基本実装よりも優れています。しかし、より強いトレーニングレシピを採用すると、その利点は消える。実際、簡単なCLIPベースラインも大幅に改善され、他のサブフィールドで人気がある有名なトレーニング技術を使用することで、下流のゼロショットタスクを25%改善できることがわかった。また,先行研究による改善のほとんどを補うために,画像やテキストの増補を適用するだけで十分であることがわかった。 clipのトレーニングレシピが改善されたことで,4つの標準データセットで最先端のパフォーマンスが得られ,従来作業(最大データセットでは最大+4%まで)を一貫して上回っています。コードはhttps://github.com/facebookresearch/clip-rocketで入手できる。

Contrastive learning has emerged as an efficient framework to learn multimodal representations. CLIP, a seminal work in this area, achieved impressive results by training on paired image-text data using the contrastive loss. Recent work claims improvements over CLIP using additional non-contrastive losses inspired from self-supervised learning. However, it is sometimes hard to disentangle the contribution of these additional losses from other implementation details, e.g., data augmentation or regularization techniques, used to train the model. To shed light on this matter, in this paper, we first propose, implement and evaluate several baselines obtained by combining contrastive learning with recent advances in self-supervised learning. In particular, we use the loss functions that were proven successful for visual self-supervised learning to align image and text modalities. We find that these baselines outperform a basic implementation of CLIP. However, when a stronger training recipe is employed, the advantage disappears. Indeed, we find that a simple CLIP baseline can also be improved substantially, up to a 25% relative improvement on downstream zero-shot tasks, by using well-known training techniques that are popular in other subfields. Moreover, we discover that it is enough to apply image and text augmentations to make up for most of the improvement attained by prior works. With our improved training recipe for CLIP, we obtain state-of-the-art performance on four standard datasets, and consistently outperform prior work (up to +4% on the largest dataset), while being substantially simpler. The code is available at https://github.com/facebookresearch/clip-rocket

翻訳日:2023-11-07 22:56:10 公開日:2023-11-04

# 校正説明:不確実性情報と対策

Calibrated Explanations: with Uncertainty Information and Counterfactuals ( http://arxiv.org/abs/2305.02305v3 )

ライセンス: Link先を確認

Helena Lofstrom, Tuwe Lofstrom, Ulf Johansson, Cecilia Sonstrod

(参考訳) aiモデルの局所的な説明は、機能の重要性など個々の予測に対する洞察を提供するが、不安定性などの問題に苦しめられている。 MLモデルのキャリブレーションが不十分なためにしばしば歪んだ特徴量の信頼性の欠如は、これらの課題をさらに深めている。さらに、特徴の重要さの重要な側面は、説明可能なAI(XAI)にほとんど適応していない。本稿では,これらの課題に真っ向から対処するために,キャリブレート説明(CE)と呼ばれる特徴重要度説明手法を提案する。 Venn-Abersの基礎の上に構築されたCEは、基礎となるモデルを校正するだけでなく、機能重みを正確に定義した信頼性の高い機能重要な説明を提供する。 CEは出力の不確実性に対処することで、従来のソリューションを超える。これは特徴量とモデルの確率推定の両方に対して不確実な定量化を提供することによって達成される。さらに、CEはモデルに依存しず、容易に理解可能な条件付きルールと、組み込まれた不確実性定量化による反実的説明を生成する能力を備えている。 25のベンチマークデータセットによる評価の結果は、CEの有効性を裏付けるもので、高速で信頼性があり、安定しており、堅牢なソリューションである。

While local explanations for AI models can offer insights into individual predictions, such as feature importance, they are plagued by issues like instability. The unreliability of feature weights, often skewed due to poorly calibrated ML models, deepens these challenges. Moreover, the critical aspect of feature importance uncertainty remains mostly unaddressed in Explainable AI (XAI). The novel feature importance explanation method presented in this paper, called Calibrated Explanations (CE), is designed to tackle these issues head-on. Built on the foundation of Venn-Abers, CE not only calibrates the underlying model but also delivers reliable feature importance explanations with an exact definition of the feature weights. CE goes beyond conventional solutions by addressing output uncertainty. It accomplishes this by providing uncertainty quantification for both feature weights and the model's probability estimates. Additionally, CE is model-agnostic, featuring easily comprehensible conditional rules and the ability to generate counterfactual explanations with embedded uncertainty quantification. Results from an evaluation with 25 benchmark datasets underscore the efficacy of CE, making it stand as a fast, reliable, stable, and robust solution.

翻訳日:2023-11-07 22:51:48 公開日:2023-11-04

# リピッツネスとスムーズネスのないオンラインポートフォリオ選択のためのデータ依存境界

Data-Dependent Bounds for Online Portfolio Selection Without Lipschitzness and Smoothness ( http://arxiv.org/abs/2305.13946v2 )

ライセンス: Link先を確認

Chung-En Tsai and Ying-Ting Lin and Yen-Huan Li

(参考訳) この研究は、オンラインポートフォリオ選択における最初の小さな損失と段階的な後悔の限界を導入し、非リプシッツ、非スムース損失によるオンライン凸最適化のためのデータ依存境界の最初の例を示している。提案するアルゴリズムは、最悪の場合におけるサブ線形後悔率を示し、データが「容易」である場合に対数後悔を達成する。後悔境界は、対数損失の新たなスムーズな特徴付け、正規化リーダ(FTRL)と必ずしも障壁ではない自己調和正則化器による局所ノルムに基づく解析、および対数バリアによる楽観的FTRLの暗黙的変種を用いて導出される。

This work introduces the first small-loss and gradual-variation regret bounds for online portfolio selection, marking the first instances of data-dependent bounds for online convex optimization with non-Lipschitz, non-smooth losses. The algorithms we propose exhibit sublinear regret rates in the worst cases and achieve logarithmic regrets when the data is "easy," with per-iteration time almost linear in the number of investment alternatives. The regret bounds are derived using novel smoothness characterizations of the logarithmic loss, a local norm-based analysis of following the regularized leader (FTRL) with self-concordant regularizers, which are not necessarily barriers, and an implicit variant of optimistic FTRL with the log-barrier.

翻訳日:2023-11-07 22:41:52 公開日:2023-11-04

# RKHMとペロン・フロベニウス演算子によるカーネルによる深層学習

Deep Learning with Kernels through RKHM and the Perron-Frobenius Operator ( http://arxiv.org/abs/2305.13588v2 )

ライセンス: Link先を確認

Yuka Hashimoto, Masahiro Ikeda, Hachem Kadri

(参考訳) 再生カーネル Hilbert $C^*$-module (RKHM) は、C^*$-algebra を用いて再生カーネル Hilbert 空間 (RKHS) の一般化であり、ペロン・フロベニウス作用素は函数の構成に関連する線型作用素である。これら2つの概念を組み合わせることで、カーネルメソッドのディープラーニングフレームワークであるDeep RKHMを提案する。この設定で束縛された新しいラデマッハ一般化を導出し、ペロン・フロベニウス作用素による良性過剰の理論的解釈を提供する。 C^*$-algebraにより、出力次元上の境界の依存性は、既存の境界よりも緩やかである。 C^*$-algebraはカーネルによるディープラーニングに適したツールであり、演算子の製品構造を活用でき、畳み込みニューラルネットワークとの明確な接続を提供することができる。我々の理論的解析は、深いカーネルメソッドを設計、分析できる新しいレンズを提供する。

Reproducing kernel Hilbert $C^*$-module (RKHM) is a generalization of reproducing kernel Hilbert space (RKHS) by means of $C^*$-algebra, and the Perron-Frobenius operator is a linear operator related to the composition of functions. Combining these two concepts, we present deep RKHM, a deep learning framework for kernel methods. We derive a new Rademacher generalization bound in this setting and provide a theoretical interpretation of benign overfitting by means of Perron-Frobenius operators. By virtue of $C^*$-algebra, the dependency of the bound on output dimension is milder than existing bounds. We show that $C^*$-algebra is a suitable tool for deep learning with kernels, enabling us to take advantage of the product structure of operators and to provide a clear connection with convolutional neural networks. Our theoretical analysis provides a new lens through which one can design and analyze deep kernel methods.

翻訳日:2023-11-07 22:41:12 公開日:2023-11-04

# NashFormer: 局所的なNash平衡を利用した意味的多元性軌道予測

NashFormer: Leveraging Local Nash Equilibria for Semantically Diverse Trajectory Prediction ( http://arxiv.org/abs/2305.17600v2 )

ライセンス: Link先を確認

Justin Lidard, Oswin So, Yanxia Zhang, Jonathan DeCastro, Xiongyi Cui, Xin Huang, Yen-Ling Kuo, John Leonard, Avinash Balachandran, Naomi Leonard, Guy Rosman

(参考訳) 道路エージェント間の相互作用は、特に複数のエージェントを含む場合において、軌道予測において重要な課題となる。既存の多様性を考慮した予測器はマルチエージェント予測のインタラクティブな性質を考慮しないため、これらの重要な相互作用の結果を見逃す可能性がある。本稿では,マルチモーダル予測のカバレッジ向上のために,ゲーム理論の逆強化学習を活用する軌道予測フレームワークであるNashFormerを提案する。トレーニング時間ゲーム理論解析を補助的損失として用いて,エージェントの行動の分類を仮定することなく,カバレッジと精度を向上させる。 Waymo Open Motion Datasetのインタラクティブな分割について,対話性の高いシナリオを含む4つのサブセットを含む,私たちのアプローチを実証する。実験の結果,予測器はベースラインモデルよりも3,3\%以上の潜在的な相互作用をカバーし,正確な予測を行うことがわかった。

Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose NashFormer, a framework for trajectory prediction that leverages game-theoretic inverse reinforcement learning to improve coverage of multi-modal predictions. We use a training-time game-theoretic analysis as an auxiliary loss resulting in improved coverage and accuracy without presuming a taxonomy of actions for the agents. We demonstrate our approach on the interactive split of the Waymo Open Motion Dataset, including four subsets involving scenarios with high interaction complexity. Experiment results show that our predictor produces accurate predictions while covering $33\%$ more potential interactions versus a baseline model.

翻訳日:2023-11-07 22:31:46 公開日:2023-11-04

# コンテキスト圧縮に言語モデルを適用する

Adapting Language Models to Compress Contexts ( http://arxiv.org/abs/2305.14788v2 )

ライセンス: Link先を確認

Alexis Chevalier, Alexander Wettig, Anirudh Ajith, Danqi Chen

(参考訳) トランスフォーマティブ言語モデル(lms)は強力で広く適用可能なツールであるが、その有用性は、有限コンテキストウィンドウと長いテキスト文書を処理するための高価な計算コストによって制限されている。プリトレーニングされたlmsをオートコンプレッサーに適用する。これらの言語モデルは、長いコンテキストをコンパクトなサマリーベクトルに圧縮し、ソフトプロンプトとしてモデルにアクセスすることができる。要約ベクトルは教師なしの目的で訓練され、長い文書はセグメントで処理され、以前の全てのセグメントからの要約ベクトルは言語モデリングに使用される。最大30,720個のトークンのシーケンスでOPTとLlama-2モデルを微調整し、AutoCompressorが長いコンテキストを使ってパープレキシティを向上できることを示す。タスク実演を圧縮することで,テキスト内学習におけるAutoCompressorsの評価を行い,要約ベクトルが平文実演の代用となり,推論コストを削減しつつ精度を高めた。最後に,検索強化言語モデルに要約ベクトルを適用することで,大規模コーパスに対する要約ベクトルの事前計算の利点について検討する。全体として、オートコンプレッサーはlmsのコンテキストウィンドウを拡張し、長いコンテキストでの推論をスピードアップするためのシンプルで安価なソリューションとして現れる。

Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window and the expensive computational cost of processing long text documents. We propose to adapt pre-trained LMs into AutoCompressors. These language models are capable of compressing long contexts into compact summary vectors, which are then accessible to the model as soft prompts. Summary vectors are trained with an unsupervised objective, whereby long documents are processed in segments, and summary vectors from all previous segments are used in language modeling. We fine-tune OPT and Llama-2 models on sequences of up to 30,720 tokens and show that AutoCompressors can utilize long contexts to improve perplexity. We evaluate AutoCompressors on in-context learning by compressing task demonstrations and find that summary vectors are good substitutes for plain-text demonstrations, increasing accuracy while reducing inference costs. Finally, we explore the benefits of pre-computing summary vectors for large corpora by applying summary vectors to retrievalaugmented language modeling and a passage re-ranking task. Overall, AutoCompressors emerge as a simple and inexpensive solution to extend the context window of LMs while speeding up inference over long contexts.

翻訳日:2023-11-07 22:27:10 公開日:2023-11-04

# Language-Model-as-an-Examinerを用いたベンチマーク基礎モデル

Benchmarking Foundation Models with Language-Model-as-an-Examiner ( http://arxiv.org/abs/2306.04181v2 )

ライセンス: Link先を確認

Yushi Bai, Jiahao Ying, Yixin Cao, Xin Lv, Yuze He, Xiaozhi Wang, Jifan Yu, Kaisheng Zeng, Yijia Xiao, Haozhe Lyu, Jiayin Zhang, Juanzi Li, Lei Hou

(参考訳) 人間に似た方法で言語を理解し、生成するモデルの能力の包括的なテストとして、オープンエンドの質問応答における基礎モデルのパフォーマンスを評価するために、多くのベンチマークが確立されている。これらの研究の多くは、新しいデータセットの提案に重点を置いているが、以前のベンチマークパイプラインには2つの大きな問題がある。本稿では,lmが知識に基づいて質問を定式化し,その応答を参照のない方法で評価する,新たなベンチマークフレームワークであるlanguage-model-as-an-examinerを提案する。我々のフレームワークは、様々なlmsを検査者として採用することができ、質問はより多様なトリガートピックによって常に更新できるため、無力な拡張性を可能にする。より包括的かつ公平な評価を行うため,(1)広範囲のドメインに質問を発生させるようLM検査官に指示し,さらに詳細な評価を行うためにフォローアップ質問を提起する3つの戦略を考案した。 2)評価では,評価基準と評価基準を組み合わせ,人間のアノテーションと密接に一致して信頼性の高い結果が得られる。 (3) 単検定における偏りに対処する分散化ピア検定法も提案する。我々のデータとベンチマーク結果は以下の通りである。

Numerous benchmarks have been established to assess the performance of foundation models on open-ended question answering, which serves as a comprehensive test of a model's ability to understand and generate language in a manner similar to humans. Most of these works focus on proposing new datasets, however, we see two main issues within previous benchmarking pipelines, namely testing leakage and evaluation automation. In this paper, we propose a novel benchmarking framework, Language-Model-as-an-Examiner, where the LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner. Our framework allows for effortless extensibility as various LMs can be adopted as the examiner, and the questions can be constantly updated given more diverse trigger topics. For a more comprehensive and equitable evaluation, we devise three strategies: (1) We instruct the LM examiner to generate questions across a multitude of domains to probe for a broad acquisition, and raise follow-up questions to engage in a more in-depth assessment. (2) Upon evaluation, the examiner combines both scoring and ranking measurements, providing a reliable result as it aligns closely with human annotations. (3) We additionally propose a decentralized Peer-examination method to address the biases in a single examiner. Our data and benchmarking results are available at: http://lmexam.xlore.cn.

翻訳日:2023-11-07 22:19:25 公開日:2023-11-04

# トルコ語テキスト可読性のためのハイブリッド言語機能の検討

Exploring Hybrid Linguistic Features for Turkish Text Readability ( http://arxiv.org/abs/2306.03774v3 )

ライセンス: Link先を確認

Ahmet Yavuz Uluslu and Gerold Schneider

(参考訳) 本稿では,トルコ語テキストの自動可読性評価に関する最初の包括的研究を行う。我々は,最先端のニューラルネットワークモデルと,語彙的,形態素的,構文的,談話的レベルでの言語的特徴を組み合わせることで,高度な可読性ツールを開発した。従来の可読性公式の有効性を,現代の自動手法と比較して評価し,トルコ語の可読性を決定する重要な言語的特徴を特定する。

This paper presents the first comprehensive study on automatic readability assessment of Turkish texts. We combine state-of-the-art neural network models with linguistic features at lexical, morphosyntactic, syntactic and discourse levels to develop an advanced readability tool. We evaluate the effectiveness of traditional readability formulas compared to modern automated methods and identify key linguistic features that determine the readability of Turkish texts.

翻訳日:2023-11-07 22:18:02 公開日:2023-11-04

# 制限付き選択バイアスによる統計的推測

Statistical Inference Under Constrained Selection Bias ( http://arxiv.org/abs/2306.03302v3 )

ライセンス: Link先を確認

Santiago Cortes-Gomez, Mateo Dulce, Carlos Patino, Bryan Wilder

(参考訳) 大規模なデータセットは、意思決定を知らせるためにますます使われています。この取り組みは、現実世界の証拠にポリシーを基礎付けることを目的としているが、選択バイアスやその他の分布シフトが観察データに支障をきたすため、課題が発生する。堅牢な推論を提供する以前の試みでは、ユーザが指定した分布シフトの量(例えば、観測された分布と対象分布の最大KLばらつき)に応じて保証が与えられていた。しかしながら、意思決定者は、可能なシフトの種類を制限するターゲット分布に関する追加の知識を持つことが多い。このような情報を活用するために,対象分布下で期待が知られている関数の形で,ユーザが特定した制約に従う選択バイアスの存在下で統計的推測を可能にする枠組みを提案する。出力は、目標分布に対する推定値に対する高確率境界である。そこで,本手法は,広い範囲の推定値を部分的に識別するために,ドメイン知識を活用する。これらの境界を推定する手法の計算・統計特性を解析し,本手法が実世界のユースケースと同様に,様々なシミュレーションおよび半合成タスクにおいて情報的境界を生成できることを示す。

Large-scale datasets are increasingly being used to inform decision making. While this effort aims to ground policy in real-world evidence, challenges have arisen as selection bias and other forms of distribution shifts often plague observational data. Previous attempts to provide robust inference have given guarantees depending on a user-specified amount of possible distribution shift (e.g., the maximum KL divergence between the observed and target distributions). However, decision makers will often have additional knowledge about the target distribution which constrains the kind of possible shifts. To leverage such information, we propose a framework that enables statistical inference in the presence of selection bias which obeys user-specified constraints in the form of functions whose expectation is known under the target distribution. The output is high-probability bounds on the value of an estimand for the target distribution. Hence, our method leverages domain knowledge in order to partially identify a wide class of estimands. We analyze the computational and statistical properties of methods to estimate these bounds and show that our method can produce informative bounds on a variety of simulated and semisynthetic tasks, as well as in a real-world use case.

翻訳日:2023-11-07 22:17:36 公開日:2023-11-04

# 言語間の感情弧の評価: 感情分析におけるグローバル分割の橋渡し

Evaluating Emotion Arcs Across Languages: Bridging the Global Divide in Sentiment Analysis ( http://arxiv.org/abs/2306.02213v3 )

ライセンス: Link先を確認

Daniela Teodorescu and Saif M. Mohammad

(参考訳) 感情は、個人(または人口)が時間とともにどのように感じるかを捉えます。産業や研究で広く使われているが、自動的に生成された弧を評価する作業はほとんどない。これは真の(金)感情の弧を確立するのが難しいためである。私たちの研究は、初めて、系統的かつ定量的に自動生成された感情弧を評価しました。また、機械学習(ML)モデルとLexicon-Only(LexO)手法の2つの感情弧を生成する一般的な方法を比較する。 9言語で18の多様なデータセットで実験を行うことで、インスタンスレベルの感情分類が著しく貧弱であるにもかかわらず、LexO法は数百のインスタンスから情報を集約する際に感情弧を生成するのに非常に正確であることを示す。また,6つのアフリカ諸言語とアラビア語,スペイン語による実験を通じて,英語感情辞書の自動翻訳により,低リソース言語における高品質な感情アークを生成することができることを示した。これは世界中の言語における感情の研究の道を開くもので、これは商業、公共政策、健康研究に欠かせない。コードとリソース:https://github.com/dteodore/EmotionArcs

Emotion arcs capture how an individual (or a population) feels over time. They are widely used in industry and research; however, there is little work on evaluating the automatically generated arcs. This is because of the difficulty of establishing the true (gold) emotion arc. Our work, for the first time, systematically and quantitatively evaluates automatically generated emotion arcs. We also compare two common ways of generating emotion arcs: Machine-Learning (ML) models and Lexicon-Only (LexO) methods. By running experiments on 18 diverse datasets in 9 languages, we show that despite being markedly poor at instance level emotion classification, LexO methods are highly accurate at generating emotion arcs when aggregating information from hundreds of instances. We also show, through experiments on six indigenous African languages, as well as Arabic, and Spanish, that automatic translations of English emotion lexicons can be used to generate high-quality emotion arcs in less-resource languages. This opens up avenues for work on emotions in languages from around the world; which is crucial for commerce, public policy, and health research in service of speakers often left behind. Code and resources: https://github.com/dteodore/EmotionArcs

翻訳日:2023-11-07 22:16:59 公開日:2023-11-04

# GAD-NR 近傍再構成によるグラフ異常検出

GAD-NR: Graph Anomaly Detection via Neighborhood Reconstruction ( http://arxiv.org/abs/2306.01951v5 )

ライセンス: Link先を確認

Amit Roy, Juan Shu, Jia Li, Carl Yang, Olivier Elshocht, Jeroen Smeets and Pan Li

(参考訳) Graph Anomaly Detection (GAD) は、グラフ内の異常ノードを識別し、ネットワークセキュリティ、不正検出、ソーシャルメディアスパム検出、その他さまざまな分野の応用を見つけるために用いられるテクニックである。 GADの一般的な方法は、グラフデータをノード表現にエンコードし、これらの表現に基づいてグラフの再構成品質を評価することによって異常を識別するグラフオートエンコーダ(GAE)である。しかし、既存のGAEモデルは直接リンク再構成に最適化されており、グラフに接続されたノードは潜在空間にクラスタ化される。その結果、クラスター型構造異常を検出するのに優れるが、クラスタに適合しないより複雑な構造異常に悩まされる。この制限に対処するため,グラフ異常検出のための近傍再構成を組み込んだGAEの新しい変種であるGAD-NRを提案する。 GAD-NRは、ノード表現に基づいて、ローカル構造、自己属性、および隣接属性を含むノードの近傍全体を再構築することを目的としている。異常ノードと正常ノード間の近傍再構成損失を比較することで、GAD-NRは任意の異常を効果的に検出できる。 6つの実世界のデータセットで実施された大規模な実験は、GAD-NRの有効性を検証し、最先端の競合相手よりも顕著な改善(AUCでは最大30%)を示す。 GAD-NRのソースコードが公開されている。比較分析の結果,既存の手法は3種類の異常から1種類または2種類の異常を検出する場合にのみ有効であることが判明した。対照的に、GAD-NRはデータセット全体の3種類の異常を検知し、その包括的な異常検出能力を示す。

Graph Anomaly Detection (GAD) is a technique used to identify abnormal nodes within graphs, finding applications in network security, fraud detection, social media spam detection, and various other domains. A common method for GAD is Graph Auto-Encoders (GAEs), which encode graph data into node representations and identify anomalies by assessing the reconstruction quality of the graphs based on these representations. However, existing GAE models are primarily optimized for direct link reconstruction, resulting in nodes connected in the graph being clustered in the latent space. As a result, they excel at detecting cluster-type structural anomalies but struggle with more complex structural anomalies that do not conform to clusters. To address this limitation, we propose a novel solution called GAD-NR, a new variant of GAE that incorporates neighborhood reconstruction for graph anomaly detection. GAD-NR aims to reconstruct the entire neighborhood of a node, encompassing the local structure, self-attributes, and neighbor attributes, based on the corresponding node representation. By comparing the neighborhood reconstruction loss between anomalous nodes and normal nodes, GAD-NR can effectively detect any anomalies. Extensive experimentation conducted on six real-world datasets validates the effectiveness of GAD-NR, showcasing significant improvements (by up to 30% in AUC) over state-of-the-art competitors. The source code for GAD-NR is openly available. Importantly, the comparative analysis reveals that the existing methods perform well only in detecting one or two types of anomalies out of the three types studied. In contrast, GAD-NR excels at detecting all three types of anomalies across the datasets, demonstrating its comprehensive anomaly detection capabilities.

翻訳日:2023-11-07 22:16:40 公開日:2023-11-04

# Pix2Repair:画像から形状を復元する

Pix2Repair: Implicit Shape Restoration from Images ( http://arxiv.org/abs/2305.18273v2 )

ライセンス: Link先を確認

Xinchao Song, Nikolas Lamb, Sean Banerjee, Natasha Kholgade Banerjee

(参考訳) Pix2Repairは、画像から復元形状を生成し、破折した物体を修復する自動形状修復手法である。以前の修理アプローチでは、入力として破砕した物体の高分解能の防水3dメッシュが必要だった。入力3Dメッシュは高価な3Dスキャナーを使用して取得し、スキャンされたメッシュは手作業によるクリーンアップ、アクセシビリティとスケーラビリティの制限を必要とする。 Pix2Repairは、壊れた物体の画像を入力として、自動的に3Dプリント可能な復元形状を生成する。本稿では, 破壊対象を表す潜在符号を, 完全な形状と破壊面に分解する新しい形状関数を提案する。本稿では, 幾何破折と破折バッドデータセットからの人工骨折の復元, QPデータセットからの文化的遺産, Fantastic Breaksデータセットからの実際の骨折の復元について述べる。視線中心の復元を予測することで軸対称物体の復元における課題を克服する。本手法は, シャムハ距離, アースムーバー距離, ノーマル一貫性, およびパーセンテージ復元の観点で形状修復に適応した形状補完アプローチよりも優れる。

We present Pix2Repair, an automated shape repair approach that generates restoration shapes from images to repair fractured objects. Prior repair approaches require a high-resolution watertight 3D mesh of the fractured object as input. Input 3D meshes must be obtained using expensive 3D scanners, and scanned meshes require manual cleanup, limiting accessibility and scalability. Pix2Repair takes an image of the fractured object as input and automatically generates a 3D printable restoration shape. We contribute a novel shape function that deconstructs a latent code representing the fractured object into a complete shape and a break surface. We show restorations for synthetic fractures from the Geometric Breaks and Breaking Bad datasets, and cultural heritage objects from the QP dataset, and for real fractures from the Fantastic Breaks dataset. We overcome challenges in restoring axially symmetric objects by predicting view-centered restorations. Our approach outperforms shape completion approaches adapted for shape repair in terms of chamfer distance, earth mover's distance, normal consistency, and percent restorations generated.

翻訳日:2023-11-07 22:14:31 公開日:2023-11-04

# 効率的なシーケンスモデリングのためのスパースモジュラーアクティベーション

Sparse Modular Activation for Efficient Sequence Modeling ( http://arxiv.org/abs/2306.11197v4 )

ライセンス: Link先を確認

Liliang Ren, Yang Liu, Shuohang Wang, Yichong Xu, Chenguang Zhu, ChengXiang Zhai

(参考訳) 線形状態空間モデル(SSM)と自己アテンション機構を組み合わせた最近のハイブリッドモデルは、様々なシーケンスモデリングタスクにおいて印象的な結果を示した。しかし、現在のアプローチでは、アテンションモジュールを静的かつ均一に入力シーケンスのすべての要素に適用することで、準最適品質効率のトレードオフにつながる。この制限に対処するために,sparse modular activation(sma)という,ニューラルネットワークによるシーケンス要素のサブモジュールのスパースおよび動的アクティベートを可能にする汎用機構を導入する。各要素が非アクティブなサブモジュールをスキップできるようにすることで、SMAはトレーニングと推論の両方の段階でニューラルネットワークの計算とメモリ消費を減らす。シーケンスモデリングにおけるSMAの有効性を検証するため,SMAを用いた新しいニューラルネットワークSeqBoatを設計し,SSMから学んだ状態表現に基づいてGAU(Gated Attention Unit)を疎結合に活性化する。 GAUが活性化された入力にのみ局所的な注意を集中させることで、セックボートは理論上無限の注意範囲を持つ線形推論複雑性を達成でき、チャンキングベースモデルよりもはるかに優れた品質と効率のトレードオフを提供できる。長いシーケンスモデリング、音声分類、言語モデリングを含む幅広いタスクの実験により、seqboatは線形複雑性を持つハイブリッドモデル間で新たな最先端の結果をもたらし、学習されたスパースアクティベーションパターンを通じて各タスクに必要な注意の量を明らかにする。私たちのコードはhttps://github.com/renll/SeqBoat.comで公開されています。

Recent hybrid models combining Linear State Space Models (SSMs) with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. However, current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. To address this limitation, we introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely and dynamically activate sub-modules for sequence elements in a differentiable manner. Through allowing each element to skip non-activated sub-modules, SMA reduces computation and memory consumption of neural networks at both training and inference stages. To validate the effectiveness of SMA on sequence modeling, we design a novel neural architecture, SeqBoat, which employs SMA to sparsely activate a Gated Attention Unit (GAU) based on the state representations learned from an SSM. By constraining the GAU to only conduct local attention on the activated inputs, SeqBoat can achieve linear inference complexity with theoretically infinite attention span, and provide substantially better quality-efficiency trade-off than the chunking-based models. With experiments on a wide range of tasks, including long sequence modeling, speech classification and language modeling, SeqBoat brings new state-of-the-art results among hybrid models with linear complexity, and reveals the amount of attention needed for each task through the learned sparse activation patterns. Our code is publicly available at https://github.com/renll/SeqBoat.

翻訳日:2023-11-07 22:06:58 公開日:2023-11-04

# テキスト・画像拡散モデルにおけるベイズ文脈更新のためのエネルギーに基づく交差注意

Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models ( http://arxiv.org/abs/2306.09869v3 )

ライセンス: Link先を確認

Geon Yeong Park, Jeongsol Kim, Beomsu Kim, Sang Wan Lee, Jong Chul Ye

(参考訳) 画像生成タスクにおけるテキスト間拡散モデルの顕著な性能にもかかわらず、近年の研究では、生成した画像がテキストプロンプトの意図した意味的内容をキャプチャできないという問題を提起している。本稿では,文脈ベクトルの後方をモデル化し,適応的文脈制御のための新しいエネルギーベースモデル(ebm)フレームワークを提案する。具体的には、まず遅延画像表現とテキスト埋め込みのESMをデノナイズドオートエンコーダの各クロスアテンション層に定式化する。次に, コンテキストベクトルの対数後方勾配を更新し, その後のクロスアテンション層に転送することにより, エネルギー関数のネスト階層を暗黙的に最小化する。我々の潜在ebmsは、異なる文脈からのクロス・アテンション出力の線形結合としてゼロショット合成生成を可能にする。広範にわたる実験により,本手法は,マルチコンセプト生成,テキスト誘導画像のインペイント,リアルおよび合成画像編集など,様々な画像生成タスクの処理に有効であることが実証された。コード:https://github.com/EnergyAttention/Energy-Based-CrossAttention。

Despite the remarkable performance of text-to-image diffusion models in image generation tasks, recent studies have raised the issue that generated images sometimes cannot capture the intended semantic contents of the text prompts, which phenomenon is often called semantic misalignment. To address this, here we present a novel energy-based model (EBM) framework for adaptive context control by modeling the posterior of context vectors. Specifically, we first formulate EBMs of latent image representations and text embeddings in each cross-attention layer of the denoising autoencoder. Then, we obtain the gradient of the log posterior of context vectors, which can be updated and transferred to the subsequent cross-attention layer, thereby implicitly minimizing a nested hierarchy of energy functions. Our latent EBMs further allow zero-shot compositional generation as a linear combination of cross-attention outputs from different contexts. Using extensive experiments, we demonstrate that the proposed method is highly effective in handling various image generation tasks, including multi-concept generation, text-guided image inpainting, and real and synthetic image editing. Code: https://github.com/EnergyAttention/Energy-Based-CrossAttention.

翻訳日:2023-11-07 22:05:59 公開日:2023-11-04

# villandiffusion:拡散モデルのための統一バックドア攻撃フレームワーク

VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models ( http://arxiv.org/abs/2306.06874v3 )

ライセンス: Link先を確認

Sheng-Yen Chou, Pin-Yu Chen, Tsung-Yi Ho

(参考訳) 拡散モデル(dms)は、反復的ノイズ付加と雑音除去から可逆的破壊過程を学ぶ最先端の生成モデルである。これらは、テキストから画像への条件生成など、多くの生成AIアプリケーションのバックボーンである。しかし、最近の研究では、基本的な無条件DM(DDPMやDDIMなど)は、モデル入力における悪意ある埋め込みパターンによって引き起こされる出力操作攻撃であるバックドアインジェクションに弱いことが示されている。本稿では,dmsのバックドア解析の現在の範囲を拡大するための統一バックドアアタックフレームワーク(villandiffusion)を提案する。本フレームワークは, 主流の非条件および条件付きDM(デノジングベースおよびスコアベース)と, 総合評価のための各種トレーニングフリーサンプリングを対象とする。実験により,dm構成のバックドア解析を容易にするとともに,dmsに対するキャプションに基づくバックドア攻撃に対する新たな洞察を提供する。私たちのコードはgithubで入手できる。 \url{https://github.com/ibm/villandiffusion}

Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising. They are the backbone of many generative AI applications, such as text-to-image conditional generation. However, recent studies have shown that basic unconditional DMs (e.g., DDPM and DDIM) are vulnerable to backdoor injection, a type of output manipulation attack triggered by a maliciously embedded pattern at model input. This paper presents a unified backdoor attack framework (VillanDiffusion) to expand the current scope of backdoor analysis for DMs. Our framework covers mainstream unconditional and conditional DMs (denoising-based and score-based) and various training-free samplers for holistic evaluations. Experiments show that our unified framework facilitates the backdoor analysis of different DM configurations and provides new insights into caption-based backdoor attacks on DMs. Our code is available on GitHub: \url{https://github.com/IBM/villandiffusion}

翻訳日:2023-11-07 22:04:27 公開日:2023-11-04

# MANER: クラッタ環境における物体のマルチエージェントニューラルアレンジメント計画

MANER: Multi-Agent Neural Rearrangement Planning of Objects in Cluttered Environments ( http://arxiv.org/abs/2306.06543v2 )

ライセンス: Link先を確認

Vivek Gupta, Praphpreet Dhir, Jeegn Dani, Ahmed H. Qureshi

(参考訳) オブジェクトの並べ替えはロボット工学における根本的な問題であり、倉庫の管理から家庭のキッチンの清掃、整理まで様々な応用が考えられる。既存の研究は主に単一エージェントのソリューションに焦点を当てているが、現実のシナリオでは複数のロボットが並べ替え作業を行う必要がある。本稿では,複雑な環境におけるタスクシーケンシングと経路計画の課題に対処する,マルチエージェントオブジェクト再構成計画のための総合的な学習ベースフレームワークを提案する。提案手法は,オブジェクトを反復的に選択し,その転置領域を判定し,目標配置を達成するためのキネマティック実現性とタスク到達性を備えたロボットとペアリングする。シミュレーションおよび実世界の多様な環境における実験により,提案フレームワークの有効性とロバスト性を実証した。さらに, トラバース時間と成功率に関して, ベースラインアプローチと比較して, 性能が向上したことを示す。

Object rearrangement is a fundamental problem in robotics with various practical applications ranging from managing warehouses to cleaning and organizing home kitchens. While existing research has primarily focused on single-agent solutions, real-world scenarios often require multiple robots to work together on rearrangement tasks. This paper proposes a comprehensive learning-based framework for multi-agent object rearrangement planning, addressing the challenges of task sequencing and path planning in complex environments. The proposed method iteratively selects objects, determines their relocation regions, and pairs them with available robots under kinematic feasibility and task reachability for execution to achieve the target arrangement. Our experiments on a diverse range of simulated and real-world environments demonstrate the effectiveness and robustness of the proposed framework. Furthermore, results indicate improved performance in terms of traversal time and success rate compared to baseline approaches.

翻訳日:2023-11-07 22:03:37 公開日:2023-11-04

# 表面統計の超越:潜時拡散モデルにおけるシーン表現

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model ( http://arxiv.org/abs/2306.05720v2 )

ライセンス: Link先を確認

Yida Chen, Fernanda Vi\'egas, Martin Wattenberg

(参考訳) 潜在拡散モデル(LDMs)は、現実的な画像を生成する素晴らしい能力を示すが、これらのモデルの内部構造は謎のままである。露骨な奥行き情報のない画像に純粋に訓練しても、通常は3dシーンのコヒーレントな画像を出力する。本研究では, LDMは単純なシーン幾何学の内部表現を作成し, 利用するのか? 線形プローブを用いて,LDMの内部活性化が3次元深度データの線形表現と有向物体/背景の区別を符号化していることを示す。これらの表現は、人間がノイズの多い画像を容易に理解できるようになる前に、ノイズ処理の初期段階に驚くほど現れる。介入実験では、これらの表現が画像合成において因果的役割を果たすことが示され、ldmの出力の単純な高レベルな編集に使うことができる。プロジェクトページ: https://yc015.github.io/scene-representation-diffusion-model/

Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious. Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process$-$well before a human can easily make sense of the noisy images. Intervention experiments further indicate these representations play a causal role in image synthesis, and may be used for simple high-level editing of an LDM's output. Project page: https://yc015.github.io/scene-representation-diffusion-model/

翻訳日:2023-11-07 22:02:49 公開日:2023-11-04

# RDumb: 継続的なテスト時間適応の進捗に疑問を呈するシンプルなアプローチ

RDumb: A simple approach that questions our progress in continual test-time adaptation ( http://arxiv.org/abs/2306.05401v2 )

ライセンス: Link先を確認

Ori Press, Steffen Schneider, Matthias K\"ummerer, Matthias Bethge

(参考訳) テスト時間適応(tta)は、トレーニング済みのモデルをデプロイ時にデータ分布を変更するように更新できる。初期の研究は、個々の固定分布シフトに対してこれらのアルゴリズムを検証したが、近年の研究では、長期にわたる連続的な適応法が提案されている。そこで本研究では,TTA手法の漸近的性能を評価するために,CCC(Continuous Changeing Corruptions)ベンチマークを提案する。最終的に、1つの最先端のメソッド以外はすべて崩壊し、非適応モデルよりもパフォーマンスが悪くなることに気付きました。さらに,モデルが予め訓練された状態に定期的にリセットされるシンプルなベースライン "RDumb" を導入する。 RDumbは、これまで提案されていたすべてのベンチマークで、より良く、あるいは同等に動作する。以上の結果から, 従来のTTAアプローチは, 崩壊を避けるための適応の正則化や, 単純化されたリセット戦略に勝ることが不可能であった。

Test-Time Adaptation (TTA) allows to update pre-trained models to changing data distributions at deployment time. While early work tested these algorithms for individual fixed distribution shifts, recent work proposed and applied methods for continual adaptation over long timescales. To examine the reported progress in the field, we propose the Continually Changing Corruptions (CCC) benchmark to measure asymptotic performance of TTA techniques. We find that eventually all but one state-of-the-art methods collapse and perform worse than a non-adapting model, including models specifically proposed to be robust to performance collapse. In addition, we introduce a simple baseline, "RDumb", that periodically resets the model to its pretrained state. RDumb performs better or on par with the previously proposed state-of-the-art in all considered benchmarks. Our results show that previous TTA approaches are neither effective at regularizing adaptation to avoid collapse nor able to outperform a simplistic resetting strategy.

翻訳日:2023-11-07 22:02:16 公開日:2023-11-04

# インタラクションワーピングによるワンショット模倣学習

One-shot Imitation Learning via Interaction Warping ( http://arxiv.org/abs/2306.12392v2 )

ライセンス: Link先を確認

Ondrej Biza, Skye Thompson, Kishore Reddy Pagidi, Abhinav Kumar, Elise van der Pol, Robin Walters, Thomas Kipf, Jan-Willem van de Meent, Lawson L.S. Wong, Robert Platt

(参考訳) デモの少ないロボットポリシーの模倣学習は、オープンエンドアプリケーションにおいて不可欠である。本稿では,1つのデモンストレーションからSE(3)ロボット操作ポリシーを学習するためのインタラクションウォーピングを提案する。オブジェクトインスタンス間のポイントクラウドをアライメントするテクニックであるshape warpingを用いて、環境内の各オブジェクトの3dメッシュを推定する。次に、操作動作をオブジェクト上のキーポイントとして表現し、オブジェクトの形状を歪めることができる。 3つのシミュレーションおよび実世界のオブジェクト再配置タスクで1ショットの模倣学習を成功させる。また,本手法が野生の物体メッシュやロボットの把持を予測できることを示す。

Imitation learning of robot policies from few demonstrations is crucial in open-ended applications. We propose a new method, Interaction Warping, for learning SE(3) robotic manipulation policies from a single demonstration. We infer the 3D mesh of each object in the environment using shape warping, a technique for aligning point clouds across object instances. Then, we represent manipulation actions as keypoints on objects, which can be warped with the shape of the object. We show successful one-shot imitation learning on three simulated and real-world object re-arrangement tasks. We also demonstrate the ability of our method to predict object meshes and robot grasps in the wild.

翻訳日:2023-11-07 21:50:06 公開日:2023-11-04

# RoMe:メッシュ表現による大規模道路表面再構築に向けて

RoMe: Towards Large Scale Road Surface Reconstruction via Mesh Representation ( http://arxiv.org/abs/2306.11368v2 )

ライセンス: Link先を確認

Ruohong Mei, Wei Sui, Jiaxin Zhang, Xue Qin, Gang Wang, Tao Peng and Cong Yang

(参考訳) 自動運転アプリケーションでは、正確で効率的な路面再構築が最重要である。本稿では,大規模道路面の堅牢な再構築を目的とした新しいフレームワークであるRoMeを紹介する。ユニークなメッシュ表現を利用することで、再構成された道路表面が正確で、セマンティクスとシームレスに一致していることを保証する。計算効率の課題に対処するため,我々は,RoMeがサブアレーに着目し,その後にマージすることで,広大な環境を再構築できる経路点サンプリング戦略を提案する。さらに,外因性キャリブレーションにおける不正確性に対する堅牢性を高めるために,外因性最適化モジュールを組み込んだ。パブリックデータセットとワイルドデータの両方に対する広範な評価は、速度、正確性、堅牢性という点で、RoMeの優位性を示している。たとえば、何千もの画像から600*600平方メートルの道路表面を回収するのに2GPU時間しかかからない。特に、RoMeの機能は単なる再構築を超えて、自律運転アプリケーションにおける自動ラベリングタスクに重要な価値を提供する。関連するすべてのデータとコードはhttps://github.com/DRosemei/RoMe.comで入手できる。

In autonomous driving applications, accurate and efficient road surface reconstruction is paramount. This paper introduces RoMe, a novel framework designed for the robust reconstruction of large-scale road surfaces. Leveraging a unique mesh representation, RoMe ensures that the reconstructed road surfaces are accurate and seamlessly aligned with semantics. To address challenges in computational efficiency, we propose a waypoint sampling strategy, enabling RoMe to reconstruct vast environments by focusing on sub-areas and subsequently merging them. Furthermore, we incorporate an extrinsic optimization module to enhance the robustness against inaccuracies in extrinsic calibration. Our extensive evaluations of both public datasets and wild data underscore RoMe's superiority in terms of speed, accuracy, and robustness. For instance, it costs only 2 GPU hours to recover a road surface of 600*600 square meters from thousands of images. Notably, RoMe's capability extends beyond mere reconstruction, offering significant value for auto-labeling tasks in autonomous driving applications. All related data and code are available at https://github.com/DRosemei/RoMe.

翻訳日:2023-11-07 21:49:27 公開日:2023-11-04

# トルコ語母語識別

Turkish Native Language Identification ( http://arxiv.org/abs/2307.14850v4 )

ライセンス: Link先を確認

Ahmet Yavuz Uluslu and Gerold Schneider

(参考訳) 本稿では,トルコ語に対するNative Language Identification (NLI)の最初の応用について述べる。 NLIは、著者の最初の言語を様々な言語で分析することで予測する。ほとんどのNLI研究は英語に重点を置いているが、トルコ語にまで範囲を広げている。我々は,最近構築されたトルコ語学習者コーパスを用いて,3つの構文的特徴(CFG生成規則,助詞n-gram,関数語)とL2テキストの組み合わせを用いて,これらの課題の有効性を実証した。

In this paper, we present the first application of Native Language Identification (NLI) for the Turkish language. NLI involves predicting the writer's first language by analysing their writing in different languages. While most NLI research has focused on English, our study extends its scope to Turkish. We used the recently constructed Turkish Learner Corpus and employed a combination of three syntactic features (CFG production rules, part-of-speech n-grams, and function words) with L2 texts to demonstrate their effectiveness in this task.

翻訳日:2023-11-07 21:39:40 公開日:2023-11-04

# AlpaGasus: 少ないデータでより良いAlpacaをトレーニングする

AlpaGasus: Training A Better Alpaca with Fewer Data ( http://arxiv.org/abs/2307.08701v4 )

ライセンス: Link先を確認

Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia Jin

(参考訳) 大きな言語モデル~(llms)は教師付き命令/応答データに対する命令細分化(ift)を通じて命令追従能力を強化する。しかし、広く使われているIFTデータセット(例えば、アルパカの52kデータ)は驚くほど多くの低品質なインスタンスを含み、不正確または無関係な応答はIFTに誤解を与え、有害である。本稿では,強力なllm(例えばchatgpt)を用いて低品質データを自動的に識別しフィルタする,簡便で効果的なデータ選択戦略を提案する。この目的のために,52kのAlpacaデータからフィルタした9kの高品質データのみを微調整したAlpaGasusを導入する。 AlpaGasusは、複数のテストセットと制御されたヒトの評価において、GPT-4で評価されたオリジナルのAlpacaよりも大幅に優れている。 13bの変種は、テストタスクにおける教師のllm(つまり52kデータを生成するtext-davinci-003)の90\%$のパフォーマンスに匹敵する。また、5.7倍高速な訓練も提供し、7B型の訓練時間を80分(アルパカ用)から14分に短縮した。さらに,本手法の有効性を,多種多様なデータセット,ベースモデル,LLMフィルタで実証した。全体として、AlpaGasusは命令チューニングデータに適用可能な新しいデータ中心のIFTパラダイムを実証し、より高速なトレーニングとより良い命令追従モデルをもたらす。私たちのプロジェクトページは以下の通りである。

Large language models~(LLMs) strengthen instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data selection strategy that automatically identifies and filters out low-quality data using a strong LLM (e.g., ChatGPT). To this end, we introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca as evaluated by GPT-4 on multiple test sets and the controlled human evaluation. Its 13B variant matches $>90\%$ performance of its teacher LLM (i.e., Text-Davinci-003 generating the 52k data) on test tasks. It also provides 5.7x faster training, reducing the training time for a 7B variant from 80 minutes (for Alpaca) to 14 minutes. Moreover, the experiments prove the efficacy of our method across diverse datasets, base models, and LLM filters. Overall, AlpaGasus demonstrates a novel data-centric IFT paradigm that can be generally applied to instruction-tuning data, leading to faster training and better instruction-following models. Our project page is available at: \url{https://lichang-chen.github.io/AlpaGasus/}

翻訳日:2023-11-07 21:37:06 公開日:2023-11-04

# ニューラルダイナミクスの低テンソルランク学習

Low Tensor Rank Learning of Neural Dynamics ( http://arxiv.org/abs/2308.11567v2 )

ライセンス: Link先を確認

Arthur Pellegrino, N Alex Cayco-Gajic, Angus Chadwick

(参考訳) 学習は神経細胞の繰り返し連結された集団における協調シナプス変化に依存する。したがって、学習によるシナプス接続の集団的進化を理解することは、神経科学と機械学習の重要な課題である。特に、最近の研究では、タスク訓練されたrnnの重み行列は一般的に低ランクであるが、この低ランク構造が学習上でどのように展開するかは不明である。そこで本研究では,学習を通して重み行列によって形成される3-テンソルのランクについて検討する。運動学習タスク中に様々なランクのRNNを大規模ニューラル記録に合わせることで、推定重みは低テンソルランクであり、したがって学習過程全体を通して一定の低次元部分空間で進化することがわかった。次に、同じ課題を解決するために訓練されたRNN上での低テンソルランク学習の観察を検証する。最後に,低次元課題を解くために訓練されたRNNにおいて,低テンソルランクの重みが自然に現れることを示す勾配勾配勾配学習ダイナミクスの行列とテンソルランクの数学的結果を示す。本研究は,生物と人工ニューラルネットワークの双方における学習による集団接続の進化に関する知見を提供し,大規模ニューラル記録からの学習誘起動的変化のリバースエンジニアリングを可能にする。

Learning relies on coordinated synaptic changes in recurrently connected populations of neurons. Therefore, understanding the collective evolution of synaptic connectivity over learning is a key challenge in neuroscience and machine learning. In particular, recent work has shown that the weight matrices of task-trained RNNs are typically low rank, but how this low rank structure unfolds over learning is unknown. To address this, we investigate the rank of the 3-tensor formed by the weight matrices throughout learning. By fitting RNNs of varying rank to large-scale neural recordings during a motor learning task, we find that the inferred weights are low-tensor-rank and therefore evolve over a fixed low-dimensional subspace throughout the entire course of learning. We next validate the observation of low-tensor-rank learning on an RNN trained to solve the same task. Finally, we present a set of mathematical results bounding the matrix and tensor ranks of gradient descent learning dynamics which show that low-tensor-rank weights emerge naturally in RNNs trained to solve low-dimensional tasks. Taken together, our findings provide insight on the evolution of population connectivity over learning in both biological and artificial neural networks, and enable reverse engineering of learning-induced changes in recurrent dynamics from large-scale neural recordings.

翻訳日:2023-11-07 21:29:42 公開日:2023-11-04

# SBSM-Pro:タンパク質のバイオシーケンスマシンをサポート

SBSM-Pro: Support Bio-sequence Machine for Proteins ( http://arxiv.org/abs/2308.10275v2 )

ライセンス: Link先を確認

Yizheng Wang, Yixiao Zhai, Yijie Ding, Quan Zou

(参考訳) タンパク質は生物学的システムにおいて重要な役割を果たす。タンパク質の分類に機械学習アルゴリズムを使用することで、生物実験を補助し、ガイドすることもできる。本稿では,生物配列の分類を目的としたモデルであるSBSM-Pro(Support Bio-Sequence Machine for Proteins)を紹介する。このモデルは生の配列から始まり、その物理化学的性質に基づいてアミノ酸をグループ化する。配列アライメントを組み、タンパク質間の類似性を計測し、新しいマルチカーネル学習(MKL)アプローチを使用して様々な種類の情報を統合し、サポートベクターマシンを用いて分類予測を行う。以上の結果から,本モデルではタンパク質機能の同定と翻訳後修飾の観点から10個のデータセットをまたいだ可読性を示す。本研究は、タンパク質の分類における最先端の研究を実証するだけでなく、生物配列の分類に適したプラットフォームの開発における有益な取り組みとして、この領域の新しい方向を舗装する。 SBSM-Proはhttp://lab.malab.cn/soft/SBSM-Pro/からアクセスできる。

Proteins play a pivotal role in biological systems. The use of machine learning algorithms for protein classification can assist and even guide biological experiments, offering crucial insights for biotechnological applications. We introduce the Support Bio-Sequence Machine for Proteins (SBSM-Pro), a model purpose-built for the classification of biological sequences. This model starts with raw sequences and groups amino acids based on their physicochemical properties. It incorporates sequence alignment to measure the similarities between proteins and uses a novel multiple kernel learning (MKL) approach to integrate various types of information, utilizing support vector machines for classification prediction. The results indicate that our model demonstrates commendable performance across ten datasets in terms of the identification of protein function and posttranslational modification. This research not only exemplifies state-of-the-art work in protein classification but also paves avenues for new directions in this domain, representing a beneficial endeavor in the development of platforms tailored for the classification of biological sequences. SBSM-Pro is available for access at http://lab.malab.cn/soft/SBSM-Pro/.

翻訳日:2023-11-07 21:29:19 公開日:2023-11-04

# 教師に適応する: 模範のない連続学習のための知識蒸留の改善

Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-free Continual Learning ( http://arxiv.org/abs/2308.09544v3 )

ライセンス: Link先を確認

Filip Szatkowski, Mateusz Pyla, Marcin Przewi\k{e}\'zlikowski, Sebastian Cygert, Bart{\l}omiej Twardowski, Tomasz Trzci\'nski

(参考訳) 本研究では, 知識蒸留(KD)を正規化戦略とし, 忘れることの防止を目的とした, 模範的自由クラスインクリメンタルラーニング(CIL)について検討する。 KDベースの手法はCILでうまく使われているが、以前のタスクからトレーニングデータの例にアクセスできることなくモデルを規則化するのに苦労することが多い。分析の結果,この問題は教師ネットワークにおける配布外データを扱う場合の表現変化に起因していることがわかった。これにより、KD損失成分に大きなエラーが発生し、CILモデルのパフォーマンスが低下する。近年の試験時間適応法に触発されて,インクリメンタルトレーニング中に教師と主要モデルを同時に更新する手法であるTeacher Adaptation (TA)を紹介した。提案手法は KD ベースの CIL アプローチとシームレスに統合し,その性能を複数の例のない CIL ベンチマークで一貫した向上を可能にする。このメソッドのソースコードはhttps://github.com/fszatkowski/cl-teacher-adaptationで入手できる。

In this work, we investigate exemplar-free class incremental learning (CIL) with knowledge distillation (KD) as a regularization strategy, aiming to prevent forgetting. KD-based methods are successfully used in CIL, but they often struggle to regularize the model without access to exemplars of the training data from previous tasks. Our analysis reveals that this issue originates from substantial representation shifts in the teacher network when dealing with out-of-distribution data. This causes large errors in the KD loss component, leading to performance degradation in CIL models. Inspired by recent test-time adaptation methods, we introduce Teacher Adaptation (TA), a method that concurrently updates the teacher and the main models during incremental training. Our method seamlessly integrates with KD-based CIL approaches and allows for consistent enhancement of their performance across multiple exemplar-free CIL benchmarks. The source code for our method is available at https://github.com/fszatkowski/cl-teacher-adaptation.

翻訳日:2023-11-07 21:28:15 公開日:2023-11-04

# ヒッグス真空がゼロの可視宇宙の双対として実現される隠れたセクタダークマター

Hidden Sector Dark Matter Realized as a Twin of the Visible Universe With Zero Higgs Vacuum Expectation ( http://arxiv.org/abs/2308.08107v2 )

ライセンス: Link先を確認

Stephen L. Adler

(参考訳) 宇宙は2つの同一の粒子集合とゲージ相互作用を含み、ヒッグスポテンシャルによって異なる重力によってのみ結合する。基礎となる対称性のため、非結合時の2つのセクタは非零相と零ヒッグス真空期待相の境界にあるヒッグスポテンシャルを持つと仮定する。 2つのセクター間の結合を断ち切ることで、あるセクターにおけるヒッグスポテンシャルを非ゼロヒッグス期待領域に(可視セクターを)押し込み、もう一方セクターにおけるヒッグスポテンシャルをゼロヒッグス期待領域に(暗セクターを)押し込むことができる。ダークセクターで最小の質量のバリオンは、自ら相互作用するダークマター粒子の候補となる。

We propose that the universe contains two identical sets of particles and gauge interactions, coupling only through gravitation, which differ by their Higgs potentials. We postulate that because of underlying symmetries, the two sectors when uncoupled have Higgs potentials that lie at the boundary between phases with nonzero and zero Higgs vacuum expectation. Turning on the coupling between the two sectors can break the degeneracy, pushing the Higgs potential in one sector into the domain of nonzero Higgs expectation (giving the visible sector), and pushing the Higgs potential in the other sector into the domain of zero Higgs expectation (giving the dark sector). The least massive baryon in the dark sector will then be a candidate self-interacting dark matter particle.

翻訳日:2023-11-07 21:27:58 公開日:2023-11-04

# 広帯域指向性可視性

Broadband directional invisibility ( http://arxiv.org/abs/2308.03689v2 )

ライセンス: Link先を確認

Farhang Loran and Ali Mostafazadeh

(参考訳) 空間的クラマース-クロニッヒ関係を満たす光学媒体における一方向可視性の発見とそのブロードバンド実現は、非エルミートフォトニクスの重要なランドマークである。この効果の高次元一般化を正確に評価し,2次元および3次元のスカラー波と3次元の電磁波の散乱におけるその実現のための十分な条件を求める。より具体的には、正の実数 $\alpha$ と単位ベクトルの連続体 $\Omega$ が与えられたとき、入射波数 $k$ が$\alpha$(すなわち $k\in(0,\alpha]$) を超えないときに完全(非近似)な可視性を示す相互作用ポテンシャル(または電磁散乱の場合の散乱媒質の誘電率と透過性テンソル)と、入射波ベクトルの方向が$\Omega$ を超えるような相互作用条件を提供する。このアプローチの特徴は、有限周波数領域における完全方向の可視性を示す電位および線形誘電体媒体の構築を可能にすることである。

The discovery of unidirectional invisibility and its broadband realization in optical media satisfying spatial Kramers-Kronig relations are important landmarks of non-Hermitian photonics. We offer a precise characterization of a higher-dimensional generalization of this effect and find sufficient conditions for its realization in the scattering of scalar waves in two and three dimensions and electromagnetic waves in three dimensions. More specifically, given a positive real number $\alpha$ and a continuum of unit vectors $\Omega$, we provide explicit conditions on the interaction potential (or the permittivity and permeability tensors of the scattering medium in the case of electromagnetic scattering) under which it displays perfect (non-approximate) invisibility whenever the incident wavenumber $k$ does not exceed $\alpha$ (i.e., $k\in(0,\alpha]$) and the direction of the incident wave vector ranges over $\Omega$. A distinctive feature of our approach is that it allows for the construction of potentials and linear dielectric media that display perfect directional invisibility in a finite frequency domain.

翻訳日:2023-11-07 21:25:38 公開日:2023-11-04

# 高強度薄膜によるナノ構造中の歪色中心の形成

Deterministic Creation of Strained Color Centers in Nanostructures via High-Stress Thin Films ( http://arxiv.org/abs/2309.07935v2 )

ライセンス: Link先を確認

Daniel R. Assumpcao, Chang Jin, Madison Sutula, Sophie W. Ding, Phong Pham, Can M. Knaut, Mihir K. Bhaskar, Abishrant Panday, Aaron M. Day, Dylan Renaud, Mikhail D. Lukin, Evelyn Hu, Bartholomeus Machielse, Marko Loncar

(参考訳) カラーセンターは、スピン光子量子情報技術を実現するための主要な量子ビット候補として登場した。しかし、プラットフォームの主な制限の1つは、個々の色中心の特性がしばしば歪んでいることである。ダイヤモンドのシリコン空白中心は通常、長いコヒーレンス特性を達成するためにミリケルビン温度を必要とするが、歪んだシリコン空白中心はフォノンによるデコヒーレンスなしで1k以上の温度で動作することが示されている。本研究は,高強度窒化ケイ素薄膜をダイヤモンドナノ構造と組み合わせて,静的に歪んだシリコン空洞色中心(平均基底状態は608GHz)を,ひずみ強度$\sim 4 \times 10^{-4}$で再現する。モデルに基づいて, このひずみは, スピン特性の劣化を伴わずに, 試料中のシリコン空孔中心を1.5Kの高温で動作させるのに十分である。この方法は、高温動作量子メモリを製造するためのスケーラブルなアプローチを提供する。シリコン空調センター以外にも、この手法は他のプラットフォームにも容易に拡張できるほど一般的である。

Color centers have emerged as a leading qubit candidate for realizing hybrid spin-photon quantum information technology. One major limitation of the platform, however, is that the characteristics of individual color-centers are often strain dependent. As an illustrative case, the silicon-vacancy center in diamond typically requires millikelvin temperatures in order to achieve long coherence properties, but strained silicon vacancy centers have been shown to operate at temperatures beyond 1K without phonon-mediated decoherence. In this work we combine high-stress silicon nitride thin films with diamond nanostructures in order to reproducibly create statically strained silicon-vacancy color centers (mean ground state splitting of 608 GHz) with strain magnitudes of $\sim 4 \times 10^{-4}$. Based on modeling, this strain should be sufficient to allow for operation of a majority silicon-vacancy centers within the measured sample at elevated temperatures (1.5K) without any degradation of their spin properties. This method offers a scalable approach to fabricate high-temperature operation quantum memories. Beyond silicon-vacancy centers, this method is sufficiently general that it can be easily extended to other platforms as well.

翻訳日:2023-11-07 21:04:31 公開日:2023-11-04

# 分散型動的チームの信頼を満たすためのステップ

Steps Towards Satisficing Distributed Dynamic Team Trust ( http://arxiv.org/abs/2309.05378v2 )

ライセンス: Link先を確認

Edmund R. Hunt, Chris Baber, Mehdi Sobhani, Sanja Milivojevic, Sagir Yusuf, Mirco Musolesi, Patrick Waterson, Sally Maynard

(参考訳) 動的でマルチエージェントなチームに対する信頼の定義と測定は、さまざまな状況、特に防衛とセキュリティの領域において重要です。チームメンバは、合意された目標と、共有された価値に従って作業することが信頼されるべきです。本稿では,人間とロボットの両方が「信頼」を解釈可能かつ使用可能な方法で定義できるように,目標と価値の定義について考察する。チームの活動の結果は、"目標"、"個人的/チーム的価値"、"法的な原則"という観点で考えることができます。我々は、アライメントが「個人/チーム価値」のレベルで可能か、または「ゴール」と「法的原則」のレベルでのみ可能であるかを疑問視する。我々は、人間またはロボットチームメンバーによって解釈可能な人間ロボットチームの信頼を定義するための一連のメトリクスを議論し、シミュレーションミッションの過程で「満足できる信頼」の概念を実証できる実験を考えます。

Defining and measuring trust in dynamic, multiagent teams is important in a range of contexts, particularly in defense and security domains. Team members should be trusted to work towards agreed goals and in accordance with shared values. In this paper, our concern is with the definition of goals and values such that it is possible to define 'trust' in a way that is interpretable, and hence usable, by both humans and robots. We argue that the outcome of team activity can be considered in terms of 'goal', 'individual/team values', and 'legal principles'. We question whether alignment is possible at the level of 'individual/team values', or only at the 'goal' and 'legal principles' levels. We argue for a set of metrics to define trust in human-robot teams that are interpretable by human or robot team members, and consider an experiment that could demonstrate the notion of 'satisficing trust' over the course of a simulated mission.

翻訳日:2023-11-07 21:02:31 公開日:2023-11-04

# 神経後主成分による不確かさの定量化

Uncertainty Quantification via Neural Posterior Principal Components ( http://arxiv.org/abs/2309.15533v2 )

ライセンス: Link先を確認

Elias Nehme, Omer Yair, Tomer Michaeli

(参考訳) 不確かさの定量化は、自動運転や生物イメージングのような安全クリティカルな領域への画像復元モデルの導入に不可欠である。これまで不確かさを可視化する手法は主にピクセル単位の見積もりに焦点を当ててきた。しかし、ピクセルごとの熱マップは、ピクセル間の強い相関を捉えないため、一般的にはほとんど実用的ではない。より自然な不確実性の尺度は、後方分布の主成分(pcs)に沿った分散に対応する。理論的には、入力画像の条件生成モデルから生成されたサンプルにPCAを適用することにより、PCを計算できる。しかし、これはテスト時に非常に多くのサンプルを生成する必要があり、現在の最先端(拡散)モデルでは痛ましいほど遅い。本研究では,ニューラルネットワークの1回のフォワードパスにおいて,任意の入力画像に対する後続分布のpcsを予測する手法を提案する。提案手法は,平均二乗誤差(MSE)を最小限に抑えるために訓練された事前学習モデルや,予測画像と後部PCの両方を出力するスクラッチからトレーニングすることができる。本稿では,画像のデノナイズ,塗布,超解像,生体画像間翻訳など,画像の逆問題について紹介する。提案手法は, インスタンス適応型不確実性方向を確実に伝達し, 後方サンプリング器に匹敵する不確実性定量化を実現する。コードと例はhttps://eliasnehme.github.io/NPPC/で公開されている。

Uncertainty quantification is crucial for the deployment of image restoration models in safety-critical domains, like autonomous driving and biological imaging. To date, methods for uncertainty visualization have mainly focused on per-pixel estimates. Yet, a heatmap of per-pixel variances is typically of little practical use, as it does not capture the strong correlations between pixels. A more natural measure of uncertainty corresponds to the variances along the principal components (PCs) of the posterior distribution. Theoretically, the PCs can be computed by applying PCA on samples generated from a conditional generative model for the input image. However, this requires generating a very large number of samples at test time, which is painfully slow with the current state-of-the-art (diffusion) models. In this work, we present a method for predicting the PCs of the posterior distribution for any input image, in a single forward pass of a neural network. Our method can either wrap around a pre-trained model that was trained to minimize the mean square error (MSE), or can be trained from scratch to output both a predicted image and the posterior PCs. We showcase our method on multiple inverse problems in imaging, including denoising, inpainting, super-resolution, and biological image-to-image translation. Our method reliably conveys instance-adaptive uncertainty directions, achieving uncertainty quantification comparable with posterior samplers while being orders of magnitude faster. Code and examples are available at https://eliasnehme.github.io/NPPC/

翻訳日:2023-11-07 20:50:55 公開日:2023-11-04

# DeepACO: 組合せ最適化のためのニューラルネットワークAntシステム

DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization ( http://arxiv.org/abs/2309.14032v2 )

ライセンス: Link先を確認

Haoran Ye, Jiarui Wang, Zhiguang Cao, Helan Liang, Yong Li

(参考訳) Ant Colony Optimization (ACO) は、様々な組合せ最適化問題(COP)に適用されたメタヒューリスティックアルゴリズムである。伝統的に、特定の問題に対してACOをカスタマイズするには、知識駆動ヒューリスティックスの専門家設計が必要である。本稿では,深層強化学習を用いてヒューリスティック設計を自動化する汎用フレームワークdeepacoを提案する。 DeepACOは、既存のACOアルゴリズムのヒューリスティックな対策を強化し、将来のACOアプリケーションにおける厳しい手動設計を不要にする。ニューラル強化されたメタヒューリスティックとして、DeepACOは1つのニューラルアーキテクチャと1セットのハイパーパラメータを使用して、8つのCOPでACOの能力を上回っている。 Neural Combinatorial Optimization法として、DeepACOは標準ルーティング問題における問題固有の手法と同等以上の性能を発揮する。私たちのコードはhttps://github.com/henry-yeh/DeepACO.comで公開されています。

Ant Colony Optimization (ACO) is a meta-heuristic algorithm that has been successfully applied to various Combinatorial Optimization Problems (COPs). Traditionally, customizing ACO for a specific problem requires the expert design of knowledge-driven heuristics. In this paper, we propose DeepACO, a generic framework that leverages deep reinforcement learning to automate heuristic designs. DeepACO serves to strengthen the heuristic measures of existing ACO algorithms and dispense with laborious manual design in future ACO applications. As a neural-enhanced meta-heuristic, DeepACO consistently outperforms its ACO counterparts on eight COPs using a single neural architecture and a single set of hyperparameters. As a Neural Combinatorial Optimization method, DeepACO performs better than or on par with problem-specific methods on canonical routing problems. Our code is publicly available at https://github.com/henry-yeh/DeepACO.

翻訳日:2023-11-07 20:49:37 公開日:2023-11-04

# 確率測度空間における勾配流によるサンプリング

Sampling via Gradient Flows in the Space of Probability Measures ( http://arxiv.org/abs/2310.03597v2 )

ライセンス: Link先を確認

Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M Stuart

(参考訳) 未知の正規化定数で目標確率分布をサンプリングすることは、計算科学と工学における根本的な課題である。近年の研究では,確率測度空間における勾配流を考慮したアルゴリズムが,アルゴリズム開発の新たな道を開くことが示されている。本稿では,これらの勾配流の設計成分を精査することにより,このサンプリング手法に3つの貢献を行う。サンプリングのための勾配流のインスタンス化には、フローを決定するためのエネルギー関数と計量、およびアルゴリズムを導出するフローの数値近似が必要である。第一の貢献は、エネルギー汎関数としてのクルバック・リーブラーの発散が、対象分布の正規化定数に依存しない勾配流の独特の性質(すべてのf-分岐)を持つことを示すことである。第二の貢献は、不変性の観点から計量の選択を研究することである。フィッシャー・ラオ計量は微分同相不変量である唯一の選択(スケーリングまで)として知られている。計算可能な代替として,メトリクスと勾配流れに対する緩和されたアフィン不変性を導入する。特に、様々なアフィン不変量wasersteinおよびstein勾配流を構成する。アフィン不変勾配流は、理論上および粒子法を用いて高異方性分布をサンプリングする場合、非アフィン不変流よりも好ましく振る舞うことが示されている。第3の貢献は、勾配流のガウス近似に基づく効率的なアルゴリズムの研究と開発であり、これは粒子法に代わるものである。種々のガウス近似勾配流の接続を確立し,パラメトリック変分推論から生じる勾配法との関係を議論し,その収束特性を理論的および数値的に検討する。

Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.

翻訳日:2023-11-07 20:40:27 公開日:2023-11-04

# 暗号通貨の解読:暗号通貨による消費者の知識と嗜好

Deciphering the Crypto-shopper: Knowledge and Preferences of Consumers Using Cryptocurrencies for Purchases ( http://arxiv.org/abs/2310.02911v3 )

ライセンス: Link先を確認

Massimiliano Silenzi and Umut Can Cabuk

(参考訳) 急速に成長する暗号通貨部門は、ビジネスと消費者の両方に挑戦と機会を与えている。本研究では、暗号通貨で買い物をする人の知識、専門知識、購買習慣を調査した。 516名の被験者を対象に調査を行ったところ,知識レベルは初心者から専門家まで様々であった。興味深いことに、回答者の30%近くが、限られた知識にもかかわらず高い購入頻度を示した。回帰分析の結果、ドメイン知識が果たす役割は、購入頻度に影響を与える要因の11.6%に過ぎなかった。 K平均クラスタ分析により、回答者はさらに3つの異なるグループに分類された。これらの結果は、幅広い知識を暗号通貨の利用の増加に結びつける従来の考え方に異議を唱え、他の要因を示唆している。さまざまな暗号通貨購入者層を理解することは、ビジネスにとって重要な要素であり、適切な戦略とユーザーフレンドリーな体験の必要性を強調している。この研究は、現在の暗号商取引行動に関する洞察を提供し、暗号商業界における幅広い影響と潜在的な変化を探求する将来の研究について論じる。

The fast-growing cryptocurrency sector presents both challenges and opportunities for businesses and consumers alike. This study investigates the knowledge, expertise, and buying habits of people who shop using cryptocurrencies. Our survey of 516 participants shows that knowledge levels vary from beginners to experts. Interestingly, a segment of respondents, nearly 30%, showed high purchase frequency despite their limited knowledge. Regression analyses indicated that while domain knowledge plays a role, it only accounts for 11.6% of the factors affecting purchasing frequency. A K-means cluster analysis further segmented the respondents into three distinct groups, each having unique knowledge levels and purchasing tendencies. These results challenge the conventional idea linking extensive knowledge to increased cryptocurrency usage, suggesting other factors at play. Understanding this varying crypto-shopper demographic is pivotal for businesses, emphasizing the need for tailored strategies and user-friendly experiences. This study offers insights into current crypto-shopping behaviors and discusses future research exploring the broader impacts and potential shifts in the crypto-consumer landscape.

翻訳日:2023-11-07 20:39:29 公開日:2023-11-04

# ジョイントトランスを用いたデ・ノボ薬物設計

De Novo Drug Design with Joint Transformers ( http://arxiv.org/abs/2310.02066v2 )

ライセンス: Link先を確認

Adam Izdebski and Ewelina Weglarz-Tomczak and Ewa Szczurek and Jakub M. Tomczak

(参考訳) de novo drug designでは、トレーニングデータ以外の新しい分子を同時生成し、そのターゲット特性を予測する必要があるため、生成モデルでは難しい作業となる。そこで本研究では,共同生成モデルにおけるトランスフォーマーデコーダ,トランスフォーマーエンコーダ,および予測器を組み合わせたジョイントトランスフォーマを提案する。ペナル化されたログライクな目的を持つモデルのトレーニングにより,分子生成における最先端性能が向上し,新たにサンプリングした分子の予測誤差は,微調整デコーダのみのトランスに比べて42%減少した。最後に, 統合トランスフォーマを用いた確率的ブラックボックス最適化アルゴリズムを提案し, トレーニングデータと比較し, ド・ノボの薬剤設計における他のスマイルベース最適化法を上回って, 目標特性を改善した新規分子を生成する。

De novo drug design requires simultaneously generating novel molecules outside of training data and predicting their target properties, making it a hard task for generative models. To address this, we propose Joint Transformer that combines a Transformer decoder, a Transformer encoder, and a predictor in a joint generative model with shared weights. We show that training the model with a penalized log-likelihood objective results in state-of-the-art performance in molecule generation, while decreasing the prediction error on newly sampled molecules, as compared to a fine-tuned decoder-only Transformer, by 42%. Finally, we propose a probabilistic black-box optimization algorithm that employs Joint Transformer to generate novel molecules with improved target properties, as compared to the training data, outperforming other SMILES-based optimization methods in de novo drug design.

翻訳日:2023-11-07 20:39:14 公開日:2023-11-04

# LanguageBind: 言語に基づくセマンティックアライメントによるN-モダリティへのビデオ言語事前学習

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment ( http://arxiv.org/abs/2310.01852v5 )

ライセンス: Link先を確認

Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, HongFa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, Wancai Zhang, Zhifeng Li, Wei Liu, and Li Yuan

(参考訳) ビデオ言語(VL)プレトレーニングは、複数の下流タスクにおいて著しく改善されている。しかしながら、現在のVL事前学習フレームワークは、視覚や言語を超えた複数のモーダル(Nモダリティ、N>=3)にまで拡張するのは難しい。そこで我々は言語bindを提案し,言語モダリティは十分に探索され,豊富な意味論を含んでいるため,言語を異なるモダリティのバインドとして捉える。具体的には、VL事前学習によって得られた言語エンコーダを凍結し、コントラスト学習を伴う他のモダリティのためのエンコーダを訓練する。その結果、すべてのモダリティは共有機能空間にマッピングされ、マルチモーダルなセマンティックアライメントを実装する。 LanguageBindは、VLモダリティをNモダリティに拡張できることを保証する一方で、言語を中心としたデータペアをアライメントする高品質なデータセットも必要です。そこで我々は,VIDAL-10Mをビデオ,赤外線,深度,オーディオおよびそれに対応する言語として提案し,VIDAL-10Mと命名した。我々のVIDAL-10Mでは、すべてのビデオは長いビデオから切り離されたセグメントではなく、完全な意味を持った短いビデオプラットフォームから作成されています。 vidal-10mをプリトレーニングした後、ゼロショットビデオテキスト検索タスクのパラメータの15%しか持たないmsr-vttデータセットで、imagebindを5.8%r@1に上回った。さらに、LanguageBindはゼロショットビデオ、オーディオ、奥行き、赤外線理解タスクを大幅に改善しました。例えば、LanguageBindがInterVideoを1.9%、MSVDが8.8%、DiDeMoが6.3%、ActivityNetが4.4%上回った。 LLVIPとNYU-Dデータセットでは、LanguageBindがImageBindを23.8%、11.1%で上回っている。コードアドレスはhttps://github.com/PKU-YuanGroup/LanguageBind。

The video-language (VL) pretraining has achieved remarkable improvement in multiple downstream tasks. However, the current VL pretraining framework is hard to extend to multiple modalities (N modalities, N>=3) beyond vision and language. We thus propose LanguageBind, taking the language as the bind across different modalities because the language modality is well-explored and contains rich semantics. Specifically, we freeze the language encoder acquired by VL pretraining, then train encoders for other modalities with contrastive learning. As a result, all modalities are mapped to a shared feature space, implementing multi-modal semantic alignment. While LanguageBind ensures that we can extend VL modalities to N modalities, we also need a high-quality dataset with alignment data pairs centered on language. We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M. In our VIDAL-10M, all videos are from short video platforms with complete semantics rather than truncated segments from long videos, and all the video, depth, infrared, and audio modalities are aligned to their textual descriptions. After pretraining on VIDAL-10M, we outperform ImageBind by 5.8% R@1 on the MSR-VTT dataset with only 15% of the parameters in the zero-shot video-text retrieval task. Beyond this, our LanguageBind has greatly improved in the zero-shot video, audio, depth, and infrared understanding tasks. For instance, LanguageBind surpassing InterVideo by 1.9% on MSR-VTT, 8.8% on MSVD, 6.3% on DiDeMo, and 4.4% on ActivityNet. On the LLVIP and NYU-D datasets, LanguageBind outperforms ImageBind with 23.8% and 11.1% top-1 accuracy. Code address: https://github.com/PKU-YuanGroup/LanguageBind.

翻訳日:2023-11-07 20:38:25 公開日:2023-11-04

# 拡散モデルの訓練に関するデバイアス

Debias the Training of Diffusion Models ( http://arxiv.org/abs/2310.08442v2 )

ライセンス: Link先を確認

Hu Yu, Li Shen, Jie Huang, Man Zhou, Hongsheng Li, Feng Zhao

(参考訳) 拡散モデルでは、単純な denoising score matching loss によって変分下界を最適化することで、魅力的な生成品質を示す。本稿では,拡散モデルにおける一定損失重み戦略の利用が,トレーニング段階での偏り推定につながるという理論的根拠を与える。ガウス雑音を一定重み付けで予測するために単純にデノナイジングネットワークを最適化することは、原画像の正確な推定を妨げる可能性がある。この問題に対処するため,理論的に偏りのない原理に基づくエレガントで効果的な重み付け戦略を提案する。さらに, 本研究は, その存在, 影響, 理由の観点から, 定常的な重み付け損失から生じる本質バイアス問題を明らかにするため, 包括的かつ体系的な調査を行う。これらの分析は、拡散モデルの内部動作の理解とデミステレーションを促進することが期待されている。実験結果から,提案手法は複雑な手法に頼らずにサンプル品質を著しく向上させ,トレーニングやサンプリング処理においてベースライン法と比較して精度が向上することを示した。

Diffusion models have demonstrated compelling generation quality by optimizing the variational lower bound through a simple denoising score matching loss. In this paper, we provide theoretical evidence that the prevailing practice of using a constant loss weight strategy in diffusion models leads to biased estimation during the training phase. Simply optimizing the denoising network to predict Gaussian noise with constant weighting may hinder precise estimations of original images. To address the issue, we propose an elegant and effective weighting strategy grounded in the theoretically unbiased principle. Moreover, we conduct a comprehensive and systematic exploration to dissect the inherent bias problem deriving from constant weighting loss from the perspectives of its existence, impact and reasons. These analyses are expected to advance our understanding and demystify the inner workings of diffusion models. Through empirical evaluation, we demonstrate that our proposed debiased estimation method significantly enhances sample quality without the reliance on complex techniques, and exhibits improved efficiency compared to the baseline method both in training and sampling processes.

翻訳日:2023-11-07 20:26:33 公開日:2023-11-04

# 抽象的故障症状マッチングによるJust-in-Time Flakyテスト検出

Just-in-Time Flaky Test Detection via Abstracted Failure Symptom Matching ( http://arxiv.org/abs/2310.06298v2 )

ライセンス: Link先を確認

Gabin An, Juyeon Yoon, Thomas Bach, Jingun Hong, Shin Yoo

(参考訳) 我々は,大規模な産業用ソフトウェアシステムであるSAP HANAの継続的インテグレーション(CI)パイプラインにおいて,エラーメッセージやスタックトレースなどの障害症状を使用して,不安定なテスト障害を特定する経験を報告する。障害症状は類似した障害を特定するために一般的に用いられるが、これまでは不安定なテスト障害を検出するために用いられていなかった。我々の仮説は、脆弱な障害は非脆弱な障害と異なる症状を示すだろうということです。その結果,失敗症状を過去の失敗症状と一致させることで,テストを再実行することなく,繰り返し発生する不安定な障害を識別できる。これにより、テストの再実行の必要性が大幅に低減され、最終的にはテスト結果のデリバリが高速になる。異なる実行インスタンスにまたがるフレキ障害の対応を容易にするため、障害重複とログ解析の分野における以前の研究から着想を得た、フレキ障害の既知のパターンと一致する前に、より新しいテスト障害症状を抽象化する。 SAP HANAのCIデータから収集した実際の故障症状を6カ月間に検出し,症状に基づくフレキネス検出法について検討した。本手法は, 故障症状を用いて再発障害を同定し, 96%以上の精度を達成し, 従来の再実行戦略と比較して約58%の機械時間を節約できる可能性を示した。偽陽性の分析と開発者からのフィードバックは、この症状ベースのアプローチの効果的なデプロイと不安定なテストのデバッグの両方において、説明的かつ情報的障害症状を持つことの重要性を強調している。

We report our experience of using failure symptoms, such as error messages or stack traces, to identify flaky test failures in a Continuous Integration (CI) pipeline for a large industrial software system, SAP HANA. Although failure symptoms are commonly used to identify similar failures, they have not previously been employed to detect flaky test failures. Our hypothesis is that flaky failures will exhibit symptoms distinct from those of non-flaky failures. Consequently, we can identify recurring flaky failures, without rerunning the tests, by matching the failure symptoms to those of historical flaky runs. This can significantly reduce the need for test reruns, ultimately resulting in faster delivery of test results to developers. To facilitate the process of matching flaky failures across different execution instances, we abstract newer test failure symptoms before matching them to the known patterns of flaky failures, inspired by previous research in the fields of failure deduplication and log analysis. We evaluate our symptom-based flakiness detection method using actual failure symptoms gathered from CI data of SAP HANA during a six-month period. Our method shows the potential of using failure symptoms to identify recurring flaky failures, achieving a precision of at least 96%, while saving approximately 58% of the machine time compared to the traditional rerun strategy. Analysis of the false positives and the feedback from developers underscore the importance of having descriptive and informative failure symptoms for both the effective deployment of this symptom-based approach and the debugging of flaky tests.

翻訳日:2023-11-07 20:25:25 公開日:2023-11-04

# マルチリガンドドドッキングと結合サイト設計のための高調波自己条件流れマッチング

Harmonic Self-Conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design ( http://arxiv.org/abs/2310.05764v2 )

ライセンス: Link先を確認

Hannes St\"ark, Bowen Jing, Regina Barzilay, Tommi Jaakkola

(参考訳) タンパク質の機能には酵素触媒を含む小さな分子の結合が必要である。そのため、小さな分子に対する結合ポケットの設計には、薬物合成からエネルギー貯蔵まで、いくつかの影響のある応用がある。この目的に向けて,我々はまず,自己条件付きフローマッチングの目的に基づいて3次元タンパク質-リガンド結合構造を改良したHarmonicFlowを開発した。 flowsiteはこのフローモデルを拡張して、タンパク質ポケットの離散的な残基型と分子の結合3d構造を共同生成する。本研究では,HarmonicFlowがポケットレベルのドッキングにおいて,ドッキングの簡易性,汎用性,平均サンプル品質を向上することを示す。この構造モデリングにより、フローサイトはバインドサイトをベースラインアプローチよりも実質的に優れている設計をする。

A significant amount of protein function requires binding small molecules, including enzymatic catalysis. As such, designing binding pockets for small molecules has several impactful applications ranging from drug synthesis to energy storage. Towards this goal, we first develop HarmonicFlow, an improved generative process over 3D protein-ligand binding structures based on our self-conditioned flow matching objective. FlowSite extends this flow model to jointly generate a protein pocket's discrete residue types and the molecule's binding 3D structure. We show that HarmonicFlow improves upon state-of-the-art generative processes for docking in simplicity, generality, and average sample quality in pocket-level docking. Enabled by this structure modeling, FlowSite designs binding sites substantially better than baseline approaches.

翻訳日:2023-11-07 20:24:56 公開日:2023-11-04

# 検索型大規模言語モデルによる財務感性分析の強化

Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models ( http://arxiv.org/abs/2310.04027v2 )

ライセンス: Link先を確認

Boyu Zhang, Hongyang Yang, Tianyu Zhou, Ali Babar, Xiao-Yang Liu

(参考訳) 金融センチメント分析は、バリュエーションと投資決定に不可欠である。しかし、従来のNLPモデルは、パラメータサイズとトレーニングデータセットの範囲によって制限されており、この分野での一般化能力と有効性を損なう。近年,広範コーパスで事前学習したLarge Language Models (LLMs) は,圧縮可能なゼロショット能力のため,様々なNLPタスクにおいて優れた性能を示した。 LLMの事前学習目標と感情ラベルの予測との相違は、彼らの予測性能を損なう可能性がある。さらに、十分な文脈を欠いた財務ニュースの簡潔な性質は、LLMの感情分析の信頼性を著しく低下させる可能性がある。これらの課題に対処するため,金融感情分析のためのLLMフレームワークを提案する。このフレームワークは、LLMが感情ラベルの予測子として振る舞うことを保証する命令調整LDMモジュールと、信頼できる外部ソースから追加コンテキストを取得する検索拡張モジュールを含む。従来のモデルとChatGPTやLLaMAなどのLLMを比較し,精度とF1得点の15～48倍の性能向上を実現した。

Financial sentiment analysis is critical for valuation and investment decision-making. Traditional NLP models, however, are limited by their parameter size and the scope of their training datasets, which hampers their generalization capabilities and effectiveness in this field. Recently, Large Language Models (LLMs) pre-trained on extensive corpora have demonstrated superior performance across various NLP tasks due to their commendable zero-shot abilities. Yet, directly applying LLMs to financial sentiment analysis presents challenges: The discrepancy between the pre-training objective of LLMs and predicting the sentiment label can compromise their predictive performance. Furthermore, the succinct nature of financial news, often devoid of sufficient context, can significantly diminish the reliability of LLMs' sentiment analysis. To address these challenges, we introduce a retrieval-augmented LLMs framework for financial sentiment analysis. This framework includes an instruction-tuned LLMs module, which ensures LLMs behave as predictors of sentiment labels, and a retrieval-augmentation module which retrieves additional context from reliable external sources. Benchmarked against traditional models and LLMs like ChatGPT and LLaMA, our approach achieves 15\% to 48\% performance gain in accuracy and F1 score.

翻訳日:2023-11-07 20:23:17 公開日:2023-11-04

# RTDK-BO:Reinforced Transformer Deep kernelを用いた高次元ベイズ最適化

RTDK-BO: High Dimensional Bayesian Optimization with Reinforced Transformer Deep kernels ( http://arxiv.org/abs/2310.03912v4 )

ライセンス: Link先を確認

Alexander Shmakov, Avisek Naug, Vineet Gundecha, Sahand Ghorbanpour, Ricardo Luna Gutierrez, Ashwin Ramesh Babu, Antonio Guillen and Soumyendu Sarkar

(参考訳) gaussian process (gp) surrogates によって導かれたベイズ最適化 (bo) は、効率的で高次元のブラックボックス最適化のための貴重な技術であり、産業設計や科学計算のような多くの応用に固有の重要な問題である。近年、単機能最適化と多目的最適化の両方において最適化性能を向上させるために強化学習(RL)を導入している。しかし、数発のテクニックでさえ、密接に関連する目的間で共有される類似性を活用できない。本稿では,近年のDeep Kernel Learning(DKL)とアテンションベースのTransformerモデルを組み合わせて,GPサロゲートとメタラーニングのモデリング能力を向上させる。本稿では,dklに注意機構を組み込んで,boプロセス中に収集した文脈情報に適応させる新しいメタラーニングboサロゲート改善手法を提案する。このトランスフォーマーディープカーネルと,連続的ソフトアクタ-クリティック強化学習を訓練した学習獲得関数を組み合わせることで,探索を支援する。この強化変圧器ディープカーネル(rtdk-bo)アプローチは、最先端の結果を連続的な高次元最適化問題に導く。

Bayesian Optimization (BO), guided by Gaussian process (GP) surrogates, has proven to be an invaluable technique for efficient, high-dimensional, black-box optimization, a critical problem inherent to many applications such as industrial design and scientific computing. Recent contributions have introduced reinforcement learning (RL) to improve the optimization performance on both single function optimization and \textit{few-shot} multi-objective optimization. However, even few-shot techniques fail to exploit similarities shared between closely related objectives. In this paper, we combine recent developments in Deep Kernel Learning (DKL) and attention-based Transformer models to improve the modeling powers of GP surrogates with meta-learning. We propose a novel method for improving meta-learning BO surrogates by incorporating attention mechanisms into DKL, empowering the surrogates to adapt to contextual information gathered during the BO process. We combine this Transformer Deep Kernel with a learned acquisition function trained with continuous Soft Actor-Critic Reinforcement Learning to aid in exploration. This Reinforced Transformer Deep Kernel (RTDK-BO) approach yields state-of-the-art results in continuous high-dimensional optimization problems.

翻訳日:2023-11-07 20:22:38 公開日:2023-11-04

# PyDCM:持続可能性のための強化学習を備えたカスタムデータセンターモデル

PyDCM: Custom Data Center Models with Reinforcement Learning for Sustainability ( http://arxiv.org/abs/2310.03906v5 )

ライセンス: Link先を確認

Avisek Naug, Antonio Guillen, Ricardo Luna Guti\'errez, Vineet Gundecha, Dejan Markovikj, Lekhapriya Dheeraj Kashyap, Lorenz Krause, Sahand Ghorbanpour, Sajad Mousavi, Ashwin Ramesh Babu, Soumyendu Sarkar

(参考訳) 持続可能性や二酸化炭素排出量削減の国際的重点化が進む中、政府や企業はデータセンターの設計と運用に対するアプローチを再考するよう迫られている。高エネルギー消費と指数関数的に大きな計算ワークロードを考えると、データセンターは特に冷却やITエネルギー利用といった分野において、電力消費を最適化する主要な候補である。この追求における重要な課題は、エンドツーエンドのパイプラインを提供する構成可能でスケーラブルな熱データセンターモデルがないことである。データセンターは、幾何学的な構成と熱散逸が熱モデリングを困難にする複数のITコンポーネントで構成されている。本稿では,Pythonで実装されたカスタマイズ可能なデータセンターモデルであるPyDCMを提案する。ベクトル化熱計算を用いることで、pydcmのオーダーは現在のエネルギーとモデリングの実装よりも30倍速くなり、cpuの数とサブリニアにスケールできる。また、pydcmは、gymnasiumラッパーを介して深層強化学習を使用してデータセンターの冷却を最適化し、様々なデータセンター設計プロトタイプをテストするユーザフレンドリーなプラットフォームを提供する。

The increasing global emphasis on sustainability and reducing carbon emissions is pushing governments and corporations to rethink their approach to data center design and operation. Given their high energy consumption and exponentially large computational workloads, data centers are prime candidates for optimizing power consumption, especially in areas such as cooling and IT energy usage. A significant challenge in this pursuit is the lack of a configurable and scalable thermal data center model that offers an end-to-end pipeline. Data centers consist of multiple IT components whose geometric configuration and heat dissipation make thermal modeling difficult. This paper presents PyDCM, a customizable Data Center Model implemented in Python, that allows users to create unique configurations of IT equipment with custom server specifications and geometric arrangements of IT cabinets. The use of vectorized thermal calculations makes PyDCM orders of magnitude faster (30 times) than current Energy Plus modeling implementations and scales sublinearly with the number of CPUs. Also, PyDCM enables the use of Deep Reinforcement Learning via the Gymnasium wrapper to optimize data center cooling and offers a user-friendly platform for testing various data center design prototypes.

翻訳日:2023-11-07 20:22:16 公開日:2023-11-04

# 多角形ビリヤードにおける半古典状態励起

The semiclassical states excitations in the multi-rectangular billiards ( http://arxiv.org/abs/2310.13166v2 )

ライセンス: Link先を確認

Stefan Giller

(参考訳) l$の形をしたビリヤード等の量子化の問題、すなわち、各角度が$\pi/2$または$3\pi/2$であるような問題は、フーリエ級数展開法(英語版)のツールとして用いられる。これらの多角形ビリヤード(MRB)におけるスーパーカー効果について,各波動関数と量子化条件を記述し,検討した。近似コピーに存在する半古典的モードに最も近いモード全体に対してビリヤードが励起されるmrmのスーパースカル現象は、それらの平行辺が互いに合理的な関係にあるmrmである。

The problem of the quantizations of the $L$-shaped billiards and the like ones, i.e. each angle of which is equal to $\pi/2$ or $3\pi/2$, is considered using as a tool the Fourier series expansion method. The respective wave functions and the quantization conditions are written and discussed looking for and discussing about the superscars effects in such multi-rectangular billiards (MRB). It is found that a special set of POC modes effect the superscars phenomena in MRB in which the billiards are excited as a whole to the modes closest to the semiclassical ones existing in their approximated copies being MRB in which their parallel sides remain in rational relations between themselves.

翻訳日:2023-11-07 20:14:36 公開日:2023-11-04

# vibe: twitter分類のためのトピック駆動時間適応

VIBE: Topic-Driven Temporal Adaptation for Twitter Classification ( http://arxiv.org/abs/2310.10191v3 )

ライセンス: Link先を確認

Yuji Zhang, Jing Li, Wenjie Li

(参考訳) 言語機能は現実世界のソーシャルメディアで進化しており、ダイナミックスにおけるテキスト分類のパフォーマンスが低下している。この課題に対処するために、過去のデータに基づいてトレーニングされたモデルが将来テストされる時間適応について研究する。以前のほとんどの作業は、事前トレーニングや知識更新の継続に重点を置いており、騒がしいソーシャルメディアデータでのパフォーマンスを損なう可能性がある。この問題に取り組むために,潜在トピック進化のモデル化を通じて特徴変化を反映し,新しいモデルであるvibe: variational information bottleneck for evolutionsを提案する。具体的には、まず2つのInformation Bottleneck(IB)レギュレータを使用し、過去と将来のトピックを区別する。次に,タイムスタンプとクラスラベル予測を用いたマルチタスクトレーニングによる適応機能として機能する。適応学習では、VIBEは、後進的に生成されたオンラインストリームから取得した未ラベルデータをトレーニングデータ時間に利用する。 twitterによる3つの分類タスクの実験では、データのわずか3%のモデルが、これまでの最先端のトレーニング方法を大きく上回っていることが分かりました。

Language features are evolving in real-world social media, resulting in the deteriorating performance of text classification in dynamics. To address this challenge, we study temporal adaptation, where models trained on past data are tested in the future. Most prior work focused on continued pretraining or knowledge updating, which may compromise their performance on noisy social media data. To tackle this issue, we reflect feature change via modeling latent topic evolution and propose a novel model, VIBE: Variational Information Bottleneck for Evolutions. Concretely, we first employ two Information Bottleneck (IB) regularizers to distinguish past and future topics. Then, the distinguished topics work as adaptive features via multi-task training with timestamp and class label prediction. In adaptive learning, VIBE utilizes retrieved unlabeled data from online streams created posterior to training data time. Substantial Twitter experiments on three classification tasks show that our model, with only 3% of data, significantly outperforms previous state-of-the-art continued-pretraining methods.

翻訳日:2023-11-07 20:12:00 公開日:2023-11-04

# 言語とメンタルヘルス:言語的バイオソーシャルマーカーとしてのテキストからの感情動態の測定

Language and Mental Health: Measures of Emotion Dynamics from Text as Linguistic Biosocial Markers ( http://arxiv.org/abs/2310.17369v2 )

ライセンス: Link先を確認

Daniela Teodorescu, Tiffany Cheng, Alona Fyshe, Saif M. Mohammad

(参考訳) 精神病理学の研究は、総じて、時間とともに感情の変化のパターン(感情のダイナミクス)が精神状態の指標であることを示した。感情変化のパターンは、伝統的に感情の自己報告を通じて決定されてきたが、正確性、バイアス、データ収集の容易さに問題がある。日常の発話から感情のダイナミクスを決定する最近のアプローチは、これらの懸念の多くに対処しているが、これらの発話感情のダイナミクス(ued)が精神の健康診断と相関しているかどうかはまだ分かっていない。ここでは、ツイートの感情動態とメンタルヘルス障害との関係について初めて検討する。調査対象のUEDメトリクスはそれぞれ,ユーザの自己開示診断によって異なることがわかった。例えば、ADHD、MDD、PTSDのユーザと比較して、コントロールグループでは平均値が有意に高かった(すなわち、よりポジティブなテキスト)。 ADHD, うつ病, 双極性障害, MDD, PTSD, OCDに対して有意差は認められなかったが, PPDは認められなかった。原子価の上昇と回復率もコントロールと大きく異なることが示された。この研究は、感情力学に関連する言語的手がかりが、精神疾患の生社会マーカーとして重要な役割を担い、精神疾患の理解、診断、管理に役立っていることを示す重要な初期の証拠を提供する。

Research in psychopathology has shown that, at an aggregate level, the patterns of emotional change over time -- emotion dynamics -- are indicators of one's mental health. One's patterns of emotion change have traditionally been determined through self-reports of emotions; however, there are known issues with accuracy, bias, and ease of data collection. Recent approaches to determining emotion dynamics from one's everyday utterances addresses many of these concerns, but it is not yet known whether these measures of utterance emotion dynamics (UED) correlate with mental health diagnoses. Here, for the first time, we study the relationship between tweet emotion dynamics and mental health disorders. We find that each of the UED metrics studied varied by the user's self-disclosed diagnosis. For example: average valence was significantly higher (i.e., more positive text) in the control group compared to users with ADHD, MDD, and PTSD. Valence variability was significantly lower in the control group compared to ADHD, depression, bipolar disorder, MDD, PTSD, and OCD but not PPD. Rise and recovery rates of valence also exhibited significant differences from the control. This work provides important early evidence for how linguistic cues pertaining to emotion dynamics can play a crucial role as biosocial markers for mental illnesses and aid in the understanding, diagnosis, and management of mental health disorders.

翻訳日:2023-11-07 20:02:33 公開日:2023-11-04

# Data Provenance Initiative: AIにおけるデータセットライセンスと属性の大規模監査

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI ( http://arxiv.org/abs/2310.16787v3 )

ライセンス: Link先を確認

Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Sara Hooker

(参考訳) 膨大な、多様な、一貫性のないデータセットで言語モデルをトレーニングするレースは、実践者に対する法的および倫理的リスクに対する懸念を高めている。データの透明性と理解を脅かすこれらのプラクティスを是正するために、法律と機械学習の専門家の間で、1800以上のテキストデータセットを体系的に監査し追跡するための、複数の学際的な取り組みを招集する。私たちは、ソース、クリエーター、一連のライセンス条件、プロパティ、以降の使用から、これらのデータセットの系統をトレースするためのツールと標準を開発します。私たちのランドスケープ分析は、より低いリソース言語、より創造的なタスク、よりリッチなトピックの多様性、より新しい、より合成的なトレーニングデータといった重要なカテゴリを独占するクローズドデータセットによる、商業的にオープンなデータセットとクローズドデータセットの組成と焦点の急激な分割を強調しています。このことは、異なるライセンス条件下で利用できるデータの種類がより深く分断され、著作権と公正使用に関する司法的法的解釈への含意が高まったことを示している。また、広く使われているデータセットホスティングサイトでは、ライセンスが70%以上、エラー率が50%以上である、ライセンスの頻繁な誤分類も観察する。これは、多くの最近のブレークスルーを駆動する最も人気のあるデータセットの誤帰と情報利用の危機を示している。データセットの透明性と責任ある使用に関する継続的な改善への貢献として、私たちは、最もポピュラーなオープンソースの微調整データコレクションであるwww.dataprovenance.orgのために、データプロヴァンスをトレースしてフィルタできるインタラクティブuiであるdata provenance explorerを使って、監査全体をリリースします。

The race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners. To remedy these practices threatening data transparency and understanding, we convene a multi-disciplinary effort between legal and machine learning experts to systematically audit and trace 1800+ text datasets. We develop tools and standards to trace the lineage of these datasets, from their source, creators, series of license conditions, properties, and subsequent use. Our landscape analysis highlights the sharp divides in composition and focus of commercially open vs closed datasets, with closed datasets monopolizing important categories: lower resource languages, more creative tasks, richer topic variety, newer and more synthetic training data. This points to a deepening divide in the types of data that are made available under different license conditions, and heightened implications for jurisdictional legal interpretations of copyright and fair use. We also observe frequent miscategorization of licenses on widely used dataset hosting sites, with license omission of 70%+ and error rates of 50%+. This points to a crisis in misattribution and informed use of the most popular datasets driving many recent breakthroughs. As a contribution to ongoing improvements in dataset transparency and responsible use, we release our entire audit, with an interactive UI, the Data Provenance Explorer, which allows practitioners to trace and filter on data provenance for the most popular open source finetuning data collections: www.dataprovenance.org.

翻訳日:2023-11-07 20:01:55 公開日:2023-11-04

# GlotLID:低リソース言語のための言語識別

GlotLID: Language Identification for Low-Resource Languages ( http://arxiv.org/abs/2310.16248v2 )

ライセンス: Link先を確認

Amir Hossein Kargaran, Ayyoob Imani, Fran\c{c}ois Yvon, Hinrich Sch\"utze

(参考訳) 最近のいくつかの論文は、約300の高リソース言語と中リソース言語のための優れた言語識別ソリューション(lid)を公開している。ただし、LIDは利用できない。 i) 幅広い低リソース言語をカバーしている。 (ii)厳格に評価され、信頼性がある (iii)効率的で使いやすい。 glotlid-mは広範にわたる範囲,信頼性,効率性のデシデラタを満たすlidモデルである。 1665の言語を識別し、以前の作業に比べてカバー範囲が大幅に増加した。実験では,F1と偽陽性率(FPR)のバランスをとる場合,GlotLID-Mは4つのベースライン(CLD3,FT176,OpenLID,NLLB)を上回った。コーパスメタデータの誤り、高リソース言語からの漏洩、密接な関連言語間の分離の困難、マクロ言語対バラエティの処理、一般的なノイズデータなどである。 GlotLID-Mをデータセット生成パイプラインに統合することで,低リソース言語や文化に対するNLP技術の品質向上とアクセシビリティ向上が期待できる。 GlotLID-Mモデル、コード、およびデータソースのリストが利用可能である。

Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages. However, there is no LID available that (i) covers a wide range of low-resource languages, (ii) is rigorously evaluated and reliable and (iii) efficient and easy to use. Here, we publish GlotLID-M, an LID model that satisfies the desiderata of wide coverage, reliability and efficiency. It identifies 1665 languages, a large increase in coverage compared to prior work. In our experiments, GlotLID-M outperforms four baselines (CLD3, FT176, OpenLID and NLLB) when balancing F1 and false positive rate (FPR). We analyze the unique challenges that low-resource LID poses: incorrect corpus metadata, leakage from high-resource languages, difficulty separating closely related languages, handling of macrolanguage vs varieties and in general noisy data. We hope that integrating GlotLID-M into dataset creation pipelines will improve quality and enhance accessibility of NLP technology for low-resource languages and cultures. GlotLID-M model, code, and list of data sources are available: https://github.com/cisnlp/GlotLID.

翻訳日:2023-11-07 20:01:24 公開日:2023-11-04

# ノイズ量子チャネルとしてのLandau-Streaterチャネル

The Landau-Streater Channel as a Noisy Quantum Channel ( http://arxiv.org/abs/2310.15353v3 )

ライセンス: Link先を確認

Shayan Roofeh, Vahid Karimipour

(参考訳) 3次元では、ランダウ・セプター・チャンネルはヴェルナー・ホルボ・チャンネルにすぎない。このようなチャネルは連続パラメータを持たず、環境ノイズをモデル化することはできない。我々は、その凸と同一性チャネルとの組合せを考え、クトリッツ上の1パラメータ雑音モデルとして適する。さらに、Werner-Holevo チャネルは完全ユニタリ群 $SU(3)$ の下で共分散を示すが、拡張族は群 $SO(3)$ の下でのみ共分散を保持する。この対称性の低減は、元のチャネルの様々な特性に対する影響を調べることができる。特に, チャネルのスペクトル, 可視性, 相補的チャネル, 正確なあるいは近似的な分解性, および各種のキャパシティへの影響について検討する。具体的には, 量子容量に対する下界と上界の確立とともに, 単発古典容量と絡み合い支援容量の解析式を導出する。

In three dimensions, the Landau-Streater channel is nothing but the Werner-Holevo channel. Such a channel has no continuous parameter and hence cannot model an environmental noise. We consider its convex combination with the identity channel, making it suitable as a one-parameter noise model on qutrits. Moreover, whereas the original Werner-Holevo channel exhibits covariance under the complete unitary group $SU(3)$, the extended family maintains covariance only under the group $SO(3)$. This symmetry reduction allows us to investigate its impact on various properties of the original channel. In particular, we examine its influence on the channel's spectrum, divisibility, complementary channel, and exact or approximate degradability, as well as its various kinds of capacities. Specifically, we derive analytical expressions for the one-shot classical capacity and the entanglement-assisted capacity, accompanied by the establishment of lower and upper bounds for the quantum capacity.

翻訳日:2023-11-07 20:00:02 公開日:2023-11-04

# ヘビアン学習と自由エネルギー最小化による認知共通モデルの神経模倣的実現

A Neuro-mimetic Realization of the Common Model of Cognition via Hebbian Learning and Free Energy Minimization ( http://arxiv.org/abs/2310.15177v2 )

ライセンス: Link先を確認

Alexander Ororbia, Mary Alexandria Kelly

(参考訳) ここ数年で、意味的に豊富なテキストのパスを合成したり、複雑なイメージを生成できる大きなニューラル生成モデルが、'生成人工知能'(Generative AI)として知られるようになったものの一般的な表現として登場した。新たな機会への扉を開くだけでなく、統計的機械学習の領域の課題にも目を向けるだけでなく、生成型aiの人気が高まるにつれて、認知科学にも興味深い疑問が持ち上がっている。この目標を念頭に置いて、有望な研究プログラムは認知アーキテクチャの創造であり、この分野の長年の伝統であり、基本的にはニューロ・ミメティック・ジェネレーティブ・ビルディング・ブロック(英語版)という観点から鋳造されていると論じている。具体的には,多変量自由エネルギー汎関数を最適化する目的で動作するヒュービアン適応の観点から,認知の共通モデルを用いる認知神経生成システムについて論じる。

Over the last few years, large neural generative models, capable of synthesizing semantically rich passages of text or producing complex images, have recently emerged as a popular representation of what has come to be known as ``generative artificial intelligence'' (generative AI). Beyond opening the door to new opportunities as well as challenges for the domain of statistical machine learning, the rising popularity of generative AI brings with it interesting questions for Cognitive Science, which seeks to discover the nature of the processes that underpin minds and brains as well as to understand how such functionality might be acquired and instantianted in biological (or artificial) substrate. With this goal in mind, we argue that a promising research program lies in the crafting of cognitive architectures, a long-standing tradition of the field, cast fundamentally in terms of neuro-mimetic generative building blocks. Concretely, we discuss the COGnitive Neural GENerative system, such an architecture that casts the Common Model of Cognition in terms of Hebbian adaptation operating in service of optimizing a variational free energy functional.

翻訳日:2023-11-07 19:59:46 公開日:2023-11-04

# 北エフの量子二重模型の任意のセクターの分類

Classification of the anyon sectors of Kitaev's quantum double model ( http://arxiv.org/abs/2310.19661v2 )

ライセンス: Link先を確認

Alex Bols, Siddharth Vadnerkar

(参考訳) 無限三角格子上のキタエフの量子二重モデルの任意のセクターと、非アーベルケースを含む有限ゲージ群$G$の完全な分類を与える。予想通り、モデルの任意のセクターは、正確に$G$の量子二重代数の既約表現に対応する。私たちの証明は2つの主な部分からなる。第一部では、量子二重代数の各既約表現を純粋状態として構成し、これらの純状態の GNS 表現が任意のセクターに対的に不随意であることを示す。第2部では、任意のエノンセクターが、第1部で構築されたエノンセクターの1つに一意的に等しいことを示す。証明の最初の部分は、問題の状態の記述を文字列-ネット凝縮として決定的に用いている。純粋性は、これらの状態が局所的制約の適切な集合を満たすユニークな状態として特徴づけられる。証明の核心は、局所ゲージ変換のある群が局所弦ネットの集合に対して自由に推移的に作用するという事実である。第二に、任意のセクターがこれらの制約の有限個を除いて全てを満たす純粋状態を含むことを示す。既知の手法を用いることで、これらの制約のうちの1つを除いて全てを満たすあらゆるセクターで純粋な状態を構築することができる。最後に、そのような状態は、最初の部分で構築された任意のセクターの1つのベクトル状態でなければならないことを示す。

We give a complete classification of the anyon sectors of Kitaev's quantum double model on the infinite triangular lattice and for finite gauge group $G$, including the non-abelian case. As conjectured, the anyon sectors of the model correspond precisely to the irreducible representations of the quantum double algebra of $G$. Our proof consists of two main parts. In the first part, we construct for each irreducible representation of the quantum double algebra a pure state and show that the GNS representations of these pure states are pairwise disjoint anyon sectors. In the second part we show that any anyon sector is unitarily equivalent to one of the anyon sectors constructed in the first part. The first part of the proof crucially uses a description of the states in question as string-net condensates. Purity is shown by characterising these states as the unique states that satisfy appropriate sets of local constraints. At the core of the proof is the fact that certain groups of local gauge transformations act freely and transitively on collections of local string-nets. For the second part, we show that any anyon sector contains a pure state that satisfies all but a finite number of these constraints. Using known techniques we can then construct a pure state in the anyon sector that satisfies all but one of these constraints. Finally, we show explicitly that any such state must be a vector state in one of the anyon sectors constructed in the first part.

翻訳日:2023-11-07 19:50:12 公開日:2023-11-04

# リニア関数近似による強化学習のための遅延フィードバックによる後方サンプリング

Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation ( http://arxiv.org/abs/2310.18919v2 )

ライセンス: Link先を確認

Nikki Lijing Kuang, Ming Yin, Mengdi Wang, Yu-Xiang Wang, Yi-An Ma

(参考訳) 強化学習(RL)の最近の研究は、関数近似を利用して、より優れたパフォーマンスのためにサンプル複雑性ハードルを緩和することで、大きな進歩を遂げている。成功にもかかわらず、既存の効率的アルゴリズムは通常、行動を取る際の即時フィードバックのアクセシビリティに依存している。観測における遅延の影響を考慮できないことは、後悔の爆発によって現実世界のシステムの性能を著しく低下させる可能性がある。本研究では, 線形関数近似を用いたRLにおける遅延フィードバックの課題に対して, 後方サンプリングを用いることで, 幅広い状況において, 一般的な UCB アルゴリズムを実証的に上回っていることを示す。 Delayed-PSVIは楽観的な値に基づくアルゴリズムで、後続サンプリングによる雑音摂動による値関数空間を効果的に探索する。 RLの遅延フィードバックによる後方サンプリングアルゴリズムの最初の解析を行い,我々のアルゴリズムが未知の確率遅延の存在下での最悪の後悔を$\widetilde{O}(\sqrt{d^3H^3T} + d^2H^2E[\tau])で達成したことを示す。ここで$E[\tau]$が期待の遅延です。計算効率をさらに向上し,高次元RL問題に適用可能性を高めるために,遅延LPSVIのランゲヴィン力学を用いた勾配に基づく近似サンプリングスキームを導入し,計算コストを$\widetilde{O}(dHK)$で同じオーダー最適後悔保証を維持する。アルゴリズムの統計的および計算的有効性を示すために経験的評価を行う。

Recent studies in reinforcement learning (RL) have made significant progress by leveraging function approximation to alleviate the sample complexity hurdle for better performance. Despite the success, existing provably efficient algorithms typically rely on the accessibility of immediate feedback upon taking actions. The failure to account for the impact of delay in observations can significantly degrade the performance of real-world systems due to the regret blow-up. In this work, we tackle the challenge of delayed feedback in RL with linear function approximation by employing posterior sampling, which has been shown to empirically outperform the popular UCB algorithms in a wide range of regimes. We first introduce Delayed-PSVI, an optimistic value-based algorithm that effectively explores the value function space via noise perturbation with posterior sampling. We provide the first analysis for posterior sampling algorithms with delayed feedback in RL and show our algorithm achieves $\widetilde{O}(\sqrt{d^3H^3 T} + d^2H^2 E[\tau])$ worst-case regret in the presence of unknown stochastic delays. Here $E[\tau]$ is the expected delay. To further improve its computational efficiency and to expand its applicability in high-dimensional RL problems, we incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI, which maintains the same order-optimal regret guarantee with $\widetilde{O}(dHK)$ computational cost. Empirical evaluations are performed to demonstrate the statistical and computational efficacy of our algorithms.

翻訳日:2023-11-07 19:47:44 公開日:2023-11-04

# TiV-NeRF:動的ニューラルネットワークを用いた時間変化表現による追跡とマッピング

TiV-NeRF: Tracking and Mapping via Time-Varying Representation with Dynamic Neural Radiance Fields ( http://arxiv.org/abs/2310.18917v2 )

ライセンス: Link先を確認

Chengyao Duan and Zhiliu Yang

(参考訳) 従来のNeural Radiance Fields(NeRF)をSLAMフレームワークに統合するための試みは、静的シーンの仮定に依存するか、動的オブジェクトを外れ値として扱うかに依存する。しかし、現実世界のシナリオのほとんどは動的です。本稿では,動的シーンの追跡と再構成を行うための時間変化表現を提案する。システムは追跡プロセスとマッピングプロセスという2つのプロセスを同時に維持する。トラッキングプロセスでは、入力画像全体を一様にサンプリングし、RGB画像のトレーニングを自己管理する。マッピングプロセスでは,動的オブジェクトと静的背景を区別するためにノウマスクを活用し,異なるサンプリング戦略を2種類の領域に適用した。両過程のパラメータ最適化は2段階で構成され、第1段階は時間と3次元の位置を関連付けて変形場を正準場に変換する。そして、第2の時間は標準場の3D位置と結びつき、色と符号付き距離関数(SDF)を得る。また,重複率に基づく新しいキーフレーム選択戦略を提案する。提案手法は,2つの公開合成データセットに対して評価し,現状の動的マッピング法よりも有効であることを示す。

Previous attempts to integrate Neural Radiance Fields (NeRF) into Simultaneous Localization and Mapping (SLAM) framework either rely on the assumption of static scenes or treat dynamic objects as outliers. However, most of real-world scenarios is dynamic. In this paper, we propose a time-varying representation to track and reconstruct the dynamic scenes. Our system simultaneously maintains two processes, tracking process and mapping process. For tracking process, the entire input images are uniformly sampled and training of the RGB images are self-supervised. For mapping process, we leverage know masks to differentiate dynamic objects and static backgrounds, and we apply distinct sampling strategies for two types of areas. The parameters optimization for both processes are made up by two stages, the first stage associates time with 3D positions to convert the deformation field to the canonical field. And the second associates time with 3D positions in canonical field to obtain colors and Signed Distance Function (SDF). Besides, We propose a novel keyframe selection strategy based on the overlapping rate. We evaluate our approach on two publicly available synthetic datasets and validate that our method is more effective compared to current state-of-the-art dynamic mapping methods.

翻訳日:2023-11-07 19:47:13 公開日:2023-11-04

# PHD: 歴史的文書のピクセルベース言語モデリング

PHD: Pixel-Based Language Modeling of Historical Documents ( http://arxiv.org/abs/2310.18343v2 )

ライセンス: Link先を確認

Nadav Borenstein, Phillip Rust, Desmond Elliott, Isabelle Augenstein

(参考訳) 歴史文書のデジタル化は歴史家に前例のない研究機会を与えた。しかし、従来の歴史文書の分析手法では、画像からテキストへocrで変換するが、これは画像として扱うことの利点を見逃し、高いレベルのノイズをもたらすプロセスである。このギャップを埋めるために、トークン分布を予測する代わりに、マスクしたピクセルのパッチを再構築するよう訓練された画素ベース言語モデルの最近の進歩を利用する。実史スキャンが不足していることから,実史文書に類似した合成スキャンを生成する新しい手法を提案する。 1700-1900年代には,本モデルであるPHDを,合成スキャンと実際の歴史新聞の組み合わせで事前訓練した。実験により,PHDはマスク付き画像パッチの再構築に高い習熟度を示し,本モデルで注目すべき言語理解能力を示す。特に、我々のモデルを歴史的QAタスクに適用し、この領域での有用性を強調した。

The digitisation of historical documents has provided historians with unprecedented research opportunities. Yet, the conventional approach to analysing historical documents involves converting them from images to text using OCR, a process that overlooks the potential benefits of treating them as images and introduces high levels of noise. To bridge this gap, we take advantage of recent advancements in pixel-based language models trained to reconstruct masked patches of pixels instead of predicting token distributions. Due to the scarcity of real historical scans, we propose a novel method for generating synthetic scans to resemble real historical documents. We then pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period. Through our experiments, we demonstrate that PHD exhibits high proficiency in reconstructing masked image patches and provide evidence of our model's noteworthy language understanding capabilities. Notably, we successfully apply our model to a historical QA task, highlighting its usefulness in this domain.

翻訳日:2023-11-07 19:46:37 公開日:2023-11-04

# オフライン強化学習における事前学習言語モデルの活用

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning ( http://arxiv.org/abs/2310.20587v2 )

ライセンス: Link先を確認

Ruizhe Shi, Yuyao Liu, Yanjie Ze, Simon S. Du, Huazhe Xu

(参考訳) オフライン強化学習(RL)は、事前コンパイルされたデータセットを使用して、ほぼ最適ポリシーを見つけることを目的としている。現実のシナリオでは、データ収集は高価でリスクが高いため、ドメイン内のデータが制限された場合、オフラインRLは特に困難になる。近年のLLM(Large Language Models)とその数発の学習技術の進歩を踏まえ、オフラインRLに事前学習言語モデル(LM)を効果的に活用するための決定変換器に基づく一般的なフレームワークである$\textbf{La}$tion Control(\textbf{LaMo}$tion Control)(\textbf{LaMo}$)について紹介する。 Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. 実験結果から、sparse-reward タスクでは $\textbf{LaMo}$ が最先端のパフォーマンスを達成し、高密度リワードタスクでは値ベースオフライン RL メソッドと決定変換器とのギャップを埋めることを示す。特に本手法は,データサンプルが限られたシナリオにおいて優れた性能を示す。プロジェクトのwebサイトは$\href{https://lamo2023.github.io}{\text{this https url}}$です。

Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces $\textbf{La}$nguage Models for $\textbf{Mo}$tion Control ($\textbf{LaMo}$), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP transformation instead of linear projections, to generate embeddings, and (4) integrating an auxiliary language prediction loss during fine-tuning to stabilize the LMs and retain their original abilities on languages. Empirical results indicate $\textbf{LaMo}$ achieves state-of-the-art performance in sparse-reward tasks and closes the gap between value-based offline RL methods and decision transformers in dense-reward tasks. In particular, our method demonstrates superior performance in scenarios with limited data samples. Our project website is $\href{https://lamo2023.github.io}{\text{this https URL}}$.

翻訳日:2023-11-07 19:35:32 公開日:2023-11-04

# 多言語慣用文脈における連続生成

Generating Continuations in Multilingual Idiomatic Contexts ( http://arxiv.org/abs/2310.20195v2 )

ライセンス: Link先を確認

Rhitabrat Pokharel, Ameeta Agrawal

(参考訳) 慣用的あるいはリテラルな多語表現を処理する能力は、あらゆる言語を理解し、生成する上で重要な側面である。慣用的(あるいはリテラル)表現を含むナラティブの文脈的関連のある継続を生成するタスクは、非定形的テキストを含むニュアンス言語を理解する際に、生成言語モデル(lms)の能力をテストすることができる。 2つの異なる言語(英語とポルトガル語)のデータセットを使って、3つの異なるトレーニング設定(ゼロショット、少数ショット、微調整)で一連の実験を行いました。以上の結果から,本モデルでは慣用的文脈よりも連続生成がわずかに優れていることが示唆された。さらに、本研究で研究されたモデルは両言語で同等に機能し、このタスクの実行における生成モデルの堅牢性を示している。

The ability to process idiomatic or literal multiword expressions is a crucial aspect of understanding and generating any language. The task of generating contextually relevant continuations for narratives containing idiomatic (or literal) expressions can allow us to test the ability of generative language models (LMs) in understanding nuanced language containing non-compositional figurative text. We conduct a series of experiments using datasets in two distinct languages (English and Portuguese) under three different training settings (zero-shot, few-shot, and fine-tuned). Our results suggest that the models are only slightly better at generating continuations for literal contexts than idiomatic contexts, with exceedingly small margins. Furthermore, the models studied in this work perform equally well across both languages, indicating the robustness of generative models in performing this task.

翻訳日:2023-11-07 19:33:44 公開日:2023-11-04

# 有限レベル量子メモリシステムの相互接続によるデコヒーレンス時間制御

Decoherence time control by interconnection for finite-level quantum memory systems ( http://arxiv.org/abs/2311.02292v1 )

ライセンス: Link先を確認

Igor G. Vladimirov, Ian R. Petersen

(参考訳) 本稿では、有限レベル系のパウリ行列と同様、動的変数が代数的構造を持つ開量子系について述べる。システムの外部ボゾン場への結合のハミルトニアンと作用素は、系変数に線形に依存する。場は、ドリフトベクトルと分散行列がアフィンおよび系変数の線形関数である準線型ハドソン・パルタハラシー量子確率微分方程式に従って系の力学を駆動する量子ウィナー過程によって表現される。この設定は、システム変数が時間的に一定である特別な場合として、ゼロハミルトニアン孤立系ダイナミクスを含み、量子メモリとして応用することができる。より現実的なシステム-フィールド結合の場合、システムの変数が初期値から平均2乗ずれが相対的に重み行列と忠実度パラメータによって指定されたときにメモリデコヒーレンス時間を定義する。系のエネルギーパラメータに対するデコヒーレンス時間最大化を考慮し、ゼロハミルトニアンが準最適解を提供する条件を得る。この最適化問題は、そのようなシステムの直接エネルギー結合相互接続についても論じる。

This paper is concerned with open quantum systems whose dynamic variables have an algebraic structure, similar to that of the Pauli matrices for finite-level systems. The Hamiltonian and the operators of coupling of the system to the external bosonic fields depend linearly on the system variables. The fields are represented by quantum Wiener processes which drive the system dynamics according to a quasilinear Hudson-Parthasarathy quantum stochastic differential equation whose drift vector and dispersion matrix are affine and linear functions of the system variables. This setting includes the zero-Hamiltonian isolated system dynamics as a particular case, where the system variables are constant in time, which makes them potentially applicable as a quantum memory. In a more realistic case of nonvanishing system-field coupling, we define a memory decoherence time when a mean-square deviation of the system variables from their initial values becomes relatively significant as specified by a weighting matrix and a fidelity parameter. We consider the decoherence time maximization over the energy parameters of the system and obtain a condition under which the zero Hamiltonian provides a suboptimal solution. This optimization problem is also discussed for a direct energy coupling interconnection of such systems.

翻訳日:2023-11-07 18:36:45 公開日:2023-11-04

# 人工知能をより説明しやすいものにするための各種手法に関する調査研究

A Survey of the Various Methodologies Towards making Artificial Intelligence More Explainable ( http://arxiv.org/abs/2311.02291v1 )

ライセンス: Link先を確認

Sopam Dasgupta

(参考訳) マシンは意思決定プロセスでますます使われており、意思決定には説明が必要だという認識が生まれる。残念ながら、これらのデプロイされたモデルの増加は、決定の背後にある理由が不明な'ブラックボックス'な性質である。したがって、これらの決定の理由は明確である必要がある。人間として、私たちはこれらの決定を説明可能な方法で提示したいと考えています。しかし、説明だけでは不十分である。必ずしも結果を達成する方法を伝えるのではなく、与えられた結果を達成する方法を伝えるのです。この理由から,本研究は,説明可能性/解釈可能性と,それが反事実的思考にどのように広がるかに注目した。

Machines are being increasingly used in decision-making processes, resulting in the realization that decisions need explanations. Unfortunately, an increasing number of these deployed models are of a 'black-box' nature where the reasoning behind the decisions is unknown. Hence, there is a need for clarity behind the reasoning of these decisions. As humans, we would want these decisions to be presented to us in an explainable manner. However, explanations alone are insufficient. They do not necessarily tell us how to achieve an outcome but merely tell us what achieves the given outcome. For this reason, my research focuses on explainability/interpretability and how it extends to counterfactual thinking.

翻訳日:2023-11-07 18:36:24 公開日:2023-11-04

# 慣性センサによる地盤反応力予測

Predicting Ground Reaction Force from Inertial Sensors ( http://arxiv.org/abs/2311.02287v1 )

ライセンス: Link先を確認

Bowen Song, Marco Paolieri, Harper E. Stewart, Leana Golubchik, Jill L. McNitt-Gray, Vishal Misra, Devavrat Shah

(参考訳) 地上反応力(grf)の研究は、ランニングなどの運動において個人が経験する機械的負荷を特徴付けるために用いられ、ストレスに関連した怪我のリスクのあるアスリートを特定するのに臨床的に応用できる。本研究の目的は,運動選手がアウトドアラン中に装着できる慣性測定装置(IMU)を用いて収集したデータを用いて,その生体力学的変数(例えば,接触時間と負荷速度)の分析を可能にするために,十分な精度でGRFを予測することである。本稿では,LSTMニューラルネットワークを用いた最先端予測とは対照的に,軽量なアプローチを検討する。具体的には、LSTMをk-Nearest Neighbors(KNN)回帰と比較し、IMUデータ(インプット)とGRFデータ(アウトプット)の特異値分解埋め込みの線形回帰を用いた新しい解であるSVD Embedding Regression(SER)を提案する。異なる競技者,同じ競技者,あるいはその両方から収集したトレーニングデータを用いて,これらの手法の精度を評価し,異なる場所(サックラム,シャンク)におけるセンサからの加速度および角速度データの利用について検討した。我々の結果は、SERやKNNのような単純な機械学習手法は、LSTMニューラルネットワークと同様、あるいはより正確であり、トレーニング時間とハイパーパラメータの最適化がはるかに高速であることを示している。特に、個人データの使用は、ほとんどの生体力学変数に対する全てのメソッドの予測エラーを減らす。

The study of ground reaction forces (GRF) is used to characterize the mechanical loading experienced by individuals in movements such as running, which is clinically applicable to identify athletes at risk for stress-related injuries. Our aim in this paper is to determine if data collected with inertial measurement units (IMUs), that can be worn by athletes during outdoor runs, can be used to predict GRF with sufficient accuracy to allow the analysis of its derived biomechanical variables (e.g., contact time and loading rate). In this paper, we consider lightweight approaches in contrast to state-of-the-art prediction using LSTM neural networks. Specifically, we compare use of LSTMs to k-Nearest Neighbors (KNN) regression as well as propose a novel solution, SVD Embedding Regression (SER), using linear regression between singular value decomposition embeddings of IMUs data (input) and GRF data (output). We evaluate the accuracy of these techniques when using training data collected from different athletes, from the same athlete, or both, and we explore the use of acceleration and angular velocity data from sensors at different locations (sacrum and shanks). Our results illustrate that simple machine learning methods such as SER and KNN can be similarly accurate or more accurate than LSTM neural networks, with much faster training times and hyperparameter optimization; in particular, SER and KNN are more accurate when personal training data are available, and KNN comes with benefit of providing provenance of prediction. Notably, the use of personal data reduces prediction errors of all methods for most biomechanical variables.

翻訳日:2023-11-07 18:36:13 公開日:2023-11-04

# 目的: 明確な多様性維持を伴わずに、偽りの問題を解決すること。

Objectives Are All You Need: Solving Deceptive Problems Without Explicit Diversity Maintenance ( http://arxiv.org/abs/2311.02283v1 )

ライセンス: Link先を確認

Ryan Boldi, Li Ding, Lee Spector

(参考訳) 認識ドメインのナビゲートは、探索アルゴリズムが最適でない局所最適化で立ち往生しているため、機械学習においてしばしば困難である。多様性を明示的に維持するか、あるいはノベルティ探索やいわゆる品質多様性アルゴリズムのような探索を促進することによって、これらの領域をナビゲートするために多くのアルゴリズムが提案されている。本稿では,潜在的に大きな定義対象の集合を最適化することにより,明らかな多様性維持を行なわずに,擬似ドメインの解決を約束するアプローチを提案する。これらの目標は、個人の生の性能を様々な方法でサブアグリゲーションすることで、環境から直接抽出することができる。人口多様性を暗黙的に維持することが示されているため,これらの目的を最適化するためにレキシケースの選択を用いる。我々は,この手法を多種多様な目的に対して,離散最適化のセット上でのmap-elites法と,多種多様なデセプションを持つ強化学習領域とを比較した。目的を多くの目的に分解し、それらを最適化することで、探究する偽りの領域においてMAP-Elitesよりも優れることがわかった。さらに,本手法は,QDスコアとカバレッジの多様性に着目した指標に対して,これらの指標を明示的に最適化することなく,競争性能が向上することがわかった。我々のアブレーション研究は、この技術が異なるサブアグリゲーション技術に対して堅牢であることを示している。しかしながら、非知覚的あるいは‘照明’なドメインの場合、品質の多様性のテクニックは一般的に、探索(搾取ではなく)に関して客観的なフレームワークを上回っており、将来の作業への潜在的な方向性を示唆しています。

Navigating deceptive domains has often been a challenge in machine learning due to search algorithms getting stuck at sub-optimal local optima. Many algorithms have been proposed to navigate these domains by explicitly maintaining diversity or equivalently promoting exploration, such as Novelty Search or other so-called Quality Diversity algorithms. In this paper, we present an approach with promise to solve deceptive domains without explicit diversity maintenance by optimizing a potentially large set of defined objectives. These objectives can be extracted directly from the environment by sub-aggregating the raw performance of individuals in a variety of ways. We use lexicase selection to optimize for these objectives as it has been shown to implicitly maintain population diversity. We compare this technique with a varying number of objectives to a commonly used quality diversity algorithm, MAP-Elites, on a set of discrete optimization as well as reinforcement learning domains with varying degrees of deception. We find that decomposing objectives into many objectives and optimizing them outperforms MAP-Elites on the deceptive domains that we explore. Furthermore, we find that this technique results in competitive performance on the diversity-focused metrics of QD-Score and Coverage, without explicitly optimizing for these things. Our ablation study shows that this technique is robust to different subaggregation techniques. However, when it comes to non-deceptive, or ``illumination" domains, quality diversity techniques generally outperform our objective-based framework with respect to exploration (but not exploitation), hinting at potential directions for future work.

翻訳日:2023-11-07 18:35:44 公開日:2023-11-04

# spark plug fault 診断のためのコントラスト型マルチモーダル表現学習

Contrastive Multi-Modal Representation Learning for Spark Plug Fault Diagnosis ( http://arxiv.org/abs/2311.02282v1 )

ライセンス: Link先を確認

Ardavan Modarres, Vahid Mohammad-Zadeh Eivaghi, Mahdi Aliyari Shoorehdeli, Ashkan Moosavian

(参考訳) 複雑な工業機構のコンディションモニタリングや、単一センサの誤解を招くノイズを克服するための十分な情報を提供するための1つのセンサ測定が不可能なため、複数のセンサを設置して、いくつかの産業機器のコンディション監視を改善する。そのため、効率的なデータ融合戦略が要求される。本研究では,マシンヘルスモニタリングの分野では初めて,コントラスト学習パラダイムに基づいたユニークな学習戦略を持つマルチモーダルオートエンコーダを提案する。このアプローチは教師なし学習と教師なし学習の両方の利点を生かしたものであり、データの複数のモダリティ(またはビュー)を豊かに共通表現に融合する優れた性能を達成するだけでなく、推論時間中に1つのビューをわずかな性能低下で省略できる次のレベルまでデータ融合する。提案手法は,センサ故障発生時に,マルチモーダル故障診断システムがより堅牢に動作することを可能にし,センサの1つ(より高価なもの)を意図的に省略することで,実用的性能を犠牲にすることなく,よりコスト効率の高い状態監視システムを構築することができる。提案手法の有効性を, 複雑な工学的機構, インライン4ストローク点火エンジンから非作業条件下で収集した実世界のプライベートマルチモーダルデータセットを用いて検討した。加速度計と音響信号を2つのモダリティとして含むこのデータセットは、非常に少量の障害を有しており、このようなデータセット上で優れた性能を達成することで、提案手法が他の機器でもうまく機能することを約束する。

Due to the incapability of one sensory measurement to provide enough information for condition monitoring of some complex engineered industrial mechanisms and also for overcoming the misleading noise of a single sensor, multiple sensors are installed to improve the condition monitoring of some industrial equipment. Therefore, an efficient data fusion strategy is demanded. In this research, we presented a Denoising Multi-Modal Autoencoder with a unique training strategy based on contrastive learning paradigm, both being utilized for the first time in the machine health monitoring realm. The presented approach, which leverages the merits of both supervised and unsupervised learning, not only achieves excellent performance in fusing multiple modalities (or views) of data into an enriched common representation but also takes data fusion to the next level wherein one of the views can be omitted during inference time with very slight performance reduction, or even without any reduction at all. The presented methodology enables multi-modal fault diagnosis systems to perform more robustly in case of sensor failure occurrence, and one can also intentionally omit one of the sensors (the more expensive one) in order to build a more cost-effective condition monitoring system without sacrificing performance for practical purposes. The effectiveness of the presented methodology is examined on a real-world private multi-modal dataset gathered under non-laboratory conditions from a complex engineered mechanism, an inline four-stroke spark-ignition engine, aiming for spark plug fault diagnosis. This dataset, which contains the accelerometer and acoustic signals as two modalities, has a very slight amount of fault, and achieving good performance on such a dataset promises that the presented method can perform well on other equipment as well.

翻訳日:2023-11-07 18:35:14 公開日:2023-11-04

# 機械学習の産業革命

Machine learning's own Industrial Revolution ( http://arxiv.org/abs/2311.02278v1 )

ライセンス: Link先を確認

Yuan Luo, Song Han, Jingjing Liu

(参考訳) 機械学習は次の産業革命を可能にすると期待されている。しかし、標準化され自動化されたアセンブリネットワークが欠如しているMLは、成長を続ける企業需要に対処し、幅広い産業に力を与えるという大きな課題に直面している。パースペクティブでは、MLはまず独自の産業革命を完遂し、その目標を最大限に達成する方法を精査し、MLのイノベーションフロンティアから大量生産と利用への迅速な翻訳を可能にする新たな機会について論じる必要がある。

Machine learning is expected to enable the next Industrial Revolution. However, lacking standardized and automated assembly networks, ML faces significant challenges to meet ever-growing enterprise demands and empower broad industries. In the Perspective, we argue that ML needs to first complete its own Industrial Revolution, elaborate on how to best achieve its goals, and discuss new opportunities to enable rapid translation from ML's innovation frontier to mass production and utilization.

翻訳日:2023-11-07 18:34:42 公開日:2023-11-04

# 科学シミュレーションの時空間超解像のための演算子学習枠組み

An Operator Learning Framework for Spatiotemporal Super-resolution of Scientific Simulations ( http://arxiv.org/abs/2311.02328v1 )

ライセンス: Link先を確認

Valentin Duruisseaux and Amit Chakraborty

(参考訳) 多くの文脈において、偏微分方程式に対する高分解能な解は、小さな時空間スケールで起こる忠実に不可欠な力学を捉えるために必要であるが、これらの解は計算資源が限られているため、従来の方法を使用するのは非常に困難で遅い。これらの計算限界を回避するための最近の方向は、より効率的に得られる低分解能シミュレーションから高分解能数値解を再構築するために、超解法に機械学習技術を使用することである。提案手法であるスーパーレゾリューション演算子ネットワーク(SROpNet)は、演算子学習問題として超解をフレーム化し、既存のアーキテクチャからインスピレーションを得て、低分解能近似からパラメトリック微分方程式の連続表現を学習し、任意の所で評価することができる。また、低分解能近似が提供された(一定数の)時空間センサの位置には制約が課されないため、既存の多くの超解像アプローチが不適当であるような、実際に発生する幅広い問題のスペクトルを考慮できる。

In numerous contexts, high-resolution solutions to partial differential equations are required to capture faithfully essential dynamics which occur at small spatiotemporal scales, but these solutions can be very difficult and slow to obtain using traditional methods due to limited computational resources. A recent direction to circumvent these computational limitations is to use machine learning techniques for super-resolution, to reconstruct high-resolution numerical solutions from low-resolution simulations which can be obtained more efficiently. The proposed approach, the Super Resolution Operator Network (SROpNet), frames super-resolution as an operator learning problem and draws inspiration from existing architectures to learn continuous representations of solutions to parametric differential equations from low-resolution approximations, which can then be evaluated at any desired location. In addition, no restrictions are imposed on the locations of (the fixed number of) spatiotemporal sensors at which the low-resolution approximations are provided, thereby enabling the consideration of a broader spectrum of problems arising in practice, for which many existing super-resolution approaches are not well-suited.

翻訳日:2023-11-07 18:24:29 公開日:2023-11-04

# fragxsitedti: 薬物標的相互作用とトランスフォーマー駆動解釈における責任セグメントの解明

FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven Interpretation ( http://arxiv.org/abs/2311.02326v1 )

ライセンス: Link先を確認

Ali Khodabandeh Yalabadi, Mehdi Yazdani-Jahromi, Niloofar Yousefi, Aida Tayebi, Sina Abdidizaji, Ozlem Ozmen Garibay

(参考訳) 薬物-標的相互作用(DTI)予測は薬物発見に不可欠であるが、モデル解釈可能性の実現と性能の最適化には課題が続く。 DTI予測におけるこれらの課題に対処することを目的とした新しいトランスフォーマーモデルFragXsiteDTIを提案する。 fragxsitedtiは薬物分子断片とタンパク質ポケットを同時に利用する最初のdtiモデルである。タンパク質と薬物の両方に対する情報豊富な表現は、相互作用について詳細な視点を提供する。 Perceiver IOフレームワークにインスパイアされた我々のモデルは学習可能な潜伏配列を特徴とし、最初はクロスアテンションを用いてタンパク質結合部位の埋め込みと相互作用し、その後自己アテンションによって洗練され、薬物のクロスアテンショントランスポーターブロックの薬物断片に対するクエリとして使用される。この学習可能なクエリ配列は、メディエーターとして機能し、薬物とタンパク質の相互作用において重要なニュアンスを保持するシームレスな情報翻訳を可能にする。 3つのベンチマークデータセットの計算結果は、いくつかの最先端モデルよりも優れた予測能力を示している。また,本モデルでは,標的タンパク質と薬物分子の双方の臨界成分について,薬物と標的のペア内での解釈可能性を示す。

Drug-Target Interaction (DTI) prediction is vital for drug discovery, yet challenges persist in achieving model interpretability and optimizing performance. We propose a novel transformer-based model, FragXsiteDTI, that aims to address these challenges in DTI prediction. Notably, FragXsiteDTI is the first DTI model to simultaneously leverage drug molecule fragments and protein pockets. Our information-rich representations for both proteins and drugs offer a detailed perspective on their interaction. Inspired by the Perceiver IO framework, our model features a learnable latent array, initially interacting with protein binding site embeddings using cross-attention and later refined through self-attention and used as a query to the drug fragments in the drug's cross-attention transformer block. This learnable query array serves as a mediator and enables seamless information translation, preserving critical nuances in drug-protein interactions. Our computational results on three benchmarking datasets demonstrate the superior predictive power of our model over several state-of-the-art models. We also show the interpretability of our model in terms of the critical components of both target proteins and drug molecules within drug-target pairs.

翻訳日:2023-11-07 18:24:05 公開日:2023-11-04

# 評価集合生成のための文脈依存翻訳の同定

Identifying Context-Dependent Translations for Evaluation Set Production ( http://arxiv.org/abs/2311.02321v1 )

ライセンス: Link先を確認

Rachel Wicks, Matt Post

(参考訳) 文脈対応機械翻訳への移行の大きな障害は、優れた評価指標とテストセットがないことである。文脈を正しく翻訳する必要がある文はテストセットでは稀であり、cometやbleuのような標準コーパスレベルのメトリクスの有用性が低下する。一方、このような文に注釈を付けるデータセットも稀で、規模が小さく、いくつかの言語でしか利用できない。これに対処するために、従来のアノテーションパイプラインの近代化、一般化、拡張を行い、代名詞、動詞句の楕円、曖昧な名詞の変形の5つの現象を正しく翻訳するコンテキストを必要とする文を含む並列文書のサブセットを識別するツールであるctxproを作成した。パイプラインへの入力は、手作り、言語ごと、言語的にインフォームドされたルールのセットであり、コア参照、パート・オブ・音声、そして最先端ツールによって提供される形態的特徴を用いて文脈的な文対を選択する。このパイプラインを、7つの言語ペア(EN into and out-of DE, ES, FR, IT, PL, PT, RU)と2つのデータセット(OpenSubtitlesとWMTテストセット)に適用し、その性能を従来の作業と重なり合い、文脈的MTシステムを文ベースシステムと区別する能力の両方を用いて検証する。我々はCTXPROパイプラインとデータをオープンソースとしてリリースする。

A major impediment to the transition to context-aware machine translation is the absence of good evaluation metrics and test sets. Sentences that require context to be translated correctly are rare in test sets, reducing the utility of standard corpus-level metrics such as COMET or BLEU. On the other hand, datasets that annotate such sentences are also rare, small in scale, and available for only a few languages. To address this, we modernize, generalize, and extend previous annotation pipelines to produce CTXPRO, a tool that identifies subsets of parallel documents containing sentences that require context to correctly translate five phenomena: gender, formality, and animacy for pronouns, verb phrase ellipsis, and ambiguous noun inflections. The input to the pipeline is a set of hand-crafted, per-language, linguistically-informed rules that select contextual sentence pairs using coreference, part-of-speech, and morphological features provided by state-of-the-art tools. We apply this pipeline to seven languages pairs (EN into and out-of DE, ES, FR, IT, PL, PT, and RU) and two datasets (OpenSubtitles and WMT test sets), and validate its performance using both overlap with previous work and its ability to discriminate a contextual MT system from a sentence-based one. We release the CTXPRO pipeline and data as open source.

翻訳日:2023-11-07 18:23:44 公開日:2023-11-04

# 空間表現の自己教師あり学習によるマルチモジュラーグリッドセルの生成

Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells ( http://arxiv.org/abs/2311.02316v1 )

ライセンス: Link先を確認

Rylan Schaeffer, Mikail Khona, Tzuhsuan Ma, Crist\'obal Eyzaguirre, Sanmi Koyejo, Ila Rani Fiete

(参考訳) マッピング,局所化,ナビゲーションの空間的問題を解決するために,哺乳類の系統は顕著な空間的表現を発達させた。 1つの重要な空間的表現はノーベル賞受賞の格子細胞である: 自己位置を表すニューロン、局所的および周期的な量、そしていくつかの離散的な周期的な非局所的および空間的活動パターンのように見える。哺乳類の系統はなぜこの特異なグリッド表現を学んでいるのか? 数学的解析により、この多周期表現は高いキャパシティと本質的な誤り補正を持つ代数的符号として優れた性質を持つことが示唆されるが、今のところ、深いリカレントニューラルネットワークにおいて多モジュラーグリッド細胞に繋がるコア原理の十分な合成は行われていない。本研究は,符号化理論,動的システム,関数最適化,教師付きディープラーニングという,グリッドセル問題に答える4つのアプローチのファミリーから,重要な洞察を抽出することから始める。次に、洞察を活用して、4つのアプローチの長所を組み合わせた新しいアプローチを提案します。我々のアプローチは、データ、データ拡張、損失関数、ネットワークアーキテクチャを含む自己教師あり学習(ssl)フレームワークであり、従来のアプローチで必要とされる特定の読み出し表現の教師あり位置情報やエンジニアリングにアクセスせずに、規範的な観点から動機づけられる。 SSLフレームワーク上でトレーニングされたネットワークに複数のグリッドセルモジュールが出現し,ネットワークと初期表現がトレーニングディストリビューションの外部でうまく一般化できることが示される。この研究には、グリッド細胞の起源に関心を持つ神経科学者や、新しいSSLフレームワークに関心を持つ機械学習研究者のための洞察が含まれている。

To solve the spatial problems of mapping, localization and navigation, the mammalian lineage has developed striking spatial representations. One important spatial representation is the Nobel-prize winning grid cells: neurons that represent self-location, a local and aperiodic quantity, with seemingly bizarre non-local and spatially periodic activity patterns of a few discrete periods. Why has the mammalian lineage learnt this peculiar grid representation? Mathematical analysis suggests that this multi-periodic representation has excellent properties as an algebraic code with high capacity and intrinsic error-correction, but to date, there is no satisfactory synthesis of core principles that lead to multi-modular grid cells in deep recurrent neural networks. In this work, we begin by identifying key insights from four families of approaches to answering the grid cell question: coding theory, dynamical systems, function optimization and supervised deep learning. We then leverage our insights to propose a new approach that combines the strengths of all four approaches. Our approach is a self-supervised learning (SSL) framework - including data, data augmentations, loss functions and a network architecture - motivated from a normative perspective, without access to supervised position information or engineering of particular readout representations as needed in previous approaches. We show that multiple grid cell modules can emerge in networks trained on our SSL framework and that the networks and emergent representations generalize well outside their training distribution. This work contains insights for neuroscientists interested in the origins of grid cells as well as machine learning researchers interested in novel SSL frameworks.

翻訳日:2023-11-07 18:23:18 公開日:2023-11-04

# ディープニューラルネットワークと異方性ガウス核を用いたマナテ集計

Counting Manatee Aggregations using Deep Neural Networks and Anisotropic Gaussian Kernel ( http://arxiv.org/abs/2311.02315v1 )

ライセンス: Link先を確認

Zhiqiang Wang, Yiran Pang, Cihan Ulus, Xingquan Zhu

(参考訳) マナテ(manatee)は、食欲の強い水生哺乳動物である。主な食料源は海草であり、1日8時間の放牧に費やされることが多い。ゆっくりと移動し、しばしば浅瀬で群れ(すなわち集合体)に留まり、食物を探し、環境の変化や他のリスクに弱いようにする。地域内での正確な計数マナティーアグリゲーションは、その習慣を観察する上で生物学的に有意義であるだけでなく、人間のボート、ダイバー等の安全規則を策定し、看護、介入、その他の計画を立てる上でも重要である。本稿では,低画質画像を入力として利用して,地域内のマナティ数を自動的にカウントする,深層学習に基づく群集カウント手法を提案する。マナテは独特の形状を持ち、浅瀬や水面反射、咬合、カモフラージュなど、しばしば浅瀬に留まり、正確なマナテ数を数えることは困難である。この課題に対処するため, 等方的ガウスカーネル (AGK) と可変回転および分散を用いて, 密度関数が異なるアグリゲーションにおける個々のマナートの形状を最大に捉えられるようにすることを提案する。その後,vgg,sert,congested scene recognition network(csrnet),marunetなど,群衆カウントを主目的とした異なるタイプのディープニューラルネットワークにagkカーネルを適用し,マナティー密度を学習し,シーン内のマナティー数を計算する。監視映像から抽出した汎用低品質画像を用いて,agkカーネルを用いたマナテ計数により最小平均絶対誤差 (mae) と根平均二乗誤差 (rmse) が得られることを示す。提案手法は,複雑な環境下でのマナテ集約の計測に特に有効である。

Manatees are aquatic mammals with voracious appetites. They rely on sea grass as the main food source, and often spend up to eight hours a day grazing. They move slow and frequently stay in group (i.e. aggregations) in shallow water to search for food, making them vulnerable to environment change and other risks. Accurate counting manatee aggregations within a region is not only biologically meaningful in observing their habit, but also crucial for designing safety rules for human boaters, divers, etc., as well as scheduling nursing, intervention, and other plans. In this paper, we propose a deep learning based crowd counting approach to automatically count number of manatees within a region, by using low quality images as input. Because manatees have unique shape and they often stay in shallow water in groups, water surface reflection, occlusion, camouflage etc. making it difficult to accurately count manatee numbers. To address the challenges, we propose to use Anisotropic Gaussian Kernel (AGK), with tunable rotation and variances, to ensure that density functions can maximally capture shapes of individual manatees in different aggregations. After that, we apply AGK kernel to different types of deep neural networks primarily designed for crowd counting, including VGG, SANet, Congested Scene Recognition network (CSRNet), MARUNet etc. to learn manatee densities and calculate number of manatees in the scene. By using generic low quality images extracted from surveillance videos, our experiment results and comparison show that AGK kernel based manatee counting achieves minimum Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The proposed method works particularly well for counting manatee aggregations in environments with complex background.

翻訳日:2023-11-07 18:22:48 公開日:2023-11-04

# 深層学習技術を用いた熱顔画像分類

Thermal Face Image Classification using Deep Learning Techniques ( http://arxiv.org/abs/2311.02314v1 )

ライセンス: Link先を確認

Prosenjit Chatterjee and ANK Zaman

(参考訳) 熱画像は、セキュリティ、医療、産業分野に様々な応用がある。本稿では,熱画像分類のための実用的な深層学習手法を提案する。熱画像の高精度かつ効率的な分類は、複雑な画像の内容と注釈付きデータセットの不足により、様々な分野において大きな課題となる。この研究は畳み込みニューラルネットワーク(CNN)アーキテクチャ、特にResNet-50とVGGNet-19を使用して、熱画像から特徴を抽出する。また,熱入力画像に対してkalmanフィルタを適用した。実験結果は,提案手法の有効性を,精度と効率の観点から示している。

Thermal images have various applications in security, medical and industrial domains. This paper proposes a practical deep-learning approach for thermal image classification. Accurate and efficient classification of thermal images poses a significant challenge across various fields due to the complex image content and the scarcity of annotated datasets. This work uses a convolutional neural network (CNN) architecture, specifically ResNet-50 and VGGNet-19, to extract features from thermal images. This work also applied Kalman filter on thermal input images for image denoising. The experimental results demonstrate the effectiveness of the proposed approach in terms of accuracy and efficiency.

翻訳日:2023-11-07 18:22:15 公開日:2023-11-04

# LISNeRFマッピング:大規模3次元シーンのための意味的ニューラルネットワークによるLiDARに基づくインシシットマッピング

LISNeRF Mapping: LiDAR-based Implicit Mapping via Semantic Neural Fields for Large-Scale 3D Scenes ( http://arxiv.org/abs/2311.02313v1 )

ライセンス: Link先を確認

Jianyuan Zhang and Zhiliu Yang

(参考訳) 大規模セマンティックマッピングは、屋外の自律エージェントが計画やナビゲーションといった高度なタスクを遂行するために不可欠である。本稿では,LiDAR測定のみでの暗黙的表現による大規模3次元意味再構築手法を提案する。まず,暗黙的特徴を格納するために,オクツリーをベースとした階層構造を利用し,その暗黙的特徴を,浅層パーセプトロン(MLP)を介して意味情報と符号付き距離値にデコードする。市販のアルゴリズムを用いて,ポイントクラウドの意味ラベルとインスタンスidを予測する。次に,暗黙的特徴とMDPパラメータを,点雲幾何学の自己超越パラダイムと意味的および汎光学的ラベルの擬似超越パラダイムとで最適化する。その後、マーチングキューブアルゴリズムを用いて推論段階のシーンを分割して視覚化する。メモリ制約のあるシナリオでは、サブマップを完全なマップにマージするmap stitchingストラテジーも開発されている。我々が知る限り、この手法はLiDARのみの入力から意味的な暗黙のシーンを再構築する最初の試みである。実世界の3つのデータセット、SemanticKITTI, SemanticPOSS, nuScenesの実験は、現在の最先端3Dマッピング手法と比較して、我々のフレームワークの有効性と効率を実証している。

Large-scale semantic mapping is crucial for outdoor autonomous agents to fulfill high-level tasks such as planning and navigation. This paper proposes a novel method for large-scale 3D semantic reconstruction through implicit representations from LiDAR measurements alone. We firstly leverages an octree-based and hierarchical structure to store implicit features, then these implicit features are decoded to semantic information and signed distance value through shallow Multilayer Perceptrons (MLPs). We adopt off-the-shelf algorithms to predict the semantic labels and instance IDs of point cloud. Then we jointly optimize the implicit features and MLPs parameters with self-supervision paradigm for point cloud geometry and pseudo-supervision pradigm for semantic and panoptic labels. Subsequently, Marching Cubes algorithm is exploited to subdivide and visualize the scenes in the inferring stage. For scenarios with memory constraints, a map stitching strategy is also developed to merge sub-maps into a complete map. As far as we know, our method is the first work to reconstruct semantic implicit scenes from LiDAR-only input. Experiments on three real-world datasets, SemanticKITTI, SemanticPOSS and nuScenes, demonstrate the effectiveness and efficiency of our framework compared to current state-of-the-art 3D mapping methods.

翻訳日:2023-11-07 18:22:05 公開日:2023-11-04

# マッチングスタイルによる零点と零点の機械翻訳のギャップを狭める

Narrowing the Gap between Zero- and Few-shot Machine Translation by Matching Styles ( http://arxiv.org/abs/2311.02310v1 )

ライセンス: Link先を確認

Weiting Tan, Haoran Xu, Lingfeng Shen, Shuyue Stella Li, Kenton Murray, Philipp Koehn, Benjamin Van Durme, Yunmo Chen

(参考訳) 主にモノリンガルな設定で訓練された大規模な言語モデルは、ゼロショットと少数ショットの例を使って機械翻訳に一般化できることを示した。しかし、ゼロショット翻訳は比較的良いが、その性能と少数ショット設定との差ははっきりしない。本稿では,このギャップに寄与する要因について検討し,対象コーパスの書字スタイルを一致させることで,このギャップをほとんど(約70%)クローズできることを示す。さらに、並列デモの例を必要とせず、ゼロショットベースラインを強化するための潜在的アプローチを検討し、これらの手法が翻訳メトリクスの改善にどのように貢献するかについて貴重な洞察を提供する。

Large language models trained primarily in a monolingual setting have demonstrated their ability to generalize to machine translation using zero- and few-shot examples with in-context learning. However, even though zero-shot translations are relatively good, there remains a discernible gap comparing their performance with the few-shot setting. In this paper, we investigate the factors contributing to this gap and find that this gap can largely be closed (for about 70%) by matching the writing styles of the target corpus. Additionally, we explore potential approaches to enhance zero-shot baselines without the need for parallel demonstration examples, providing valuable insights into how these methods contribute to improving translation metrics.

翻訳日:2023-11-07 18:21:38 公開日:2023-11-04

# Heteroskedastic Tensor Clustering

Heteroskedastic Tensor Clustering ( http://arxiv.org/abs/2311.02306v1 )

ライセンス: Link先を確認

Yuchen Zhou, Yuxin Chen

(参考訳) ノイズの多いテンソル観測から基盤となるクラスタ構造を抽出しようとするテンソルクラスタリングが注目されている。テンソルクラスタリングの広く研究されているモデルの一つはテンソルブロックモデルであり、各モードに沿ってクラスタリング構造が存在することを仮定し、多関節遺伝子発現解析や多層ネットワーク解析といった領域で広く応用されている。しかし、現在利用可能なテンソルクラスタリングの計算可能な方法は、サブガウスノイズの処理に限られるか、あるいは準最適統計性能に悩まされているかのいずれかであり、ヘテロスケダスティックデータや低信号対雑音比(SNR)を扱う必要があるアプリケーションにおいて、それらの実用性を抑える。これらの課題を克服するために,2段階の手法である$\mathsf{high\text{-}order~heteroclustering}$ (\mathsf{hhc}$) を提案する。本稿では,SNRが計算限界を超える限り,精度の高いクラスタリングを確実に達成し(対数的要因を無視する),SNRはノード間の対等差と雑音レベルとの比を,計算限界は多項式ランタイムとの正確なクラスタリングを可能にする最下位のSNRを示す。総合的なシミュレーションと実データ実験により,提案アルゴリズムが既存のアルゴリズムを様々な設定で上回り,より信頼性の高いクラスタリング性能を提供することが示唆された。

Tensor clustering, which seeks to extract underlying cluster structures from noisy tensor observations, has gained increasing attention. One extensively studied model for tensor clustering is the tensor block model, which postulates the existence of clustering structures along each mode and has found broad applications in areas like multi-tissue gene expression analysis and multilayer network analysis. However, currently available computationally feasible methods for tensor clustering either are limited to handling i.i.d. sub-Gaussian noise or suffer from suboptimal statistical performance, which restrains their utility in applications that have to deal with heteroskedastic data and/or low signal-to-noise-ratio (SNR). To overcome these challenges, we propose a two-stage method, named $\mathsf{High\text{-}order~HeteroClustering}$ ($\mathsf{HHC}$), which starts by performing tensor subspace estimation via a novel spectral algorithm called $\mathsf{Thresholded~Deflated\text{-}HeteroPCA}$, followed by approximate $k$-means to obtain cluster nodes. Encouragingly, our algorithm provably achieves exact clustering as long as the SNR exceeds the computational limit (ignoring logarithmic factors); here, the SNR refers to the ratio of the pairwise disparity between nodes to the noise level, and the computational limit indicates the lowest SNR that enables exact clustering with polynomial runtime. Comprehensive simulation and real-data experiments suggest that our algorithm outperforms existing algorithms across various settings, delivering more reliable clustering performance.

翻訳日:2023-11-07 18:21:25 公開日:2023-11-04

# OSM vs HDマップ:軌道予測のためのマップ表現

OSM vs HD Maps: Map Representations for Trajectory Prediction ( http://arxiv.org/abs/2311.02305v1 )

ライセンス: Link先を確認

Jing-Yan Liao, Parth Doshi, Zihan Zhang, David Paz, Henrik Christensen

(参考訳) High Definition (HD) Mapsは、静的道路要素の正確な描写に長年好まれてきたが、そのアクセシビリティの制約と環境変化への感受性は、特に運動予測タスクにおいて、自動運転の広範な展開を妨げる。本稿では,長期動作予測のためのHDマップの代替として,OpenStreetMap (OSM)を活用することを提案する。この研究の貢献は3つある: まず、OSMの応用を長期予測に拡張し、以前の研究と比べて予測の地平を2倍にする。第2に,レセプティブフィールドの拡大と交差点優先の統合を通じて,osmベースのアプローチは,hdマップベースのモデルとのギャップを狭める競争性能を示す。最後に,多種多様なシナリオにおける動き予測の深い洞察を提供するとともに,クラス認識の比較を行う。この研究は、粗い地図表現による長期動作予測を推し進めるだけでなく、自律運転の領域において潜在的にスケーラブルなソリューションを提供する。

While High Definition (HD) Maps have long been favored for their precise depictions of static road elements, their accessibility constraints and susceptibility to rapid environmental changes impede the widespread deployment of autonomous driving, especially in the motion forecasting task. In this context, we propose to leverage OpenStreetMap (OSM) as a promising alternative to HD Maps for long-term motion forecasting. The contributions of this work are threefold: firstly, we extend the application of OSM to long-horizon forecasting, doubling the forecasting horizon compared to previous studies. Secondly, through an expanded receptive field and the integration of intersection priors, our OSM-based approach exhibits competitive performance, narrowing the gap with HD Map-based models. Lastly, we conduct an exhaustive context-aware analysis, providing deeper insights in motion forecasting across diverse scenarios as well as conducting class-aware comparisons. This research not only advances long-term motion forecasting with coarse map representations but additionally offers a potential scalable solution within the domain of autonomous driving.

翻訳日:2023-11-07 18:20:49 公開日:2023-11-04

# MFTCoder: マルチタスクファインチューニングによるコードLLMの強化

MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning ( http://arxiv.org/abs/2311.02303v1 )

ライセンス: Link先を確認

Bingchang Liu, Chaoyu Chen, Cong Liao, Zi Gong, Huan Wang, Zhichao Lei, Ming Liang, Dajun Chen, Min Shen, Hailian Zhou, Hang Yu, Jianguo Li

(参考訳) コードllmは特別な研究分野として登場し、事前訓練されたモデルの微調整によるモデルのコーディング能力の向上に特化している。従来の微調整アプローチは、通常、特定の下流タスクやシナリオに合わせたもので、各タスクごとに微調整を分離し、広範なトレーニングリソースを必要とし、デプロイメントとメンテナンスの観点から課題を提起することを意味していた。さらに、これらのアプローチは、異なるコード関連タスク間の固有の相互接続性を活用できなかった。これらの制約を克服するために,複数タスクの同時かつ並列な微調整を可能にするマルチタスクファインチューニングフレームワーク MFTcoder を提案する。各種損失関数を組み込むことにより,データ不均衡,難易度の変化,収束速度の不整合といったマルチタスク学習における共通課題を効果的に解決する。大規模な実験により、我々のマルチタスクファインチューニングアプローチは、単一タスクにおける個々のファインチューニングと混合タスクにおけるファインチューニングの両方に優れることが示された。さらに、MPTコーダは、効率的なデータトークン化モードやPEFTファインチューニングを含む効率的なトレーニング機能を提供しており、従来のファインチューニング手法に比べて、大幅に速度が向上している。 MFTcoder は CodeLLama や Qwen など,主要なオープンソース LLM とシームレスに統合されている。 MFTcoderの微調整モデルであるCodeLLama Foundationを活用して、HumaneEvalベンチマークで74.4\%の素晴らしいパス@1スコアを達成し、GPT-4パフォーマンス(67.%、ゼロショット)を上回りました。 MFTCoder は \url{https://github.com/codefuse-ai/MFTCOder} でオープンソース化されている

Code LLMs have emerged as a specialized research field, with remarkable studies dedicated to enhancing model's coding capabilities through fine-tuning on pre-trained models. Previous fine-tuning approaches were typically tailored to specific downstream tasks or scenarios, which meant separate fine-tuning for each task, requiring extensive training resources and posing challenges in terms of deployment and maintenance. Furthermore, these approaches failed to leverage the inherent interconnectedness among different code-related tasks. To overcome these limitations, we present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. By incorporating various loss functions, we effectively address common challenges in multi-task learning, such as data imbalance, varying difficulty levels, and inconsistent convergence speeds. Extensive experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks. Moreover, MFTcoder offers efficient training capabilities, including efficient data tokenization modes and PEFT fine-tuning, resulting in significantly improved speed compared to traditional fine-tuning methods. MFTcoder seamlessly integrates with several mainstream open-source LLMs, such as CodeLLama and Qwen. Leveraging the CodeLLama foundation, our MFTcoder fine-tuned model, \textsc{CodeFuse-CodeLLama-34B}, achieves an impressive pass@1 score of 74.4\% on the HumaneEval benchmark, surpassing GPT-4 performance (67\%, zero-shot). MFTCoder is open-sourced at \url{https://github.com/codefuse-ai/MFTCOder}

翻訳日:2023-11-07 18:20:27 公開日:2023-11-04

# 積極的漸進学習QAOA

Proactively incremental-learning QAOA ( http://arxiv.org/abs/2311.02302v1 )

ライセンス: Link先を確認

Lingxiao Li, Jing Li, Yanqi Song, Sujuan Qin, Qiaoyan Wen, and Fei Gao

(参考訳) 量子近似最適化アルゴリズム (Quantum Approximate Optimization Algorithm, QAOA) の既存の研究のターゲットとなっている。そこで本研究では,段階的学習に基づく高度なQAOAを提案する。例えば、MaxCut問題を例として、グラフ全体から小さな部分グラフをランダムに選択し、量子回路をトレーニングし、第1フェーズにおいて、サブグラフのMaxCutに対して最適化されたパラメータを得る。その後のインクリメンタルフェーズ毎に、残りのノードとエッジの一部が現在のサブグラフに追加され、回路が再トレーニングされ、新しい最適化パラメータが取得される。上記の操作は、グラフ全体のMaxCut問題が解決されるまで繰り返される。キーポイントは、前フェーズの最適化されたパラメータが現在のフェーズの初期パラメータで再利用されることである。多数のシミュレーション実験により,本手法は近似比(AR)とトレーニング時間において,QAOAの一般的な作業よりも優れた性能を示した。具体的には、ARは標準のQAOAよりも13.17%高い。

Solving optimization problems with high performance is the target of existing works of Quantum Approximate Optimization Algorithm (QAOA). With this intention, we propose an advanced QAOA based on incremental learning, where the training trajectory is proactively segmented into incremental phases. Taking the MaxCut problem as our example, we randomly select a small subgraph from the whole graph and train the quantum circuit to get optimized parameters for the MaxCut of the subgraph in the first phase. Then in each subsequent incremental phase, a portion of the remaining nodes and edges are added to the current subgraph, and the circuit is retrained to get new optimized parameters. The above operation is repeated until the MaxCut problem on the whole graph is solved. The key point is that the optimized parameters of the previous phase will be reused in the initial parameters of the current phase. Numerous simulation experiments show our method has superior performance on Approximation Ratio (AR) and training time compared to prevalent works of QAOA. Specifically, the AR is higher than standard QAOA by 13.17% on weighted random graphs.

翻訳日:2023-11-07 18:19:54 公開日:2023-11-04

# 部分絡み合いエントロピーの測地:PEEスレッドからビットスレッドへ

Geometrizing the Partial Entanglement Entropy: from PEE Threads to Bit Threads ( http://arxiv.org/abs/2311.02301v1 )

ライセンス: Link先を確認

Jiong Lin, Yizhou Lu, Qiang Wen

(参考訳) ホログラフィックCFTにおける部分絡み合いエントロピー(PEE)をAdS/CFTの文脈で測る手法を提案する。より具体的には、ある点 $\textbf{x}$ が与えられたとき、これらの2点を接続するバルク測地学の観点で、$\textbf{x}$ と他の任意の点の間の2点 PEE を測地する。我々はこれらの測地線を \textit{pee threads} と呼び、これは自然に分岐のないベクトル場 $v_{\textbf{x}}^{\mu}$ の積分曲線と見なすことができ、これは我々が \emph{pee thread flow} と呼ぶ。 PEEスレッドの密度を特徴付ける$V_{\textbf{x}}^{\mu}$のノルムは、PEEの物理的要求によって決定できる。任意の静的区間または球面領域$A$に対して、状態によって決定されるPEEスレッド構成からユニークなビットスレッド構成を生成することができることを示す。したがって、中性でないビットスレッドは、内在的なpeスレッドから発生する。静的非連結区間の場合、分散のない流れを記述するベクトル場はRT式を再現するのにより適している。我々は、PEEスレッドを任意のホモロジー曲面と交差する回数で重み付けする。代わりに、RT式は、全ての重みの割り当てが可能なPEEスレッドの和の最小化として完全に再構成される。

We give a scheme to geometrize the partial entanglement entropy (PEE) for holographic CFT in the context of AdS/CFT. More explicitly, given a point $\textbf{x}$ we geometrize the two-point PEEs between $\textbf{x}$ and any other points in terms of the bulk geodesics connecting these two points. We refer to these geodesics as the \textit{PEE threads}, which can be naturally regarded as the integral curves of a divergenceless vector field $V_{\textbf{x}}^{\mu}$, which we call \emph{PEE thread flow}. The norm of $V_{\textbf{x}}^{\mu}$ that characterizes the density of the PEE threads can be determined by some physical requirements of the PEE. We show that, for any static interval or spherical region $A$, a unique bit thread configuration can be generated from the PEE thread configuration determined by the state. Hence, the non-intrinsic bit threads are emergent from the intrinsic PEE threads. For static disconnected intervals, the vector fields describing a divergenceless flow is are longer suitable to reproduce the RT formula. We weight the PEE threads with the number of times it intersects with any homologous surface. Instead the RT formula is perfectly reformulated to be the minimization of the summation of the PEE threads with all possible assignment of weights.

翻訳日:2023-11-07 18:19:35 公開日:2023-11-04

# Few-Shot Fault Time Series Prognosis に対する逐次モデル非依存メタラーニング

Successive Model-Agnostic Meta-Learning for Few-Shot Fault Time Series Prognosis ( http://arxiv.org/abs/2311.02300v1 )

ライセンス: Link先を確認

Hai Su, Jiajun Hu, Songsen Yu

(参考訳) メタラーニングは,近年多くの研究者が注目している,数発の断層予測問題の解決に有望な手法である。既存の時系列予測のメタラーニング手法は, 乱数および類似性に基づくタスク分割に大きく依存するが, 機能評価の非効率性, (2) 最適タスクデータアロケーション, (3) 小サンプルによるロバストさの3つの大きな制約に直面している。このような制約を克服するために,連続した時系列を複数連続する短周期からなるメタタスクとして扱う,新しい「擬似メタタスク」分割方式を導入する。連続時系列を擬似メタタスクとして使用することにより,データからより包括的な特徴や関係を抽出し,より正確な予測を行うことができる。さらに,異なるデータセットにまたがる手法の堅牢性を高めるために,差分アルゴリズムを導入する。複数の故障・時系列予測データセットを広範囲に実験した結果,本手法は予測性能と一般化能力を大きく向上させることを実証した。

Meta learning is a promising technique for solving few-shot fault prediction problems, which have attracted the attention of many researchers in recent years. Existing meta-learning methods for time series prediction, which predominantly rely on random and similarity matching-based task partitioning, face three major limitations: (1) feature exploitation inefficiency; (2) suboptimal task data allocation; and (3) limited robustness with small samples. To overcome these limitations, we introduce a novel 'pseudo meta-task' partitioning scheme that treats a continuous time period of a time series as a meta-task, composed of multiple successive short time periods. Employing continuous time series as pseudo meta-tasks allows our method to extract more comprehensive features and relationships from the data, resulting in more accurate predictions. Moreover, we introduce a differential algorithm to enhance the robustness of our method across different datasets. Through extensive experiments on several fault and time series prediction datasets, we demonstrate that our approach substantially enhances prediction performance and generalization capability under both few-shot and general conditions.

翻訳日:2023-11-07 18:19:12 公開日:2023-11-04

# LLMは概念の道徳を理解する

LLMs grasp morality in concept ( http://arxiv.org/abs/2311.02294v1 )

ライセンス: Link先を確認

Mark Pock, Andre Ye, Jared Moore

(参考訳) AI倫理と公正に関する作業は、公正さ、真実、多様性といった特定の価値を反映するLLMの規制に大きな進歩をもたらした。しかし、LLMがどんなものでも「意味」するかどうかという問題は当然ある。これに対処しない限り、そのような値で LLM を印字する意味は明確ではない。これに対し、私たちは人間を超えて広がる意味の一般的な理論を提供します。我々はこの理論を用いて、LLMの正確な性質を意味エージェントとして説明する。我々は, LLMが意味エージェントとしての立場から, 人間の社会の構成(道徳, 性別, 人種など)を概念的に把握していることを提案する。その結果、ある倫理的枠組みの下では、モデルアライメントの一般的な手法は、ベストに制限され、最悪に反生産的である。さらに、整合性のないモデルは、道徳的および社会的哲学をより良く発展させるのに役立つかもしれない。

Work in AI ethics and fairness has made much progress in regulating LLMs to reflect certain values, such as fairness, truth, and diversity. However, it has taken the problem of how LLMs might 'mean' anything at all for granted. Without addressing this, it is not clear what imbuing LLMs with such values even means. In response, we provide a general theory of meaning that extends beyond humans. We use this theory to explicate the precise nature of LLMs as meaning-agents. We suggest that the LLM, by virtue of its position as a meaning-agent, already grasps the constructions of human society (e.g. morality, gender, and race) in concept. Consequently, under certain ethical frameworks, currently popular methods for model alignment are limited at best and counterproductive at worst. Moreover, unaligned models may help us better develop our moral and social philosophy.

翻訳日:2023-11-07 18:18:50 公開日:2023-11-04

# コミュニティ検出のための対比的非負行列因子化

Contrastive Deep Nonnegative Matrix Factorization for Community Detection ( http://arxiv.org/abs/2311.02357v1 )

ライセンス: Link先を確認

Yuecheng Li, Jialong Chen, Chuan Chen, Lei Yang, Zibin Zheng

(参考訳) 近年,非負行列因子化(NMF)がコミュニティ検出に広く採用されている。しかし、既存のNMFベースの手法には以下の3つの問題がある。 1) 本来のネットワークを直接コミュニティメンバーシップ空間に変換するため,階層的な情報を把握することが困難である。 2) ネットワークのトポロジにのみ注意を払い、ノード属性を無視することが少なくない。 3)地域社会発見に必要なグローバルな構造情報を学習することは困難である。そこで我々はContrastive Deep Non negative Matrix Factorization (CDNMF) という新しいコミュニティ検出アルゴリズムを提案する。まず、情報抽出能力を強化するため、NMFをより深めます。その後,コントラスト学習に触発され,ネットワークトポロジーとノード属性を2つのコントラストビューとして創造的に構成する。さらに,debiased negative sampling layerを用いて,コミュニティレベルでのノード類似性を学習し,コミュニティ検出のためのモデルの適合性を高める。 3つの公開実数グラフデータセットについて実験を行い,提案手法は最先端手法よりも優れた結果を得た。コードはhttps://github.com/6lyc/cdnmf.git。

Recently, nonnegative matrix factorization (NMF) has been widely adopted for community detection, because of its better interpretability. However, the existing NMF-based methods have the following three problems: 1) they directly transform the original network into community membership space, so it is difficult for them to capture the hierarchical information; 2) they often only pay attention to the topology of the network and ignore its node attributes; 3) it is hard for them to learn the global structure information necessary for community detection. Therefore, we propose a new community detection algorithm, named Contrastive Deep Nonnegative Matrix Factorization (CDNMF). Firstly, we deepen NMF to strengthen its capacity for information extraction. Subsequently, inspired by contrastive learning, our algorithm creatively constructs network topology and node attributes as two contrasting views. Furthermore, we utilize a debiased negative sampling layer and learn node similarity at the community level, thereby enhancing the suitability of our model for community detection. We conduct experiments on three public real graph datasets and the proposed model has achieved better results than state-of-the-art methods. Code available at https://github.com/6lyc/CDNMF.git.

翻訳日:2023-11-07 18:11:16 公開日:2023-11-04

# mata*:近似グラフ編集距離計算のための学習ノードマッチングとa*アルゴリズムを組み合わせる

MATA*: Combining Learnable Node Matching with A* Algorithm for Approximate Graph Edit Distance Computation ( http://arxiv.org/abs/2311.02356v1 )

ライセンス: Link先を確認

Junfeng Liu, Min Zhou, Shuai Ma, Lujia Pan

(参考訳) グラフ編集距離 (Graph Edit Distance, GED) は、グラフ検索や検索タスクで広く使われているグラフ類似度を測定する一般的な、および、ドメインに依存しない尺度である。しかし、正確なGED計算はNP完全であることが知られている。例えば、広く使われているA*アルゴリズムは、必然的にスケーラビリティに悩む最適なソリューションを見つけるために、検索空間全体を探索する。学習ベースの手法は、回帰タスクを定式化してGEDを学習するためにグラフ表現技術を適用し、編集パスを復元できず、不正確なGED近似につながる(すなわち、予測されたGEDは正確なよりも小さい)。そこで本研究では,グラフニューラルネットワーク(GNN)とA*アルゴリズムに基づくGEDの近似計算のためのデータ駆動型ハイブリッドアプローチMATA*を提案する。具体的には、GED計算における構造支配的操作(ノードとエッジの挿入/削除)の性質を認識し、ノードマッチングのためのノード埋め込みのための局所および高次構造情報を共同で学習する構造強化GNNを設計する。第2に、top-k候補ノードは微分可能なtop-k操作によって生成され、gedの他の特性、すなわち複数の最適ノードマッチングに準拠したノードマッチングのトレーニングを可能にする。第3に、候補ノードの恩恵を受けたmata*は、有望な検索方向のみを実行し、効率的にソリューションに到達する。最後に、MATA*は組合せ探索法、学習法、ハイブリッド法を著しく上回り、大規模グラフに匹敵するスケール性を示す。

Graph Edit Distance (GED) is a general and domain-agnostic metric to measure graph similarity, widely used in graph search or retrieving tasks. However, the exact GED computation is known to be NP-complete. For instance, the widely used A* algorithms explore the entire search space to find the optimal solution which inevitably suffers scalability issues. Learning-based methods apply graph representation techniques to learn the GED by formulating a regression task, which can not recover the edit path and lead to inaccurate GED approximation (i.e., the predicted GED is smaller than the exact). To this end, in this work, we present a data-driven hybrid approach MATA* for approximate GED computation based on Graph Neural Networks (GNNs) and A* algorithms, which models from the perspective of learning to match nodes instead of directly regressing GED. Specifically, aware of the structure-dominant operations (i.e.,node and edge insertion/deletion) property in GED computation, a structure-enhanced GNN is firstly designed to jointly learn local and high-order structural information for node embeddings for node matchings. Second, top-k candidate nodes are produced via a differentiable top-k operation to enable the training for node matchings, which is adhering to another property of GED, i.e., multiple optimal node matchings. Third, benefiting from the candidate nodes, MATA* only performs on the promising search directions, reaching the solution efficiently. Finally, extensive experiments show the superiority of MATA* as it significantly outperforms the combinatorial search-based, learning-based and hybrid methods and scales well to large-size graphs.

翻訳日:2023-11-07 18:10:57 公開日:2023-11-04

# TreeSwap: 依存サブツリースワッピングによる機械翻訳のためのデータ拡張

TreeSwap: Data Augmentation for Machine Translation via Dependency Subtree Swapping ( http://arxiv.org/abs/2311.02355v1 )

ライセンス: Link先を確認

Attila Nagy, Dorina Lakatos, Botond Barta, Judit \'Acs

(参考訳) ニューラルネットワーク翻訳のためのデータ拡張手法は、限られた量のトレーニングデータが利用可能である場合、特に有用である。本稿では,物体と対象をバイセントで置き換えることで,新たな文を生成する手法を提案する。これはソースとターゲット文の依存関係解析木に基づいて同時に実行される。このメソッドをTreeSwapと名付けます。この結果から,TreeSwapはリソース制約付きデータセット上で,4つの言語ペアのベースラインモデルに対して一貫した改善を実現していることがわかった。ドメイン固有のコーパスについても検討するが,本手法は法,医療,ITデータに大きな改善をもたらすものではない。同様の拡張手法のスコアを報告し,treeswapが両立することを確認した。また、生成した文を定性的に分析し、ほとんどのケースで増補が正しい翻訳を生み出すことを見出した。コードはgithubから入手できます。

Data augmentation methods for neural machine translation are particularly useful when limited amount of training data is available, which is often the case when dealing with low-resource languages. We introduce a novel augmentation method, which generates new sentences by swapping objects and subjects across bisentences. This is performed simultaneously based on the dependency parse trees of the source and target sentences. We name this method TreeSwap. Our results show that TreeSwap achieves consistent improvements over baseline models in 4 language pairs in both directions on resource-constrained datasets. We also explore domain-specific corpora, but find that our method does not make significant improvements on law, medical and IT data. We report the scores of similar augmentation methods and find that TreeSwap performs comparably. We also analyze the generated sentences qualitatively and find that the augmentation produces a correct translation in most cases. Our code is available on Github.

翻訳日:2023-11-07 18:10:26 公開日:2023-11-04

# ネットワーク上の意見形成のサンプル複雑性

Sample Complexity of Opinion Formation on Networks ( http://arxiv.org/abs/2311.02349v1 )

ライセンス: Link先を確認

Haolin Liu, Rajmohan Rajaraman, Ravi Sundaram, Anil Vullikanti, Omer Wasim, Haifeng Xu

(参考訳) ソーシャル・ネットワークが連携する地域社会において、新たなワクチンに対する意識を広めることを目指す公衆衛生担当者について検討する。情報を最小限のリソースで分散し、実際の事実に沿ったコミュニティ全体の理解を確保するにはどうすればよいのか? この懸念は多くの現実世界の状況を反映している。本稿では,この問題を解決するために,サンプル複雑性の研究を意見形成において初期化する。我々のモデルは、認識された意見形成ゲームに基づいており、各エージェントの意見は、先行研究のような実数ではなく、データ由来のモデルパラメータであるとみなす。このような拡張は、意見形成をより深く理解し、連合学習と密接に結びついている。この定式化を通じて、任意のネットワークのサンプル複雑性境界を特徴づけ、特定のネットワーク構造に対して漸近的に密接な境界を示す。興味深いことに、最適な戦略は、しばしばその度合いに逆らってサンプルを割り当て、重要な政策含意を示唆する。本研究は,合成ネットワークと実世界のネットワークの両方で実証実験を行った。

Consider public health officials aiming to spread awareness about a new vaccine in a community interconnected by a social network. How can they distribute information with minimal resources, ensuring community-wide understanding that aligns with the actual facts? This concern mirrors numerous real-world situations. In this paper, we initialize the study of sample complexity in opinion formation to solve this problem. Our model is built on the recognized opinion formation game, where we regard each agent's opinion as a data-derived model parameter, not just a real number as in prior studies. Such an extension offers a wider understanding of opinion formation and ties closely with federated learning. Through this formulation, we characterize the sample complexity bounds for any network and also show asymptotically tight bounds for specific network structures. Intriguingly, we discover optimal strategies often allocate samples inversely to the degree, hinting at vital policy implications. Our findings are empirically validated on both synthesized and real-world networks.

翻訳日:2023-11-07 18:10:11 公開日:2023-11-04

# 質問応答のための摂動型アクティブラーニング

Perturbation-based Active Learning for Question Answering ( http://arxiv.org/abs/2311.02345v1 )

ライセンス: Link先を確認

Fan Luo, Mihai Surdeanu

(参考訳) アクティブラーニング(AL)トレーニング戦略を活用することで、アノテーションコストの少ない質問応答(QA)モデルを構築することができる。最も情報のないトレーニングデータを選択して、モデルを効果的に更新する。 ALの取得関数は、不確実性や多様性に基づくサンプリングなど、各トレーニング例がどの程度情報的であるかを決定するために使用される。本研究では,摂動型アクティブラーニングによる学習戦略を提案し,既存の一般的な学習戦略よりも効果的であることを実証する。

Building a question answering (QA) model with less annotation costs can be achieved by utilizing active learning (AL) training strategy. It selects the most informative unlabeled training data to update the model effectively. Acquisition functions for AL are used to determine how informative each training example is, such as uncertainty or diversity based sampling. In this work, we propose a perturbation-based active learning acquisition strategy and demonstrate it is more effective than existing commonly used strategies.

翻訳日:2023-11-07 18:09:54 公開日:2023-11-04

# 1回だけ前進する - 予測と合理化を1回のフォワードパスで行う

You Only Forward Once: Prediction and Rationalization in A Single Forward Pass ( http://arxiv.org/abs/2311.02344v1 )

ライセンス: Link先を確認

Han Jiang, Junwen Duan, Zhe Qu, and Jianxin Wang

(参考訳) 教師なし論理抽出は、注釈付き推論なしでモデル予測をサポートするために、簡潔で連続したテキストスニペットを抽出することを目的としている。これまでの研究では、RNP(Rationalizing Neural Prediction)フレームワークと呼ばれる2段階のフレームワークを使用してきた。彼らは、抽出された説明は合理性と呼ばれ、ゴールデンラベルを予測するのに十分であると仮定した。しかし、上記の仮定は元の定義から外れており、うまく機能するには厳格すぎる。さらに、これらの二相モデルは連動問題とスプリアス相関に苦しむ。そこで本研究では, 予測ではなくモデル予測を支援するため, 理論の緩やかなバージョンから導出した, you only forward once (yofo) と呼ばれる新しい単相フレームワークを提案する。我々のフレームワークでは、BERTのような事前訓練された言語モデルがデプロイされ、相互ロックや素早い相関による影響が少なく、同時に予測と合理化が行われる。教師なしの方法で重要なトークンを直接選択することは難しい。 YOFOは重要なトークンを直接選択する代わりに、前方伝播中に重要でないトークンを徐々に削除する。 BeerAdvocateおよびHotel Reviewデータセットの実験を通して、我々のモデルが有理性を抽出し、RNPベースのモデルよりも正確に予測できることを示した。従来の最先端手法と比較して,トークンレベルのF1では最大18.4\%の改善が見られた。また,抽出された合理性およびトークン崩壊戦略の解明と実験を行った。その結果, YOFOは, モデル中央の重要でないトークンを除去しながら, 正確かつ重要な有理を抽出できることがわかった。

Unsupervised rationale extraction aims to extract concise and contiguous text snippets to support model predictions without any annotated rationale. Previous studies have used a two-phase framework known as the Rationalizing Neural Prediction (RNP) framework, which follows a generate-then-predict paradigm. They assumed that the extracted explanation, called rationale, should be sufficient to predict the golden label. However, the assumption above deviates from the original definition and is too strict to perform well. Furthermore, these two-phase models suffer from the interlocking problem and spurious correlations. To solve the above problems, we propose a novel single-phase framework called You Only Forward Once (YOFO), derived from a relaxed version of rationale where rationales aim to support model predictions rather than make predictions. In our framework, A pre-trained language model like BERT is deployed to simultaneously perform prediction and rationalization with less impact from interlocking or spurious correlations. Directly choosing the important tokens in an unsupervised manner is intractable. Instead of directly choosing the important tokens, YOFO gradually removes unimportant tokens during forward propagation. Through experiments on the BeerAdvocate and Hotel Review datasets, we demonstrate that our model is able to extract rationales and make predictions more accurately compared to RNP-based models. We observe an improvement of up to 18.4\% in token-level F1 compared to previous state-of-the-art methods. We also conducted analyses and experiments to explore the extracted rationales and token decay strategies. The results show that YOFO can extract precise and important rationales while removing unimportant tokens in the middle part of the model.

翻訳日:2023-11-07 18:09:46 公開日:2023-11-04

# 安定拡散参照のみ:イメージプロンプトとブループリントによる2次塗装用多成分拡散モデル

Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting ( http://arxiv.org/abs/2311.02343v1 )

ライセンス: Link先を確認

Hao Ai, Lu Sheng

(参考訳) 安定拡散と制御ネットは画像生成と合成の分野で優れた成果を上げている。しかし、その粒度と制御方法により、二次絵画を主な作品とする漫画やアニメーション制作などの専門的な芸術作品では、効率性の向上が限定されている。現在のワークフローでは、文字や画像のスタイルを修正するには長いテキストプロンプトが必要であり、さらにテキストインバージョンやdreamboothなどの方法によるさらなるトレーニングが必要であり、これは画家にとって非常に複雑で高価である。そこで,本論文では,2種類の条件付き画像のみを用いて,2次絵画の高速化を行う,画像から画像への自己教師付きモデルである,安定拡散参照(Stable Diffusion Reference Only)を提案する。第1タイプの条件画像は、画像プロンプトとして機能し、生成に必要な概念および色情報を提供する。第2のタイプはブループリントイメージであり、生成された画像の視覚構造を制御する。元々のUNetにネイティブに組み込まれており、ControlNetの必要性を排除している。モジュールとパイプラインのすべてのコードをリリースし、コントロール可能な文字行アートカラーリングモデルをhttps://github.com/aihao2000/stable-diffusion-reference-onlyでトレーニングしました。これにより、この構造の有効性が検証され、アニメーション、漫画、ファンワークの生産効率が大幅に向上する。

Stable Diffusion and ControlNet have achieved excellent results in the field of image generation and synthesis. However, due to the granularity and method of its control, the efficiency improvement is limited for professional artistic creations such as comics and animation production whose main work is secondary painting. In the current workflow, fixing characters and image styles often need lengthy text prompts, and even requires further training through TextualInversion, DreamBooth or other methods, which is very complicated and expensive for painters. Therefore, we present a new method in this paper, Stable Diffusion Reference Only, a images-to-image self-supervised model that uses only two types of conditional images for precise control generation to accelerate secondary painting. The first type of conditional image serves as an image prompt, supplying the necessary conceptual and color information for generation. The second type is blueprint image, which controls the visual structure of the generated image. It is natively embedded into the original UNet, eliminating the need for ControlNet. We released all the code for the module and pipeline, and trained a controllable character line art coloring model at https://github.com/aihao2000/stable-diffusion-reference-only, that achieved state-of-the-art results in this field. This verifies the effectiveness of the structure and greatly improves the production efficiency of animations, comics, and fanworks.

翻訳日:2023-11-07 18:09:19 公開日:2023-11-04

# オープンワールドアンバイアス検出器のための提案レベル非教師なし領域適応

Proposal-Level Unsupervised Domain Adaptation for Open World Unbiased Detector ( http://arxiv.org/abs/2311.02342v1 )

ライセンス: Link先を確認

Xuanyi Liu, Zhongqi Yue, Xian-Sheng Hua

(参考訳) Open World Object Detection (OWOD)はオープンセットのオブジェクト検出とインクリメンタルな学習機能を組み合わせて、オープンでダイナミックなビジュアル世界の課題に対処する。既存の研究では、観察されたカテゴリで訓練されたフォアグラウンド予測器は、トップkの最も自信のあるフォアグラウンド予測を選択することで、見当たらないカテゴリの場所を特定するために直接転送できると仮定している。しかし、この仮定は実際はほとんど有効ではない。これは、予測者は必然的に既知のカテゴリに偏り、見当たらないカテゴリの出現のシフト下で失敗するためである。本研究では,非教師なし領域適応の下でタスクを再フォーマットし,現在のバイアス付き予測者がドメイン形成を支援することにより,未バイアスのフォアグラウンド予測器を構築することを目的としている。次に, 単純かつ効果的な自己学習法を用いて, 領域不変のフォアグラウンド特徴に基づく予測系を学習し, 視認圏と視認圏の出現の変化に頑健な非バイアス予測を実現する。このアプローチのパイプラインは,OWOD評価によって実証的に検証された,さまざまな検出フレームワークやUDAメソッドに適応することができる。

Open World Object Detection (OWOD) combines open-set object detection with incremental learning capabilities to handle the challenge of the open and dynamic visual world. Existing works assume that a foreground predictor trained on the seen categories can be directly transferred to identify the unseen categories' locations by selecting the top-k most confident foreground predictions. However, the assumption is hardly valid in practice. This is because the predictor is inevitably biased to the known categories, and fails under the shift in the appearance of the unseen categories. In this work, we aim to build an unbiased foreground predictor by re-formulating the task under Unsupervised Domain Adaptation, where the current biased predictor helps form the domains: the seen object locations and confident background locations as the source domain, and the rest ambiguous ones as the target domain. Then, we adopt the simple and effective self-training method to learn a predictor based on the domain-invariant foreground features, hence achieving unbiased prediction robust to the shift in appearance between the seen and unseen categories. Our approach's pipeline can adapt to various detection frameworks and UDA methods, empirically validated by OWOD evaluation, where we achieve state-of-the-art performance.

翻訳日:2023-11-07 18:08:52 公開日:2023-11-04

# 中国の多工系学生における英語の筆記能力向上 : 入力仮説の適用に関する詳細な文献レビュー

Enhancing English Writing Proficiency in China's Polytechnic Students An In-Depth Literature Review on the Application of the Input Hypothesis ( http://arxiv.org/abs/2311.02341v1 )

ライセンス: Link先を確認

Wei Zhou

(参考訳) 英語を上手に書くことは、多技術系学生にとって非常に重要である。しかし、技術系学校の多くの生徒は、高いレベルのスキルに達するのに苦労している。入力仮説はStephen Krashen氏によって作成され、人々が既に知っているよりも少し難しい情報を受け取れば、言語をうまく学べることを示唆している。本研究は,多芸学生が英語の書き方を改善する上で,入力仮説がいかに役立つかを研究することを目的とする。この研究には、これまでの研究からの実際の観察と実験が含まれる。入力仮説が実際に筆記スキルの向上に役立つかどうかを確認するため、特殊書記指導を受ける多技術者学生のデータを調べる。この論文は、ポリテクニックの学生、教員、サポートスタッフ、そして、より大きなコミュニティのメンバーにも、その帰属、プロセス、そして、ポリテクニックの学生にとって第二言語開発の結果について、より良い情報を提供することができる。キーワード:英語書記スキル、多芸学生、入力仮説、理解可能な入力

Having good English writing skills is extremely important for students in polytechnic institutions. However, a lot of students in technical schools have difficulties in reaching high levels of skill. The Input Hypothesis, created by Stephen Krashen, suggests that people learn languages well when they receive information that's a little harder than what they already know but still understandable. This research paper wants to study how the Input Hypothesis can help polytechnic students improve their English writing skills. The study will include real-life observations and experiments from the previous research. We will look at data from polytechnic students who are receiving special writing instruction to see if the Input Hypothesis actually helps improve their writing skills. The paper can better inform polytechnic students, faculty members, and support staff and even members of the larger community about the attributions, the processes, and the possible outcomes of second language development for polytechnic students. Keywords: English writing skills, Polytechnic students, Input hypothesis, Comprehensible input

翻訳日:2023-11-07 18:08:27 公開日:2023-11-04

# MC-Stereo:ステレオマッチングのためのマルチピーク検索とカスケード検索範囲

MC-Stereo: Multi-peak Lookup and Cascade Search Range for Stereo Matching ( http://arxiv.org/abs/2311.02340v1 )

ライセンス: Link先を確認

Miaojie Feng, Junda Cheng, Hao Jia, Longliang Liu, Gangwei Xu, Xin Yang

(参考訳) ステレオマッチングはシーン理解における基本的なタスクである。近年,反復最適化に基づく手法がステレオマッチングに有望であることが示された。しかし、現在のイテレーションフレームワークはシングルピークルックアップを採用しており、マルチピーク問題を効果的に処理するのに苦労している。さらに、イテレーションプロセス中に使われる固定探索範囲は最終収束効果を制限する。これらの問題に対処するため、MC-Stereoと呼ばれる新しい反復最適化アーキテクチャを提案する。このアーキテクチャは、マルチピークルックアップ戦略を通したマッチングにおけるマルチピーク分布問題を緩和し、粗大な概念をカスケード探索範囲を介して反復的なフレームワークに統合する。さらに, 特徴表現学習が学習ベースステレオマッチングの成功に不可欠であることを踏まえ, 特徴抽出器として機能する事前学習ネットワークを導入し, ステレオマッチングパイプラインのフロントエンドを強化する。これらの改善に基づき、MC-Stereo は KITTI-2012 と KITTI-2015 ベンチマークで利用可能なすべてのメソッドの中で第1位であり、ETH3D の最先端性能も達成している。コードは、この論文の公開後にオープンソース化される。

Stereo matching is a fundamental task in scene comprehension. In recent years, the method based on iterative optimization has shown promise in stereo matching. However, the current iteration framework employs a single-peak lookup, which struggles to handle the multi-peak problem effectively. Additionally, the fixed search range used during the iteration process limits the final convergence effects. To address these issues, we present a novel iterative optimization architecture called MC-Stereo. This architecture mitigates the multi-peak distribution problem in matching through the multi-peak lookup strategy, and integrates the coarse-to-fine concept into the iterative framework via the cascade search range. Furthermore, given that feature representation learning is crucial for successful learnbased stereo matching, we introduce a pre-trained network to serve as the feature extractor, enhancing the front end of the stereo matching pipeline. Based on these improvements, MC-Stereo ranks first among all publicly available methods on the KITTI-2012 and KITTI-2015 benchmarks, and also achieves state-of-the-art performance on ETH3D. The code will be open sourced after the publication of this paper.

翻訳日:2023-11-07 18:08:09 公開日:2023-11-04

# 深層学習によるPotato Leaf病の分類:畳み込みニューラルネットワークによるアプローチ

Potato Leaf Disease Classification using Deep Learning: A Convolutional Neural Network Approach ( http://arxiv.org/abs/2311.02338v1 )

ライセンス: Link先を確認

Utkarsh Yashwant Tambe, A. Shobanadevi, A. Shanthini and Hsiu-Chun Hsu

(参考訳) 本研究では、ディープラーニングを用いて、ジャガイモ葉病の分類に畳み込みニューラルネットワーク(CNN)を用いる。提案するアプローチでは、リーフイメージデータの事前処理、そのデータ上でCNNモデルをトレーニング、テストセットでのモデルの成功を評価する。実験結果によると、CNNモデル全体の精度は99.1%であり、初期明光、後期明光、健康といった2種類のジャガイモの葉病を同定する上で非常に正確である。提案手法は, 食品の安全維持と農業の財政的損失の最小化に不可欠なジャガイモ病の同定に, 信頼性と効果的な対策を提供する。モデルでは、重症感染症が存在する場合でも、さまざまな疾患タイプを正確に認識することができる。本研究は,ジャガイモ栽培における効果的かつ自動化された病害管理を支援する,ジャガイモ病を分類するための深層学習手法の可能性を強調した。

In this study, a Convolutional Neural Network (CNN) is used to classify potato leaf illnesses using Deep Learning. The suggested approach entails preprocessing the leaf image data, training a CNN model on that data, and assessing the model's success on a test set. The experimental findings show that the CNN model, with an overall accuracy of 99.1%, is highly accurate in identifying two kinds of potato leaf diseases, including Early Blight, Late Blight, and Healthy. The suggested method may offer a trustworthy and effective remedy for identifying potato diseases, which is essential for maintaining food security and minimizing financial losses in agriculture. The model can accurately recognize the various disease types even when there are severe infections present. This work highlights the potential of deep learning methods for categorizing potato diseases, which can help with effective and automated disease management in potato farming.

翻訳日:2023-11-07 18:07:49 公開日:2023-11-04

# STOW:倉庫ピッキングロボットの離散フレームセグメンテーションと未確認物体追跡

STOW: Discrete-Frame Segmentation and Tracking of Unseen Objects for Warehouse Picking Robots ( http://arxiv.org/abs/2311.02337v1 )

ライセンス: Link先を確認

Yi Li, Muru Zhang, Markus Grotz, Kaichun Mo, Dieter Fox

(参考訳) 離散フレームにおける見えないオブジェクトインスタンスのセグメンテーションと追跡は、分散倉庫のような動的産業ロボットのコンテキストにおいて大きな課題となる。ここでロボットは、新しいアイテムによる移動、除去、部分的閉塞を含むオブジェクトの再配置を処理し、時間的ギャップのかなりの後にこれらのアイテムを追跡する必要がある。このタスクは、トレーニングセットで学習されていない物体にロボットが遭遇した場合、さらに複雑になる。このような環境では、連続観察がしばしばアクセスできないことを考えると、我々のタスクは、シーンに実質的な変化が生じる可能性のある、不確定な期間で区切られた離散的なフレームの集合を扱うことである。このタスクは、テーブル上のオブジェクトの並べ替えなど、国内のロボットアプリケーションにも変換される。これらの要求に対処するために、これらの産業と家庭のシナリオを再現する新しい合成および実世界のデータセットを導入します。また,効率の良いフレーム間通信を容易にするトランスフォーマーモジュールとともに,離散フレームにおけるジョイントセグメンテーションとトラッキングのための新しいパラダイムを提案する。実験の結果,我々のアプローチは最近の手法を大きく上回っていることがわかった。さらなる結果とビデオについては、 \href{https://sites.google.com/view/stow-corl23}{website} をご覧ください。コードとデータセットがリリースされる。

Segmentation and tracking of unseen object instances in discrete frames pose a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses. Here, robots must handle object rearrangement, including shifting, removal, and partial occlusion by new items, and track these items after substantial temporal gaps. The task is further complicated when robots encounter objects not learned in their training sets, which requires the ability to segment and track previously unseen items. Considering that continuous observation is often inaccessible in such settings, our task involves working with a discrete set of frames separated by indefinite periods during which substantial changes to the scene may occur. This task also translates to domestic robotic applications, such as rearrangement of objects on a table. To address these demanding challenges, we introduce new synthetic and real-world datasets that replicate these industrial and household scenarios. We also propose a novel paradigm for joint segmentation and tracking in discrete frames along with a transformer module that facilitates efficient inter-frame communication. The experiments we conduct show that our approach significantly outperforms recent methods. For additional results and videos, please visit \href{https://sites.google.com/view/stow-corl23}{website}. Code and dataset will be released.

翻訳日:2023-11-07 18:07:32 公開日:2023-11-04

# バイトレベルの精度を持つエンコーダ・デコーダ基礎モデルを用いたDNAの自然言語理解

Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision ( http://arxiv.org/abs/2311.02333v1 )

ライセンス: Link先を確認

Aditya Malusare and Harish Kothandaraman and Dipesh Tamboli and Nadia A. Lanman and Vaneet Aggarwal

(参考訳) 本稿では、エンコーダ・デコーダトランスフォーマアーキテクチャを用いて、dna配列をバイトレベルの精度で解析するアンサンブルヌクレオチドヌクレオチドバイトレベルエンコーダ・デコーダ(enbed)基礎モデルを提案する。 ENBEDは、エンコーダのみまたはデコーダのみのアーキテクチャで以前のゲノムモデルを一般化し、シーケンスからシーケンスへの変換が可能な効率的なモデルを開発するために、注意のサブクアドラルな実装を使用する。 We use Masked Language Modeling to pre-train the foundation model using reference genome sequences and apply it in the following downstream tasks: (1) identification of enhancers, promotors and splice sites, (2) identification of biological function annotations of genomic sequences, (3) recognition of sequences containing base call mismatches and insertion/deletion errors, an advantage over tokenization schemes involving multiple base pairs, which lose the ability to analyze with byte-level precision, and (4) generating mutations of the Influenza virus using the encoder-decoder architecture and validating them against real-world observations. これらの課題のそれぞれにおいて、既存の最先端の成果と比較して顕著な改善が示される。

This paper presents the Ensemble Nucleotide Byte-level Encoder-Decoder (ENBED) foundation model, analyzing DNA sequences at byte-level precision with an encoder-decoder Transformer architecture. ENBED uses a sub-quadratic implementation of attention to develop an efficient model capable of sequence-to-sequence transformations, generalizing previous genomic models with encoder-only or decoder-only architectures. We use Masked Language Modeling to pre-train the foundation model using reference genome sequences and apply it in the following downstream tasks: (1) identification of enhancers, promotors and splice sites, (2) identification of biological function annotations of genomic sequences, (3) recognition of sequences containing base call mismatches and insertion/deletion errors, an advantage over tokenization schemes involving multiple base pairs, which lose the ability to analyze with byte-level precision, and (4) generating mutations of the Influenza virus using the encoder-decoder architecture and validating them against real-world observations. In each of these tasks, we demonstrate significant improvement as compared to the existing state-of-the-art results.

翻訳日:2023-11-07 18:07:12 公開日:2023-11-04

# 臨床補助イメージングに基づくバイオメディカル応用のためのマルチモーダル機械学習

Multimodal Machine Learning for Clinically-Assistive Imaging-Based Biomedical Applications ( http://arxiv.org/abs/2311.02332v1 )

ライセンス: Link先を確認

Elisa Warner, Joonsang Lee, William Hsu, Tanveer Syeda-Mahmood, Charles Kahn, and Arvind Rao

(参考訳) 医療人工知能(AI)システムにおける機械学習(ML)の応用は、伝統的および統計的手法から、ディープラーニングモデルやより最近の生成モデルの適用の増加へと移行してきた。近年,特に画像を用いたマルチモーダルデータ統合をサポートする,広く利用可能なディープラーニングアーキテクチャの発見が増えている。これらのモデルに複数のモダリティを組み込むことは、独自の課題を示す、繁栄する研究トピックである。本稿では、ML(representation, fusion, alignment, translation, co-learning)に関連するマルチモーダルAIに対する5つの課題について論じ、医療画像に基づく臨床意思決定モデルにおけるこれらの課題に対処するための最近のアプローチについて調査する。結論として,この分野の将来について議論し,臨床モデルと臨床現場への翻訳についてさらに解明すべき方向性を示唆した。

Machine learning (ML) applications in medical artificial intelligence (AI) systems have shifted from traditional and statistical methods to increasing application of deep learning models and even more recently generative models. Recent years have seen a rise in the discovery of widely-available deep learning architectures that support multimodal data integration, particularly with images. The incorporation of multiple modalities into these models is a thriving research topic, presenting its own unique challenges. In this work, we discuss five challenges to multimodal AI as it pertains to ML (representation, fusion, alignment, translation, and co-learning) and survey recent approaches to addressing these challenges in the context of medical image-based clinical decision support models. We conclude with a discussion of the future of the field, suggesting directions that should be elucidated further for successful clinical models and their translation to the clinical setting.

翻訳日:2023-11-07 18:06:41 公開日:2023-11-04

# 複合臓器マスクガイド放射線治療報告の作成

Complex Organ Mask Guided Radiology Report Generation ( http://arxiv.org/abs/2311.02329v1 )

ライセンス: Link先を確認

Gu Tiancheng, Liu Dongnan, Li Zhiyuan, Cai Weidong

(参考訳) The goal of automatic report generation is to generate a clinically accurate and coherent phrase from a single given X-ray image, which could alleviate the workload of traditional radiology reporting.However, in a real-world scenario, radiologists frequently face the challenge of producing extensive reports derived from numerous medical images, thereby medical report generation from multi-image perspective is needed.In this paper, we propose the Complex Organ Mask Guided (termed as COMG) report generation model, which incorporates masks from multiple organs (e.g., bones, lungs, heart, and mediastinum), to provide more detailed information and guide the model's attention to these crucial body regions. 具体的には, 融合過程における各臓器に対応する疾患の事前知識を活用して, 報告生成過程における疾患識別フェーズを増強する。さらに、コサイン類似度損失を目標関数として、クロスモーダル一貫性の収束を保証し、モデルの最適化を促進するとともに、COMGがそれぞれIU-Xray上のSOTAモデルKiUTとMIMICのBLEU@4スコアで11.4%と9.7%の改善を達成したことを示す。

The goal of automatic report generation is to generate a clinically accurate and coherent phrase from a single given X-ray image, which could alleviate the workload of traditional radiology reporting.However, in a real-world scenario, radiologists frequently face the challenge of producing extensive reports derived from numerous medical images, thereby medical report generation from multi-image perspective is needed.In this paper, we propose the Complex Organ Mask Guided (termed as COMG) report generation model, which incorporates masks from multiple organs (e.g., bones, lungs, heart, and mediastinum), to provide more detailed information and guide the model's attention to these crucial body regions. Specifically, we leverage prior knowledge of the disease corresponding to each organ in the fusion process to enhance the disease identification phase during the report generation process. Additionally, cosine similarity loss is introduced as target function to ensure the convergence of cross-modal consistency and facilitate model optimization.Experimental results on two public datasets show that COMG achieves a 11.4% and 9.7% improvement in terms of BLEU@4 scores over the SOTA model KiUT on IU-Xray and MIMIC, respectively.

翻訳日:2023-11-07 18:06:14 公開日:2023-11-04

# CDR-Adapter: クロスドメイン勧告モデルのための転送能力向上のための学習アダプタ

CDR-Adapter: Learning Adapters to Dig Out More Transferring Ability for Cross-Domain Recommendation Models ( http://arxiv.org/abs/2311.02398v1 )

ライセンス: Link先を確認

Yanyu Chen, Yao Yao, Wai Kin Victor Chan, Li Xiao, Kai Zhang, Liang Zhang, Yun Ye

(参考訳) データスパーシリティとコールドスタート問題は、レコメンデーションシステムにおいて永続的な課題である。クロスドメインレコメンデーション(CDR)は、ソースドメインからの知識を利用して、ターゲットドメインのレコメンデーションパフォーマンスを改善する、有望なソリューションである。従来のCDRアプローチは主に、知識伝達を促進するためにマッピング関数を学習するEmbedding and Mapping(EMCDR)フレームワークに従ったものだ。しかし、これらのアプローチは、計算コストが高く、元の知識を壊滅的に忘れてしまう可能性がある、転送可能な知識を組み込むために、ネットワーク構造の再設計と再訓練を必要としている。本稿では、ネットワーク構造の再設計を必要とせず、元のレコメンデーションモデルをマッピング関数から切り離すことにより、CDRにおけるデータスパーシリティとコールドスタート問題に対処するスケーラブルで効率的なパラダイムを提案する。具体的には、CDR-Adapterは、アダプタモジュールを使用して特徴表現を整列させ、異なるドメイン間で柔軟な知識伝達を可能にし、トレーニングコストを最小限に抑えた効率的な微調整を可能にする。ベンチマークデータセットについて広範な実験を行い,最先端cdrアプローチの有効性を実証した。

Data sparsity and cold-start problems are persistent challenges in recommendation systems. Cross-domain recommendation (CDR) is a promising solution that utilizes knowledge from the source domain to improve the recommendation performance in the target domain. Previous CDR approaches have mainly followed the Embedding and Mapping (EMCDR) framework, which involves learning a mapping function to facilitate knowledge transfer. However, these approaches necessitate re-engineering and re-training the network structure to incorporate transferrable knowledge, which can be computationally expensive and may result in catastrophic forgetting of the original knowledge. In this paper, we present a scalable and efficient paradigm to address data sparsity and cold-start issues in CDR, named CDR-Adapter, by decoupling the original recommendation model from the mapping function, without requiring re-engineering the network structure. Specifically, CDR-Adapter is a novel plug-and-play module that employs adapter modules to align feature representations, allowing for flexible knowledge transfer across different domains and efficient fine-tuning with minimal training costs. We conducted extensive experiments on the benchmark dataset, which demonstrated the effectiveness of our approach over several state-of-the-art CDR approaches.

翻訳日:2023-11-07 17:58:28 公開日:2023-11-04

# NeuroEvoBench: ディープラーニングアプリケーションのための進化的最適化のベンチマーク

NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications ( http://arxiv.org/abs/2311.02394v1 )

ライセンス: Link先を確認

Robert Tjarko Lange, Yujin Tang, Yingtao Tian

(参考訳) 近年、ディープラーニングコミュニティは、長いインナーループアンロールによるメタラーニングや非微分演算子の最適化など、ハード最適化問題に対処する手段として、進化的最適化(eo)に関心を寄せている。このトレンドの主な理由は、最近ハードウェアアクセラレーションと互換性のあるソフトウェアが革新され、分散人口評価が以前よりもずっと簡単になったことだ。しかし、勾配降下に基づく手法とは違って、EOのハイパーパラメータ理解とベストプラクティスが欠如している。さらに、進化的コミュニティの古典的なベンチマークは、ディープラーニングアプリケーションに対する実践的な洞察をほとんど提供しません。これは、新参者がハードウェアアクセラレーションのeoに挑戦し、大きな採用を妨げる。そこで我々は,Deep Learningアプリケーションに適したEO手法(NeuroEvoBench)の新たなベンチマークを構築し,従来型およびメタ学習型EOを徹底的に評価する。本稿では,資源配分,適合性形成,正規化,正規化,EOのスケーラビリティといった科学的な問題について検討する。ベンチマークはApache-2.0ライセンス下でhttps://github.com/neuroevobench/neuroevobenchで公開されている。

Recently, the Deep Learning community has become interested in evolutionary optimization (EO) as a means to address hard optimization problems, e.g. meta-learning through long inner loop unrolls or optimizing non-differentiable operators. One core reason for this trend has been the recent innovation in hardware acceleration and compatible software - making distributed population evaluations much easier than before. Unlike for gradient descent-based methods though, there is a lack of hyperparameter understanding and best practices for EO - arguably due to severely less 'graduate student descent' and benchmarking being performed for EO methods. Additionally, classical benchmarks from the evolutionary community provide few practical insights for Deep Learning applications. This poses challenges for newcomers to hardware-accelerated EO and hinders significant adoption. Hence, we establish a new benchmark of EO methods (NeuroEvoBench) tailored toward Deep Learning applications and exhaustively evaluate traditional and meta-learned EO. We investigate core scientific questions including resource allocation, fitness shaping, normalization, regularization & scalability of EO. The benchmark is open-sourced at https://github.com/neuroevobench/neuroevobench under Apache-2.0 license.

翻訳日:2023-11-07 17:58:05 公開日:2023-11-04

# ビデオからの教師なし単眼深度の連続学習

Continual Learning of Unsupervised Monocular Depth from Videos ( http://arxiv.org/abs/2311.02393v1 )

ライセンス: Link先を確認

Hemang Chawla, Arnav Varma, Elahe Arani, and Bahram Zonooz

(参考訳) 単眼深度推定を含む空間的シーン理解は、ロボット工学や自律運転といった様々な応用において重要な問題である。教師なし単眼深度推定の改善により、さまざまなクラウドソースビデオでモデルをトレーニングすることが可能になったが、ほとんどの手法では標準のトレーニングプロトコルを使用しており、新しいデータが収集された後、モデルがスクラッチからトレーニングされる。代わりに、逐次的に収集されたデータに対するモデルの連続的なトレーニングは、計算とメモリコストを大幅に削減する。それにもかかわらず、ナイーブな継続的なトレーニングは、モデル安定性と可塑性の間のトレードオフを強調しながら、古いドメインでモデルパフォーマンスが劣化する破滅的な忘れ込みにつながる。画像分類においてこの問題に対処するためにいくつかの手法が提案されているが、深度推定の高次元および時空間的相関アウトプットは別の課題となっている。私たちの知る限りでは、深さ推定における連続学習の問題に焦点をあてたフレームワークや方法は存在しない。そこで我々は,連続的教師なし深度推定(CUDE)の課題を捉え,モデルの性能を評価するために必要な指標を定義する枠組みを提案する。本稿では,カメラ内在性が不明な場合であっても,時間的一貫性を深度推定に活用するリハーサルベースデュアルメモリ法 monodepthcl を提案する。

Spatial scene understanding, including monocular depth estimation, is an important problem in various applications, such as robotics and autonomous driving. While improvements in unsupervised monocular depth estimation have potentially allowed models to be trained on diverse crowdsourced videos, this remains underexplored as most methods utilize the standard training protocol, wherein the models are trained from scratch on all data after new data is collected. Instead, continual training of models on sequentially collected data would significantly reduce computational and memory costs. Nevertheless, naive continual training leads to catastrophic forgetting, where the model performance deteriorates on older domains as it learns on newer domains, highlighting the trade-off between model stability and plasticity. While several techniques have been proposed to address this issue in image classification, the high-dimensional and spatiotemporally correlated outputs of depth estimation make it a distinct challenge. To the best of our knowledge, no framework or method currently exists focusing on the problem of continual learning in depth estimation. Thus, we introduce a framework that captures the challenges of continual unsupervised depth estimation (CUDE), and define the necessary metrics to evaluate model performance. We propose a rehearsal-based dual-memory method, MonoDepthCL, which utilizes spatiotemporal consistency for continual learning in depth estimation, even when the camera intrinsics are unknown.

翻訳日:2023-11-07 17:57:45 公開日:2023-11-04

# クロスレベル蒸留と機能劣化

Cross-Level Distillation and Feature Denoising for Cross-Domain Few-Shot Classification ( http://arxiv.org/abs/2311.02392v1 )

ライセンス: Link先を確認

Hao Zheng, Runqi Wang, Jianzhuang Liu, Asako Kanezaki

(参考訳) 従来の少数ショット分類は、大きなラベル付きベースデータセット上でモデルを学習し、ベースデータセットと同じ分布からターゲットデータセットに迅速に適応することを目的としている。しかし、実際には、いくつかのショット分類のベースとターゲットデータセットは、通常異なるドメインから作られており、これはクロスドメインのショット分類の問題である。トレーニング段階において、対象領域内のラベルなし画像のごく一部をアクセス可能にすることで、この問題に対処する。この設定では、ベースデータが十分でラベル付けされているにもかかわらず、大きなドメインシフトはベースデータセットからの知識の転送を困難にする。我々は,ネットワークの浅い層を誘導し,より高いレベルの情報を学習することで,対象データセットのより差別的な特徴を抽出する能力を高めることができるクロスレベル知識蒸留法を慎重に設計する。さらに,評価段階におけるオーバーフィッティングを緩和するために,特徴冗長性を低減し,オーバーフィッティングを緩和できる特徴デノイジング操作を提案する。 BSCD-FSLベンチマークでは,従来の動的蒸留法を1ショットで5.44%,5ショットの分類タスクで1.37%超えることができる。実装コードはhttps://github.com/jarucezh/cldfdで利用可能である。

The conventional few-shot classification aims at learning a model on a large labeled base dataset and rapidly adapting to a target dataset that is from the same distribution as the base dataset. However, in practice, the base and the target datasets of few-shot classification are usually from different domains, which is the problem of cross-domain few-shot classification. We tackle this problem by making a small proportion of unlabeled images in the target domain accessible in the training stage. In this setup, even though the base data are sufficient and labeled, the large domain shift still makes transferring the knowledge from the base dataset difficult. We meticulously design a cross-level knowledge distillation method, which can strengthen the ability of the model to extract more discriminative features in the target dataset by guiding the network's shallow layers to learn higher-level information. Furthermore, in order to alleviate the overfitting in the evaluation stage, we propose a feature denoising operation which can reduce the feature redundancy and mitigate overfitting. Our approach can surpass the previous state-of-the-art method, Dynamic-Distillation, by 5.44% on 1-shot and 1.37% on 5-shot classification tasks on average in the BSCD-FSL benchmark. The implementation code will be available at https://github.com/jarucezh/cldfd.

翻訳日:2023-11-07 17:57:22 公開日:2023-11-04

# 量子重ね合わせの原理:再考

The quantum superposition principle: a reconsideration ( http://arxiv.org/abs/2311.02391v1 )

ライセンス: Link先を確認

Ivan Georgiev Koprinkov

(参考訳) 量子重ね合わせの原理は、量子力学の断熱定理、非断熱的な状態、実験的証拠に基づいて再考される。量子重ね合わせの物理的機構と物理的性質を明らかにする。

The quantum superposition principle is reconsidered based on adiabatic theorem of quantum mechanics, nonadiabatic dressed states and experimental evidence. The physical mechanism and physical properties of the quantum superposition are revealed.

翻訳日:2023-11-07 17:56:57 公開日:2023-11-04

# セルラーネットワークに適用するAIベースの自己修復ソリューションの概要

AI-based Self-healing Solutions Applied to Cellular Networks: An Overview ( http://arxiv.org/abs/2311.02390v1 )

ライセンス: Link先を確認

Jaleh Farmani, Amirreza Khalil Zadeh

(参考訳) 本稿では,セルネットワークにおけるセル障害に対する自己修復を実装するために使用される,古典型と深層型の両方の機械学習(ml)手法の概要について述べる。自己修復はネットワーク管理に対する有望なアプローチであり、自律的な方法で細胞障害の検出と補償を目的としている。この技術は,既存の4gネットワークと5gネットワークの設置とメンテナンスに伴うコストを削減することを目的としている。本稿では,ネットワーク管理におけるSON,自己修復,ML技術の基本概念と分類について概説する。さらに, 細胞障害の文献における現状を概観し, 特にMLに基づくアプローチに注目した。

In this article, we provide an overview of machine learning (ML) methods, both classical and deep variants, that are used to implement self-healing for cell outages in cellular networks. Self-healing is a promising approach to network management, which aims to detect and compensate for cell outages in an autonomous way. This technology aims to decrease the expenses associated with the installation and maintenance of existing 4G and 5G, i.e. emerging 6G networks by simplifying operational tasks through its ability to heal itself. We provide an overview of the basic concepts and taxonomy for SON, self-healing, and ML techniques, in network management. Moreover, we review the state-of-the-art in literature for cell outages, with a particular emphasis on ML-based approaches.

翻訳日:2023-11-07 17:56:52 公開日:2023-11-04

# 超長周期分散変圧器

Ultra-Long Sequence Distributed Transformer ( http://arxiv.org/abs/2311.02382v1 )

ライセンス: Link先を確認

Xiao Wang, Isaac Lyngaas, Aristeidis Tsaris, Peng Chen, Sajal Dash, Mayanka Chandra Shekar, Tao Luo, Hong-Jun Yoon, Mohamed Wahib, John Gouley

(参考訳) 長いシーケンスで訓練されたトランスフォーマーモデルは、しばしば短いシーケンスよりも高い精度を達成する。残念なことに、従来のトランスフォーマーは、圧倒的な計算とメモリ要求のために長いシーケンストレーニングに苦労している。既存のロングシーケンストレーニングの方法は、制限されたスピードアップとメモリ削減を提供し、精度を損なう可能性がある。本稿では,長周期の変圧器を学習するための新しい分散学習手法であるLong Short-Sequence Transformer(LSS Transformer)を提案する。長いシーケンスをGPU間でセグメントに分散し、各GPUコンピューティングはそのセグメントに対して部分的な自己アテンションを持つ。そして、融合通信と新しい二重勾配平均化技術を用いて、部分的な自己注意の集約や通信オーバーヘッドの最小化を回避する。 wikipedia enwik8データセット上で,lssトランスフォーマタとnvidiaシーケンシャル並列性の性能評価を行った。その結果,提案手法はNvidia V100の144 GPUにおける最先端シーケンス並列処理と比較して,5.6倍,メモリ効率が10.2倍向上した。さらに,3,456個のGPUで50,112個の極端なシーケンス長にスケールアップし,超線形並列効率161%,スループット32ペタフロップスを実現した。

Transformer models trained on long sequences often achieve higher accuracy than short sequences. Unfortunately, conventional transformers struggle with long sequence training due to the overwhelming computation and memory requirements. Existing methods for long sequence training offer limited speedup and memory reduction, and may compromise accuracy. This paper presents a novel and efficient distributed training method, the Long Short-Sequence Transformer (LSS Transformer), for training transformer with long sequences. It distributes a long sequence into segments among GPUs, with each GPU computing a partial self-attention for its segment. Then, it uses a fused communication and a novel double gradient averaging technique to avoid the need to aggregate partial self-attention and minimize communication overhead. We evaluated the performance between LSS Transformer and the state-of-the-art Nvidia sequence parallelism on a Wikipedia enwik8 dataset. Results show that our proposed method lead to 5.6x faster and 10.2x more memory-efficient implementation compared to state-of-the-art sequence parallelism on 144 Nvidia V100 GPUs. Moreover, our algorithm scales to an extreme sequence length of 50,112 at 3,456 GPUs, achieving 161% super-linear parallel efficiency and a throughput of 32 petaflops.

翻訳日:2023-11-07 17:56:38 公開日:2023-11-04

# 大規模言語モデルからのフィードバックによるロボット操作の強化学習

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models ( http://arxiv.org/abs/2311.02379v1 )

ライセンス: Link先を確認

Kun Chu, Xufeng Zhao, Cornelius Weber, Mengdi Li, Stefan Wermter

(参考訳) 強化学習(rl)は,環境との試行錯誤による自己学習を可能にするため,ロボット操作領域において重要な役割を果たす。それでも、サンプル効率と報酬仕様は、その可能性を大幅に制限している。ひとつの可能な解決策は、専門家の指導から学ぶことだ。しかし、RLエージェントを監督するコストが高いため、人間専門家の獲得は不可能であり、自動スーパーバイザーの開発は困難な作業である。大規模言語モデル(LLM)は、自然言語のユーザ入力に対して人間のようなフィードバックを提供する能力を示す。それでも、訓練は特定のロボットデータではなく、巨大なインターネットデータに基づいているため、低レベルのロボットの動きを直接制御するように設計されていない。本稿では,LLMのタイムリーなフィードバックを利用して,RLエージェントがロボットタスクを効率的に学習することを可能にするLafite-RL(Language Agent feedback Interactive Reinforcement Learning)フレームワークを提案する。 rlbenchタスクで行った実験は、自然言語による簡単なプロンプトデザインにより、llmに導かれると学習能力が向上することを示している。これは、学習効率と成功率の両方においてベースラインを上回り、llmによって提供される報酬の有効性を強調する。

Reinforcement Learning (RL) plays an important role in the robotic manipulation domain since it allows self-learning from trial-and-error interactions with the environment. Still, sample efficiency and reward specification seriously limit its potential. One possible solution involves learning from expert guidance. However, obtaining a human expert is impractical due to the high cost of supervising an RL agent, and developing an automatic supervisor is a challenging endeavor. Large Language Models (LLMs) demonstrate remarkable abilities to provide human-like feedback on user inputs in natural language. Nevertheless, they are not designed to directly control low-level robotic motions, as their pretraining is based on vast internet data rather than specific robotics data. In this paper, we introduce the Lafite-RL (Language agent feedback interactive Reinforcement Learning) framework, which enables RL agents to learn robotic tasks efficiently by taking advantage of LLMs' timely feedback. Our experiments conducted on RLBench tasks illustrate that, with simple prompt design in natural language, the Lafite-RL agent exhibits improved learning capabilities when guided by an LLM. It outperforms the baseline in terms of both learning efficiency and success rate, underscoring the efficacy of the rewards provided by an LLM.

翻訳日:2023-11-07 17:56:18 公開日:2023-11-04

# MTS-DVGAN:二変量生成対向ネットワークを用いたサイバー物理システムにおける異常検出

MTS-DVGAN: Anomaly Detection in Cyber-Physical Systems using a Dual Variational Generative Adversarial Network ( http://arxiv.org/abs/2311.02378v1 )

ライセンス: Link先を確認

Haili Sun, Yan Huang, Lansheng Han, Cai Fu, Hongle Liu, Xiang Long

(参考訳) 深層生成モデルは、ラベル付き情報に頼ることなくサイバーフィジカルシステム(cpss)の脆弱性を緩和し、新しいサイバーフィジカル攻撃を検出することを約束している。それでもこれらの生成モデルは、通常のデータによく似た攻撃行動を識別したり、通常のデータ分布から逸脱するが、潜在空間における通常のクラスタの多様体に近い攻撃行動を特定するという課題に直面している。そこで本論文では,MST-DVGAN と呼ばれる非教師付き二重変動生成逆数モデルを提案し,CPS セキュリティのための多変量時系列データにおける異常検出を行う。中心となる概念は、再構成された異常サンプルと通常のサンプルとの区別を広げることで、モデルの識別能力を高めることである。具体的には,よりコンパクトな組込みを得るために,コントラスト制約を再構成プロセスに課すことで拡張モジュールを提案する。次に,多変量時系列の分布特性を利用して正規パターンをモデル化することにより,GAN(Generative Adversarial Network)を強制的に生成する変動オートエンコーダを導入する。さらに,2つの拡張損失関数は,拡張サンプルと原サンプルの相互誘導により,自己監督的な本質的な特徴を抽出するように設計されている。最後に、ジェネレータネットワークの安定性を高めるために、特定の特徴中心損失を導入する。 SWAT、WADI、NSL_KDDという3つの公開データセットで実証実験を行った。その結果,MTS-DVGANの安定性が向上し,一貫した性能向上が達成できた。

Deep generative models are promising in detecting novel cyber-physical attacks, mitigating the vulnerability of Cyber-physical systems (CPSs) without relying on labeled information. Nonetheless, these generative models face challenges in identifying attack behaviors that closely resemble normal data, or deviate from the normal data distribution but are in close proximity to the manifold of the normal cluster in latent space. To tackle this problem, this article proposes a novel unsupervised dual variational generative adversarial model named MST-DVGAN, to perform anomaly detection in multivariate time series data for CPS security. The central concept is to enhance the model's discriminative capability by widening the distinction between reconstructed abnormal samples and their normal counterparts. Specifically, we propose an augmented module by imposing contrastive constraints on the reconstruction process to obtain a more compact embedding. Then, by exploiting the distribution property and modeling the normal patterns of multivariate time series, a variational autoencoder is introduced to force the generative adversarial network (GAN) to generate diverse samples. Furthermore, two augmented loss functions are designed to extract essential characteristics in a self-supervised manner through mutual guidance between the augmented samples and original samples. Finally, a specific feature center loss is introduced for the generator network to enhance its stability. Empirical experiments are conducted on three public datasets, namely SWAT, WADI and NSL_KDD. Comparing with the state-of-the-art methods, the evaluation results show that the proposed MTS-DVGAN is more stable and can achieve consistent performance improvement.

翻訳日:2023-11-07 17:55:58 公開日:2023-11-04

# 厳密な鞍点を避けるリーマン確率最適化法

Riemannian stochastic optimization methods avoid strict saddle points ( http://arxiv.org/abs/2311.02374v1 )

ライセンス: Link先を確認

Ya-Ping Hsieh and Mohammad Reza Karimi and Andreas Krause and Panayotis Mertikopoulos

(参考訳) オンライン主成分分析から共分散行列同定や辞書学習に至るまで、現代の機械学習アプリケーションの多くはリーマン多様体上の最小化問題として定式化され、リーマンの確率的勾配法(あるいはその変種)で解かれる。しかし、多くの場合において、結果の最小化問題は測地的に凸ではないので、選択された解の望ましい解(すなわち局所最小化)への収束は決して保証されない。本稿では,確率 1 の鞍点を避けるために,確率リーマン最適化アルゴリズムが保証されているか,という問題を正確に研究する。一般性については, リーマン勾配降下と比較して, シナリオ毎のコストがはるかに低い可能性に加えて, 自然政策勾配法や通常の凸空間におけるミラー降下法など, 広く用いられている他のアルゴリズムを含む, 引き込みに基づく手法の族について検討する。この一般的な設定では、環境多様体と勾配情報を提供するオラクルの穏やかな仮定の下で、研究中のポリシーは、任意の初期条件から、確率 1 の厳密な saddle point / submanifolds を避ける。この結果は、ほぼ常に、確率リーマンアルゴリズムの極限状態が局所的最小値のみであることを示すため、多様体上の勾配法の使用に対する重要な健全性チェックを提供する。

Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesically convex, so the convergence of the chosen solver to a desirable solution - i.e., a local minimizer - is by no means guaranteed. In this paper, we study precisely this question, that is, whether stochastic Riemannian optimization algorithms are guaranteed to avoid saddle points with probability 1. For generality, we study a family of retraction-based methods which, in addition to having a potentially much lower per-iteration cost relative to Riemannian gradient descent, include other widely used algorithms, such as natural policy gradient methods and mirror descent in ordinary convex spaces. In this general setting, we show that, under mild assumptions for the ambient manifold and the oracle providing gradient information, the policies under study avoid strict saddle points / submanifolds with probability 1, from any initial condition. This result provides an important sanity check for the use of gradient methods on manifolds as it shows that, almost always, the limit state of a stochastic Riemannian algorithm can only be a local minimizer.

翻訳日:2023-11-07 17:55:31 公開日:2023-11-04

# トロイの木馬から城壁へ:拡散モデルにおける両側のバックドア効果

From Trojan Horses to Castle Walls: Unveiling Bilateral Backdoor Effects in Diffusion Models ( http://arxiv.org/abs/2311.02373v1 )

ライセンス: Link先を確認

Zhuoshi Pan, Yuguang Yao, Gaowen Liu, Bingquan Shen, H. Vicky Zhao, Ramana Rao Kompella, Sijia Liu

(参考訳) 最先端拡散モデル(DM)は画像生成において優れているが、セキュリティに関する懸念は持続する。初期の研究ではdmsのバックドア攻撃に対する脆弱性が強調されたが、これらの研究は画像分類における'badnets'のような従来の方法よりも厳格な要件を課した。これは前者が拡散サンプリングと訓練手順に修正を加える必要があるためである。従来と異なり,従来の拡散過程を阻害することなくトレーニングデータセットを汚染するだけで,DMのバックドア攻撃がBadNetsと同じくらい簡単にできるかどうかを検討する。この現実的なバックドア設定では、敵の目的(DMの機能を補完する)だけでなく、防御的優位性(バックドアの防御に活用できる)をもたらす両側のバックドア効果を明らかにする。具体的には、BadNetsのようなバックドア攻撃は、(意図したテキスト条件と一致しない)不正確な画像を生成するためのDMに対して有効であり、DMを分類器として使用すると誤予測が生じる。一方,バックドアDMでは,生成画像中のバックドアトリガの割合が増加しており,この現象は「トリガー増幅」と呼ばれている。後者の知見は,バックドア・ポゾンによるトレーニングデータの検出の促進に有効であることを示す。低バックドア中毒率下においても、DMのバックドア効果を研究することは、抗バックドア画像分類器の設計にも有用である。最後に,dms固有のデータ記憶傾向を探索することにより,バックドア攻撃とデータ複製現象との間に有意義な関連性を確立する。私たちの作業のコードはhttps://github.com/OPTML-Group/BiBadDiff.orgで公開されています。

While state-of-the-art diffusion models (DMs) excel in image generation, concerns regarding their security persist. Earlier research highlighted DMs' vulnerability to backdoor attacks, but these studies placed stricter requirements than conventional methods like 'BadNets' in image classification. This is because the former necessitates modifications to the diffusion sampling and training procedures. Unlike the prior work, we investigate whether generating backdoor attacks in DMs can be as simple as BadNets, i.e., by only contaminating the training dataset without tampering the original diffusion process. In this more realistic backdoor setting, we uncover bilateral backdoor effects that not only serve an adversarial purpose (compromising the functionality of DMs) but also offer a defensive advantage (which can be leveraged for backdoor defense). Specifically, we find that a BadNets-like backdoor attack remains effective in DMs for producing incorrect images (misaligned with the intended text conditions), and thereby yielding incorrect predictions when DMs are used as classifiers. Meanwhile, backdoored DMs exhibit an increased ratio of backdoor triggers, a phenomenon we refer to as `trigger amplification', among the generated images. We show that this latter insight can be used to enhance the detection of backdoor-poisoned training data. Even under a low backdoor poisoning ratio, studying the backdoor effects of DMs is also valuable for designing anti-backdoor image classifiers. Last but not least, we establish a meaningful linkage between backdoor attacks and the phenomenon of data replications by exploring DMs' inherent data memorization tendencies. The codes of our work are available at https://github.com/OPTML-Group/BiBadDiff.

翻訳日:2023-11-07 17:55:05 公開日:2023-11-04

# TACNET: テンポラルオーディオソースカウントネットワーク

TACNET: Temporal Audio Source Counting Network ( http://arxiv.org/abs/2311.02369v1 )

ライセンス: Link先を確認

Amirreza Ahmadnejad, Ahmad Mahmmodian Darviishani, Mohmmad Mehrdad Asadi, Sajjad Saffariyeh, Pedram Yousef, Emad Fatemizadeh

(参考訳) 本稿では,音声ソースカウントタスクの制限に対処する革新的なアーキテクチャであるTemporal Audio Source Counting Network(TaCNet)を紹介する。 TaCNetは生のオーディオ入力を直接操作し、複雑な前処理ステップを排除し、ワークフローを簡素化する。特に、Truncatedの入力ウィンドウでさえ、リアルタイムの話者カウントに優れています。 LibriCountデータセットを用いて行った広範囲な評価は、TaCNetの例外的なパフォーマンスを強調し、オーディオソースカウントタスクの最先端ソリューションとして位置付ける。 11のクラスで平均74.18パーセントの精度で、TaCNetは中国語とペルシア語を含む様々なシナリオでその効果を実証している。この言語間適応性は、その汎用性と潜在的影響を強調している。

In this paper, we introduce the Temporal Audio Source Counting Network (TaCNet), an innovative architecture that addresses limitations in audio source counting tasks. TaCNet operates directly on raw audio inputs, eliminating complex preprocessing steps and simplifying the workflow. Notably, it excels in real-time speaker counting, even with truncated input windows. Our extensive evaluation, conducted using the LibriCount dataset, underscores TaCNet's exceptional performance, positioning it as a state-of-the-art solution for audio source counting tasks. With an average accuracy of 74.18 percentage over 11 classes, TaCNet demonstrates its effectiveness across diverse scenarios, including applications involving Chinese and Persian languages. This cross-lingual adaptability highlights its versatility and potential impact.

翻訳日:2023-11-07 17:54:39 公開日:2023-11-04

# 量子通信

Quantum Communications ( http://arxiv.org/abs/2311.02367v1 )

ライセンス: Link先を確認

Michal Hajdu\v{s}ek and Rodney Van Meter

(参考訳) 第2次量子革命は、この10年で勢いを増している。量子技術は政府、民間企業、投資家、そして公共から注目を集め始めている。情報処理とコミュニケーションのために個々の量子システムを制御する能力は、もはや理論上の夢ではないが、世界中の研究所やスタートアップで着実に日常化しつつある。これにより、量子エンジニアの次世代を教育する必要性がもたらされる。この教科書は、Quantum Academy of Science and Technologyとして知られるQ-Leap Educationプロジェクトにおける、量子コミュニケーションの概要に関するビデオ講義の仲間である。量子ネットワークへの温和な導入であり、様々な背景を持つ大学生の教科書としての使用に適している。量子物理学や量子情報の事前知識は想定されていない。各章にエクササイズが含まれている。

The second quantum revolution has been picking up momentum over the last decade. Quantum technologies are starting to attract more attention from governments, private companies, investors, and public. The ability to control individual quantum systems for the purpose of information processing and communication is no longer a theoretical dream, but is steadily becoming routine in laboratories and startups around the world. With this comes the need to educate the future generation of quantum engineers. This textbook is a companion to our video lectures on Overview of Quantum Communications from the Q-Leap Education project known as Quantum Academy of Science and Technology. It is a gentle introduction to quantum networks, and is suitable for use as a textbook for undergraduate students of diverse background. No prior knowledge of quantum physics or quantum information is assumed. Exercises are included in each chapter.

翻訳日:2023-11-07 17:54:26 公開日:2023-11-04

# 2つの純粋状態の最適識別とドリナー型コヒーレント状態検出

Optimal Discrimination Between Two Pure States and Dolinar-Type Coherent-State Detection ( http://arxiv.org/abs/2311.02366v1 )

ライセンス: Link先を確認

Itamar Katz, Alex Samorodnitsky and Yuval Kochman

(参考訳) 我々は2つの純粋量子状態の識別の問題を考える。誤差確率とログロスの基準の両方の下での最適測定が投影であることはよく知られているが、'reerasure-distortion'' の基準の下では、3-outcome positive operator-valued measure (povm) である。これらの結果は別々に導かれた。 Bhattacharyya 距離に対する凸関係を満たす任意の歪み測度の下で最適な測定値を求める統一的なアプローチを提案する。すなわち、測度が相対凸 (resp. concave) であれば、その測度は上記の射影 (resp. three-outcome POVM) である。上記の3つの結果は、この単純な導出の特別な場合として得られる。結果が適用されるさらなる測度については、位数 1 以上の Renyi エントロピー(参照: $1/2$ 以下)が相対凸(参照: concave)であることを証明する。実用的関心の特別な設定は、2つのコヒーレント光波形の識別である。ドリナーによる顕著な研究で、光子カウンタとフィードバック制御された局所発振器からなる単純な検出器が量子最適誤差確率を得ることを示した。後に、同じ検出器(同じ局所信号を持つ)もログロスの意味で最適であることが示される。同様の凸性アプローチを適用することで、様々な基準に対して最適な信号が統一的に得られる。

We consider the problem of discrimination between two pure quantum states. It is well known that the optimal measurement under both the error-probability and log-loss criteria is a projection, while under an ``erasure-distortion'' criterion it is a three-outcome positive operator-valued measure (POVM). These results were derived separately. We present a unified approach which finds the optimal measurement under any distortion measure that satisfies a convexity relation with respect to the Bhattacharyya distance. Namely, whenever the measure is relatively convex (resp. concave), the measurement is the projection (resp. three-outcome POVM) above. The three above-mentioned results are obtained as special cases of this simple derivation. As for further measures for which our result applies, we prove that Renyi entropies of order $1$ and above (resp. $1/2$ and below) are relatively convex (resp. concave). A special setting of great practical interest, is the discrimination between two coherent-light waveforms. In a remarkable work by Dolinar it was shown that a simple detector consisting of a photon counter and a feedback-controlled local oscillator obtains the quantum-optimal error probability. Later it was shown that the same detector (with the same local signal) is also optimal in the log-loss sense. By applying a similar convexity approach, we obtain in a unified manner the optimal signal for a variety of criteria.

翻訳日:2023-11-07 17:54:15 公開日:2023-11-04

# 画像超解像における潜時空間(DTLS)の領域移動-非分解モデル

Domain Transfer in Latent Space (DTLS) Wins on Image Super-Resolution -- a Non-Denoising Model ( http://arxiv.org/abs/2311.02358v1 )

ライセンス: Link先を確認

Chun-Chuen Hui, Wan-Chi Siu, Ngai-Fong Law

(参考訳) 大規模な画像スーパーレゾリューションはコンピュータビジョンの課題であり、例えばforscale x16スーパーレゾリューションのような高度に劣化した画像には膨大な情報が欠落している。拡散モデルは近年、超高分解能な応用において成功しており、ガウスノイズは潜在光写実空間を形成する手段として使われ、潜光写実空間と潜光写実空間の間のリンクとして機能する。拡散モデルを成功させるガウス雑音の統計のマッピングには、かなり洗練された数学的導出がある。本稿では,ガウス雑音を回避しつつ,画像の高分解能化に拡散モデルの基本構造を応用した簡易な手法を提案する。基本的には,統計的性質の違いを学習し,適度な品質の結果として段階的な補間を容易にする,隣接領域間のドメイン転送を行うdnnを提案する。入力LR画像を参照してドメイン転送を条件付けすることにより、さらなる品質向上を実現する。実験結果から,本手法は最先端の大規模超解像モデルだけでなく,画像超解像に対する現在の拡散モデルよりも優れていた。このアプローチは、画像の啓蒙、塗装、装飾など、他のイメージ・ツー・イメージタスクに容易に拡張できる。

Large scale image super-resolution is a challenging computer vision task, since vast information is missing in a highly degraded image, say for example forscale x16 super-resolution. Diffusion models are used successfully in recent years in extreme super-resolution applications, in which Gaussian noise is used as a means to form a latent photo-realistic space, and acts as a link between the space of latent vectors and the latent photo-realistic space. There are quite a few sophisticated mathematical derivations on mapping the statistics of Gaussian noises making Diffusion Models successful. In this paper we propose a simple approach which gets away from using Gaussian noise but adopts some basic structures of diffusion models for efficient image super-resolution. Essentially, we propose a DNN to perform domain transfer between neighbor domains, which can learn the differences in statistical properties to facilitate gradual interpolation with results of reasonable quality. Further quality improvement is achieved by conditioning the domain transfer with reference to the input LR image. Experimental results show that our method outperforms not only state-of-the-art large scale super resolution models, but also the current diffusion models for image super-resolution. The approach can readily be extended to other image-to-image tasks, such as image enlightening, inpainting, denoising, etc.

翻訳日:2023-11-07 17:53:52 公開日:2023-11-04

# 教師付き分類のためのネットワーク上の量子輸送

Quantum transport on networks for supervised classification ( http://arxiv.org/abs/2311.02442v1 )

ライセンス: Link先を確認

Shmuel Lorber, Oded Zimron, Inbal Lorena Zak, Anat Milo and Yonatan Dubi

(参考訳) 入力を既存のクラスに分類する計算プロセスである分類は、機械学習の時代における現代の計算の基礎となっている。本稿では、トレーニングされた量子ネットワークにおける粒子の量子輸送に基づく新しいタイプの量子分類器を提案する。分類器は、量子粒子をネットワークに送信し、粒子の出口点を測定することに基づいており、これは「クラス」として機能し、ネットワークパラメータを変更することで決定される。このスキームを用いて、分類の例を3つ示す: まず、波動関数は、所定の(ランダムな)群との重なりに従って分類される。第二に、波動関数はその位置化のレベルに応じて分類する。どちらの例も小さなトレーニングセットを使用し、90%以上の精度とリコールを達成している。第3の分類は、その反応性に応じて触媒芳香族アルデヒド基質の分類に関する「現実世界問題」である。実験データを用いて、量子分類器は平均86\%の分類精度に達する。量子分類器はこれらの例では古典的よりも優れており、特に「小さなデータ」の体系において量子上の優位性を示す。これらの結果は、アルゴリズムとして実装でき、フォトニックネットワークのような量子ハードウェア上で実験的に実現できる新しい分類法への道を開いた。

Classification, the computational process of categorizing an input into pre-existing classes, is now a cornerstone in modern computation in the era of machine learning. Here we propose a new type of quantum classifier, based on quantum transport of particles in a trained quantum network. The classifier is based on sending a quantum particle into a network and measuring the particle's exit point, which serves as a "class" and can be determined by changing the network parameters. Using this scheme, we demonstrate three examples of classification; in the first, wave functions are classified according to their overlap with predetermined (random) groups. In the second, we classify wave-functions according to their level of localization. Both examples use small training sets and achieve over 90\% precision and recall. The third classification scheme is a "real-world problem", concerning classification of catalytic aromatic-aldehyde substrates according to their reactivity. Using experimental data, the quantum classifier reaches an average 86\% classification accuracy. We show that the quantum classifier outperforms its classical counterpart for these examples, thus demonstrating quantum advantage, especially in the regime of "small data". These results pave the way for a novel classification scheme, which can be implemented as an algorithm, and potentially realized experimentally on quantum hardware such as photonic networks.

翻訳日:2023-11-07 17:46:12 公開日:2023-11-04

# ChatGPTはソフトウェア検証をサポートできるか?

Can ChatGPT support software verification? ( http://arxiv.org/abs/2311.02433v1 )

ライセンス: Link先を確認

Christian Jan{\ss}en, Cedric Richter, Heike Wehrheim

(参考訳) 大規模な言語モデルは,コード生成やデバッグ,修復といったソフトウェアエンジニアリングタスクにおいて,ますます効果的になっています。 chatgptのような言語モデルはコードを生成するだけでなく、内部動作や特に正確性を説明することができる。これにより、ChatGPTを使って正式なソフトウェア検証をサポートできるかという疑問が持ち上がる。本稿では,この質問に答える第一歩を踏み出す。具体的には,ChatGPTがループ不変量を生成できるかどうかを検討する。ループ不変量生成はソフトウェア検証における中核的なタスクであり、有効な不変量の生成は形式的検証に役立つ可能性が高い。この仮説に関する最初の証拠を与えるため、ChatGPT にループ不変量を持つ 106 C プログラムにアノテートを依頼する。 frama-c と cpachecker の2つの検証器に渡して生成した不変量の有効性と有用性を確認した。評価の結果,ChatGPTはFrama-Cがこれまで解決できなかったタスクを検証できる有効かつ有用な不変量を生成することができることがわかった。最初の知見に基づいて,ChatGPT(あるいは大規模言語モデル)とソフトウェア検証器を組み合わせる方法を提案し,現状の限界とオープンな問題について議論する。

Large language models have become increasingly effective in software engineering tasks such as code generation, debugging and repair. Language models like ChatGPT can not only generate code, but also explain its inner workings and in particular its correctness. This raises the question whether we can utilize ChatGPT to support formal software verification. In this paper, we take some first steps towards answering this question. More specifically, we investigate whether ChatGPT can generate loop invariants. Loop invariant generation is a core task in software verification, and the generation of valid and useful invariants would likely help formal verifiers. To provide some first evidence on this hypothesis, we ask ChatGPT to annotate 106 C programs with loop invariants. We check validity and usefulness of the generated invariants by passing them to two verifiers, Frama-C and CPAchecker. Our evaluation shows that ChatGPT is able to produce valid and useful invariants allowing Frama-C to verify tasks that it could not solve before. Based on our initial insights, we propose ways of combining ChatGPT (or large language models in general) and software verifiers, and discuss current limitations and open issues.

翻訳日:2023-11-07 17:45:53 公開日:2023-11-04

# P-Age:ロバストな時空間年齢分類のためのピクセルデータセット

P-Age: Pexels Dataset for Robust Spatio-Temporal Apparent Age Classification ( http://arxiv.org/abs/2311.02432v1 )

ライセンス: Link先を確認

Abid Ali and Ashish Marisetty and Francois Bremond

(参考訳) 年齢推定は、多くのアプリケーションを持つ難しいタスクである。本稿では, 咬合や低分解能, 照明条件などの課題に対処するために, ビデオベースモデルを用いた年齢分類の新たな方向性を提案する。これらの課題に対処するために,年齢分類において顔に基づく方法が支配される全身の動態の時空間情報を利用する AgeFormer を提案する。提案する2ストリームアーキテクチャは,timeformer と efficientnet をバックボーンとして使用し,顔と身体のダイナミクス情報の両方を効果的にキャプチャし,効率的な年齢推定を行う。さらに,映像からの年齢予測のギャップを埋めるため,年齢分類のためのPexels Age(P-Age)というビデオデータセットを構築した。提案手法は, 既存の顔年齢推定法と比較して優れた結果を得ることができ, 顔の遮蔽, ぼやけた, マスクを施した状況で評価できる。また、Charades、Smarthome、Thumos-14など、さまざまな挑戦的なビデオデータセット上でクロステストされている。

Age estimation is a challenging task that has numerous applications. In this paper, we propose a new direction for age classification that utilizes a video-based model to address challenges such as occlusions, low-resolution, and lighting conditions. To address these challenges, we propose AgeFormer which utilizes spatio-temporal information on the dynamics of the entire body dominating face-based methods for age classification. Our novel two-stream architecture uses TimeSformer and EfficientNet as backbones, to effectively capture both facial and body dynamics information for efficient and accurate age estimation in videos. Furthermore, to fill the gap in predicting age in real-world situations from videos, we construct a video dataset called Pexels Age (P-Age) for age classification. The proposed method achieves superior results compared to existing face-based age estimation methods and is evaluated in situations where the face is highly occluded, blurred, or masked. The method is also cross-tested on a variety of challenging video datasets such as Charades, Smarthome, and Thumos-14.

翻訳日:2023-11-07 17:45:35 公開日:2023-11-04

# 連続学習のためのloraを用いたタスク演算

Task Arithmetic with LoRA for Continual Learning ( http://arxiv.org/abs/2311.02428v1 )

ライセンス: Link先を確認

Rajas Chitale, Ankit Vaidya, Aditya Kane, Archana Ghotkar

(参考訳) 連続学習は、トレーニングデータが連続的なチャンクで利用可能である問題を「タスク」と呼ぶ。連続学習の進歩の大部分は、データのストリーム上でモデルを逐次訓練することによる破滅的な忘れ込みの問題によって妨げられている。さらに、大規模モデルを複数回連続的にトレーニングする計算コストも高くなる。両問題を同時に緩和するために,低ランク適応とタスク演算を用いたトランスフォーマーベース視覚モデルを継続的に学習する手法を提案する。本手法は,各タスクにおける学習モデルの計算要求を減らし,破滅的忘れの問題を完全に回避する。クラス毎に10個のサンプルを小さなメモリで支援すると,本手法はフルセットファインタニングに近い性能が得られる。本手法の長所を支援するために厳格なアブレーションを行った。

Continual learning refers to the problem where the training data is available in sequential chunks, termed "tasks". The majority of progress in continual learning has been stunted by the problem of catastrophic forgetting, which is caused by sequential training of the model on streams of data. Moreover, it becomes computationally expensive to sequentially train large models multiple times. To mitigate both of these problems at once, we propose a novel method to continually train transformer-based vision models using low-rank adaptation and task arithmetic. Our method completely bypasses the problem of catastrophic forgetting, as well as reducing the computational requirement for training models on each task. When aided with a small memory of 10 samples per class, our method achieves performance close to full-set finetuning. We present rigorous ablations to support the prowess of our method.

翻訳日:2023-11-07 17:45:15 公開日:2023-11-04

# オンライン長期制約最適化

Online Long-run Constrained Optimization ( http://arxiv.org/abs/2311.02426v1 )

ライセンス: Link先を確認

Shijie Pan and Wenjie Huang

(参考訳) 本稿では,目的と制約が必ずしも凸であるとは限らないオンライン方式の長期的制約付き最適化問題を解くために,新しい追従型アルゴリズムを提案し,解析する。各期間において、ランダムな線形摂動と強い凹凸摂動をそれぞれ、オフラインのオラクルにプリマル方向とデュアル方向に組み入れ、グローバルミニマックス点を解として探索する。期待される2つの静的累積後悔の定義に基づいて、この問題のクラスに対する最初のサブ線形$O(T^{8/9})$後悔の複雑さを導き出す。提案アルゴリズムは,長期(リスク)制約のある河川汚染源同定問題に対処し,理論結果の有効性を実証し,既存手法と比較して優れた性能を示す。

In this paper, a novel Follow-the-Perturbed-Leader type algorithm is proposed and analyzed for solving general long-term constrained optimization problems in online manner, where the objective and constraints are not necessarily convex. In each period, random linear perturbation and strongly concave perturbation are incorporated in primal and dual directions, respectively, to the offline oracle, and a global minimax point is searched as solution. Based on two particular definitions of expected static cumulative regret, we derive the first sublinear $O(T^{8/9})$ regret complexity for this class of problems. The proposed algorithm is applied to tackle a long-term (risk) constrained river pollutant source identification problem, demonstrating the validity of the theoretical results and exhibiting superior performance compared to existing method.

翻訳日:2023-11-07 17:45:02 公開日:2023-11-04

# 二次駆動を持つ量子電池

A quantum battery with quadratic driving ( http://arxiv.org/abs/2311.02424v1 )

ライセンス: Link先を確認

C. A. Downing and M. S. Ukhtary

(参考訳) 量子バッテリ(quantum battery)は、量子力学オブジェクトを使用して構築されたエネルギー貯蔵デバイスであり、古典的バッテリを上回ることを目的として開発された。量子優位性を利用する量子電池の最適設計を提供するには、高速充電、耐久性ストレージ、効率的な作業抽出のための競合する要求のバランスをとる必要がある。ここでは,エネルギーホルダに接続された駆動型チャージャーからなる2成分量子バッテリモデルについて,線形駆動と二次駆動の2つのパラダイムケースで理論的に検討する。リニアバッテリは、バッテリーの応答を2つのレジームに分割する単一の例外点によって制御される。二次駆動はスクイーズド量子電池につながり、散逸相転移に関連する臨界点付近で多くの有用な仕事を生み出す。我々の理論的結果は、パラメトリックキャビティや非線形回路によって実現され、スクイーズを示す量子電池の出現に繋がる可能性がある。

Quantum batteries are energy storage devices built using quantum mechanical objects, which are developed with the aim of outperforming their classical counterparts. Proposing optimal designs of quantum batteries which are able to exploit quantum advantages requires balancing the competing demands for fast charging, durable storage and effective work extraction. Here we study theoretically a bipartite quantum battery model, composed of a driven charger connected to an energy holder, within two paradigmatic cases of a driven-dissipative open quantum system: linear driving and quadratic driving. The linear battery is governed by a single exceptional point which splits the response of the battery into two regimes, one of which induces a good amount of useful work. Quadratic driving leads to a squeezed quantum battery, which generates plentiful useful work near to critical points associated with dissipative phase transitions. Our theoretical results may be realized with parametric cavities or nonlinear circuits, potentially leading to the manifestation of a quantum battery exhibiting squeezing.

翻訳日:2023-11-07 17:44:47 公開日:2023-11-04

# 量子ゲームにおける行列乗法重みを用いたペイオフ学習

Payoff-based learning with matrix multiplicative weights in quantum games ( http://arxiv.org/abs/2311.02423v1 )

ライセンス: Link先を確認

Kyriakos Lotidis and Panayotis Mertikopoulos and Nicholas Bambos and Jose Blanchet

(参考訳) 本稿では,量子ゲーム(および半定値ゲーム)における学習の問題について,スカラー,ペイオフに基づくフィードバックを用いて検討する。具体的には、広く使われている行列乗算重み (MMW) アルゴリズムに焦点をあて、プレイヤーにゲーム(および/またはそれぞれの選択した状態)の完全な知識を求める代わりに、異なる情報フレームワークに合わせた最小情報行列乗算重み (3MW) 法一式を導入する。この設定において収束を達成するのが難しいのは、古典的な有限ゲームとは対照的に、量子ゲームは純粋状態(純粋戦略の量子的等価性)の無限連続体を持つため、ペイオフベクトルを推定するための標準的な重要性重み付け技術は適用できないことである。その代わり、バンディット凸最適化のアイデアを借用し、問題の半定義幾何学に適応したゼロ次勾配サンプラーを設計する。最初の結果として,決定論的ペイオフフィードバックを持つ3MW法は,プレイヤーが1つのスカラーのみを観測したとしても,バニラの収束率$\mathcal{O}(1/\sqrt{T})$であり,量子ミニマックスゲームにおける完全情報MMWアルゴリズムであることを示す。その後、アルゴリズムの情報要求をさらに緩和し、3MW法を提供し、プレイヤーは彼らのペイオフ観測可能なランダムな実現を観測するだけで、$\mathcal{O}(T^{-1/4})$レートで平衡に収束する。最後に、ゼロサムゲームを超えて、提案した3MW法の正規化変種が、ある一階安定性条件を満たす全ての平衡に対して高い確率で局所収束を保証することを示す。

In this paper, we study the problem of learning in quantum games - and other classes of semidefinite games - with scalar, payoff-based feedback. For concreteness, we focus on the widely used matrix multiplicative weights (MMW) algorithm and, instead of requiring players to have full knowledge of the game (and/or each other's chosen states), we introduce a suite of minimal-information matrix multiplicative weights (3MW) methods tailored to different information frameworks. The main difficulty to attaining convergence in this setting is that, in contrast to classical finite games, quantum games have an infinite continuum of pure states (the quantum equivalent of pure strategies), so standard importance-weighting techniques for estimating payoff vectors cannot be employed. Instead, we borrow ideas from bandit convex optimization and we design a zeroth-order gradient sampler adapted to the semidefinite geometry of the problem at hand. As a first result, we show that the 3MW method with deterministic payoff feedback retains the $\mathcal{O}(1/\sqrt{T})$ convergence rate of the vanilla, full information MMW algorithm in quantum min-max games, even though the players only observe a single scalar. Subsequently, we relax the algorithm's information requirements even further and we provide a 3MW method that only requires players to observe a random realization of their payoff observable, and converges to equilibrium at an $\mathcal{O}(T^{-1/4})$ rate. Finally, going beyond zero-sum games, we show that a regularized variant of the proposed 3MW method guarantees local convergence with high probability to all equilibria that satisfy a certain first-order stability condition.

翻訳日:2023-11-07 17:44:30 公開日:2023-11-04

# 量子ウォークを用いたハイブリッド絡み合い状態の決定論的生成

Deterministic generation of hybrid entangled states using quantum walks ( http://arxiv.org/abs/2311.02419v1 )

ライセンス: Link先を確認

Jaskaran Singh, Vikash Mittal

(参考訳) 近年、量子ビットをコヒーレント状態と絡むハイブリッド絡み合い(he)が様々な量子情報処理タスク、特に量子鍵分布(arxiv:2305.18906 (2023))において優れた性能を示している。理論上の利点にもかかわらず、実験室でのこれらの状態の実際的な生成は困難であった。この文脈では、量子ウォークを用いてhe状態を生成する決定論的かつ効率的な手法を導入する。本手法は, 1次元分割ステップ量子ウォークにおいて, わずか20ステップで99.90 %の顕著な忠実度を実現する。これは、HE状態が確率的にのみ得られ、しばしば80%以下の忠実度を持つ以前のアプローチよりも顕著な改善である。我々のスキームはHE状態の生成に対する堅牢な解を提供するだけでなく、量子ウォークの独特な優位性を強調し、この急成長する分野の発展に寄与する。さらに,本手法は現在の技術で実験的に実現可能である。

Recently, hybrid entanglement (HE), which involves entangling a qubit with a coherent state, has demonstrated superior performance in various quantum information processing tasks, particularly in quantum key distribution [arXiv:2305.18906 (2023)]. Despite its theoretical advantages, the practical generation of these states in the laboratory has been a challenge. In this context, we introduce a deterministic and efficient approach for generating HE states using quantum walks. Our method achieves a remarkable fidelity of 99.90 % with just 20 time steps in a one-dimensional split-step quantum walk. This represents a significant improvement over prior approaches that yielded HE states only probabilistically, often with fidelities as low as 80 %. Our scheme not only provides a robust solution to the generation of HE states but also highlights a unique advantage of quantum walks, thereby contributing to the advancement of this burgeoning field. Moreover, our scheme is experimentally feasible with the current technology.

翻訳日:2023-11-07 17:43:55 公開日:2023-11-04

# P2O-Calib:点対空間閉塞関係を用いたカメラ-LiDAR校正

P2O-Calib: Camera-LiDAR Calibration Using Point-Pair Spatial Occlusion Relationship ( http://arxiv.org/abs/2311.02413v1 )

ライセンス: Link先を確認

Su Wang, Shini Zhang, Xuchong Qiu

(参考訳) センサの精度とロバストな校正結果は,自律走行・ロボット分野におけるフォローアップ研究の重要な構成要素であると考えられる。現在の3D LiDARとモノクルカメラの外部校正は、主にターゲットベースとターゲットレスの手法に焦点を当てている。ターゲットベースの手法はしばしば、追加のターゲット設計やターゲット配置制限などの制約のためにオフラインで使用される。現在のターゲットレスメソッドは、さまざまな環境で特徴的不確定性と特徴的ミスマッチに苦しむ。これらの制約を緩和するために, 3次元空間における閉塞関係を用いた2D-3Dエッジポイント抽出に基づく, ターゲットレスキャリブレーション手法を提案する。さらに,抽出した2D-3D点対に基づいて,校正精度を改善し,計算コストを削減するオクルージョン誘導点マッチング法を提案する。提案手法の有効性を検証するため,KITTIデータセットの実際の画像に対して定性的かつ定量的に評価を行った。その結果,本手法は既存のターゲットレス手法よりも優れており,低誤差・高ロバスト性を実現し,高品質カメラライダーキャリブレーションを応用できることを示す。

The accurate and robust calibration result of sensors is considered as an important building block to the follow-up research in the autonomous driving and robotics domain. The current works involving extrinsic calibration between 3D LiDARs and monocular cameras mainly focus on target-based and target-less methods. The target-based methods are often utilized offline because of restrictions, such as additional target design and target placement limits. The current target-less methods suffer from feature indeterminacy and feature mismatching in various environments. To alleviate these limitations, we propose a novel target-less calibration approach which is based on the 2D-3D edge point extraction using the occlusion relationship in 3D space. Based on the extracted 2D-3D point pairs, we further propose an occlusion-guided point-matching method that improves the calibration accuracy and reduces computation costs. To validate the effectiveness of our approach, we evaluate the method performance qualitatively and quantitatively on real images from the KITTI dataset. The results demonstrate that our method outperforms the existing target-less methods and achieves low error and high robustness that can contribute to the practical applications relying on high-quality Camera-LiDAR calibration.

翻訳日:2023-11-07 17:43:38 公開日:2023-11-04

# 科学論文の臨場感による要約

Citance-Contextualized Summarization of Scientific Papers ( http://arxiv.org/abs/2311.02408v1 )

ライセンス: Link先を確認

Shahbaz Syed, Ahmad Dawar Hakimi, Khalid Al-Khatib, Martin Potthast

(参考訳) 科学論文の自動要約への最近のアプローチは、抽象的な形で情報的な要約を生成する。しかし、要約は論文と引用された参考文献の関係を示すものではない。本稿では,参照の引用(いわゆる'citance'')を含む与えられた文に条件付けされた情報的要約を生成する新しい文脈的要約手法を提案する。この要約では引用位置に関連する引用論文の内容について概説する。そこで,本稿では,論文のクタンスを抽出・モデル化し,引用論文から関連する節を抽出し,各クタンスに合わせた要約要約を生成する。我々は,540Kのコンピュータ科学論文と4.6Mのアクセントを含む新しいデータセットである$\textbf{Webis-Context-SciSumm-2023}$を用いて,我々のアプローチを評価する。

Current approaches to automatic summarization of scientific papers generate informative summaries in the form of abstracts. However, abstracts are not intended to show the relationship between a paper and the references cited in it. We propose a new contextualized summarization approach that can generate an informative summary conditioned on a given sentence containing the citation of a reference (a so-called ``citance''). This summary outlines the content of the cited paper relevant to the citation location. Thus, our approach extracts and models the citances of a paper, retrieves relevant passages from cited papers, and generates abstractive summaries tailored to each citance. We evaluate our approach using $\textbf{Webis-Context-SciSumm-2023}$, a new dataset containing 540K~computer science papers and 4.6M~citances therein.

翻訳日:2023-11-07 17:43:21 公開日:2023-11-04

# ゲームにおける正規化学習における動的・戦略的安定性の等価性

The equivalence of dynamic and strategic stability under regularized learning in games ( http://arxiv.org/abs/2311.02407v1 )

ライセンス: Link先を確認

Victor Boone and Panayotis Mertikopoulos

(参考訳) 本稿では,有限ゲームにおける正規化非回帰学習の長期実行行動について検討する。フィールドでのよく知られた結果は、ノンレグレットプレイの実証的な頻度がゲームの粗い相関均衡に収束することを示しているが、プレイヤーの実際の戦略が時間とともにどのように進化するかに対する我々の理解は、より限定的であり、多くの場合、存在しない。この問題は、厳密なナッシュ均衡のみが安定し、正規化学習の下で引き寄せられることを示し、学習とポイントワイズ・ソリューションの概念との関係を特に解明することによってさらに悪化する。これの代わりに、我々はより一般的なアプローチをとり、プレイヤーの日々のプレーの「emph{setwise}」合理性特性を特徴付けようとしている。この目的を達成するために,我々は,集合からの一方的な逸脱が,よりよい応答(club)の下での閉性(closeness)と呼ばれる特性であるデビエータ(deviator)のコストを伴うという,集合的な戦略的安定性の最も厳密な基準の1つに焦点を当てている。純粋な戦略の製品は、そのスパンが安定していて、正規化学習の下で引き寄せられる場合に限り、より良い応答の下で閉じられる。さらに、そのような集合への収束率を推定し、エントロピー正則化に基づく手法(指数重み付けアルゴリズムなど)が幾何的な速度で収束するのに対し、射影に基づく手法は、帯域幅、ペイオフベースのフィードバックであっても有限個の反復に収束することを示す。

In this paper, we examine the long-run behavior of regularized, no-regret learning in finite games. A well-known result in the field states that the empirical frequencies of no-regret play converge to the game's set of coarse correlated equilibria; however, our understanding of how the players' actual strategies evolve over time is much more limited - and, in many cases, non-existent. This issue is exacerbated further by a series of recent results showing that only strict Nash equilibria are stable and attracting under regularized learning, thus making the relation between learning and pointwise solution concepts particularly elusive. In lieu of this, we take a more general approach and instead seek to characterize the \emph{setwise} rationality properties of the players' day-to-day play. To that end, we focus on one of the most stringent criteria of setwise strategic stability, namely that any unilateral deviation from the set in question incurs a cost to the deviator - a property known as closedness under better replies (club). In so doing, we obtain a far-reaching equivalence between strategic and dynamic stability: a product of pure strategies is closed under better replies if and only if its span is stable and attracting under regularized learning. In addition, we estimate the rate of convergence to such sets, and we show that methods based on entropic regularization (like the exponential weights algorithm) converge at a geometric rate, while projection-based methods converge within a finite number of iterations, even with bandit, payoff-based feedback.

翻訳日:2023-11-07 17:43:05 公開日:2023-11-04

# 肝ステアトーシス診断のためのハイブリッド量子画像分類と連合学習

Hybrid quantum image classification and federated learning for hepatic steatosis diagnosis ( http://arxiv.org/abs/2311.02402v1 )

ライセンス: Link先を確認

Luca Lusnig, Asel Sagingalieva, Mikhail Surmach, Tatjana Protasevich, Ovidiu Michiu, Joseph McLoughlin, Christopher Mansell, Graziano de' Petris, Deborah Bonazza, Fabrizio Zanconati, Alexey Melnikov, and Fabio Cavalli

(参考訳) 深層学習技術によって成熟することで、臨床画像の日常的解釈を支援するインテリジェントなシステムは非常に重要な役割を果たすことができる。さらに、ディープラーニングに適用される量子技術は、このパフォーマンスを向上し、連合学習技術は、異なる参加者間のプライバシーフレンドリな協調学習を実現し、機密データの使用によるプライバシ問題を解決し、個々の参加者に対して収集すべきデータ数を減らすことができる。本研究では,非アルコール性肝ステアトーシスの定量化に使用可能なハイブリッド量子ニューラルネットワークを提案するとともに,従来型深層学習法に基づく連合学習アプローチを提案する。 5つの量子ビットと100以上の変分ゲートからなるハイブリッド量子resnetモデルであるハイブリッド量子ニューラルネットワークの肝ステアトーシス画像分類精度は97%に達し、これは従来のresnetよりも1.8%高い。重要なのは、データセットを減らしたとしても、私たちのハイブリッドアプローチは従来のアプローチよりもずっと優れており、より優れた一般化と医療応用における過度な適合の可能性を示していることです。さらに、複数のクライアントによるフェデレートされたアプローチは、精度は低いが90%以上であるにもかかわらず、最大32まで、各参加者に対して非常に小さなデータセット、すなわち最大30分の1まで使用することができる。実語臨床データに基づく研究は,スケーラブルで協調的な出発点と見なすことができ,臨床病理学者の日常的な診断作業を容易にする効果的で信頼性の高いコンピュータ支援システムの必要性を満たすことができる。

With the maturity achieved by deep learning techniques, intelligent systems that can assist physicians in the daily interpretation of clinical images can play a very important role. In addition, quantum techniques applied to deep learning can enhance this performance, and federated learning techniques can realize privacy-friendly collaborative learning among different participants, solving privacy issues due to the use of sensitive data and reducing the number of data to be collected for each individual participant. We present in this study a hybrid quantum neural network that can be used to quantify non-alcoholic liver steatosis and could be useful in the diagnostic process to determine a liver's suitability for transplantation; at the same time, we propose a federated learning approach based on a classical deep learning solution to solve the same problem, but using a reduced data set in each part. The liver steatosis image classification accuracy of the hybrid quantum neural network, the hybrid quantum ResNet model, consisted of 5 qubits and more than 100 variational gates, reaches 97%, which is 1.8% higher than its classical counterpart, ResNet. Crucially, that even with a reduced dataset, our hybrid approach consistently outperformed its classical counterpart, indicating superior generalization and less potential for overfitting in medical applications. In addition, a federated approach with multiple clients, up to 32, despite the lower accuracy, but still higher than 90%, would allow using, for each participant, a very small dataset, i.e., up to one-thirtieth. Our work, based over real-word clinical data can be regarded as a scalable and collaborative starting point, could thus fulfill the need for an effective and reliable computer-assisted system that facilitates the daily diagnostic work of the clinical pathologist.

翻訳日:2023-11-07 17:42:32 公開日:2023-11-04

# BarcodeBERT:生物多様性分析用トランス

BarcodeBERT: Transformers for Biodiversity Analysis ( http://arxiv.org/abs/2311.02401v1 )

ライセンス: Link先を確認

Pablo Millan Arias and Niousha Sadjadi and Monireh Safari and ZeMing Gong and Austin T. Wang and Scott C. Lowe and Joakim Bruslund Haurum and Iuliia Zarubiieva and Dirk Steinke and Lila Kari and Angel X. Chang and Graham W. Taylor

(参考訳) 生物多様性を理解することはグローバルな課題であり、DNAのバーコードショート断片が種によってクラスター化され、重要な役割を果たす。特に、非常に多様で未調査の群である無脊椎動物は、独特の分類学的複合体を呈する。我々は、教師付きCNN、微調整された基礎モデル、複雑度の異なるデータセット間でのDNAバーコード固有のマスキング戦略など、機械学習アプローチについて検討する。単純なデータセットやタスクは教師付きcnnや微調整されたトランスフォーマーを好むが、種レベルでの識別には、自己教師付き事前トレーニングへのパラダイムシフトが必要である。本稿では, 1.5Mの無脊椎動物DNAバーコード参照ライブラリを利用した, 生物多様性解析のための初の自己管理手法BarcodeBERTを提案する。この研究は、データセットの特定とカバレッジがモデル選択にどのように影響するかを強調し、種と属レベルでの高精度なDNAバーコードに基づく識別を達成する上で、自己教師付き事前訓練の役割を強調している。実際、細調整のステップなしで、大規模なDNAバーコードデータセットで事前訓練されたBarcodeBERTは、複数の下流分類タスクでDNABERTとDNABERT-2を上回っている。コードリポジトリはhttps://github.com/Kari-Genomics-Lab/BarcodeBERTで公開されている。

Understanding biodiversity is a global challenge, in which DNA barcodes - short snippets of DNA that cluster by species - play a pivotal role. In particular, invertebrates, a highly diverse and under-explored group, pose unique taxonomic complexities. We explore machine learning approaches, comparing supervised CNNs, fine-tuned foundation models, and a DNA barcode-specific masking strategy across datasets of varying complexity. While simpler datasets and tasks favor supervised CNNs or fine-tuned transformers, challenging species-level identification demands a paradigm shift towards self-supervised pretraining. We propose BarcodeBERT, the first self-supervised method for general biodiversity analysis, leveraging a 1.5 M invertebrate DNA barcode reference library. This work highlights how dataset specifics and coverage impact model selection, and underscores the role of self-supervised pretraining in achieving high-accuracy DNA barcode-based identification at the species and genus level. Indeed, without the fine-tuning step, BarcodeBERT pretrained on a large DNA barcode dataset outperforms DNABERT and DNABERT-2 on multiple downstream classification tasks. The code repository is available at https://github.com/Kari-Genomics-Lab/BarcodeBERT

翻訳日:2023-11-07 17:42:02 公開日:2023-11-04

# プレートから生産へ:現代の消費者駆動食品システムにおける人工知能

From Plate to Production: Artificial Intelligence in Modern Consumer-Driven Food Systems ( http://arxiv.org/abs/2311.02400v1 )

ライセンス: Link先を確認

Weiqing Min, Pengfei Zhou, Leyi Xu, Tao Liu, Tianhao Li, Mingyu Huang, Ying Jin, Yifan Yi, Min Wen, Shuqiang Jiang, Ramesh Jain

(参考訳) 世界の食料システムは、需要が増大する中で持続的で栄養豊かな食事を供給するという緊急の課題に直面している。 AI(Artificial Intelligence)の出現は、個人の選択革命をもたらし、AIによる個人による決定が、食卓から農場、そして皿へと、食品システムを変革する。この文脈で、aiアルゴリズムは個人の食事選択を洗練し、その後農業生産を形作り、消費から栽培まで最適なフィードバックループを促進する。最初は、食品サプライチェーンにまたがるAIツールやテクニックを調べ、その後、AIサブフィールドが機械学習、コンピュータビジョン、音声認識をどのように通過するかを評価する。 AIFSフレームワークの注目点として、デジタル化、ビッグデータ分析、バイオテクノロジー、そしてあらゆるコンポーネントの現代の食品システムで広く使用されているIoTなど、AIとAIの融合を強調しています。このパラダイムは、伝統的な「ファーム・トゥ・フォーク」の物語を循環型「消費者主導型ファーム・トゥ・フォーク」モデルにシフトさせ、持続的で栄養豊かな食事の実現に役立てる。本稿では、食品分野におけるaiの約束と本質的な課題について考察する。厳格なAIガバナンス、均一なデータアーキテクチャ、学際的なパートナーシップを推進することによって、消費者中心の戦略と相乗化するAIは、持続可能な軌道に向けて食品システムを操る可能性を秘めている、と私たちは主張する。我々は、食品システムの多様な側面における最先端技術に関する包括的な調査を行い、その後、ギャップを特定し、創発的なai方法論の公平で効果的な展開を提唱する。

Global food systems confront the urgent challenge of supplying sustainable, nutritious diets in the face of escalating demands. The advent of Artificial Intelligence (AI) is bringing in a personal choice revolution, wherein AI-driven individual decisions transform food systems from dinner tables, to the farms, and back to our plates. In this context, AI algorithms refine personal dietary choices, subsequently shaping agricultural outputs, and promoting an optimized feedback loop from consumption to cultivation. Initially, we delve into AI tools and techniques spanning the food supply chain, and subsequently assess how AI subfields$\unicode{x2013}$encompassing machine learning, computer vision, and speech recognition$\unicode{x2013}$are harnessed within the AI-enabled Food System (AIFS) framework, which increasingly leverages Internet of Things, multimodal sensors and real-time data exchange. We spotlight the AIFS framework, emphasizing its fusion of AI with technologies such as digitalization, big data analytics, biotechnology, and IoT extensively used in modern food systems in every component. This paradigm shifts the conventional "farm to fork" narrative to a cyclical "consumer-driven farm to fork" model for better achieving sustainable, nutritious diets. This paper explores AI's promise and the intrinsic challenges it poses within the food domain. By championing stringent AI governance, uniform data architectures, and cross-disciplinary partnerships, we argue that AI, when synergized with consumer-centric strategies, holds the potential to steer food systems toward a sustainable trajectory. We furnish a comprehensive survey for the state-of-the-art in diverse facets of food systems, subsequently pinpointing gaps and advocating for the judicious and efficacious deployment of emergent AI methodologies.

翻訳日:2023-11-07 17:41:40 公開日:2023-11-04

# 高速かつ高精度な分散GNNのためのエントロピーアウェアトレーニング

Entropy Aware Training for Fast and Accurate Distributed GNN ( http://arxiv.org/abs/2311.02399v1 )

ライセンス: Link先を確認

Dhruv Deshmukh (1), Gagan Raj Gupta (1), Manisha Chawla (1), Vishwesh Jatala (1), Anirban Haldar (1) ((1) Department of CSE, IIT Bhilai, India)

(参考訳) 数十億規模のグラフ上でグラフニューラルネットワーク(gnn)をスケールするために、いくつかの分散フレームワークが開発された。いくつかのベンチマークでは、これらのフレームワークが生成するグラフ分割が異種データ分散とクラス不均衡を持ち、コンバージェンスに影響し、集中型実装よりもパフォーマンスが低下することを観察した。これらの課題に積極的に対処し、トレーニング時間を短縮し、精度を向上するテクニックを開発します。我々は,全エントロピーを最小化して,マイクロ平均F1スコア(精度)を改善するためにエッジ重み分割法を開発した。さらに、各計算ホストのモデルをローカルデータ分布に適応させる非同期パーソナライズフェーズを追加します。我々は,収束をかなりスピードアップするクラスバランススプリマーを設計した。アルゴリズムをDistDGLフレームワーク上に実装し、既存のトレーニング手法よりもはるかに優れたスケーリングを実現することを観察した。トレーニング時間では2～3倍のスピードアップを達成し,標準ベースラインと比較して5つのグラフベンチマークでマイクロF1スコアの平均4倍の改善を実現した。

Several distributed frameworks have been developed to scale Graph Neural Networks (GNNs) on billion-size graphs. On several benchmarks, we observe that the graph partitions generated by these frameworks have heterogeneous data distributions and class imbalance, affecting convergence, and resulting in lower performance than centralized implementations. We holistically address these challenges and develop techniques that reduce training time and improve accuracy. We develop an Edge-Weighted partitioning technique to improve the micro average F1 score (accuracy) by minimizing the total entropy. Furthermore, we add an asynchronous personalization phase that adapts each compute-host's model to its local data distribution. We design a class-balanced sampler that considerably speeds up convergence. We implemented our algorithms on the DistDGL framework and observed that our training techniques scale much better than the existing training approach. We achieved a (2-3x) speedup in training time and 4\% improvement on average in micro-F1 scores on 5 large graph benchmarks compared to the standard baselines.

翻訳日:2023-11-07 17:41:10 公開日:2023-11-04

# 固体ネオン上の環状表面状態に基づく単一電子量子ビット

Single-electron qubits based on ring-shaped surface states on solid neon ( http://arxiv.org/abs/2311.02501v1 )

ライセンス: Link先を確認

Toshiaki Kanai, Dafei Jin, and Wei Guo

(参考訳) 最近の実験では、固体ネオン表面に結合した単一電子からなる電荷量子ビットは、非常に長いコヒーレンス時間を示し、量子コンピューティングのプラットフォームとして期待できる。しかし、いくつかの観測は、電子の結合機構と量子状態と応用された電気トラップポテンシャルとの直接相関に疑問を投げかけた。本研究では,電子とネオン表面地形(バンプや谷など)との相互作用を調べるための理論的枠組みを提案する。電子によって誘導される表面電荷を評価することにより、ネオン表面への強い垂直結合を示す。電子の2次元曲面上の横運動に対するシュロディンガー方程式は、広範な地形変化のために解かれる。その結果、表面バンプは電子に自然に結合し、実験的な観測と整合する一意なリング状量子状態を形成することが明らかとなった。また,電子の励起エネルギーは磁場を用いてスムーズに調整でき,量子ビット操作が容易になることを示す。本研究は、e-neon量子ビット特性の理解を深め、量子コンピューティングアーキテクチャの進化のための設計と最適化の指針となる基礎となる。

Recent experiments demonstrate that a charge qubit consisting of a single electron bound to a solid neon surface exhibits an exceptionally long coherence time, making it a promising platform for quantum computing. However, some observations cast doubt on the direct correlation between the electron's binding mechanism and quantum states with the applied electric trapping potential. In this study, we introduce a theoretical framework to examine the electron's interactions with neon surface topography, such as bumps and valleys. By evaluating the surface charges induced by the electron, we demonstrate its strong perpendicular binding to the neon surface. The Schrodinger equation for the electron's lateral motion on the curved 2D surface is then solved for extensive topographical variations. Our results reveal that surface bumps can naturally bind an electron, forming unique ring-shaped quantum states that align with experimental observations. We also show that the electron's excitation energy can be smoothly tuned using an magnetic field to facilitate qubit operation. This study offers a leap in our understanding of e-neon qubit properties, laying the groundwork to guide its design and optimization for advancing quantum computing architectures.

翻訳日:2023-11-07 17:33:11 公開日:2023-11-04

# 大規模固定点反復に対するアンダーソン加速度の収束率の改善

Improved Convergence Rates of Anderson Acceleration for a Large Class of Fixed-Point Iterations ( http://arxiv.org/abs/2311.02490v1 )

ライセンス: Link先を確認

Casey Garner and Gilad Lerman and Teng Zhang

(参考訳) 本稿では、固定点法${x}^{(k+1)}=q({x}^{(k)})$に対するアンダーソン加速度(AA)を研究する。これは作用素 $q$ が線型で対称であるとき、AA が固定点反復よりも根線型収束係数を改善するという最初の証明である。 q$ が非線形であるにもかかわらず、解に対称ヤコビアンを持つとき、少し修正されたaaアルゴリズムは、固定点反復よりも類似のルート線形収束係数が向上することが証明される。シミュレーションは我々の観察を検証する。さらに、異なるデータモデルを用いた実験により、AAはタイラーのM推定の標準的な固定点法よりもはるかに優れていることが示された。

This paper studies Anderson acceleration (AA) for fixed-point methods ${x}^{(k+1)}=q({x}^{(k)})$. It provides the first proof that when the operator $q$ is linear and symmetric, AA improves the root-linear convergence factor over the fixed-point iterations. When $q$ is nonlinear, yet has a symmetric Jacobian at the solution, a slightly modified AA algorithm is proved to have an analogous root-linear convergence factor improvement over fixed-point iterations. Simulations verify our observations. Furthermore, experiments with different data models demonstrate AA is significantly superior to the standard fixed-point methods for Tyler's M-estimation.

翻訳日:2023-11-07 17:32:52 公開日:2023-11-04

# コードレビューのスピードは実践者にとって重要か?

Does Code Review Speed Matter for Practitioners? ( http://arxiv.org/abs/2311.02489v1 )

ライセンス: Link先を確認

Gunnar Kudrjavets (University of Groningen) and Ayushi Rastogi (University of Groningen)

(参考訳) コードベロシティの増大は、さまざまなソフトウェアプロジェクトの共通の目標である。コードレビュープロセスの効率は、コードが最終製品にマージされ、顧客に到達するまでの速度に大きく影響します。我々は、コードベロシティに関連する信念とプラクティスを研究するための調査を実施した。業界参加者39名から75名,オープンソースコミュニティから36名を対象に調査を行った。私たちの重要な発見は (a)業界とオープンソースコミュニティは同様の信念を持っている。 b) 迅速な反応時間は最も重要であり、他のエンジニアのツールインフラストラクチャや振る舞いに適用します。 c) time-to-mergeは、改善するために必要なコードレビューの基準です。 (d) エンジニアは、キャリアの成長に対するコードベロシティの増加の利点について異なる意見を持っている。 (e)コミット・then-reviewモデルの制御されたアプリケーションによってコード速度が向上する。私たちの研究は、基盤となる組織エコシステムに関係なく、コードベロシティへの投資と改善を継続する必要性をサポートします。

Increasing code velocity is a common goal for a variety of software projects. The efficiency of the code review process significantly impacts how fast the code gets merged into the final product and reaches the customers. We conducted a survey to study the code velocity-related beliefs and practices in place. We analyzed 75 completed surveys from 39 participants from the industry and 36 from the open-source community. Our critical findings are (a) the industry and open-source community hold a similar set of beliefs, (b) quick reaction time is of utmost importance and applies to the tooling infrastructure and the behavior of other engineers, (c) time-to-merge is the essential code review metric to improve, (d) engineers have differing opinions about the benefits of increased code velocity for their career growth, and (e) the controlled application of the commit-then-review model can increase code velocity. Our study supports the continued need to invest in and improve code velocity regardless of the underlying organizational ecosystem.

翻訳日:2023-11-07 17:32:40 公開日:2023-11-04

# スパースカテーテルパスを用いた左心房の神経再建

Neural Network Reconstruction of the Left Atrium using Sparse Catheter Paths ( http://arxiv.org/abs/2311.02488v1 )

ライセンス: Link先を確認

Alon Baram, Moshe Safran, Tomer Noy, Naveh Geri and Hayit Greenspan

(参考訳) 近年,カテーテルを用いた肺静脈分離用高周波アブレーションが心房細動治療の第一線となっている。これは、肺静脈のオスティアを含む左心房下心筋表面の比較的正確な地図を必要とし、表面の濃密なサンプリングと10分以上を要する。この研究の焦点は、手順の早期に左心房の可視化を提供することで、手順の複雑さを緩和し、表面のサンプリングが困難であるカテーテルの使用など、さらなるワークフローを可能にすることである。簡単なカテーテル操作から得られた部分的データから左心房の形状を再構築する新しい正規化項を持つ高密度エンコーダデコーダネットワークを提案する。ネットワークをトレーニングするために,3次元アトリア形状の大規模なデータセットを取得し,対応するカテーテル軌道を生成する。トレーニング後,提案するネットワークは,与えられた軌道に基づいて,十分なアトリリウム形状を近似できることを示す。 3次元心房再建のためのネットワークソリューションをいくつか比較した。提案手法は3分間の時間間隔で部分的取得を用いて現実的な可視化を実現する。合成およびヒトの臨床例が示される。

Catheter based radiofrequency ablation for pulmonary vein isolation has become the first line of treatment for atrial fibrillation in recent years. This requires a rather accurate map of the left atrial sub-endocardial surface including the ostia of the pulmonary veins, which requires dense sampling of the surface and takes more than 10 minutes. The focus of this work is to provide left atrial visualization early in the procedure to ease procedure complexity and enable further workflows, such as using catheters that have difficulty sampling the surface. We propose a dense encoder-decoder network with a novel regularization term to reconstruct the shape of the left atrium from partial data which is derived from simple catheter maneuvers. To train the network, we acquire a large dataset of 3D atria shapes and generate corresponding catheter trajectories. Once trained, we show that the suggested network can sufficiently approximate the atrium shape based on a given trajectory. We compare several network solutions for the 3D atrium reconstruction. We demonstrate that the solution proposed produces realistic visualization using partial acquisition within a 3-minute time interval. Synthetic and human clinical cases are shown.

翻訳日:2023-11-07 17:32:27 公開日:2023-11-04

# 時空間データに対する深層学習の不確かさの定量化--課題と機会

Uncertainty Quantification of Deep Learning for Spatiotemporal Data: Challenges and Opportunities ( http://arxiv.org/abs/2311.02485v1 )

ライセンス: Link先を確認

Wenchong He and Zhe Jiang

(参考訳) gps、リモートセンシング、計算シミュレーションの進歩により、大量の地理空間データと時空間データが高速に収集されている。このような時空間的なビッグデータ資産は、ディープラーニング技術の最近の進歩とともに、社会を変えるユニークな機会を提供する。しかし、深層学習が予期せぬ予測を不確実な自信で下し、高い意思決定アプリケーション(災害管理、医療診断、自律運転など)に重大な結果をもたらすことが広く認識されている。不確実性定量化(UQ)は、ディープラーニングモデルの信頼性を推定することを目的としている。本稿では,時空間データに対する深層学習のuqについて,その特異な課題や既存手法などについて概説する。特に不確実性源の重要性に注目する。時空間データの今後の研究方向を明らかにする。

With the advancement of GPS, remote sensing, and computational simulations, large amounts of geospatial and spatiotemporal data are being collected at an increasing speed. Such emerging spatiotemporal big data assets, together with the recent progress of deep learning technologies, provide unique opportunities to transform society. However, it is widely recognized that deep learning sometimes makes unexpected and incorrect predictions with unwarranted confidence, causing severe consequences in high-stake decision-making applications (e.g., disaster management, medical diagnosis, autonomous driving). Uncertainty quantification (UQ) aims to estimate a deep learning model's confidence. This paper provides a brief overview of UQ of deep learning for spatiotemporal data, including its unique challenges and existing methods. We particularly focus on the importance of uncertainty sources. We identify several future research directions for spatiotemporal data.

翻訳日:2023-11-07 17:32:10 公開日:2023-11-04

# 一般化されたゼロショットオーディオツーインテント分類

Generalized zero-shot audio-to-intent classification ( http://arxiv.org/abs/2311.02482v1 )

ライセンス: Link先を確認

Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki

(参考訳) 音声のみのデータを用いた音声言語理解システムの人気は高まっているが、未認識の意図を扱う能力は限られている。本研究では,インテント毎に数文のサンプル文しか持たない汎用的ゼロショット音声対インテント分類フレームワークを提案する。そこで我々はまず,自己教師付き事前学習モデルを用いて教師付きオーディオ・インテリジェント分類器を訓練する。次に、ニューラルオーディオシンセサイザーを利用して、サンプルテキスト発話のためのオーディオ埋め込みを作成し、コサイン類似性を用いて、見えない意図に対する一般化ゼロショット分類を行う。また,音声表現に語彙情報を組み込んでゼロショット性能を向上させるマルチモーダルトレーニング戦略を提案する。マルチモーダルトレーニングアプローチでは,音声のみの学習に比べて,slurpの意図を意識しない場合のゼロショットインテント分類の精度が2.75%,内部目標指向ダイアログデータセットでは18.2%向上している。

Spoken language understanding systems using audio-only data are gaining popularity, yet their ability to handle unseen intents remains limited. In this study, we propose a generalized zero-shot audio-to-intent classification framework with only a few sample text sentences per intent. To achieve this, we first train a supervised audio-to-intent classifier by making use of a self-supervised pre-trained model. We then leverage a neural audio synthesizer to create audio embeddings for sample text utterances and perform generalized zero-shot classification on unseen intents using cosine similarity. We also propose a multimodal training strategy that incorporates lexical information into the audio representation to improve zero-shot performance. Our multimodal training approach improves the accuracy of zero-shot intent classification on unseen intents of SLURP by 2.75% and 18.2% for the SLURP and internal goal-oriented dialog datasets, respectively, compared to audio-only training.

翻訳日:2023-11-07 17:31:55 公開日:2023-11-04

# 医用画像の循環翻訳のための厳密な境界付きディープネットワーク

A Strictly Bounded Deep Network for Unpaired Cyclic Translation of Medical Images ( http://arxiv.org/abs/2311.02480v1 )

ライセンス: Link先を確認

Swati Rai, Jignesh S. Bhatt, and Sarat Kumar Patra

(参考訳) 医用画像翻訳は不適切な問題である。本稿では, 既存の一方向一方向翻訳ネットワークとは異なり, 不対化医療画像について検討し, 安定な双方向翻訳を実現する厳密な有界ネットワークを提供する。適応辞書学習に組み込んだパッチレベル連結巡回条件生成逆数ネットワーク(pCCGAN)を提案する。 47層のサイクリック接続された2つのCGANで構成されており、両方のジェネレータ(各32層の層)は、同じ臓器の入力とターゲットのモダリティ画像(地上の真理ではない)から異なる未対のパッチを連結して条件付けされている。鍵となる考え方は、近隣の文脈の特徴情報を利用して翻訳空間を束縛し、一般化を促進することである。ジェネレータはさらに、コンテキストパッチから学習した適応辞書を備えて、劣化の可能性を低減している。識別器は、ミニマックス関数を用いて翻訳画像を検証する15層ディープネットワークである。複合損失関数は, 対向的, 非対向的, 前向きの周期的, 同一性的損失で定式化され, 提案した学習機械の分散をさらに小さくする。定性的,定量的,アブレーション分析の結果,実際のCTおよびMRIでは良好な結果が得られた。

Medical image translation is an ill-posed problem. Unlike existing paired unbounded unidirectional translation networks, in this paper, we consider unpaired medical images and provide a strictly bounded network that yields a stable bidirectional translation. We propose a patch-level concatenated cyclic conditional generative adversarial network (pCCGAN) embedded with adaptive dictionary learning. It consists of two cyclically connected CGANs of 47 layers each; where both generators (each of 32 layers) are conditioned with concatenation of alternate unpaired patches from input and target modality images (not ground truth) of the same organ. The key idea is to exploit cross-neighborhood contextual feature information that bounds the translation space and boosts generalization. The generators are further equipped with adaptive dictionaries learned from the contextual patches to reduce possible degradation. Discriminators are 15-layer deep networks that employ minimax function to validate the translated imagery. A combined loss function is formulated with adversarial, non-adversarial, forward-backward cyclic, and identity losses that further minimize the variance of the proposed learning machine. Qualitative, quantitative, and ablation analysis show superior results on real CT and MRI.

翻訳日:2023-11-07 17:31:38 公開日:2023-11-04

# コンピュータサイエンスの教授・学生の学術的・個人的背景に基づく成功予測

Forecasting Success of Computer Science Professors and Students Based on Their Academic and Personal Backgrounds ( http://arxiv.org/abs/2311.02476v1 )

ライセンス: Link先を確認

Ghazal Kalhor and Behnam Bahrak

(参考訳) 大学院を修了した後、多くのコンピュータサイエンス(cs)の学生が北米における競争的な大学院プログラムに応募する。彼らの長期的な目標は、大手5社のうちの1社に採用されるか、あるいは教授になることだ。したがって、受け入れ基準の役割に気付くことで、目標に向かって最良の道を選ぶのに役立つかもしれない。本稿では,北米の高名な大学に入学し,将来教授として学界に復帰する可能性について,学生の過去の大学の影響を分析した。以上の結果から,先行大学ランキングが目標達成の重要な要因であることが示された。次に、上位25のコンピュータサイエンスプログラムを受講した学部の学生に偏見があることを示す。最後に,これらの大学における教授の成功を予測するために,機械学習モデルを用いる。我々はこの予測課題に対して7.85のRMSEを達成した。

After completing their undergraduate studies, many computer science (CS) students apply for competitive graduate programs in North America. Their long-term goal is often to be hired by one of the big five tech companies or to become a faculty member. Therefore, being aware of the role of admission criteria may help them choose the best path towards their goals. In this paper, we analyze the influence of students' previous universities on their chances of being accepted to prestigious North American universities and returning to academia as professors in the future. Our findings demonstrate that the ranking of their prior universities is a significant factor in achieving their goals. We then illustrate that there is a bias in the undergraduate institutions of students admitted to the top 25 computer science programs. Finally, we employ machine learning models to forecast the success of professors at these universities. We achieved an RMSE of 7.85 for this prediction task.

翻訳日:2023-11-07 17:31:16 公開日:2023-11-04

# ロボットスキルの精度保存外挿のための制約付き方程式学習ネットワーク

Constrained Equation Learner Networks for Precision-Preserving Extrapolation of Robotic Skills ( http://arxiv.org/abs/2311.02475v1 )

ライセンス: Link先を確認

Hector Perez-Villeda, Justus Piater, and Matteo Saveriano

(参考訳) デモンストレーションによるプログラミングでは、ロボットは人間のデモから新しいスキルを学ぶ。学習後、ロボットはスキルを再現するだけでなく、新たなトレーニングデータを集めることなく、シフトしたドメインに一般化できるべきである。類似領域への適応は文献で研究されているが、オープンな問題は、データ分布の外にある異なる条件に学習スキルをどのように適応するか、そしてもっと重要なことは、望ましい適応の精度を保つかである。本稿では,制約付き回帰の観点からの演題によるプログラミングにおける軌道適応問題に対処する,制約付き方程式学習ネットワークと呼ばれる新しい教師付き学習フレームワークを提案する。制約付き回帰に対する従来のアプローチでは、例えばガウスでは、方程式学習ネットワークを利用して分析式を学習し、基底関数として使用する。これらの基礎関数は、トレーニングデータからの逸脱を最小限に抑えることを目的として、新しい初期点や最終点のような望ましい適応を表す制約を課す。ロボット軌道の適応には3つの課題がある。 1) 新しい適応のための軌道の歪みを最小限にすること 2) 適応の正確性を維持すること,及び 3)基礎関数の構造に関する直観の欠如に対処すること。本研究では,環境変化による適応を必要とするロボット作業のシミュレーションと実実験の両方において,本手法の有効性を検証し,既存の2つの手法との比較を行った。実験の結果,制約付き等式学習者ネットワークは,ロボットスキルの一般化と適応性の向上により,芸術的アプローチの状態を上回っていることがわかった。

In Programming by Demonstration, the robot learns novel skills from human demonstrations. After learning, the robot should be able not only to reproduce the skill, but also to generalize it to shifted domains without collecting new training data. Adaptation to similar domains has been investigated in the literature; however, an open problem is how to adapt learned skills to different conditions that are outside of the data distribution, and, more important, how to preserve the precision of the desired adaptations. This paper presents a novel supervised learning framework called Constrained Equation Learner Networks that addresses the trajectory adaptation problem in Programming by Demonstrations from a constrained regression perspective. While conventional approaches for constrained regression use one kind of basis function, e.g., Gaussian, we exploit Equation Learner Networks to learn a set of analytical expressions and use them as basis functions. These basis functions are learned from demonstration with the objective to minimize deviations from the training data while imposing constraints that represent the desired adaptations, like new initial or final points or maintaining the trajectory within given bounds. Our approach addresses three main difficulties in adapting robotic trajectories: 1) minimizing the distortion of the trajectory for new adaptations; 2) preserving the precision of the adaptations; and 3) dealing with the lack of intuition about the structure of basis functions. We validate our approach both in simulation and in real experiments in a set of robotic tasks that require adaptation due to changes in the environment, and we compare obtained results with two existing approaches. Performed experiments show that Constrained Equation Learner Networks outperform state of the art approaches by increasing generalization and adaptability of robotic skills.

翻訳日:2023-11-07 17:31:02 公開日:2023-11-04

# クラスタネットワーク干渉による個別政策評価と学習

Individualized Policy Evaluation and Learning under Clustered Network Interference ( http://arxiv.org/abs/2311.02467v1 )

ライセンス: Link先を確認

Yi Zhang, Kosuke Imai

(参考訳) 現在、政策評価と学習に関する文献が多数存在するが、先行研究の多くは、ある単位の処理課題が別の単位の結果に影響を及ぼさないと仮定している。あいにく、干渉を無視して政策評価が偏り、学習方針が無効になることがある。例えば、多くの友人を持つ影響力のある個人を治療すると、ポジティブな流出効果が生じ、個別化された治療規則(ITR)の全体的な性能が向上する。本稿では,集団ネットワーク(あるいは部分的)干渉下での最適ITRの評価と学習の問題について考察する。このモデルでは、itrの実証的性能を評価するために使用できる推定器を提案する。この推定器は標準逆確率重み推定器よりも実質的に効率的であり, 流出効果についての仮定を課さない。学習ITRに対する有限サンプル残差を導出し、効率的な評価推定器の使用により学習ポリシーの性能が向上することを示す。最後に,提案手法の利点を説明するためにシミュレーションと経験的研究を行う。

While there now exists a large literature on policy evaluation and learning, much of prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference may lead to biased policy evaluation and yield ineffective learned policies. For example, treating influential individuals who have many friends can generate positive spillover effects, thereby improving the overall performance of an individualized treatment rule (ITR). We consider the problem of evaluating and learning an optimal ITR under clustered network (or partial) interference where clusters of units are sampled from a population and units may influence one another within each cluster. Under this model, we propose an estimator that can be used to evaluate the empirical performance of an ITR. We show that this estimator is substantially more efficient than the standard inverse probability weighting estimator, which does not impose any assumption about spillover effects. We derive the finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the improved performance of learned policies. Finally, we conduct simulation and empirical studies to illustrate the advantages of the proposed methodology.

翻訳日:2023-11-07 17:30:35 公開日:2023-11-04

# 多状態脳ネットワーク発見

Multi-State Brain Network Discovery ( http://arxiv.org/abs/2311.02466v1 )

ライセンス: Link先を確認

Hang Yin and Yao Su and Xinyue Liu and Thomas Hartvigsen and Yanhua Li and Xiangnan Kong

(参考訳) 脳ネットワーク発見は、人間の脳のfMRIスキャンなどの神経画像データから得られる時空間信号からノードとエッジを見つけることを目的としている。既存の方法は、観測された信号が単一の脳活動状態によってのみ生成されると仮定して、代表的または平均的な脳ネットワークを導出する傾向がある。しかし、ヒトの脳は通常複数の活動状態を含み、協調して脳の活動を決定する。脳の領域とその接続は通常、単一の状態のネットワークだけでは捉えにくい複雑なパターンを示す。最近の研究では、脳の活動状態に応じて脳のパーセレーションと接続が変化している。このような脳ネットワークをマルチステートと呼び、この混合物は人間の行動を理解するのに役立ちます。したがって、単一状態ネットワークと比較して、複数状態ネットワークは認知脳ネットワークの重要な情報を失うことを防げる。そこで我々は,CGL(コヒーレントなグラフィカルラッソ)とGMM(ガウス混合モデル)を組み合わせることで,多状態脳ネットワークのモデル化に成功したMNGL(Multi-state Network Graphical Lasso)という新しいモデルを提案する。合成および実世界のADHD 200 fMRIデータセットを用いて、MNGLがより説明的で現実的な結果を発見することによって、最近の最先端の代替品より優れていることを示す。

Brain network discovery aims to find nodes and edges from the spatio-temporal signals obtained by neuroimaging data, such as fMRI scans of human brains. Existing methods tend to derive representative or average brain networks, assuming observed signals are generated by only a single brain activity state. However, the human brain usually involves multiple activity states, which jointly determine the brain activities. The brain regions and their connectivity usually exhibit intricate patterns that are difficult to capture with only a single-state network. Recent studies find that brain parcellation and connectivity change according to the brain activity state. We refer to such brain networks as multi-state, and this mixture can help us understand human behavior. Thus, compared to a single-state network, a multi-state network can prevent us from losing crucial information of cognitive brain network. To achieve this, we propose a new model called MNGL (Multi-state Network Graphical Lasso), which successfully models multi-state brain networks by combining CGL (coherent graphical lasso) with GMM (Gaussian Mixture Model). Using both synthetic and real world ADHD 200 fMRI datasets, we demonstrate that MNGL outperforms recent state-of-the-art alternatives by discovering more explanatory and realistic results.

翻訳日:2023-11-07 17:30:15 公開日:2023-11-04

# AGIのレベル:AGIへの道のりをめざして

Levels of AGI: Operationalizing Progress on the Path to AGI ( http://arxiv.org/abs/2311.02462v1 )

ライセンス: Link先を確認

Meredith Ringel Morris, Jascha Sohl-dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg

(参考訳) 本稿では,人工知能(AGI)モデルとその前駆体の性能と動作を分類する枠組みを提案する。このフレームワークは、AGIパフォーマンス、一般性、自律性のレベルを導入します。モデルの比較,リスク評価,AGIへの道程の進捗測定を行う共通言語を提供することで,この枠組みが自律運転のレベルに類似した形で有用になることを願っている。フレームワークを開発するために、既存のAGIの定義を分析し、AGIにとって有用なオントロジーが満たすべき6つの原則を抽出する。これらの原則には、メカニズムよりも能力にフォーカスすること、汎用性とパフォーマンスを別々に評価すること、エンドポイントではなくagiに向かう段階を定義することが含まれる。これらの原則を念頭に置いて,奥行き(性能)と能力の広さ(一般性)に基づく「アギのレベル」を提案し,このオントロジーに現在のシステムがどのように適合するかを考察する。これらのレベルに対してAGIモデルの振る舞いと能力を定量化する将来のベンチマークの課題について論じる。最後に、これらのAGIのレベルが自律性やリスクといったデプロイメント上の考慮事項とどのように相互作用するかについて議論し、高機能なAIシステムの責任と安全なデプロイメントにおいて、ヒューマン・AIインタラクションパラダイムを慎重に選択することの重要性を強調します。

We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy. It is our hope that this framework will be useful in an analogous way to the levels of autonomous driving, by providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill six principles that a useful ontology for AGI should satisfy. These principles include focusing on capabilities rather than mechanisms; separately evaluating generality and performance; and defining stages along the path toward AGI, rather than focusing on the endpoint. With these principles in mind, we propose 'Levels of AGI' based on depth (performance) and breadth (generality) of capabilities, and reflect on how current systems fit into this ontology. We discuss the challenging requirements for future benchmarks that quantify the behavior and capabilities of AGI models against these levels. Finally, we discuss how these levels of AGI interact with deployment considerations such as autonomy and risk, and emphasize the importance of carefully selecting Human-AI Interaction paradigms for responsible and safe deployment of highly capable AI systems.

翻訳日:2023-11-07 17:29:51 公開日:2023-11-04

# SPHEAR: 完全統計的3次元モデリングのための球面頭部登録

SPHEAR: Spherical Head Registration for Complete Statistical 3D Modeling ( http://arxiv.org/abs/2311.02461v1 )

ライセンス: Link先を確認

Eduard Gabriel Bazavan, Andrei Zanfir, Thiemo Alldieck, Teodor Alexandru Szente, Mihai Zanfir and Cristian Sminchisescu

(参考訳) 本研究では,球面埋め込みに基づく新しい3次元登録法により,高精度で微分可能な3次元人頭モデルである \emph{sphear} を提案する。従来の非リジッド登録法からパラダイムを移行し,様々な表面前処理の下で動作し,再構築の忠実性を高め,必要な介入を最小化する。さらに、sphear は \emph{complete} モデルであり、多様な合成頭の形や表情をサンプリングするだけでなく、視線方向、高解像度のカラーテクスチャ、表面の正常な地図、細部で表現されたヘアカットをストランドとしてサンプリングすることができる。 SPHEARは、自動現実的な視覚データ生成、セマンティックアノテーション、一般的な再構築タスクに使用できる。最先端のアプローチと比較して,我々のコンポーネントは高速かつメモリ効率が高く,設計選択の妥当性と登録,再構築,生成の精度を実験でサポートしています。

We present \emph{SPHEAR}, an accurate, differentiable parametric statistical 3D human head model, enabled by a novel 3D registration method based on spherical embeddings. We shift the paradigm away from the classical Non-Rigid Registration methods, which operate under various surface priors, increasing reconstruction fidelity and minimizing required human intervention. Additionally, SPHEAR is a \emph{complete} model that allows not only to sample diverse synthetic head shapes and facial expressions, but also gaze directions, high-resolution color textures, surface normal maps, and hair cuts represented in detail, as strands. SPHEAR can be used for automatic realistic visual data generation, semantic annotation, and general reconstruction tasks. Compared to state-of-the-art approaches, our components are fast and memory efficient, and experiments support the validity of our design choices and the accuracy of registration, reconstruction and generation techniques.

翻訳日:2023-11-07 17:29:26 公開日:2023-11-04

# ヒューリスティック画像処理による企業組織図からのネットワーク構造抽出

Extracting Network Structures from Corporate Organization Charts Using Heuristic Image Processing ( http://arxiv.org/abs/2311.02460v1 )

ライセンス: Link先を確認

Hiroki Sayama and Junichi Yamanoi

(参考訳) 企業の組織構造は、企業運営のダイナミクスとパフォーマンスに影響を及ぼす可能性がある。しかし、このテーマは、簡単に利用できる組織ネットワークデータセットが不足しているため、未調査のままである。このギャップを克服するため,我々は組織図から組織ネットワークデータを抽出・再構成する新しいヒューリスティック画像処理手法を開発した。本手法は,企業組織図のPDFファイルを解析し,テキストラベル,ボックス,接続線,その他のオブジェクトをヒューリスティックに実装した複数ステップの画像処理により検出する。検出されたコンポーネントは、視覚化、バリデーション、さらにネットワーク分析のために、PythonのNetworkX Graphオブジェクトにまとめられる。 2008年から2011年までdiamond, inc.が発行した「組織図/システム図手帳」に示す日本全上場企業の組織図に本手法を適用した。 10,008の組織図PDFファイルのうち,4,606の組織ネットワークを再構築することができた(データ取得成功率:46%)。再建された組織ネットワーク毎にいくつかのネットワーク診断を行い,企業行動とパフォーマンスの関連性を調べるために,さらなる統計分析に活用する。

Organizational structure of corporations has potential to provide implications for dynamics and performance of corporate operations. However, this subject has remained unexplored because of the lack of readily available organization network datasets. To overcome the this gap, we developed a new heuristic image-processing method to extract and reconstruct organization network data from published organization charts. Our method analyzes a PDF file of a corporate organization chart and detects text labels, boxes, connecting lines, and other objects through multiple steps of heuristically implemented image processing. The detected components are reorganized together into a Python's NetworkX Graph object for visualization, validation and further network analysis. We applied the developed method to the organization charts of all the listed firms in Japan shown in the ``Organization Chart/System Diagram Handbook'' published by Diamond, Inc., from 2008 to 2011. Out of the 10,008 organization chart PDF files, our method was able to reconstruct 4,606 organization networks (data acquisition success rate: 46%). For each reconstructed organization network, we measured several network diagnostics, which will be used for further statistical analysis to investigate their potential correlations with corporate behavior and performance.

翻訳日:2023-11-07 17:29:08 公開日:2023-11-04

# 注意に基づくマルチインスタンス混合モデル

Attention-based Multi-instance Mixed Models ( http://arxiv.org/abs/2311.02455v1 )

ライセンス: Link先を確認

Jan P. Engelmann, Alessandro Palma, Jakub M. Tomczak, Fabian J Theis, Francesco Paolo Casale

(参考訳) 単細胞データから患者の特徴を予測することは、健康や疾患にかかわる細胞状態を明らかにすることができる。線形モデルと平均的な細胞型表現は、その効率性と頑健性のためにこのタスクに好まれるが、単細胞データに固有の豊富な細胞多様性を見落としている。このギャップに対処するために,汎用線形混合モデル (GLMM) と多重インスタンス学習 (MIL) を統合したフレームワークであるGMILを導入し,セル状態の不均一性をモデル化しながら線形モデルの利点を裏付ける。 GMILは、事前に定義されたセル埋め込みを活用することにより、計算効率を高め、シングルセル表現学習の最近の進歩と整合する。実験の結果,GMILは単一セルデータセットにおいて既存のMILモデルよりも優れており,新たな関連性を明らかにし,異なる領域にわたる生物学的機構を明らかにする。

Predicting patient features from single-cell data can unveil cellular states implicated in health and disease. Linear models and average cell type expressions are typically favored for this task for their efficiency and robustness, but they overlook the rich cell heterogeneity inherent in single-cell data. To address this gap, we introduce GMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL), upholding the advantages of linear models while modeling cell-state heterogeneity. By leveraging predefined cell embeddings, GMIL enhances computational efficiency and aligns with recent advancements in single-cell representation learning. Our empirical results reveal that GMIL outperforms existing MIL models in single-cell datasets, uncovering new associations and elucidating biological mechanisms across different domains.

翻訳日:2023-11-07 17:28:48 公開日:2023-11-04

# QOCO:モバイルエッジコンピューティングのための深層強化学習に基づくQoE指向計算オフロードアルゴリズム

QOCO: A QoE-Oriented Computation Offloading Algorithm based on Deep Reinforcement Learning for Mobile Edge Computing ( http://arxiv.org/abs/2311.02525v1 )

ライセンス: Link先を確認

Iman Rahmati, Hamed Shah-Mansouri, Ali Movaghar

(参考訳) モバイルエッジコンピューティング(MEC)の領域では、効率的な計算タスクのオフロードは、ユーザにとってシームレスな品質のエクスペリエンス(QoE)を保証する上で重要な役割を果たす。ユーザが応答性と信頼性の高いサービスを要求する、今日の相互接続の世界では、高いQoEを維持することが最重要である。この課題は、動的で不確実なモバイル環境の処理に寄与する最も重要な要因の1つである。本研究では,厳密なタスク処理期限とエネルギー制約がシステム性能に悪影響を及ぼすようなMECシステムの計算オフロードについて検討する。計算タスクオフロード問題をマルコフ決定プロセス(mdp)として定式化し,各ユーザの長期qoeを個別に最大化する。本稿では、モバイルデバイスが他のデバイスによる意思決定の知識を必要とせずに、そのオフロード決定を行うことを可能にする、深層強化学習(DRL)に基づく分散QoE指向計算オフロード(QOCO)アルゴリズムを提案する。数値解析により,QOCOの性能評価を行った。シミュレーションの結果、QOCOアルゴリズムはエッジノードの計算資源を効率的に活用することを確認した。その結果、14%のタスクを完了でき、タスクの遅延とエネルギー消費をそれぞれ9%と6%削減できる。これらを組み合わせると、既存のアルゴリズムと比較して、qoeの平均値が少なくとも37%向上する。

In the realm of mobile edge computing (MEC), efficient computation task offloading plays a pivotal role in ensuring a seamless quality of experience (QoE) for users. Maintaining a high QoE is paramount in today's interconnected world, where users demand responsive and reliable services. This challenge stands as one of the most primary key factors contributing to handling dynamic and uncertain mobile environment. In this study, we delve into computation offloading in MEC systems, where strict task processing deadlines and energy constraints can adversely affect the system performance. We formulate the computation task offloading problem as a Markov decision process (MDP) to maximize the long-term QoE of each user individually. We propose a decentralized QoE-oriented computation offloading (QOCO) algorithm based on deep reinforcement learning (DRL) that empowers mobile devices to make their offloading decisions without requiring knowledge of decisions made by other devices. Through numerical studies, we evaluate the performance of QOCO. Simulation results validate that the QOCO algorithm efficiently exploits the computational resources of edge nodes. Consequently, it can complete 14% more tasks and reduce task delay and energy consumption by 9% and 6%, respectively. These together contribute to a significant improvement of at least 37% in average QoE compared to an existing algorithm.

翻訳日:2023-11-07 17:19:58 公開日:2023-11-04

# UniTSFace: 顔認識のための統一された閾値統合型サンプル対サンプル損失

UniTSFace: Unified Threshold Integrated Sample-to-Sample Loss for Face Recognition ( http://arxiv.org/abs/2311.02523v1 )

ライセンス: Link先を確認

Qiufu Li, Xi Jia, Jiancan Zhou, Linlin Shen and Jinming Duan

(参考訳) サンプル対クラスベースの顔認識モデルは、大量の顔画像間のクロスサンプル関係を十分に調べることができない。さらに、どちらの方法も、正と負の対を分離する統一しきい値が期待できる実世界の顔認証アプリケーションの要件を満たすものではない。本稿では,正の対と負の対を区別するための明確な統一された閾値を特徴とする,試料対サンプル損失(USS損失)の統一しきい値を提案する。 USSの損失にインスパイアされ、サンプル対サンプルベースのソフトマックスとBCEの損失を導き、それらの関係を議論する。 MFR, IJB-C, LFW, CFP-FP, AgeDB, MegaFaceなど,複数のベンチマークデータセットに対する大規模な評価は,提案されたUSS損失が極めて効率的で,サンプル-クラスベースの損失とシームレスに動作することを示した。組込み損失(USSとSprint-to-class Softmax損失)は、以前のアプローチの落とし穴を克服し、訓練された顔モデルUniTSFaceは、CosFace、ArcFace、VPL、AnchorFace、UNPGといった最先端のメソッドよりも優れたパフォーマンスを示す。私たちのコードは利用可能です。

Sample-to-class-based face recognition models can not fully explore the cross-sample relationship among large amounts of facial images, while sample-to-sample-based models require sophisticated pairing processes for training. Furthermore, neither method satisfies the requirements of real-world face verification applications, which expect a unified threshold separating positive from negative facial pairs. In this paper, we propose a unified threshold integrated sample-to-sample based loss (USS loss), which features an explicit unified threshold for distinguishing positive from negative pairs. Inspired by our USS loss, we also derive the sample-to-sample based softmax and BCE losses, and discuss their relationship. Extensive evaluation on multiple benchmark datasets, including MFR, IJB-C, LFW, CFP-FP, AgeDB, and MegaFace, demonstrates that the proposed USS loss is highly efficient and can work seamlessly with sample-to-class-based losses. The embedded loss (USS and sample-to-class Softmax loss) overcomes the pitfalls of previous approaches and the trained facial model UniTSFace exhibits exceptional performance, outperforming state-of-the-art methods, such as CosFace, ArcFace, VPL, AnchorFace, and UNPG. Our code is available.

翻訳日:2023-11-07 17:19:34 公開日:2023-11-04

# Forward $\chi^2$ Divergence based Variational Importance Smpling

Forward $\chi^2$ Divergence Based Variational Importance Sampling ( http://arxiv.org/abs/2311.02516v1 )

ライセンス: Link先を確認

Chengrui Li, Yule Wang, Weihan Li and Anqi Wu

(参考訳) ログの最大化は潜在変数モデルを学ぶ上で重要な側面であり、変分推論(VI)は一般的に採用されている手法である。しかし、複雑な後続分布を扱う場合、VIは高いログライクな状態を達成する上で困難に直面する可能性がある。この制限に応えて,ログ類似度を直接推定し,最大化する,新しい変動重要度サンプリング(VIS)手法を導入する。 VISは、forward $\chi^2$ divergence を最小化して達成した最適な提案分布を活用し、ログ類似度推定を強化する。混合モデル、変分オートエンコーダ、部分観測可能な一般化線形モデルなど、様々な一般的な潜在変数モデルにvisを適用する。その結果,本手法は,ログ類似度とモデルパラメータ推定の両面で,最先端のベースラインを一貫して上回ることを示した。

Maximizing the log-likelihood is a crucial aspect of learning latent variable models, and variational inference (VI) stands as the commonly adopted method. However, VI can encounter challenges in achieving a high log-likelihood when dealing with complicated posterior distributions. In response to this limitation, we introduce a novel variational importance sampling (VIS) approach that directly estimates and maximizes the log-likelihood. VIS leverages the optimal proposal distribution, achieved by minimizing the forward $\chi^2$ divergence, to enhance log-likelihood estimation. We apply VIS to various popular latent variable models, including mixture models, variational auto-encoders, and partially observable generalized linear models. Results demonstrate that our approach consistently outperforms state-of-the-art baselines, both in terms of log-likelihood and model parameter estimation.

翻訳日:2023-11-07 17:19:06 公開日:2023-11-04

# 単層WSe2/ギャッププラズモンナノキャビティにおける高可変室温プレクシトン

Highly tunable room-temperature plexcitons in monolayer WSe2 /gap-plasmon nanocavities ( http://arxiv.org/abs/2311.02513v1 )

ライセンス: Link先を確認

Thomas P. Darlington, Mahfujur Rahaman, Kevin W.C. Kwock, Emanuil Yanev, Xuehao Wu, Luke N. Holtzman, Madisen Holbrook, Gwangwoo Kim, Kyung Yeol Ma, Hyeon Suk Shin, Andrey Krayev, Matthew Strasbourg, Nicholas J. Borys, D. N. Basov, Katayun Barmak, James C. Hone, Abhay N. Pasupathy, Deep Jariwala, P. James Schuck

(参考訳) 量子フォトニック技術の進歩は、光学活性状態の自由度を正確に制御する能力に依存している。そこで我々は, ストレイン工学と電圧調整可能なプラズモンナノキャビティを組み合わせた一般手法により, 2次元半導体単層におけるリアルタイム, 室温調整可能な強プラズモン・エキシトン結合を実現する。エキシトンエネルギーとナノキャビティプラズモン共鳴は、プラズモニック・ナノプローブに圧力を印加することで、同期的に制御可能であり、ラビの分裂が100mevを超え、デチューニングと結合強度をオペロンドで制御できることを示した。相関力分光法、ナノフォトルミネッセンス(ナノPL)、ナノラマン測定を応用し、電磁シミュレーションで拡張し、異なる偏光子バンドと暗い偏光子状態を特定し、ナノギャップとひずみチューニングの関数としてそれらの進化をマッピングした。このシステムは、デチューンを劇的に変更することなく、様々な空洞パラメータの結合強度を操作できる。さらに,複数の押圧サイクルと複数のナノバブルを用いた繰り返し実験により,波長可変の強いカップリングが頑健であることが判明した。最後に, ナノギャップサイズは, 基板とプラズモニック先端間の印加直流電圧により直接変調可能であることを示し, 複合型ナノエレクトロメカニカルシステム(NEMS)としての概念の性質を強調した。本研究は,単層(1l)遷移金属ジカルコゲナイド (tmds) に局在するプレキシトン状態を正確に制御し, 調整する可能性を実証し, 量子情報処理から光化学へのオンチップ・ポラリトン系ナノフォトニクス応用への道を開く。

The advancement of quantum photonic technologies relies on the ability to precisely control the degrees of freedom of optically active states. Here, we realize real-time, room-temperature tunable strong plasmon-exciton coupling in 2D semiconductor monolayers enabled by a general approach that combines strain engineering plus force- and voltage-adjustable plasmonic nanocavities. We show that the exciton energy and nanocavity plasmon resonance can be controllably toggled in concert by applying pressure with a plasmonic nanoprobe, allowing in operando control of detuning and coupling strength, with observed Rabi splittings >100 meV. Leveraging correlated force spectroscopy, nano-photoluminescence (nano-PL) and nano-Raman measurements, augmented with electromagnetic simulations, we identify distinct polariton bands and dark polariton states, and map their evolution as a function of nanogap and strain tuning. Uniquely, the system allows for manipulation of coupling strength over a range of cavity parameters without dramatically altering the detuning. Further, we establish that the tunable strong coupling is robust under multiple pressing cycles and repeated experiments over multiple nanobubbles. Finally, we show that the nanogap size can be directly modulated via an applied DC voltage between the substrate and plasmonic tip, highlighting the inherent nature of the concept as a plexcitonic nano-electro-mechanical system (NEMS). Our work demonstrates the potential to precisely control and tailor plexciton states localized in monolayer (1L) transition metal dichalcogenides (TMDs), paving the way for on-chip polariton-based nanophotonic applications spanning quantum information processing to photochemistry.

翻訳日:2023-11-07 17:18:49 公開日:2023-11-04

# ニューラルオブジェクト形状コンプリートを用いた擬似グラスピング

Anthropomorphic Grasping with Neural Object Shape Completion ( http://arxiv.org/abs/2311.02510v1 )

ライセンス: Link先を確認

Diego Hidalgo-Carvajal, Hanzhi Chen, Gemma C. Bettelani, Jaesug Jung, Melissa Zavaglia, Laura Busse, Abdeldjallil Naceri, Stefan Leutenegger, Sami Haddadin

(参考訳) 人間に合った環境におけるロボットの進歩的な普及は、デクスタリティが重要な役割を果たす無数のオブジェクト操作技術を生み出した。人間は物体を扱う際、異常なデクスター性を示すことが確立されている。このようなデキスタリティは、物体の性質(重量、大きさ、形状など)の堅牢な理解と、それらと相互作用する顕著な能力に由来すると考えられる。手の姿勢は、通常、特定の領域が、特に部分的に見える場合は、把握する必要がある物体に与える影響を示す。本研究では, 部分的観察から全形状を再構築し, 7自由度ロボットハンドで操作することで, 人間の物体理解を活用した。提案手法は, 部分的再構成のみでベースラインの把持成功率を30%近く向上させ, 3つの異なる対象カテゴリで150以上の把持を達成した。これは,現実のシナリオにおいて,様々な方向や位置から完成した物体形状に基づいて,把持姿勢を予測・実行するためのアプローチの一貫した能力を示す。我々の研究は、現実世界の再構成された物体の正確な把握と操作のスキルを必要とするロボットアプリケーションを強化する新たな可能性を開く。

The progressive prevalence of robots in human-suited environments has given rise to a myriad of object manipulation techniques, in which dexterity plays a paramount role. It is well-established that humans exhibit extraordinary dexterity when handling objects. Such dexterity seems to derive from a robust understanding of object properties (such as weight, size, and shape), as well as a remarkable capacity to interact with them. Hand postures commonly demonstrate the influence of specific regions on objects that need to be grasped, especially when objects are partially visible. In this work, we leverage human-like object understanding by reconstructing and completing their full geometry from partial observations, and manipulating them using a 7-DoF anthropomorphic robot hand. Our approach has significantly improved the grasping success rates of baselines with only partial reconstruction by nearly 30% and achieved over 150 successful grasps with three different object categories. This demonstrates our approach's consistent ability to predict and execute grasping postures based on the completed object shapes from various directions and positions in real-world scenarios. Our work opens up new possibilities for enhancing robotic applications that require precise grasping and manipulation skills of real-world reconstructed objects.

翻訳日:2023-11-07 17:18:12 公開日:2023-11-04

# MAAIP:物理系文字に対する実演の模倣を前提とした多エージェント対人インタラクション

MAAIP: Multi-Agent Adversarial Interaction Priors for imitation from fighting demonstrations for physics-based characters ( http://arxiv.org/abs/2311.02502v1 )

ライセンス: Link先を確認

Mohamed Younes, Ewa Kijak, Richard Kulpa, Simon Malinowski, Franck Multon

(参考訳) 物理に基づくキャラクターのリアルな相互作用と動きのシミュレーションは、インタラクティブなアプリケーションや、映画やビデオゲーム産業における自動セカンダリキャラクタアニメーションに非常に関心がある。近年の強化学習の成果は, シングルキャラクタシミュレーション, 特に模倣学習に基づく手法を用いた実験において, 顕著な結果が提案されている。しかし、複数の文字の相互作用と動きを模倣するには、その相互作用をモデル化する必要がある。本稿では,複数の物理系文字の相互作用と動作の両方を扱うために,一つの文字に対する動き模倣の考え方を一般化した,新しいマルチエージェント生成逆模倣学習手法を提案する。入力として2つの非構造化データセットが与えられる。 1) 特定のアプリケーションにリンクした一連の動作を行う単一のアクターの動作を含む単一アクターデータセット 2) 複数のアクター間のインタラクションのいくつかの例を含むインタラクションデータセット。これらのデータセットに基づいて,本システムは,本質的なスタイルを保ちながら,各キャラクターが各アクターに関連する対話的スキルを模倣できるように制御ポリシーを訓練する。このアプローチはボクシングとフルボディの格闘技の2つの異なるスタイルでテストされ、異なるスタイルを模倣する手法の能力を実証している。

Simulating realistic interaction and motions for physics-based characters is of great interest for interactive applications, and automatic secondary character animation in the movie and video game industries. Recent works in reinforcement learning have proposed impressive results for single character simulation, especially the ones that use imitation learning based techniques. However, imitating multiple characters interactions and motions requires to also model their interactions. In this paper, we propose a novel Multi-Agent Generative Adversarial Imitation Learning based approach that generalizes the idea of motion imitation for one character to deal with both the interaction and the motions of the multiple physics-based characters. Two unstructured datasets are given as inputs: 1) a single-actor dataset containing motions of a single actor performing a set of motions linked to a specific application, and 2) an interaction dataset containing a few examples of interactions between multiple actors. Based on these datasets, our system trains control policies allowing each character to imitate the interactive skills associated with each actor, while preserving the intrinsic style. This approach has been tested on two different fighting styles, boxing and full-body martial art, to demonstrate the ability of the method to imitate different styles.

翻訳日:2023-11-07 17:17:53 公開日:2023-11-04

# チャットGPTは言語実験を解けるか?

Can Chat GPT solve a Linguistics Exam? ( http://arxiv.org/abs/2311.02499v1 )

ライセンス: Link先を確認

Patricia Ronan, Gerold Schneider

(参考訳) 本研究は、言語モデルGPT4を用いたChatGPT4のバージョンであるChatGPT4が、導入言語試験をうまく解決できるかどうかを問うものである。ドイツの大学における言語学コースの紹介に関するこれまでの試験質問は、この試験に使われている。試験質問は、最小限の事前処理のみでChatGPT4に送付された。その結果,複雑なタスクやネストタスクの解釈においても,言語モデルは非常に成功していることがわかった。広い音素転写のタスクでは驚くほど成功したが、形態素やフレーズの分析ではあまりうまく機能しなかった。単純な場合では十分に機能するが、特に1対1の対応が欠如している稀なケースは、現在混合した結果で処理されている。このモデルは、構文木の分析や生成のような視覚化を扱うことができない。これらのタスクをテキストデータに変換するより広範な前処理は、モデルがこれらのタスクをうまく解決するのを可能にする。

The present study asks if ChatGPT4, the version of ChatGPT which uses the language model GPT4, can successfully solve introductory linguistic exams. Previous exam questions of an Introduction to Linguistics course at a German university are used to test this. The exam questions were fed into ChatGPT4 with only minimal preprocessing. The results show that the language model is very successful in the interpretation even of complex and nested tasks. It proved surprisingly successful in the task of broad phonetic transcription, but performed less well in the analysis of morphemes and phrases. In simple cases it performs sufficiently well, but rarer cases, particularly with missing one-to-one correspondence, are currently treated with mixed results. The model is not yet able to deal with visualisations, such as the analysis or generation of syntax trees. More extensive preprocessing, which translates these tasks into text data, allow the model to also solve these tasks successfully.

翻訳日:2023-11-07 17:17:31 公開日:2023-11-04

# 極性超低温反応:空洞制御分子光解離

Polaritonic ultracold reactions: cavity controlled molecular photoassociation ( http://arxiv.org/abs/2311.02497v1 )

ライセンス: Link先を確認

Vasil Rokaj, Simeon I. Mistakidis, and H. R. Sadeghpour

(参考訳) ルビジウム二量体とテラヘルツ空洞との共振振動強い結合を考慮した超低温光化学のキャビティ分極制御のための原型モデルを提案する。振動励起と真空光子吸収の交差を避けるために、分子と光子の間の分極状態は、分子振動のフランク・コンドン(FC)因子を効率的に制御できることを示した。光と物質の絡み合いにより、FC因子は1つの極性分岐から別の極性分岐に移動され、実質的に強化されたFC因子を持つ偏光子となる。この偏光子状態を利用して光解離し、超低温分子の形成が促進される。この研究は、キャビティ真空場と光結合を制御する道筋を示唆し、極性超低温化学の新たなサブフィールドの基盤を築いている。

We introduce a prototypical model for cavity polaritonic control of ultracold photochemistry by considering the resonant vibrational strong coupling of a rubidium dimer to a terahertz cavity. We demonstrate that at avoided crossings between a vibrational excitation and the vacuum photon absorption, the resulting polaritonic states between the molecule and photons can efficiently control the molecular vibrational Franck-Condon (FC) factors. Due to the entanglement between light and matter, FC factor is transferred from one polaritonic branch to other, leading to a polariton with a substantially enhanced FC factor. Utilizing this polariton state for photoassociation results in the enhanced formation of ultracold molecules. This work suggests a path to controlling photoassociation with cavity vacuum fields, and lays the ground for the emerging subfield of polaritonic ultracold chemistry.

翻訳日:2023-11-07 17:17:14 公開日:2023-11-04

# locomujoco:locomotionのための総合的模倣学習ベンチマーク

LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion ( http://arxiv.org/abs/2311.02496v1 )

ライセンス: Link先を確認

Firas Al-Hafez and Guoping Zhao and Jan Peters and Davide Tateo

(参考訳) Imitation Learning (IL)は、エンボディエージェントでアジャイルの移動を可能にするための大きな約束を持っています。しかし、既存のlocomotionベンチマークの多くは、主に単純化されたおもちゃのタスクに焦点を当てており、しばしば現実のシナリオの複雑さを捉えず、非現実的なドメインに対する研究の運営に失敗した。そこで本研究では,ILアルゴリズムの厳密な評価と比較を容易にするための新しいベンチマークを提案する。このベンチマークは、四足歩行、二足歩行、筋骨格の人間モデルを含む多様な環境を包含しており、それぞれが実際のノイズモーションキャプチャデータ、グランド・トゥルート・エキスパート・データ、グランド・トゥルート・サブ・オプティカル・データなどの包括的なデータセットを伴い、難易度レベルのスペクトルをまたいで評価することができる。学習エージェントの堅牢性を高めるために、動的ランダム化のための簡単なインタフェースを提供し、異なる実施形態でエージェントを訓練するための広範囲な部分観測可能なタスクを提供する。最後に、各タスクに手作りのメトリクスを提供し、評価を容易にし、高速なベンチマークを可能にする最先端のベースラインアルゴリズムでベンチマークを出荷する。

Imitation Learning (IL) holds great promise for enabling agile locomotion in embodied agents. However, many existing locomotion benchmarks primarily focus on simplified toy tasks, often failing to capture the complexity of real-world scenarios and steering research toward unrealistic domains. To advance research in IL for locomotion, we present a novel benchmark designed to facilitate rigorous evaluation and comparison of IL algorithms. This benchmark encompasses a diverse set of environments, including quadrupeds, bipeds, and musculoskeletal human models, each accompanied by comprehensive datasets, such as real noisy motion capture data, ground truth expert data, and ground truth sub-optimal data, enabling evaluation across a spectrum of difficulty levels. To increase the robustness of learned agents, we provide an easy interface for dynamics randomization and offer a wide range of partially observable tasks to train agents across different embodiments. Finally, we provide handcrafted metrics for each task and ship our benchmark with state-of-the-art baseline algorithms to ease evaluation and enable fast benchmarking.

翻訳日:2023-11-07 17:16:57 公開日:2023-11-04

# ベイズニューラルネットワークを用いた材料特性予測のための多変量回帰の不確かさ定量化

Uncertainty Quantification in Multivariable Regression for Material Property Prediction with Bayesian Neural Networks ( http://arxiv.org/abs/2311.02495v1 )

ライセンス: Link先を確認

Longze li, Jiang Chang, Aleksandar Vakanski, Min Xian

(参考訳) 物質科学におけるデータ駆動アプローチと機械学習に基づく手法の利用の増加により、情報決定のための予測変数の信頼性確実性定量化(UQ)の重要性は過大評価されない。材料特性予測におけるUQは、先進的な材料のマルチスケールおよびマルチフィジカルな性質、多数の要因間の複雑な相互作用、モデルトレーニングのための大規模キュレートデータセットの限定的利用など、ユニークな課題を提起する。近年、ベイジアンニューラルネットワーク(BNN)がUQの有望なアプローチとして登場し、ニューラルネットワーク内の不確実性を捉えるための確率的フレームワークを提供している。そこで本研究では,物質モデリングにおける規制法則から知識を統合し,モデルを物理的に一貫した予測へと導く手法を提案する。本手法の有効性を評価するために, 鋼のクリープ破断寿命を予測するケーススタディを提案する。クリープ試験から収集した3つのデータセットによる実験的検証は、従来のガウス過程回帰法の性能を超える、正確な点と不確実性の推定値を生成するBNNの能力を示す。同様に、アクティブラーニングアプリケーションにおけるBNNのUQに対する適合性を評価し、競合性能を報告した。最も有望なクリープ寿命予測の枠組みはマルコフ連鎖モンテカルロ近似に基づくbnnであり、変動推論近似や確率的アウトプットを持つ関連するnnに基づくbnnと比較してより信頼性の高い結果が得られた。コードはhttps://github.com/avakanski/creep-uncertainty-quantificationで入手できる。

With the increased use of data-driven approaches and machine learning-based methods in material science, the importance of reliable uncertainty quantification (UQ) of the predicted variables for informed decision-making cannot be overstated. UQ in material property prediction poses unique challenges, including the multi-scale and multi-physics nature of advanced materials, intricate interactions between numerous factors, limited availability of large curated datasets for model training, etc. Recently, Bayesian Neural Networks (BNNs) have emerged as a promising approach for UQ, offering a probabilistic framework for capturing uncertainties within neural networks. In this work, we introduce an approach for UQ within physics-informed BNNs, which integrates knowledge from governing laws in material modeling to guide the models toward physically consistent predictions. To evaluate the effectiveness of this approach, we present case studies for predicting the creep rupture life of steel alloys. Experimental validation with three datasets of collected measurements from creep tests demonstrates the ability of BNNs to produce accurate point and uncertainty estimates that are competitive or exceed the performance of the conventional method of Gaussian Process Regression. Similarly, we evaluated the suitability of BNNs for UQ in an active learning application and reported competitive performance. The most promising framework for creep life prediction is BNNs based on Markov Chain Monte Carlo approximation of the posterior distribution of network parameters, as it provided more reliable results in comparison to BNNs based on variational inference approximation or related NNs with probabilistic outputs. The codes are available at: https://github.com/avakanski/Creep-uncertainty-quantification.

翻訳日:2023-11-07 17:16:36 公開日:2023-11-04

# 畳み込み長期記憶テンソル回帰ネットワークを用いたカリフォルニアにおける火炎後植生回復予測

Forecasting Post-Wildfire Vegetation Recovery in California using a Convolutional Long Short-Term Memory Tensor Regression Network ( http://arxiv.org/abs/2311.02492v1 )

ライセンス: Link先を確認

Jiahe Liu, Xiaodi Wang

(参考訳) 森林火災後の植物再生の研究は, 生態系回復戦略の立案に不可欠である。先行研究は主に、火災後の継承に影響を与える重要な生態学的・生物地理学的要因を調査した。本研究は, 火災後の植物回復を予測し, 解析するための新しいアプローチを提案する。本研究では, 火災封止後の短期植物生育データに基づいて, 将来の正規化差分植生指数(NDVI)を予測する畳み込み長短期記憶テンソル回帰(ConvLSTMTR)ネットワークを開発した。このモデルは2013年から2020年にかけてカリフォルニア州で発生した104の野火で訓練され、テストされている。 ConvLSTMとテンソル回帰の統合により、予測されたNDVIを用いて全体的なロジスティック成長率kを計算することができる。全体として、我々のk値予測は印象的なパフォーマンスを示し、予測の50%は絶対誤差0.12以下、75%は誤差0.24以下である。最後に,uniform manifold approximation and projection (umap) と knn clustering を用いて回復傾向を同定し,回復率の異なる領域への洞察を提供する。本研究は, テンソル回帰とConvLSTMの併用を先導し, 類似の山火事のクラスター化に UMAP を適用した。これは予測的生態モデリングを推進し、将来の火災後の植生管理戦略を知らせる可能性がある。

The study of post-wildfire plant regrowth is essential for developing successful ecosystem recovery strategies. Prior research mainly examines key ecological and biogeographical factors influencing post-fire succession. This research proposes a novel approach for predicting and analyzing post-fire plant recovery. We develop a Convolutional Long Short-Term Memory Tensor Regression (ConvLSTMTR) network that predicts future Normalized Difference Vegetation Index (NDVI) based on short-term plant growth data after fire containment. The model is trained and tested on 104 major California wildfires occurring between 2013 and 2020, each with burn areas exceeding 3000 acres. The integration of ConvLSTM with tensor regression enables the calculation of an overall logistic growth rate k using predicted NDVI. Overall, our k-value predictions demonstrate impressive performance, with 50% of predictions exhibiting an absolute error of 0.12 or less, and 75% having an error of 0.24 or less. Finally, we employ Uniform Manifold Approximation and Projection (UMAP) and KNN clustering to identify recovery trends, offering insights into regions with varying rates of recovery. This study pioneers the combined use of tensor regression and ConvLSTM, and introduces the application of UMAP for clustering similar wildfires. This advances predictive ecological modeling and could inform future post-fire vegetation management strategies.

翻訳日:2023-11-07 17:16:03 公開日:2023-11-04

# CenterRadarNet: 4D FMCWレーダを用いた3次元物体検出・追跡フレームワーク

CenterRadarNet: Joint 3D Object Detection and Tracking Framework using 4D FMCW Radar ( http://arxiv.org/abs/2311.01423v2 )

ライセンス: Link先を確認

Jen-Hao Cheng, Sheng-Yao Kuan, Hugo Latapie, Gaowen Liu, Jenq-Neng Hwang

(参考訳) ロバストな認識は、安全な自律運転と補助運転を確保する上で不可欠な要素である。耐候性センサーを提供する自動車レーダー(77 - 81 GHz)は、先進的なLiDARベースの自動運転システムに補完機能を提供する。無線周波数(RF)レーダーテンソルは3D位置情報以外に、時空間のセマンティクスが豊富である。従来の手法のほとんどは3D (Doppler-range-azimuth) RFレーダーテンソルを用いており、鳥の目視(BEV)における物体の位置、方向角、大きさを予測できる。しかし、3D空間におけるオブジェクトのサイズ、向き、アイデンティティを同時に推測する能力は欠如している。この制限を克服するために,3次元物体検出および再同定(re-ID)タスクのための4Dレーダデータからの高分解能表現学習を容易にするために,CenterRadarNetと呼ばれる効率的なジョイントアーキテクチャを提案する。シングルステージの3Dオブジェクト検出器として、CenterRadarNetはBEVオブジェクト分布の信頼性マップ、対応する3Dバウンディングボックス属性、各ピクセルの外観埋め込みを直接推論する。さらに,学習した外見埋め込みをre-IDに応用したオンライントラッカーを構築した。 CenterRadarNetは、K-Radar 3Dオブジェクト検出ベンチマークで最先端の結果を達成する。さらに、K-RadarデータセットV2にレーダーを用いた最初の3次元オブジェクト追跡結果を示す。さまざまな駆動シナリオにおいて、CenterRadarNetは一貫性があり、堅牢なパフォーマンスを示し、その広範な適用性を強調している。

Robust perception is a vital component for ensuring safe autonomous and assisted driving. Automotive radar (77 to 81 GHz), which offers weather-resilient sensing, provides a complementary capability to the vision- or LiDAR-based autonomous driving systems. Raw radio-frequency (RF) radar tensors contain rich spatiotemporal semantics besides 3D location information. The majority of previous methods take in 3D (Doppler-range-azimuth) RF radar tensors, allowing prediction of an object's location, heading angle, and size in bird's-eye-view (BEV). However, they lack the ability to at the same time infer objects' size, orientation, and identity in the 3D space. To overcome this limitation, we propose an efficient joint architecture called CenterRadarNet, designed to facilitate high-resolution representation learning from 4D (Doppler-range-azimuth-elevation) radar data for 3D object detection and re-identification (re-ID) tasks. As a single-stage 3D object detector, CenterRadarNet directly infers the BEV object distribution confidence maps, corresponding 3D bounding box attributes, and appearance embedding for each pixel. Moreover, we build an online tracker utilizing the learned appearance embedding for re-ID. CenterRadarNet achieves the state-of-the-art result on the K-Radar 3D object detection benchmark. In addition, we present the first 3D object-tracking result using radar on the K-Radar dataset V2. In diverse driving scenarios, CenterRadarNet shows consistent, robust performance, emphasizing its wide applicability.

翻訳日:2023-11-07 11:18:45 公開日:2023-11-04

PDF登録状況（公開日: 20231104）